Sunday, June 23, 2013

Distributed File System, part 3

I've thought through a lot of this already, but I have not implemented much. But I have gotten started...

Wish I could put a block diagram thing in here somehow...image seems the only way, but I don't really have any.

So you want to have all the shared files available everywhere, but you sure can't keep copies everywhere, and I discussed the idea of cross-mounts, or buying truly massive storage devices, etc... none of those things are workable, really.

So what I think you do is gather the knowledge of what all the shared files are, catalog them, publish the catalog via a web-service, and then transparently copy things locally when you need to use them, and age them off later (either by a size heuristic, or a time heuristic, or LRU heuristic), making them available locally.

Depending on what's happening across your network, you could wind up with a popular file having copies actually reside a lot of places...for a while. Most files would only have two locations: primary shared, and whatever there is for standard backup,

At the moment I'm thinking of it rather like a library system.You have your own collection of files (books you acquired somewhere). You are willing to share some of them. Others are likewise willing to share some. There's a "library/librarian" service. You can ask the service what all is available (from all those willing to share--the "library" doesn't have its own repository), and you can have a copy of anything listed, until you bump into your local age-off restrictions. Remember how your local physical library works? You can look at the catalog, find something you want, check out a book for 30 days, take it home to be in your personal library, and then return it: i.e., locate a file, copy it locally for temporary use, and then delete it.

If you find yourself having age-off space problems, maybe you buy some bigger bookshelves (i.e., a new and larger disk drive).

This is not a perfect analogy, but works ok for the moment.

So there are some other storage units that could/should participate in this, and they need a proxy of sorts to do so: SAN, NAS--that sort of thing. A NAS device can be just a mountable filesystem, which suggests that perhaps the Librarian needs to take on the management of that, although that doesn't quite fit the analogy the right way: I am thinking of the file-copying as being a lot more like a P2P file-transfer system.

So there's the Librarian service(s), the local shared-publishing service, the P2P file-transferring, and the local storage management. I've written a small part of the Librarian, more of the local shared, I've been looking at file-transfer codes, and merely thought about the storage mgmt. It's all just casual so far, although it's been in the back of my mind for months. Been writing down the use cases, too.  I should have a working system in a couple of months, I think.

[Later: ok, I've put less time into it recently, so not til this fall at the earliest]

No comments: