There's a lot of data around on a lot of computers everywhere...far too much to fit on any one machine, or even on some kind of larger storage in any cost-effective manner for us little guys.
At work I have a SAN, 100TB available storage. THAT is a lot of storage; but given what I do there, actually not all that hard to fill up. But that kind of device STILL does not solve the larger problem, nor was it very cost effective--I could replace the drives, from 2TB to 3TB, but that would only be a 50% increase...suppose I need a 10X increase? 100X? More?
2TB drives aren't very expensive any more (you know, it seems almost absurd to even be able to say that, given that my first computer had a 20 MB drive in it), and it's not hard to find dirt-cheap machines around, used or even free. Regrettably they are seldom small, and therefore tend to be a little power hungry...not a prob for a data center kinda place, but uncomfortable for me at home.
Suppose I decided I had a problem to work where 30TB looked like the right capacity...and let's say that means 10 machines @ 3TB each...
I've written a heterogeneous distributed OS-agnostic Grid Engine. Perfect for doing data processing on a 10-node cluster. But this really works best when all the nodes are using a shared/common file system. THAT works best with a SAN and a Blade Server, like at work. Well, the blade server part isn't really very expensive ($3k will buy a decent used one that is full, and pleasantly fast--look on EBay for IBM HS21 systems). But getting a SAN on there--not going to happen. OK, I could perhaps put some high-cap 2.5" drives in the blades, etc, but that doesn't solve the resulting problem, which is still how do they share data with each other?
Well, on a limited basis you can make file shares and cross-mount all the shares across all the machines--but that doesn't scale all that far, and those shares all become a nightmare--and they STILL aren't a shared common file system.
So really the problem I have is how to make a shared common file system across a bunch of machines? I need it to be heterogeneous, since I run Mac/Win/Linux machines, and am considering other things like Gumstix.
There are homogeneous file systems around...several, it turns out, although they are mostly Linux-only (FUSE, Lustre/Gluster, etc), which doesn't help me. OK, I could just buy the cheap hardware, and install Linux everywhere, but what happens when I have a windows-only software tool to run?
I've been hunting for an OS-agnostic tool, it's not really clear whether there is such a thing. OpenAFS (i.e, Andrew File System) might do it, which would be perhaps the ideal solution. I haven't tried this yet. Pretty much everything I've read about doesn't meet my requirements, heterogeneous being the first fail point. At work I'm using StorNext with the SAN, but I can't afford that on my own.
So I think I have to solve this myself. What I kinda think I want is a BYOD approach where you'd have to run some agents to join, but you'd have access to everything shared on the network without having to cross mount a zillion things that you can't even find out about casually.
What you would NOT have is something that shows up in Finder/Windows-Explorer. I can probably figure out how to finagle that too, altho I don't consider that a critical requirement. I expect that OpenAFS has that figured out.
Is it going to take an Advanced Degree(tm) to figure this out? It's not an easy problem.
Friday, June 14, 2013
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment