Sunday, May 05, 2013

Advanced Software, leading to PDVFS

I generally work on somewhat exotic software projects. Cutting, if not bleeding, edge.

Was early in the Semantic Web stuff 2000-06, the AI stuff in the 80s, other oddments like Digital Mapping (starting in the 80s), text processing (starting mid-90s), I wrote one of the very first GUI builders (late 80s). A health-care R&D effort in the early 90s would still be cutting edge today.

My latest bit of exotic is a Grid Engine. Granted, not anything new, other than mine is OS-agnostic. You can readily find the other grid engines, but they are not really agnostic. Mine runs Windows (XP/7), Linux (probably any flavor) and OSX (at least 10.6+). The whole thing is of course written in Java, which is why it's agnostic. It should run anywhere a Java 1.6 JVM runs properly (possibly including JME, I haven't a way to test there--it would depend on the lightweight thread support).

I'm now processing a lot bigger datasets than I used to, thus the Grid Engine, in order to distribute processing adequately. I have, so far, run it on two systems: 3 machines with 64 total cores, and 12 machines with 48 total cores. It's designed to run on A LOT of machines, but I'm pretty sure that there are undiscovered scale-up problems along the way. There's no imposed maximum.

Because the datasets are now bigger, I have to think about additional problems. In particular, where does that data go? Everything is fine as long as the dataset is under 2 TB, because that fits a single disk just fine, but then you have the issue of how many clients have to be served by that disk, and therefore how much punishment the disk is taking over time; this is the arrangement I have on the 3/64 machines, with no apparent disk degradation yet. If you use a SAN, you can certainly make a much larger apparent single partition; this is what I have with the 12/48 machines, that's a blade chassis with an attached FC-SAN, with 60/15/15/5/5 TB partitions. You set up the SAN for the partition sizes, and use separate software to manage how the blade units see the SAN; works fine, that's really a lot of space, you CAN daisy-chain another SAN onto it, but that isn't really solving the problem--because I've already burned out two disks in it.

I want/need to distribute data differently, so that I am achieving a more random spread of data over storage devices. I want to work this with the grid engine. I need it to be heterogeneous across random hardware.

So of course Hadoop HDFS sounds like a possible, but there are some reason why not. Hadoop is not oriented around this kind of data, where file sizes range from 100 bytes to 3 Gig. Hadoop wants a 64 MB file-chunk size--I don't have that. I need to use native file systems and disk behavior.

Looking at various experimental file systems, nothing seems to do the right job, or be adequately OS-agnostic. There are several Linus-only possibilities, which are probably closer to what I want other than being Linux-only.

Initially I thought I wanted real mounted file-systems. AFS seemed the likeliest solution, but I think that has some problems likely. I don't know what, specifically, except that I wonder what it means to be writing files out--where are they? It looks like a unified file-system, DOES appear OS-agnostic, but...I don't know.

So I'm now thinking about something that isn't actually a file-system, but a P2P-FS-like thing. I need some not-quite-normal capabilities. And I ultimately want it to run on anything that has file storage (or fronts for it, like a NAS). Going to be interesting working this...

No comments: