Ah, that was the point. No graph DB. Wrote a high-performance-but-low-overhead sorted collection class and a routine that loaded SNOMED CT objects into RAM from slightly modified RF1 files. Also wrote a library to do the traversal / query part (very much in the same mould as the official recommendations, only the query language was XML based and thus easy for poor ol’ me to write a parser for). That leaned heavily on the excellent Guava library for Set operations.
Just sticking everything in TreeMap gives you about 400MB of overhead for that set of objects (making the whole shebang over 900MB), and it’s not very fast.
900MB is pushing the limits of available heap space on 32-bit Windows boxes (for virtual memory address layout reasons, you get about 1.2-1.4GB heap space, max, depending on the size of drivers you have loaded). But of course, we were limited to 32-bit OS installs, because various bits of legacy baggage didn’t want to run on 64-bit Windows and no-one wanted to work to fix this.
So for my poor crippled Windows users, I worked to trim that overhead ; ended up with a collection class that was much faster and had about 20MB total overhead for all 3M or so objects in the core SNOMED CT module (which contains about 150MB just as strings). Being able to fit everything in around 500MB of RAM leaves you enough room to actually run useful apps on top of that data.
Collection class was what I think is called a hybrid bucket trie ; low overhead and quite cache-optimized so lookups and inserts are fast. Can load the core SNOMED CT module in under 20s on reasonable hardware, can run most of the full set of UK refset queries and write the output to disk in under 80s ; from comparisons we made with the official implementation (at the time, Chris Morris converting specs in Word docs to lots of PL/SQL), it’s pretty accurate.
Still have the code (and it’s under a permissive license like most efforts of NHS Digital in recent years), but it ain’t pretty or fun to read. All very, very special purpose. But I think SNOMED CT is complex enough that it needs special-purpose - working at it from the level of abstraction that e.g. the Common Terminology Services spec lays out is just nuts, like making a sandwich using chopsticks while wearing a welding mask.
Edit : And SNOMED CT is in itself quite the abstraction, proposals like the representation of numbers for e.g. pharmaceutical products compounding that considerably.