# `sct` - Modern, fast SNOMED-CT tooling for the agentic age **Category:** [open forum](https://openhealthhub.org/c/open-forum/9) **Created:** 2026-03-27 23:48 UTC **Views:** 82 **Replies:** 14 **URL:** https://openhealthhub.org/t/sct-modern-fast-snomed-ct-tooling-for-the-agentic-age/2956 --- ## Post #1 by @pacharanero Hi Open Health Hubbers I had an interesting day today thinking about SNOMED-CT Terminology Servers and how inefficient it is to have to call a web server for SCT queries. I have always found it difficult to get started with SNOMED because of the requirement to set up a server, or to use one of the clunky web UIs. I wanted a local-first SNOMED tool that I could understand and which I could play with on the command line. Maybe such things exist somewhere, but a decent search didn't turn up anything that was obviously easier to get to grips with than the Ontoservers and Snowstorms and suchlike. Using SNOMED and LLMs via a Web Terminology Server is likely to be incredibly slow, because of the latency of the web connection, then multiple queries required, and the relatively heavy context window overhead for LLMs of crafting REST queries compared to using `jq` or `ripgrep`. So. I wrote my own! It's a single Rust binary which can ingest the entirety of the UK Edition in 27 seconds on my laptop. From there there's a ton of next steps you can take to handle the data in different ways. https://github.com/pacharanero/sct `sct` is a local-first SNOMED CT toolchain - a single Rust binary that takes an RF2 Snapshot release (the raw tab-separated files that SNOMED CT is distributed in) and converts it into a canonical NDJSON artefact, joining 800k+ concepts with their preferred terms, synonyms, hierarchy paths, and relationships in one pass. From that artefact you can load into **SQLite** with FTS5 full-text search, export to **Parquet** for DuckDB analytics, render per-concept **Markdown** files for RAG/LLM ingestion, or generate Ollama **vector embeddings** for semantic similarity search. There's also a built-in **MCP server** that connects directly to LLMs, giving an AI assistant live access to five SNOMED tools (free-text search, concept detail, children, ancestors, hierarchy browse) with no cloud dependency and sub-5ms startup. The whole thing runs offline, produces standard files queryable with `sqlite3` , `duckdb` , `jq` , or `ripgrep` , and is designed around the principle that the expensive RF2 join should happen once - deterministically - and everything else should be derived from the resulting stable, versionable NDJSON file. I'd very much appreciate feedback on the work so far. It's likely to have some bugs, but it does work. As an example, I have been able to ask Claude for the dumbest and silliest SNOMED-CT terms, in its opinion, which it dug out after 58s of querying for silly things: ![image|532x500](upload://wNavD6PCnX9BUyvWvTZOJlVi3D.png) I finally have SNOMED-CT in may laptop in a way I can query in any way I like. What would you like to see in `sct` next? --- ## Post #2 by @mayfield.g.kev You are a bit further down the road than I am. I’m at a stage where I can produce output like you have using (jupyter + **[medspaCy](https://spacy.io/universe/project/medspacy) + scispacy)** https://github.com/nw-gmsa/Testing/blob/main/PDFTextAnalytics.ipynb But I’m going to use this to justify not producing only PDF Genomic Reports , our source data like many secondary care is already coded. So I’m going to attempt improvements around engineering the process, rather than fixing the output (as PDF) --- ## Post #3 by @mayfield.g.kev The main problem I have is: GP’s probably want the diagnostics implication in Rare Disease Genomic Reports SNOMED coded. That I think is pretty obvious. The hard part is talking the ‘NHS’ system into making this a requirement, most of the tech around this is already done. E.g. in Manchester the PDF is shared with GPs, in Yorkshire we can in theory just plug into the main Yorkshire architecture ….. so Yorkshire GP’s can see reports we’ve done for their patients. ← Again this is mostly technically done, the hard part is the NHS system. Somehow I need to engineer an obvious user requirement. Sorry started to waffle, I think many elements of SNOMED coding could be solved with northern engineering, rather than AI --- ## Post #4 by @navin1976 Hi Marcus, This looks really nice! Thank you. I am looking at how to best add SNOMED CT support to my server backend. Am I right i thinking if i wanted to keep a trail of the full snomed change history, I would: 1) Download the full current release and generate the canonical NDJSON. 2) When new releases come, I download the snapshot and run `sct ndjson --rf2 ` 3) Run `sct sqlite --input new-release.ndjson` and then hot-swap this new database file into my database. 4) Run: `sct diff --old 2025-01.ndjson --new 2026-01.ndjson --format ndjson > diff_25_to_26.ndjson` So that: a) My web app only hosts the single SQLite database generated from the latest Snapshot. b) I store the tiny `diff.ndjson` files generated by `sct diff` to maintain a trail of the historical change. c) I can compress (via gzip or zstd) and archive the point-in-time `.ndjson` files, to recreate the database for any exact date in the past if ever needed. --- ## Post #5 by @pacharanero [quote="navin1976, post:4, topic:2956"] * Download the full current release and generate the canonical NDJSON. * When new releases come, I download the snapshot and run `sct ndjson --rf2 ` * Run `sct sqlite --input new-release.ndjson` and then hot-swap this new database file into my database. * Run: `sct diff --old 2025-01.ndjson --new 2026-01.ndjson --format ndjson > diff_25_to_26.ndjson` [/quote] I think that would work yes. The entire process should be totally deterministic, so you can store any part of the 'chain' (apart from perhaps the embeddings, which may have 'temperature' ie small amounts of random variation) from .zip -> RF2 -> NDJSON -> SQLite -> diffs and they should all essentially hold the same information. [quote="navin1976, post:4, topic:2956"] I am looking at how to best add SNOMED CT support to my server backend [/quote] Just to flag with you, in the spirit of transparency, that I literally wrote this project **yesterday** in a single massive spec-driven agentic engineering session, and while I take care to make sure this is not throwaway 'vibe code' (it has tests, I have clear standards, I use CI and will make this 100% production ready...) - you should still exercise caution if you're incorporating this work into your platform! --- ## Post #6 by @pacharanero [quote="mayfield.g.kev, post:3, topic:2956"] Sorry started to waffle, I think many elements of SNOMED coding could be solved with northern engineering, rather than AI [/quote] I wasn't quite clear on what you're planning to do - you're right that engineering and standards are the solutions, not AI. Receiving GP clinical systems probably don't have good ways for the Sender to include 'suggested' SNOMED-CT codes, apart from bunging the code in a PDF and then this becomes an admin task from one of the practice staff. GP clinical systems should probably be better at this. --- ## Post #7 by @navin1976 Noted thank you! This system is not in full production yet. And if it gets rid of the need to run a snomed server that would be great. I wonder if there is some interesting logic for working across national / international editions with your system, maybe a flag to prefer an edition if you have multiple loaded? That could be really useful for a multi-national platform. --- ## Post #8 by @pacharanero For multiple editions, create multiple appropriately-labelled `snomed-.db` files - then you can simply use the required edition when you issue SQLite commands. At the moment it's been designed mainly as a CLI tool, but I do plan to make it into a Rust library which can be integrated into a Rust codebase, so that my other projects can integrate SNOMED more easily. However once you have a local NDJSON or SQLite DB, or any of the derivative products, you can query them using any tooling you want, in any framework or language. --- ## Post #9 by @navin1976 Looking forward to the library! --- ## Post #10 by @mayfield.g.kev [quote="pacharanero, post:6, topic:2956"] Receiving GP clinical systems probably don’t have good ways for the Sender to include ‘suggested’ SNOMED-CT codes, apart from bunging the code in a PDF and then this becomes an admin task from one of the practice staff. GP clinical systems should probably be better at this. [/quote] They do, a few years ago → 5 of them (as Transfer of Care) but ……. it was complex (for small NHS Trust IT teams) and GP Suppliers only display them to GPs as html pages (and a 3rd party can’t access them). The next version to come out is likely to be this https://build.fhir.org/ig/hl7-eu/hdr/ , ideally shared via NRL, hopefully without any over engineering added on top. But in the main most others have just been PDF, mostly Kettering XML as this was easiest for small NHS Trust teams to support ….. but in most cases these reports start off with coding present (probably not SNOMED or LOINC) with NHS trusts. --- ## Post #11 by @JW148 Interesting - as one who has gone through the pain of setting up own SNOMED CT SQL server ingesting Monolith edition and has to go through the tedium of updating said installation at intervals... I think some of us, unfamiliar with Rust may need a bit of help to get on the first rung of the ladder. Have you got some kind of a Dummies guide that will help me from the point where I am logged in to my Windows 11 notebook, can access Github - read the files but can't easily work out where to start / the prerequisites / what I need to install etc, to have some chance of checking out what you have done. It looks very interesting and it might be worthwhile amongst other things to compare what your SCT generates with the output from SQL queries on full server - accessible from my notebook. Also intrigued to know how you have pulled together RF2 into a single canonical artefact. The monolith release gets rid of some of the complexity - but not all. Anything to demystify and get to grips with the SNOMED CT beast looks good to me... Hope that with a little bit of help I can get SCT working? --- ## Post #12 by @JW148 Important waffle. As ex GP I would be interested to explore what GPs need / want from Rare Disease Genomics Reports and how that can best be conveyed in a structured manner. Would also be interesting to hear more about difficulties with the 'NHS system'. Am involved with some work on replacing the aged PMIP / EDIFACT pathology links message with FHIR / SNOMED CT. A number of us keep asking why there is not more collaboration with the genomics folk about getting reports into GP systems. The 'system' seems to be siloed. For the future we need a very much wider spectrum of reports to reach GP systems than the very limited selection that we currently get via PMIP. We also need to close the loop so that meaningful requests go the other way - and all designed to accommodate decision support / AI in its various forms. We too have our problems with the 'NHS system'. Sounds as if this may at some point need a thread separate from SCT --- ## Post #13 by @mayfield.g.kev reply https://openhealthhub.org/t/genomic-reports-for-gp-and-future-ai-use/2963 --- ## Post #14 by @pacharanero [quote="JW148, post:11, topic:2956"] Have you got some kind of a Dummies guide that will help me from the point where I am logged in to my Windows 11 notebook, can access Github - read the files but can’t easily work out where to start / the prerequisites / what I need to install etc, to have some chance of checking out what you have done. [/quote] There is a Quick Start here https://github.com/pacharanero/sct?tab=readme-ov-file#quick-start And the Walkthrough is also useful https://pacharanero.github.io/sct/walkthrough/ Installing Rust on Windows will be documented somewhere on the internet. I use `mise` to install language toolchains now, but that is an optional route. `rustup` is the most common approach. --- ## Post #15 by @pacharanero https://youtu.be/f-gz-MKtU44 Here's the [Walkthrough](https://pacharanero.github.io/sct/walkthrough/) section of the docs, in video form. If anyone's struggling to get `sct` installed, feed back here with what you've tried and what's happening when you do, and I'll help. --- **Canonical:** https://openhealthhub.org/t/sct-modern-fast-snomed-ct-tooling-for-the-agentic-age/2956 **Original content:** https://openhealthhub.org/t/sct-modern-fast-snomed-ct-tooling-for-the-agentic-age/2956