An Open Clinical Terminology?

As far as I am given to understand right now, SNOMED-CT is the ‘only terminological game in town’. Yet it’s proprietary and only available under license, which I think is against the most fundamental principles of medicine. (see my blog post here for a more detailed and ranty explanation of what I mean). We need something open, permissively-licensed, and - crucially - SIMPLE so that clinicians can use and uderstand it.

An open clinical terminology that can ‘guide and teach’ as you use it (to paraphrase the inestimable Larry Weed ).

I’ve started this thread to explore in more detail how you could use modern tooling and development paradigms to develop a completely free, open clinical terminology.

Initially I kind of want to start a discussion here of what we understand by a clinical terminology. But I’d also like to understand how you might design a new terminology to avoid some of the problems that existing terminologies end up with:

  • General clinician-unfriendliness of terminology - as a terminology matures, it seems to become steadily more ‘terminologically perfect’, yet steadily more unusable for its primary purpose by everyday clinicians. This is the main issue we need to address. Most general clinicians don’t understand the differences between similar-sounding concepts from different parts of SNOMED’s hierarchy, for example. Older, simpler clinical coding systems like Read were smaller and less expressive, but much more understandable by clinicians. Education alone cannot fix this, the actual terminology needs to be fit for its primary purpose.

  • Lack of ready readibility/comprehensibility of the ‘code’ - now that data storage technology has rendered obsolete the need to reduce the number of bytes being stored by replacing the text with a 4- or 5-byte code, why do we need to use non-human-comprehensible labels such as SCT 22298006 to mean something that could be written as myocardial infarction

  • Changes break queries - retiring or redefining codes and the relationships between codes over time results in historical queries which are inconsistent, and can only be done using a ‘transitive closure table’ - can that be designed out?

  • Is it a namespace? - most clinical terminologies have ‘Concepts’ and these function as a kind of namespace - for example a Read Code |Xa18W|Tympanic membrane structure| creates a unique point of reference that ‘means’ that atomic clinical concept, and can be used to represent that clinical concept in the abstract, just as all language is a namespace in this regard. Is it enough just to define the namespace? This is what clinicians want and need after all.

  • Is it a hierarchy? - Concepts may have hierarchical relationships in multiple directions (parent/child, and association/feature_of, and more) but does this layer need necessarily to be part of the terminology? Does this layer just complicate the simplicity of a flat namespace?

  • Is it a ‘language’? - elements of terminologies such as SNOMED-CT such as post-coordination and even the way pre-coordinated codes are created allow for ‘composition’ or complex concepts from more basic ones. But it’s clumsy and implementation is very bespoke to the terminology. Could we learn from regular programming languages how to do this compositional work better? Alternatively, should it just be ditched as utter madness?

  • Distribution - from what I’ve seen in the world of terminologies, the distribution mechanisms fall way short of what the rest of the tech industry uses to distribute and manage dependencies in other knowledge artefacts such as software libraries. Instead of downloading enormous TSV text files and munging them manually, I want to be able to do something that would make sense in a programming context and enable me to use existing tools to manage versions, dependencies, and updates.

  • Use existing stuff, don’t invent weird new jargon - the tech world has many excellent tools to serve us in our mission - eg JSON is the lingua franca of information exchange across the web, and so a terminology that is natively expressed and distributed in JSON would be easire to get to grips with for developers than something you need to manually

And myriad other questions:

  • How are other similar big reference ontologies elsewhere in the tech world handled?
  • Is there any way to prevent the inevitable mention of RDF triples and OWL in this thread? :wink:
  • Can you manage a project like this on GitHub or similar, using open discussion, Issues, and Pull Requests for change management.

Let’s have your thoughts and comments right here!



Because we build solutions to non-clinical health and Care challenges this does not affect us per se but happy to learn more about SNOWMED-CT . Agree it should be freely avaialble. ‘available under license’ is a euphemism fro cha’ching.

1 Like

you asked a lot of questions, some of which I might attempt a direct answer to. But initially, I’ll repost something from the openEHR lists from Mar 2018 on how I would do the big picture differently. [original ref(Re: [Troll] Terminology bindings ... again)]

The killer move would be to do something I advocated for years unsuccessfully: separate SNOMED technology from content and allow them to be independently licensable and used. Here, technology means representation (RF2 for example), open source programming libraries for working with ref-sets, specs and implems for e…g the constraint language, URIs and so on.

It should be possible for a country (the one I am most familiar with w.r.t. to terminology today is Brazil) to create an empty ‘SNOMED container’ of its own, and put its existing terminologies in there - typically procedure lists, drug codes, lab codes, devices & prosthesis codes, packages (chargeable coarse-grained packages like childbirth that you get on a health plan) and so on. There are usually <20k or even 10k such codes for most countries (UK and US would an exception), not counting lab analyte codes (but even there, 2000 or so codes would take care of most results). But the common situation is that nearly every country has its own version of these things, and they are far smaller than SNOMED. Now, SNOMED’s version of things is usually better for some of that content, but in some cases, it is missing concepts .

The ability to easily create an empty SNOMED repo, fill it with national vocabularies, have it automatically generate non-clashing (i.e. with other countries, or the core) concept codes and mappings, and then serve it from a standard CTS2 (or other decent standard) terminology service would have revolutionised things in my view. This pathway has not been obviously available however, and has been a real blockage. The error was not understanding that the starting point for most countries isn’t the international core, it’s their own vocabularies.

The second killer feature would have been to make creating and managing ref-sets for data/form fields much easier , based on a subsetting language that can be applied to the core, and tools that implement that. Ways are needed to make the local / legacy vocabularies that have been imported, to look like a regular ref-set.

The third killer feature would have been to make translation tools work on the basis of legacy vocabulary and new ref-sets, not on the basis of the huge (but mostly unused) international core.

I think IHTSDO’s / SNOMED International’s emphasis has historically been on curating the core content, and making/buying tools to do that (the IHTSDO workbench, a tool that comes with its own PhD course), rather than promulgating SNOMED technology and tooling to enable the mess of real world content in each country to be rehoused in a standard way, and incrementally joined up by mapping or other means to the core. I think the latter would have been more helpful.

There is additionally an elephant in the room: IHTSDO (now SNOMED International) has been tied to a single terminology - SNOMED CT , but it would have been better to have had a terminology standards org that was independent of any particular terminology, and worked to create a truly terminology-independent technology ecosystem, along with technical means of connecting terminologies to each other, without particularly favouring any one of them. It’s just a fact that the world has LOINC, ICDx, ICPC, ICF and hundreds of other terminologies that are not going anywhere. What would be useful would be to:

  • classify them according to meta-model type - e.g. multi-hierarchy (Snomed); single hierarchy (ICDx, ICPC, … ); multi-axial (LOINC); units (UCUM, …), etc
  • build / integrate technology for each major category - I would guess <10
  • help the owning orgs slowly migrate their terminologies to the appropriate representation and tools
  • embark on an exercise to graft in appropriate upper level ontology/ies, i.e. BFO2, RO, and related ontologies (this is where the <10 comes from by the way)
  • specify standards for URIs, querying, ref-sets that work across all terminologies , not just SNOMED CT

A further program would look at integrating units (but not by the current method of importing to SNOMED, which is a complete error because of the different meta-models), drugs and substances (same story), lab result normal and other range data, and so on. None of this can be done without properly studying and developing the underlying ontologies, which are generally small, but subtle.

I’ll stop there for now. I suspect I have kicked the hornet’s nest, but since Grahame kicked it first, and I can run faster than him, I feel oddly safe. Probably an illusion.


Have you seen “The Great Passage”? The novel was a story to make a Japanese dictionary, and derived to a drama film and an anime series. The story describes a man devoted his life to make an update to a dictionary for about 300,000 words, as many as terms in SNOMED CT.

To make a terminology is similar to make a dictionary. It takes a long time and money. I agree that SNOMED has problems, but could not find alternatives for funding. Even Wikipedia, they have to collect huge donation to run.
If it is possible the shortest way would be to bue SNOMED and release for free.

1 Like

Is this really a priority - Its’ a massive task and SNOMED-CT does the job

SNOMED has it problems and carries a lot of baggage from it’s history, but if you ignore some of its complexity (the point Marcus makes about it trying to be a language) it does the job.

Open source purist may not like it licensing model, but it’s free to use in those countries that are members (like the UK) and fees reflect the wealth of the members and I would suggest are affordable even for poor countries (although I would like to see SNOMED International drop the fee where this is currently under $5,000 pa) . Given that there are no examples of a economically sustainable large scale open source projects in health, since Donald Trump pulled the legs out from under VistA, SNOMED’s reluctance to go open source is understandable. SNOMED International are after all a not-for-profit NGO whom I’m sure would like to go open source if they believed it was economically sustainable. As an open source enthusiast (I’m writing this on my Linux desktop) I say this with some sadness.

My view is that we should focus our efforts on fixing issues with SNOMED not starting again.



What we are suffering from is a total lack of decent infrastructure. That’s why everything is SO DAMN HARD and takes so long in health tech (unless you are selling snake oil AI in which case it’s all about the marketing not the product). So yes, stuff like this IS a priority. Anyway, what does it matter to anyone else what I prioritise? It’s not like I was doing anything else…

And… it’s not just a call of ‘we need an open terminology because Open’ - it’s also ‘we need terminologies that don’t require 25 years of experience to use’ - we need the tooling, the language-specific helper packages, the utilities, the distribution mechanisms…


I think this is an illustration of one of my core bones of contention with the focus of EHR software - all icing, no cake.

Structured data that systems can understand and make use of is a lovely goal, but only if you have the basics.

  • Universally accessible records
    • No special software required (or only universally available software)
  • Good access control / privacy
  • Excellent audit of access and edits

You don’t need SNOMED CT to make medical records useful to humans - they only need to natural language understand them. SNOMED CT is compensation for the lack of context awareness of most computer systems in processing natural language. It’s a highly unnatural language. In most cases, I would imagine it’s introduction reduces usability, for the humans.

IHTSDO Workbench … comes with it’s own PhD course

If it’s still the same software (big hulky Java thing?), it was the product of a PhD course … and There Be Dragons. If it is still the same software, and you have a copy … you’re probably entitled to the source code. Because it’s powered by Berkeley DB, which has a copyleft license unless you have a commercial license for it. But please don’t ask for it.

What we are suffering from is a total lack of decent infrastructure.

I don’t think SNOMED CT counts as infrastructure. It’s firmly in the upper reaches of Layer 7 - detail, not ditch-digging. It’s main importance is not it’s utility (as multiple people are opining and IMHO also, it’s really hard to make it useful), but the fact that it’s mandated.

What do you want to use it for? I’m interested in what is the 80%. What I’ve seen over the years is:

  • At a basic level (the 1990’s GP system approach?) is to pick a code from something that looks like this
  • At more recent version (year 2000 GP system) is to for the clinician to type is a few letters and they are prompted for the correct term (predictive text/coding)
  • On the technical side, I’ve requirements for answering some basic queries on the terminology (is this code a child of this, what is the SNOMED equivalent of this NHS Data Dictionary code, etc)

What I’ve not seen is a demand for:

  • large reference sets
  • pre/post coordinated concepts.

Also, I believe the license prohibits deploying SNOMED as part of a product. So SNOMED files can be picked up and used, open source server side software such as HAPI JPA Server can import these files … but distributing HAPI with SNOMED included wouldn’t be allowed. (So you need a few hours to load it in).

1 Like

the license prohibits deploying SNOMED as part of a product

Oops, done that before now…

1 Like

This is a real problem and does cause severe issues everywhere. Unfortunately medicine does change over time so a certain amount of drift and change is inevitable. The alternative of not allowing changes to the hierarchy caused problems with older terminologies like Read.

The non-human-comprehensible code is a deliberate part of the design of terminologies like SNOMED CT. It comes from ‘Desiderata for Controlled Medical Vocabularies in the Twenty-First Century by James J. Cimino; Department of Medical Informatics, Columbia University, New York, USA’.
Ideally the user of a system should never be exposed to the code, the code is for interoperability to uniquely identify things. Storage isn’t the reason for a non-human-comprehensible code.

A lot of the problem here is the people who are tasked with developing the existing terminologies like SNOMED CT are not techies or programmers so don’t understand what needs to be developed.

1 Like

Ta da… in JSON or XML ValueSets

It won’t work on large CodeSystem such as SNOMED or LOINC (or expand SNOMED valuesets because we need a terminology server)

Well, JSON is one of many lingua francas, and it only works some of the time. Turtle syntax is another, that is more relevant to this situation. Serialised object dump format is always a secondary consideration; the formal models of anything are what really matters. Abstract syntaxes are more important (Java, Ruby, OWL, ADL, etc).

Some things are down to history as well. For example, with archetypes, we invented a format now called ODIN 15 years ago, when there was no JSON, and that does a lot more than JSON. If there had been JSON, we would have had to seriously upgrade it to some sort of JSON2. But today, we can pump out an archetype in JSON, YAML, XML, ODIN and ADL. These formats come and go.

Speaking as someone with a medical degree and more than 5 years of experience coding SNOMED CT tools, I don’t really understand SNOMED CT.

The best I could usually manage was to fit enough of the semantics in my head to work on the bit of tooling I was developing. Which is fine if you’re a programmer, because you get to leave chunks of your mind lying around in text files for the computer to use later.

For clinicians, who also have to remember how to medicine? I’m not really sure it’s practical…

About Lack of ready readibility/comprehensibility of the ‘code’ the answer is the same reason we actually need terminology systems: language is ambiguous, codes not. Programs need codes, humans need terms and phrases. Codes should be opaque: not processable internally and not hold intrinsic semantics. Codes are mapped to semantics in the terminology systems, and also could store synonyms acronyms, and related terms, so humans can find the correct concept to record, then that is stored as a code. Analytics work over those recorded codes, not over terms and phrases. Also concepts can be used not individually, but ontologically. In SNOMED CT for instance you have expressions to express more generic concepts or to mix concepts together. For instance you have a code for each type of diabetes, but you can say “any type of diabetes” in an expression and use that to query a database and get patients back with that health problem. The same happens with ontologies of drugs. The power is really having all that ontology behind, not on using individual codes directly.

1 Like

I’m not going to quote my sources but their is a view that the advanced features of snomed is actually preventing its adoption. If we don’t push them, then adoption and understanding of snomed would increase.

1 Like

Related post is here, this is concentrating on the technical side HOWTO - HL7v3/IHE XDS OID's, URI's and HL7v2 Tables and new Terminology Support in CCRI

1 Like

Interesting to read this thread, and I just wanted to add a few things (putting my cards on the table: I work at SNOMED International and so you can guess which side of the argument I fall on!)

Whilst the terminology is licensed (that’s a debate for others to have), the days of the infamous Workbench and other proprietary software mentioned in this thread are long long gone (including the BDB!). We develop a lot of software, all of which is available as Apache v2 open source, and all of which are built in mind to make it easier to use SNOMED CT.

Of probably most interest to those on this thread is our open source SNOMED CT terminology server, GitHub - IHTSDO/snowstorm: Scalable SNOMED CT Terminology Server using Elasticsearch, making it very easy to load SNOMED CT (in 15 minutes), query and access the terminology over both FHIR and more direct REST APIs. There’s a lot of other software available there including that used for the current NHS SNOMED CT terminology browser.

The software sitting in our GitHub repos hopefully helps to make it a little more approachable, and we are always looking to help where we can, so please do reach out to us.


It sounds exactly like something I’ve been looking for.

Would this work with UK RF2 SNOMED? (Sound like it).

1 Like

Yes, it should work with any SNOMED CT extension/edition. We’ve not tested with the UK edition, but have done with others, including those with other languages. Now my curiosity is piqued, I’ll try it out and see if it does

1 Like