As far as I am given to understand right now, SNOMED-CT is the ‘only terminological game in town’. Yet it’s proprietary and only available under license, which I think is against the most fundamental principles of medicine. (see my blog post here for a more detailed and ranty explanation of what I mean). We need something open, permissively-licensed, and - crucially - SIMPLE so that clinicians can use and uderstand it.
An open clinical terminology that can ‘guide and teach’ as you use it (to paraphrase the inestimable Larry Weed ).
I’ve started this thread to explore in more detail how you could use modern tooling and development paradigms to develop a completely free, open clinical terminology.
Initially I kind of want to start a discussion here of what we understand by a clinical terminology. But I’d also like to understand how you might design a new terminology to avoid some of the problems that existing terminologies end up with:
-
General clinician-unfriendliness of terminology - as a terminology matures, it seems to become steadily more ‘terminologically perfect’, yet steadily more unusable for its primary purpose by everyday clinicians. This is the main issue we need to address. Most general clinicians don’t understand the differences between similar-sounding concepts from different parts of SNOMED’s hierarchy, for example. Older, simpler clinical coding systems like Read were smaller and less expressive, but much more understandable by clinicians. Education alone cannot fix this, the actual terminology needs to be fit for its primary purpose.
-
Lack of ready readibility/comprehensibility of the ‘code’ - now that data storage technology has rendered obsolete the need to reduce the number of bytes being stored by replacing the text with a 4- or 5-byte code, why do we need to use non-human-comprehensible labels such as SCT
22298006
to mean something that could be written asmyocardial infarction
-
Changes break queries - retiring or redefining codes and the relationships between codes over time results in historical queries which are inconsistent, and can only be done using a ‘transitive closure table’ - can that be designed out?
-
Is it a namespace? - most clinical terminologies have ‘Concepts’ and these function as a kind of namespace - for example a Read Code
|Xa18W|Tympanic membrane structure|
creates a unique point of reference that ‘means’ that atomic clinical concept, and can be used to represent that clinical concept in the abstract, just as all language is a namespace in this regard. Is it enough just to define the namespace? This is what clinicians want and need after all. -
Is it a hierarchy? - Concepts may have hierarchical relationships in multiple directions (parent/child, and association/feature_of, and more) but does this layer need necessarily to be part of the terminology? Does this layer just complicate the simplicity of a flat namespace?
-
Is it a ‘language’? - elements of terminologies such as SNOMED-CT such as post-coordination and even the way pre-coordinated codes are created allow for ‘composition’ or complex concepts from more basic ones. But it’s clumsy and implementation is very bespoke to the terminology. Could we learn from regular programming languages how to do this compositional work better? Alternatively, should it just be ditched as utter madness?
-
Distribution - from what I’ve seen in the world of terminologies, the distribution mechanisms fall way short of what the rest of the tech industry uses to distribute and manage dependencies in other knowledge artefacts such as software libraries. Instead of downloading enormous TSV text files and munging them manually, I want to be able to do something that would make sense in a programming context and enable me to use existing tools to manage versions, dependencies, and updates.
-
Use existing stuff, don’t invent weird new jargon - the tech world has many excellent tools to serve us in our mission - eg JSON is the lingua franca of information exchange across the web, and so a terminology that is natively expressed and distributed in JSON would be easire to get to grips with for developers than something you need to manually
And myriad other questions:
- How are other similar big reference ontologies elsewhere in the tech world handled?
- Is there any way to prevent the inevitable mention of RDF triples and OWL in this thread?
- Can you manage a project like this on GitHub or similar, using open discussion, Issues, and Pull Requests for change management.
Let’s have your thoughts and comments right here!
Marcus