# An Open Clinical Terminology? **Category:** [oct](https://openhealthhub.org/c/oct/58) **Created:** 2019-03-08 14:06 UTC **Views:** 7068 **Replies:** 58 **URL:** https://openhealthhub.org/t/an-open-clinical-terminology/2012 --- ## Post #1 by @pacharanero As far as I am given to understand right now, SNOMED-CT is the 'only terminological game in town'. Yet it's proprietary and only available under license, which I think is against the most fundamental principles of medicine. (see my blog post [here](https://medium.com/@marcus_baw/open-source-is-the-only-way-for-medicine-9e698de0447e) for a more detailed and ranty explanation of what I mean). We need something open, permissively-licensed, and - crucially - SIMPLE so that clinicians can use and understand it. An open clinical terminology that can 'guide and teach' as you use it (to paraphrase the inestimable [Larry Weed](https://www.youtube.com/watch?v=qMsPXSMTpFI) ). I've started this thread to explore in more detail how you could use modern tooling and development paradigms to develop a completely free, open clinical terminology. Initially I kind of want to start a discussion here of what we understand by a clinical terminology. But I'd also like to understand how you might design a new terminology to avoid some of the problems that existing terminologies end up with: * **General clinician-unfriendliness of terminology** - as a terminology matures, it seems to become steadily more 'terminologically perfect', yet steadily more **unusable for its primary purpose** by everyday clinicians. This is the main issue we need to address. Most general clinicians don't understand the differences between similar-sounding concepts from different parts of SNOMED's hierarchy, for example. Older, simpler clinical coding systems like Read were smaller and less expressive, but much more understandable by clinicians. Education alone cannot fix this, the actual terminology needs to be fit for its primary purpose. * **Lack of ready readibility/comprehensibility of the 'code'** - now that data storage technology has rendered obsolete the need to reduce the number of bytes being stored by replacing the text with a 4- or 5-byte code, why do we need to use non-human-comprehensible labels such as SCT `22298006` to mean something that could be written as `myocardial infarction` * **Changes break queries** - retiring or redefining codes and the relationships between codes over time results in historical queries which are inconsistent, and can only be done using a 'transitive closure table' - can that be designed out? * **Is it a namespace?** - most clinical terminologies have 'Concepts' and these function as a kind of namespace - for example a Read Code `|Xa18W|Tympanic membrane structure|` creates a unique point of reference that 'means' that atomic clinical concept, and can be used to *represent that clinical concept in the abstract*, just as *all* language is a namespace in this regard. Is it enough **just** to define the namespace? This is what clinicians want and need after all. * **Is it a hierarchy?** - Concepts may have hierarchical relationships in multiple directions (parent/child, and association/feature_of, and more) but does this layer need necessarily to be **part** of the terminology? Does this layer just complicate the simplicity of a flat namespace? * **Is it a 'language'?** - elements of terminologies such as SNOMED-CT such as post-coordination and even the way pre-coordinated codes are created allow for 'composition' or complex concepts from more basic ones. But it's clumsy and implementation is very bespoke to the terminology. Could we learn from regular programming languages how to do this compositional work better? Alternatively, should it just be ditched as utter madness? * **Distribution** - from what I've seen in the world of terminologies, the distribution mechanisms fall way short of what the rest of the tech industry uses to distribute and manage dependencies in other knowledge artefacts such as software libraries. Instead of downloading enormous TSV text files and munging them manually, I want to be able to do something that would make sense in a programming context and enable me to use existing tools to manage versions, dependencies, and updates. * **Use existing stuff, don't invent weird new jargon** - the tech world has many excellent tools to serve us in our mission - eg JSON is the lingua franca of information exchange across the web, and so a terminology that is **natively expressed** and **distributed in** JSON would be easire to get to grips with for developers than something you need to manually **And myriad other questions:** * How are other similar big reference ontologies **elsewhere in the tech world** handled? * Is there any way to prevent the inevitable mention of RDF triples and OWL in this thread? ;-) * Can you manage a project like this on GitHub or similar, using open discussion, Issues, and Pull Requests for change management. Let's have your thoughts and comments right here! Marcus --- ## Post #2 by @jkmcs Because we build solutions to non-clinical health and Care challenges this does not affect us per se but happy to learn more about SNOWMED-CT . Agree it should be freely avaialble. 'available under license' is a euphemism fro cha'ching. --- ## Post #3 by @wolandscat Marcus, you asked a lot of questions, some of which I might attempt a direct answer to. But initially, I'll repost something from the openEHR lists from Mar 2018 on how I would do the big picture differently. [original ref(https://www.mail-archive.com/openehr-clinical@lists.openehr.org/msg04389.html)] The killer move would be to do something I advocated for years unsuccessfully: *separate SNOMED technology from content* and allow them to be independently licensable and used. Here, technology means representation (RF2 for example), open source programming libraries for working with ref-sets, specs and implems for e..g the constraint language, URIs and so on. It should be possible for a country (the one I am most familiar with w.r.t. to terminology today is Brazil) to create an empty 'SNOMED container' of its own, and put its existing terminologies in there - typically procedure lists, drug codes, lab codes, devices & prosthesis codes, packages (chargeable coarse-grained packages like childbirth that you get on a health plan) and so on. There are usually <20k or even 10k such codes for most countries (UK and US would an exception), not counting lab analyte codes (but even there, 2000 or so codes would take care of most results). But the common situation is that nearly every country has its own version of these things, and they are far smaller than SNOMED. Now, SNOMED's version of things is usually better for *some* of that content, but in some cases, *it is missing concepts* . The ability to easily create an empty SNOMED repo, fill it with national vocabularies, have it automatically generate non-clashing (i.e. with other countries, or the core) concept codes and mappings, and then serve it from a standard CTS2 (or other decent standard) terminology service would have revolutionised things in my view. This pathway has not been obviously available however, and has been a real blockage. The error was not understanding that the starting point for most countries isn't the international core, it's their own vocabularies. The second killer feature would have been to *make creating and managing ref-sets for data/form fields much easier* , based on a subsetting language that can be applied to the core, and tools that implement that. Ways are needed to make the local / legacy vocabularies that have been imported, to look like a regular ref-set. The third killer feature would have been to *make translation tools work* on the basis of legacy vocabulary and new ref-sets, not on the basis of the huge (but mostly unused) international core. I think IHTSDO's / SNOMED International's emphasis has historically been on curating the core content, and making/buying tools to do that (the IHTSDO workbench, a tool that comes with its own PhD course), rather than promulgating SNOMED technology and tooling to enable the mess of real world content in each country to be rehoused in a standard way, and incrementally joined up by mapping or other means to the core. I think the latter would have been more helpful. There is additionally an elephant in the room: *IHTSDO (now SNOMED International) has been tied to a single terminology - SNOMED CT* , but it would have been better to have had a terminology standards org that was independent of any particular terminology, and worked to create a truly terminology-independent technology ecosystem, along with technical means of connecting terminologies to each other, without particularly favouring any one of them. It's just a fact that the world has LOINC, ICDx, ICPC, ICF and hundreds of other terminologies that are not going anywhere. What would be useful would be to: * classify them according to meta-model type - e.g. multi-hierarchy (Snomed); single hierarchy (ICDx, ICPC, ... ); multi-axial (LOINC); units (UCUM, ...), etc * build / integrate technology for each major category - I would guess <10 * help the owning orgs slowly migrate their terminologies to the appropriate representation and tools * embark on an exercise to graft in appropriate upper level ontology/ies, i.e. BFO2, RO, and related ontologies (this is where the <10 comes from by the way) * specify standards for URIs, querying, ref-sets that *work across all terminologies* , not just SNOMED CT A further program would look at integrating units (but not by the current method of importing to SNOMED, which is a complete error because of the different meta-models), drugs and substances (same story), lab result normal and other range data, and so on. None of this can be done without properly studying and developing the underlying ontologies, which are generally small, but subtle. I'll stop there for now. I suspect I have kicked the hornet's nest, but since Grahame kicked it first, and I can run faster than him, I feel oddly safe. Probably an illusion. --- ## Post #4 by @skoba Have you seen "The Great Passage"? The novel was a story to make a Japanese dictionary, and derived to a drama film and an anime series. The story describes a man devoted his life to make an update to a dictionary for about 300,000 words, as many as terms in SNOMED CT. To make a terminology is similar to make a dictionary. It takes a long time and money. I agree that SNOMED has problems, but could not find alternatives for funding. Even Wikipedia, they have to collect huge donation to run. If it is possible the shortest way would be to bue SNOMED and release for free. --- ## Post #5 by @ewan Is this really a priority - Its' a massive task and SNOMED-CT does the job SNOMED has it problems and carries a lot of baggage from it's history, but if you ignore some of its complexity (the point Marcus makes about it trying to be a language) it does the job. Open source purist may not like it licensing model, but it's free to use in those countries that are members (like the UK) and fees reflect the wealth of the members and I would suggest are affordable even for poor countries (although I would like to see SNOMED International drop the fee where this is currently under $5,000 pa) . Given that there are no examples of a economically sustainable large scale open source projects in health, since Donald Trump pulled the legs out from under VistA, SNOMED's reluctance to go open source is understandable. SNOMED International are after all a not-for-profit NGO whom I'm sure would like to go open source if they believed it was economically sustainable. As an open source enthusiast (I'm writing this on my Linux desktop) I say this with some sadness. My view is that we should focus our efforts on fixing issues with SNOMED not starting again. Ewan --- ## Post #6 by @pacharanero [quote="ewan, post:5, topic:2012"] Is this really a priority [/quote] What we are suffering from is a total lack of decent infrastructure. That's why everything is SO DAMN HARD and takes so long in health tech (unless you are selling snake oil AI in which case it's all about the marketing not the product). So yes, stuff like this IS a priority. Anyway, what does it matter to anyone else what I prioritise? It's not like I was doing anything else... And... it's not just a call of 'we need an open terminology because Open' - it's also 'we need terminologies that don't require 25 years of experience to use' - we need the tooling, the language-specific helper packages, the utilities, the distribution mechanisms... --- ## Post #7 by @adrian.wilkins I think this is an illustration of one of my core bones of contention with the focus of EHR software - all icing, no cake. Structured data that systems can understand and make use of is a lovely goal, but only if you have the basics. * Universally accessible records * No special software required (or only universally available software) * Good access control / privacy * Excellent audit of access and edits You don't need SNOMED CT to make medical records useful to humans - they only need to natural language understand them. SNOMED CT is compensation for the lack of context awareness of most computer systems in processing natural language. It's a highly unnatural language. In most cases, I would imagine it's introduction reduces usability, for the humans. > IHTSDO Workbench ... comes with it's own PhD course If it's still the same software (big hulky Java thing?), it was the product of a PhD course ... and There Be Dragons. If it is still the same software, and you have a copy ... you're probably entitled to the source code. Because it's powered by Berkeley DB, which has a copyleft license unless you have a commercial license for it. But please don't ask for it. > What we are suffering from is a total lack of decent infrastructure. I don't think SNOMED CT counts as infrastructure. It's firmly in the upper reaches of Layer 7 - detail, not ditch-digging. It's main importance is not it's utility (as multiple people are opining and IMHO also, it's really hard to make it useful), but the fact that it's mandated. --- ## Post #8 by @mayfield.g.kev What do you want to use it for? I'm interested in what is the 80%. What I've seen over the years is: * At a basic level (the 1990's GP system approach?) is to pick a code from something that looks like this https://snomedbrowser.com/ * At more recent version (year 2000 GP system) is to for the clinician to type is a few letters and they are prompted for the correct term (predictive text/coding) * On the technical side, I've requirements for answering some basic queries on the terminology (is this code a child of this, what is the SNOMED equivalent of this NHS Data Dictionary code, etc) What I've not seen is a demand for: * large reference sets * pre/post coordinated concepts. --- ## Post #9 by @mayfield.g.kev Also, I believe the license prohibits deploying SNOMED as part of a product. So SNOMED files can be picked up and used, open source server side software such as HAPI JPA Server can import these files .... but distributing HAPI with SNOMED included wouldn't be allowed. (So you need a few hours to load it in). --- ## Post #10 by @adrian.wilkins > the license prohibits deploying SNOMED as part of a product Oops, done that before now... --- ## Post #11 by @stuartabbott [quote="pacharanero, post:1, topic:2012"] **Changes break queries** - retiring or redefining codes and the relationships between codes over time results in historical queries which are inconsistent, and can only be done using a ‘transitive closure table’ - can that be designed out? [/quote] This is a real problem and does cause severe issues everywhere. Unfortunately medicine does change over time so a certain amount of drift and change is inevitable. The alternative of not allowing changes to the hierarchy caused problems with older terminologies like Read. [quote="pacharanero, post:1, topic:2012"] **Lack of ready readibility/comprehensibility of the ‘code’** - now that data storage technology has rendered obsolete the need to reduce the number of bytes being stored by replacing the text with a 4- or 5-byte code, why do we need to use non-human-comprehensible labels such as SCT `22298006` to mean something that could be written as `myocardial infarction` [/quote] The non-human-comprehensible code is a deliberate part of the design of terminologies like SNOMED CT. It comes from '[Desiderata for Controlled Medical Vocabularies in the Twenty-First Century by James J. Cimino; Department of Medical Informatics, Columbia University, New York, USA](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3415631/)'. Ideally the user of a system should never be exposed to the code, the code is for interoperability to uniquely identify things. Storage isn't the reason for a non-human-comprehensible code. A lot of the problem here is the people who are tasked with developing the existing terminologies like SNOMED CT are not techies or programmers so don't understand what needs to be developed. --- ## Post #12 by @mayfield.g.kev [quote="pacharanero, post:1, topic:2012"] **Use existing stuff, don’t invent weird new jargon** - the tech world has many excellent tools to serve us in our mission - eg JSON is the lingua franca of information exchange across the web, and so a terminology that is **natively expressed** and **distributed in** JSON would be easire to get to grips with for developers than something you need to manually [/quote] Ta da... in JSON or XML [ValueSets](https://data.developer-test.nhs.uk/ccri/term/valuesets) [CodeSystems](https://data.developer-test.nhs.uk/ccri/term/codesystem) It won't work on large CodeSystem such as SNOMED or LOINC (or expand SNOMED valuesets because we need a terminology server) --- ## Post #13 by @wolandscat Well, JSON is one of many lingua francas, and it only works some of the time. Turtle syntax is another, that is more relevant to this situation. Serialised object dump format is always a secondary consideration; the formal models of anything are what really matters. Abstract syntaxes are more important (Java, Ruby, OWL, ADL, etc). Some things are down to history as well. For example, with archetypes, we invented a format now called ODIN 15 years ago, when there was no JSON, and that does a lot more than JSON. If there had been JSON, we would have had to seriously upgrade it to some sort of JSON2. But today, we can pump out an archetype in JSON, YAML, XML, ODIN and ADL. These formats come and go. --- ## Post #14 by @adrian.wilkins [quote="stuartabbott, post:11, topic:2012"] the people who are tasked with developing the existing terminologies like SNOMED CT are not techies or programmers so don’t understand what needs to be developed [/quote] Speaking as someone with a medical degree and more than 5 years of experience coding SNOMED CT tools, I don't really understand SNOMED CT. The best I could usually manage was to fit enough of the semantics in my head to work on the bit of tooling I was developing. Which is fine if you're a programmer, because you get to leave chunks of your mind lying around in text files for the computer to use later. For clinicians, who also have to remember how to medicine? I'm not really sure it's practical... --- ## Post #15 by @ppazos About **Lack of ready readibility/comprehensibility of the ‘code’** the answer is the same reason we actually need terminology systems: language is ambiguous, codes not. Programs need codes, humans need terms and phrases. Codes should be opaque: not processable internally and not hold intrinsic semantics. Codes are mapped to semantics in the terminology systems, and also could store synonyms acronyms, and related terms, so humans can find the correct concept to record, then that is stored as a code. Analytics work over those recorded codes, not over terms and phrases. Also concepts can be used not individually, but ontologically. In SNOMED CT for instance you have expressions to express more generic concepts or to mix concepts together. For instance you have a code for each type of diabetes, but you can say "any type of diabetes" in an expression and use that to query a database and get patients back with that health problem. The same happens with ontologies of drugs. The power is really having all that ontology behind, not on using individual codes directly. --- ## Post #16 by @mayfield.g.kev I’m not going to quote my sources but their is a view that the advanced features of snomed is actually preventing its adoption. If we don’t push them, then adoption and understanding of snomed would increase. --- ## Post #17 by @mayfield.g.kev Related post is here, this is concentrating on the technical side https://www.openhealthhub.org/t/howto-hl7v3-ihe-xds-oids-uris-and-hl7v2-tables-and-new-terminology-support-in-ccri/2036 --- ## Post #18 by @rory.davidson Interesting to read this thread, and I just wanted to add a few things (putting my cards on the table: I work at SNOMED International and so you can guess which side of the argument I fall on!) Whilst the terminology is licensed (that's a debate for others to have), the days of the infamous Workbench and other proprietary software mentioned in this thread are long long gone (including the BDB!). We develop a lot of software, all of which is available as Apache v2 open source, and all of which are built in mind to make it easier to use SNOMED CT. Of probably most interest to those on this thread is our open source SNOMED CT terminology server, https://github.com/IHTSDO/snowstorm, making it very easy to load SNOMED CT (in 15 minutes), query and access the terminology over both FHIR and more direct REST APIs. There's a lot of other software available there including that used for the current NHS SNOMED CT terminology browser. The software sitting in our GitHub repos hopefully helps to make it a little more approachable, and we are always looking to help where we can, so please do reach out to us. --- ## Post #19 by @mayfield.g.kev It sounds exactly like something I've been looking for. Would this work with UK RF2 SNOMED? (Sound like it). --- ## Post #20 by @rory.davidson Yes, it should work with any SNOMED CT extension/edition. We've not tested with the UK edition, but have done with others, including those with other languages. Now my curiosity is piqued, I'll try it out and see if it does --- ## Post #21 by @pacharanero [quote="rory.davidson, post:18, topic:2012"] Of probably most interest to those on this thread is our open source SNOMED CT terminology server, [GitHub - IHTSDO/snowstorm: Scalable SNOMED CT Terminology Server using Elasticsearch ](https://github.com/IHTSDO/snowstorm), making it very easy to load SNOMED CT (in 15 minutes), query and access the terminology over both FHIR and more direct REST APIs. There’s a lot of other software available there including that used for the current NHS SNOMED CT terminology browser. [/quote] This is fantastic stuff and is exactly what I had hoped SNOMED would be doing - putting out good and usable, permissively-licensed open source tooling to make it easier to work with. The Dockerization of the stack also really helps for those who don't want to sully their development machine with Java badness ;-) Top marks for the `docker-compose.yml` which makes getting the whole stack up and running super easy. I'm still surprised that there hasn't been a better way developed for uploading the SNOMED-CT files - ie could not this step be managed with some kind of SNOMED 'package manager' or even using a Git server and URLs? I appreciate it's only a single manual step, but manual steps are the enemy of continuous integration and other forms of automation. M --- ## Post #22 by @rory.davidson Yes, absolutely. For now, we're trying to walk before we can run and have been focussed on making the sure the main functionality is good to go. This terminology server is also the one that we will use for authoring/managing the terminology so it has a rich feature set, most of which not needed for most users. However, we've had requests for this from developers in other countries, so our plan is to develop the functionality to allow devs/users to 'request' an edition/extension which will then be retrieved and imported without any other manual intervention, with the correct version being imported (a snapshot version or a delta if there is an existing version in the terminology server). Hopefully later this year, after we've completed full FHIR compliance and other things on our shopping list. --- ## Post #23 by @adrian.wilkins [quote="pacharanero, post:21, topic:2012"] some kind of SNOMED ‘package manager’ or even using a Git server and URLs [/quote] We got the ICD-10 toolchain going on Git (and the foundations of the UK ICD-10 tools are as a result arguably better than the WHO ones), but doing this for SNOMED CT is a harder problem, not because of the design of Git (the object model of Git is a great design for lots of collaborative works), but because of the underlying limitations of "file system" as a database. We got away with it on ICD-10 because it involved on the order of 10^5 nodes. SNOMED CT is an order of magnitude bigger at 400k nodes for just the core graph, and while *nix file systems are probably OK with that, NTFS and Windows really start to choke on that many files (not to mention, the multiplication of the typical overhead of all the virus checking and scanning tools most IT departments stick on Windows). It's a great ambition though. Part of the problem with the (hooray!) defunct IHTSDO Workbench was that it tried to solve the version control problem with 30 year old version control design (internally it was designed like RCS - only without the convenience of abstracting the version control layer away by) - those models were originally designed for version control of single objects and things like CVS are just hacks on top to orchestrate the companion versioning of multiple objects, rather than models like Git which treat revisions as a single composite object. One of the things I seriously looked at when working on those tools were backend libraries for Git that used something other than a raw file system for storage, and the state of the art in that space may be a lot more mature now. I had the notion that the systems for authoring and distribution should be not that different and Git seemed like a good choice to stand that on. The other core design decision that was a problem in those tools was trying to make a generic terminology model that was itself a generalization that would support SNOMED CT. That lead to a great deal of complexity that sat squarely on top of what was already a complex metamodel designed to be very general. I note that `snowstorm` is billed as a "SNOMED CT terminology server" and not "a terminology server" and I presume (hopefully) that means it's not trying to be a general terminology server. --- ## Post #24 by @PeterfromRuralOz The main problem about open sourcing is the ever changing nature making it impossible to be consistent and comparable over time. SNOMED is consistent and abhors change without reason and decisions are accepted and used by all their users I can't see that happening in the open source world... --- ## Post #25 by @pacharanero [quote="PeterfromRuralOz, post:24, topic:2012"] The main problem about open sourcing is the ever changing nature making it impossible to be consistent and comparable over time. [/quote] I don't think you've fully understood open source, judging by this comment. Are you involved directly in any open source projects? There is still centralised control of any open source project. It isn't a complete free-for-all. [quote="PeterfromRuralOz, post:24, topic:2012"] SNOMED is consistent and abhors change without reason and decisions are accepted and used by all their users [/quote] I would disagree with this quite strongly. Ironically, I am *this week* involved in some discussions at national UK level about serious and potentially breaking changes that SNOMED International are apparently imposing, which will have huge deleterious impact on our install-base of UK GP Systems. (more on this soon) Yes the UK has representation at SNOMED International, however this tends to be an expert Terminologist rep, not a clinical rep. The proposed changes, while working towards an ontologically perfect terminology, may significantly undermine the actual **primary purpose** of the terminology (ie recording clinical care). --- ## Post #26 by @PeterfromRuralOz Not sure that open source would be a good solution rather than changing the governance structure of SNOMED International. --- ## Post #27 by @pacharanero My experience has generally been that "changing the governance structure" of large international organisations is non-trivial. "Hi SNOMED International, Marcus here. Yes **that** one. I'm new here and I'm not a career terminologist, but I'm just wondering if you'd mind changing your **entire governance structure** for me? Aiming to be more community-oriented, consensual, and clinically-relevant. You know, like an open source project? Hello? Hello? They hung up - can you believe that?" --- ## Post #28 by @Hans_Hendrickx Marcus, I have a long answer and a short one. The short one is that I believe that codifying medicine is killing the essence of medicine. William Osler coined the essence as: "Medicine is a science of uncertainty and an art of probability." During the 50 years I have worked in hospitals all over the world, I always have enjoyed a good letter to amice. The last 20 years at best I had to accept sort of telegram style nonsensical referral letters, mostly the patient was transferred with one-word referral, like 'headache', 'gallbladder', 'acute abdomen'. Often the single word had no close relationship to the patient or the complaint, without any mention of context. Dr Thornley just wrote a blog about the “[Demise of Medicine](https://www.kevinmd.com/blog/2018/08/the-demise-of-medicine-a-neurologist-advocates-for-patients-and-is-silenced.html)”. It is a disaster waiting to happen? Surprisingly so, I do like ontologies, because if designed well they represent at each step a question/decision/answer. The problem is the uncertainty we have to deal with. So, the idea of Diagnosis Related Imbursement of doctors is absurd, and yet everywhere in the world entertained by insurance companies and political parties. ICD10-11 is very popular for this purpose, even though it is designed for classifying causes of death, not daily practice. No surprise, in over 40% of death certificates, pathologists cannot relate the text in the certificates with reality, diagnostics in real live are even worse. Relating ICPC’s and ICD10-11 is impossible, and an example of the serious gap between the GP-bubbles and those of specialists. My conclusion for long is that we have developed natural language over 1000000 years, and 60 years of codifying has been a very nice experiment, which now kills the essence of medicine, the dealing with uncertainty. The right diagnosis can be pinpointed in 80% of cases by good Medical History Taking. We need natural language for that and smart questions. This is how doctors (should) think, based on symptoms and signs the patient can report and show. That is called communication. The gathered data should be collected into a casebook with a pattern every doctor should have learned in medical school. In my own experience nowadays, visiting a doctor means an encounter a person who is glued to a square screen, does not introduce 'it'self, and often turns out to be a nurse or aid. NIH informs patients that a large part of medical care is provided by nurses. In the UK this has been 'codyfied'. As an anesthesiologist I deal with easy work, which if it goes wrong has serious consequences. This is the typical business model for insurance revenues, dealing with incidents with high impact. That has triggered me into my quest for intelligent, smart and efficient medicine. IT has a lot to offer, because smart, intelligent and dynamic questionnaires are able to extract very useful data from patients, and those data can be translated into structured and patterned information doctors like. This is the essence of my current work, MediPrepare Open Source Project. Every doctor now could create Expert Medical Systems with our tools by creating questionnaires for all 130+ specialties and incorporate their expert knowledge. The data can be translated into valuable information leading up to a differential diagnostic path which can be started by the patient. So, I believe that using the route of natural language in Medicine for many years to come will be superior to communicating in digitized codes. Eventually smart computers will be able to dissect our natural language to the point that we can let them communicate by digits. For now Codifying Medicine is killing patients and doctors. In the USA third cause of death is medical mishaps… Maybe we need meaningful IT, created by close cooperation of doctor and programmer, like was used in the Caduceus Project at Pittsburgh University Hospitals around 1980. My 5 cents, Hans --- ## Post #29 by @adrian.wilkins Hans, thank you for so eloquently stating an opinion I arrived at after close to a decade of maintaining healthcare code system maintenance tools, which I carry forward into my opinions about EHR software - it's all about communication, and the endpoints that really matter are the humans. If you're going to communicate with the limited endpoints of an API to elicit a service, sure, codify your inputs. But to me, the vast bulk of the drive to codify and structure health data is from the (noble, but also potentially profitable) desire to mine it for data, rather than the desire to serve an individual patient better. [quote="Hans_Hendrickx, post:28, topic:2012"] Diagnosis Related Imbursement of doctors [/quote] As you say, getting the right code from ICD-10, which in the UK is around 14,000 codes, is hard enough. Having [your insurance payment](https://www.verywellhealth.com/diagnosis-code-decides-if-medicare-will-pay-3989741) depend on having chosen the right code is harsh. I hear ICD-11 is far more complex. But probably a game of tic-tac-toe next to using SNOMED CT for the same purpose. --- ## Post #30 by @mayfield.g.kev Do any secondary care systems code as you type?? So when type 'patient has *asthma*' it automatically prompts you to select a code for *asthma*? I've not seen it. I'd always had the impression, doctors would code items as it made it easier for them to drill into the medical record at a later time (in primary care). As a side effect it enabled reporting. [However in other sectors, codes seem to be done primarily for reporting, not care] --- ## Post #31 by @pacharanero [quote="mayfield.g.kev, post:30, topic:2012"] Do any secondary care systems code as you type?? So when type ‘patient has *asthma* ’ it automatically prompts you to select a code for *asthma* ? [/quote] I've got so much to say on this I've started a new thread so as not to hijack this one. https://www.openhealthhub.org/t/clinical-autocomplete-autosuggest-how-not-to-do-it/2049 --- ## Post #32 by @PeterfromRuralOz [quote="pacharanero, post:25, topic:2012"] The proposed changes, while working towards an ontologically perfect terminology, may significantly undermine the actual **primary purpose** of the terminology (ie recording clinical care). [/quote] I am glad that in Australia SNO-MED has much less of a focus and impact on the system. Instead we have ICD and DRG (disease related grouping) system and SNO-MED is just an internal thing within hospital systems. However, the positive side of having better terminology and codification is the incentivisation of the more specific and complex diagnoses and rewarding less the non-specific and minimal documentation approach to diagnosis. --- ## Post #33 by @Hans_Hendrickx Adrian, In my opinion, mining for data can better be done by natural language clues. That way one could easily incorporate adjectives such as severe, thunderstrike-like etc. That is the beauty of modern IT, one can let it learn. My wife makes use of that in analysis of millions of emails about fraud at the Fraud Help Desk. Yes, I believe we need a worldwide medical standard vocabulary. However, it is dangerous to use ICPC with 150 codes, ICD11 with 55.000 codes, SnoMed with 170.000 codes, for these codes represent not context of diseases in real patients, not provide phenotypical clues, constantly are new diseases added. I have tried to use ontologies, because each step represents a question. Have given up, because ontologies can't deal well with synonyms yet. Yes, I try to make use of standardized expressions in questionnaire design. But the main focus is to incorporate Expert Knowledge, not codes. I have never seen a patient’s life saved by a code.. --- ## Post #34 by @Hans_Hendrickx Kev, http://scriptiesonline.uba.uva.nl/document/450944 describes the DBC system in the Netherlands. Assumption is that a diagnosis has been made at entrance of healthcare. A disaster for all parties in my opinion. This has caused a lot of problems. See f.e. https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=2ahUKEwjMoOrUzcfhAhVBblAKHWwTAsgQFjAAegQIBhAC&url=http%3A%2F%2Fwww.cs.uu.nl%2Feducation%2Fscripties%2Fpdf.php%3FSID%3DINF%2FSCR-2012-005&usg=AOvVaw3n2pzYNdX_BAcASmXJjVCW --- ## Post #35 by @pacharanero Having thought about this a bit more, I think we need to separate the two parts: 1) **Namespace** - **a single immutable label that refers to a single immutable concept for ever**. This should be like a character set eg ASCII or UTF-8. **The code means the character**. That's all. No logic and no ontology *at this stage*. This means that (apart from probably needing to expand the namespace every so often) that code will always mean that character. 2) **Everything else** - the hierarchy, ontology, combinatorial logic etc is added as a further layer or set of layers on top of 1). Doesn't have to be managed by the same people who manage the namespace. There can be a multitude of different hierarchies, ontologies, classifications etc (as indeed there already are). But this would work in the same way as French and English having their own, perfectly internally consistent rules, despite using the same character set. As long as you have a common character set, though, you can build tools that work for both. --- ## Post #36 by @pacharanero For various reasons I have been discussing the idea of an open clinical terminology again, and actually found this thread by Kagi-searching 'open clinical terminology' - I had completely forgotten about this thread, which has some seriously interesting discussions and heavyweight contributors. I also found this, an academic article which considers the possibility of bootstrapping an open clinical terminology using Wikipedia content. As far as I can see they concluded yes you probably could do it, but then... didn't. https://pubmed.ncbi.nlm.nih.gov/31397314/ --- ## Post #37 by @mayfield.g.kev Might be onto something. I’m finding general understanding of clinical coding is quite low and NHS documentation can get quite heavy quickly. Something in the middle might be quite useful. Where I’m finding most general understanding is around QOF points (SNOMED) in primary care and financial related coding (ICD) in acute. --- ## Post #38 by @tommyyama2020 [quote="pacharanero, post:36, topic:2012"] For various reasons I have been discussing the idea of an open clinical terminology again, and actually found this thread by Kagi-searching ‘open clinical terminology’ - I had completely forgotten about this thread, which has some seriously interesting discussions and heavyweight contributors. I also found this, an academic article which considers the possibility of bootstrapping an open clinical terminology using Wikipedia content. As far as I can see they concluded yes you probably could do it, but then… didn’t. [/quote] I am interested in volunteering to help develop an open clinical terminology. Is there anyone here from a clinical background who would be open to partnering or collaborating with me? The clinical space is really broad — there’s so much to cover, from diseases and drugs to procedures and lab data — and I’m not sure about which area to focus on first. --- ## Post #39 by @pacharanero I think it needs to be done. I'm up for doing it and joining forces with anyone else that wants to do it. @tommyyama2020 I'm clinical and we could work together on getting something started. I know from recent conversations that key people in the openEHR and FHIR world are supportive of an open terminology. ### Repository I have started a GitHub repository here to put initial work in https://github.com/pacharanero/open-terminology I'm happy to move it (when some progress has been made) into joint ownership in a GitHub organisation which the main contributors control. ### Vectors and cosine distance I have been talking about an open terminology with various people over past years, months and weeks. One very interesting comment made was that if one had an open terminology represented as embeddings (in which each concept is described as a vector in thousands of dimensions) then *related* concepts in terms of clinical meaning will be related in the vector representation as well (they would have a smaller cosine distance from each other). This might help solve the problem of synonyms and could even help avoid/reduce the need for hierarchies, refsets, and other groupings. ### Namespace separate from Hierarchy I think this fundamental design choice is necessary in order to separate the easy bit (namespacing) from the complex and gnarly bit (hierarchy, inheritance, refsets etc). A namespace could pretty easily be crowdsourced. The hierarchy, graph, tree, classifications etc are something that others will create over the top of the namespace, and different hierarchies will be necessary for different use-cases. Doing it this way makes it very flexible. It also means that the first stage of the terminology need only be a text file with `IDENTIFIER` `SEPARATOR` `DESCRIPTION` ### Identifiers How would we want to represent each concept's identifier? My thoughts: * **Meaningless**: Reluctantly I concede that a non-meaningful string of characters is necessary as an ID. * **Appearance**: The ID should look *obviously* different from other entities eg. a SNOMED-CT ID, a Read Code, and any other existing type of identifier used in healthcare or common technologies. * **Size**: We should overestimate the required namespace size by a few orders of magnitude so we don't repeat what happened to Read Codes. * **Short**: For reasons of practicality we want the ID to be short ### Separator I'm open to view on this - initial thoughts are NOT TABS because they can't easily be visually differentiated from spaces in a text editor. Maybe a single space? The pipe character is tempting but I want to deliberately avoid anything that will attract accusations of plagiarism from SNOMED. (They will likely make that accusation anyway, I think we should be prepared for that, hence open source from the first commit, and then there will be an audit trail of submissions). ### Description These can be crowdsourced or automatically scraped from open sources (as in the article [above](https://openhealthhub.org/t/an-open-clinical-terminology/2012/35)) ### Reading I have been reading around this for a while. Here are some resources worth reading: [Alan Rector 1999: Medical Terminology - why is it so hard?.pdf|attachment](upload://XOzPd0P8ChlEun5amFIBUDnQZi.pdf) (105.1 KB) [Desiderata for Controlled Medical Vocabularies in the Twenty First Century - JJ Cimino 1998.pdf|attachment](upload://4IhyxdhcbJxGA1gkpc06V5CFsHh.pdf) (715.9 KB) https://www.mail-archive.com/openehr-clinical@lists.openehr.org/msg04389.html (posted by @wolandscat above) https://pmc.ncbi.nlm.nih.gov/articles/PMC1480228/ https://www.snomed.org/news/snomed-international-releases-an-open-terminology-to-broaden-clinical-data-interoperability --- ## Post #40 by @mayfield.g.kev Fsh or sushi has a method of generating codesystems in a similar format eg https://github.com/nw-gmsa/nw-gmsa.github.com/blob/main/input/fsh/CodeSystem/NWGMSA.fsh the generated code system is here https://nw-gmsa.github.io/CodeSystem-NWGMSA.html --- ## Post #41 by @tommyyama2020 @pacharanero Sounds great — thanks for sharing the repository. I am happy to check it out and help get things started. :slightly_smiling_face: :woman_bowing: @tommyyama2020 --- ## Post #42 by @pacharanero Please do have a read and help me figure out the way everything will work. At this stage I'm still deciding things like the identifier format. At the present moment the front-running idea is to use Crockford base32, which means that reading out an ID (eg over the phone) will be fairly unambiguous (no need to differentiate between upper and lower case letters) and error-resistant (similar-looking letters and numbers are removed) https://www.crockford.com/base32.html --- ## Post #43 by @tommyyama2020 Sorry for the silly question, but are you considering using the SNOMED API or building everything — including the datasets — completely from scratch? @pacharanero --- ## Post #44 by @pacharanero **Everything from scratch**. We cannot use or even *mention* SNOMED in relation to this project. There has to be an intellectual firewall separating this project from any copyrighted works. It is a clean-room, new build, open source terminology, built in a different way to take advantage of the 2-3 decades of improvements in technical collaboration tools and practices that have happened since existing terminologies were created. --- ## Post #45 by @tommyyama2020 That’s kind of a groundbreaking idea. :slightly_smiling_face: --- ## Post #46 by @wolandscat Hi Marcus, I have come to the conclusion that the next generation of openEHR and any ecosystem to which it contributes needs its own new terminology. I concur on many of your points. Notes on a few details… [quote="pacharanero, post:1, topic:2012"] Yet it’s proprietary and only available under license, which I think is against the most fundamental principles of medicine [/quote] It is, but to be fair, they have been searching for a business model that allows them to survive, similar to other standards orgs. I think they’d be better doing what we do at openEHR - solicit financial sponsorship in various categories (rather than just country-level), and make the result free for everyone. What I’m most interested in is a terminology that works (better). SNOMED is still (10y after I represented the UK on IHTSDO standing committees) full of precoordinated codes and questionable hierarchy relationships. On the other hand, it has a decent meta-model, constraint language and expression language (the thing you use to create a post-coordinated concept expression). So it’s a mixed bag. I have however come to the conclusion that we would be better off creating a new terminology that is: * smaller - it doesn’t need to be as big as Snomed * ontologically coherent, e.g. children of any node are all mutually consistent, not overlapping etc * extensible - easily * doesn’t end up with ‘national extensions’ hiding major sub-terminologies. [quote="pacharanero, post:1, topic:2012"] **General clinician-unfriendliness of terminology** - as a terminology matures, it seems to become steadily more ‘terminologically perfect’, yet steadily more **unusable for its primary purpose** by everyday clinicians. This is the main issue we need to address. Most general clinicians don’t understand the differences between similar-sounding concepts from different parts of SNOMED’s hierarchy, for example. [/quote] This is a consequence of ontological ambiguities, and could be fixed. [quote="pacharanero, post:1, topic:2012"] * **Lack of ready readibility/comprehensibility of the ‘code’** - now that data storage technology has rendered obsolete the need to reduce the number of bytes being stored by replacing the text with a 4- or 5-byte code, why do we need to use non-human-comprehensible labels such as SCT `22298006` to mean something that could be written as `myocardial infarction` [/quote] I have also come to the same conclusion a few years ago. I would (probably) still not allow codes to have spaces, and I would prefer shorter codes, because they get bound into larger syntactic elements (e.g. paths). We could consider multi-axial approach as well. E.g. can you guess these codes? `’sysbp-pt-tgt’`(systolic BP, point in time, target);`’hr-4h_avg-hist’`(heart rate, 4h average, historical measurement); `’myo_infarct-hist’`(historical MI event); `’myo-infarct-risk’`etc. There would be more to learn here, but a simple bit of UI could generate such codes easily for the user. Multi-axial (= post-coordinated) codes provide a lot more computability, and also that the terminology is a lot smaller. These are early thoughts! [quote="pacharanero, post:1, topic:2012"] * **Changes break queries** - retiring or redefining codes and the relationships between codes over time results in historical queries which are inconsistent, and can only be done using a ‘transitive closure table’ - can that be designed out? [/quote] That probably doesn’t happen too much even with SNOMED - they do have discipline these days on not changing meanings of codes. It won’t be possible to make queries completely resilient to change. E.g. remember when the classifications of Hepatitis were A, B and non-A-non-B. Now we have 8 (or whatever it is) varieties. Older queries containing ‘non-A-non-B’ might need to be re-engineered. [quote="pacharanero, post:1, topic:2012"] * **Is it a hierarchy?** - Concepts may have hierarchical relationships in multiple directions (parent/child, and association/feature_of, and more) but does this layer need necessarily to be **part** of the terminology? Does this layer just complicate the simplicity of a flat namespace? [/quote] We definitely want hierarchy (IS-A relationships) - that’s the basis of inferencing with terminology. It also enables browsing a subset within an app, for the purpose of code selection. Just think for example of trying to choose blood type for blood bank purposes. Hierarchy can be completely hidden from any user context where it doesn’t help. [quote="pacharanero, post:1, topic:2012"] **Is it a ‘language’?** - elements of terminologies such as SNOMED-CT such as post-coordination and even the way pre-coordinated codes are created allow for ‘composition’ or complex concepts from more basic ones. But it’s clumsy and implementation is very bespoke to the terminology. Could we learn from regular programming languages how to do this compositional work better? Alternatively, should it just be ditched as utter madness? [/quote] Whether we need the exact expression language SNOMED has is a question, but it’s quite well designed. It’s just never used in real life (that I’ve ever seen). But we will need something that enables post-coordination. [quote="pacharanero, post:1, topic:2012"] **Use existing stuff, don’t invent weird new jargon** - the tech world has many excellent tools to serve us in our mission - eg JSON is the lingua franca of information exchange across the web, and so a terminology that is **natively expressed** and **distributed in** JSON would be easire to get to grips with for developers than something you need to manually [/quote] This is just a technical detail, and easy to achieve. Currently SNOMED can be obtained in OWL-RDF form (as well as DB tables), and that is a widely accepted formalism. Generating a JSON format form of any terminology is easy. [quote="pacharanero, post:1, topic:2012"] * Is there any way to prevent the inevitable mention of RDF triples and OWL in this thread? :wink: [/quote] Apparently not, I just did it :slight_smile: [quote="pacharanero, post:1, topic:2012"] Can you manage a project like this on GitHub or similar, using open discussion, Issues, and Pull Requests for change management [/quote] Yes but we would need a browser tool as well to be able to search, look at hierarchies etc. One of the main (maybe the main) question for this group is whether you want a better reference terminology, or an *interface* terminology, with mappings to existing terminologies. I am personally interested in a better reference terminology where the codes are human comprehensible, that is maybe 100k terms, and reliably computable. --- ## Post #47 by @tommyyama2020 Would a client-side web browser interface (front-end) that clinicians use to search for term codes be considered for OCT? Any idea? --- ## Post #48 by @pacharanero [quote="tommyyama2020, post:47, topic:2012"] client-side web browser interface (front-end) that clinicians use to search for term codes be considered for OCT? [/quote] 100% this has to be part of the `oct` project. I would like it to be a self-hostable local web server with web UI, and we would also run an instance of the UI for public web use. If we could make the design of the web server as simple as possible then we *might* even be able to make it a static site, requiring no server resource, or at least something that might be able to run in a very low-resource runtime (container apps, function apps sort of thing). Thanks for creating the [Issue](https://github.com/openterminology/oct/issues/8) describing some initial spec for this feature, I would propose that we continue the elaboration of the UI feature over there. --- ## Post #49 by @tommyyama2020 Hi everyone, Although this continues an earlier discussion, I’m interested in contributing to the built-in web browser component and would greatly appreciate any input or insights from the community regarding the UI design. 🙏 At the moment, the NHS SNOMED CT browser is one of the primary reference points. Do you think a similar design approach would be suitable for universal next-generation clinical terminology tools, or are there alternative UI concepts we should consider exploring? Your feedback would be sincerely appreciated. --- ## Post #50 by @mike.bainbridge I’ve always found [Ontoserver / Shrimp](https://ontoserver.csiro.au/shrimp/?concept=138875005&version=http%253A%252F%252Fsnomed.info%252Fsct%252F83821000000107&valueset=http%253A%252F%252Fsnomed.info%252Fsct%252F32506021000036107%253Ffhir_vs&fhir=https%253A%252F%252Ftx.ontoserver.csiro.au%252Ffhir) helpful to show people what the Ontology means and how (where it’s been modelled properly) things fit together well… --- ## Post #51 by @tommyyama2020 @mike.bainbridge Thanks. I think this provides a solid starting point from a UI design perspective. :slightly_smiling_face: --- ## Post #52 by @tommyyama2020 @pacharanero @mike.bainbridge I’ve submitted a pull request for the minimal UI, titled **Universal OCTOPUS Viewer**. Please take a look when you get a chance. I’d welcome any feedback from anyone. I’ll be enhancing it further as I work toward finalizing the MVP. https://github.com/openterminology/oct/pull/15 --- ## Post #53 by @pacharanero Thanks @tommyyama2020 - will review this week. Should be able to make some progress on `oct`. --- ## Post #54 by @mike.bainbridge * Thanks @tommyyama2020 Forgive the question but I’m still coming up to speed here. We need to be careful with the intended audience - It’s important that the browser addresses all areas of the ontology such as inheritence and context but for a *real* end user, this needs to be \[almost\] completely invisible. This goes for things like concept and term_ids too where they are unlikely to have mean outside of unique identification of the concept. So what I can see is great for the developer community but needs careful ‘veneer’ for the end user. Diabetes Melitus in the screen shot can be: * A diagnosis * A family history * A differential diagnosis * A worry the patient has expressed * etc. etc. * Also the second term Diabetes - is this a relative of the first term? Does it take in ALL Diabetes concepts (like Diabetes insipidus) whch may have nothing to do physiologically with Diabetes Mellitus? * Is the term I am looking at, the preferred or a synonym.. Happy to jump on a call to discuss, --- ## Post #55 by @pacharanero I think those terms are ones tommy has put in for demo purposes. They aren't `oct` terms, which are 6-digit Crockford Base32 strings (unless someone can convince me to do something else) --- ## Post #56 by @tommyyama2020 @mike.bainbridge Yes, it is As @pacharanero pointed out, the terms aren’t clinically validated—they’re just there for demo purposes. If displaying the attributions you mentioned in the UI is more clinically appropriate, I’ll make the necessary changes at my convenience. --- ## Post #57 by @pacharanero Just found out that CTV3 is on the Open Government License, which is very permissive. Looks like we can very likely pull the entirety of CTV3 into `oct`, which is a great advantage, even if some of CTV3 is out of date. Medicine doesn't actually move ***that*** fast. [Clinical Terms Version 3](https://fairsharing.org/10.25504/FAIRsharing.t955dp) (CTV3, a successor of the [Read Codes](https://digital.nhs.uk/services/terminology-and-classifications/read-codes)) was made [available on an Open Government License (OGL3)](https://isd.digital.nhs.uk/trud/user/guest/group/0/pack/9/subpack/19/releases). It hasn’t been updated since 2018, but it contains a huge amount of what went on to be merged into SNOMED-RT to become SNOMED-CT. This means we can simply take all that CTV3 content and use it in `oct`’s namespace and graphs (this is what I’m proposing to call our hierarchies/ontologies). This accelerates us significantly. That should give us a decent start on terms. Before I start haggling with the TRUD people to give me an old copy or set up FTP access, does anyone have a copy of CTV3 from 2018? --- ## Post #58 by @pacharanero Also, just in case anyone fancies supporting the project financially, so that we can start to build some momentum around it, I have set up a [GitHub Sponsors](https://github.com/sponsors/bawmedical) account, initially under Baw Medical Ltd which is my own company. For now this is a business entity that already exists and has a bank account. https://github.com/sponsors/bawmedical Over time if we can build the project into a serious terminology, then I would like there to be an independent 'foundation' or other non-profit behind `oct`, to ensure its open future. At that point donations can go direct to that organisation instead. --- ## Post #59 by @tommyyama2020 Hello, I’ve temporarily deployed the mock UI to the GitHub Pages link below and would greatly appreciate any feedback you may have: https://tommyyama2020.github.io/oct/static/demo.html I’m currently focusing on refining the user experience and adding new features, so please feel free to share any suggestions or impressions. For now, regulatory and policy compliance can be set aside—any general or UI-focused feedback is welcome. Design-wise, you may notice that the overall impression is quite different from the existing NHS browser. With that in mind, I would really appreciate your thoughts on whether this direction feels appropriate or if there are aspects I should reconsider. --- **Canonical:** https://openhealthhub.org/t/an-open-clinical-terminology/2012 **Original content:** https://openhealthhub.org/t/an-open-clinical-terminology/2012