Earlier this month the MHRA announced it was investigating the QRisk2 calculator used in TPP’s SystmOne. QRisk2 is an algorithm that is used to estimate a patient’s risk of having a heart attack or stroke over the following ten years. It has been developed by doctors and academics using longitudinal primary care data and socio-economic indicators.
DHI - MHRA issue alert over QRisk in TTP.
Clinrisk publish QRisk2 as open source software under the GNU LGPLv3. The code available here is fully working implementation of the algorithm. From what I can tell its the 2014 release. The 2015 release is not in SVN but there is a tarball available. By way of some wet Bank HOliday fun I imported all the published code from Clinrisk SVN to GitHub repos. Here’s the QRisk2 repo. I made notes on compiling the command line calculators on Ubuntu and added a wrapper script to help people get started.
I did this as I wanted to conduct a little investigation of my own. Could I identify any weaknesses in the QRisk2 code? Answer: NO; but you all know I’m not a programmer. Could I demonstrate and reproduce variance between calculations? Answer: YES; yet there are reasonably sound explanations for this, not least the variance expected between the published revisions of 2014, 2015 & 2016.
I’m comparing the score I get from the online tool and the command line. The variables I’m using are:
- Male
- Aged 64
- My home postcode (giving a Townsend score of 3.80)
- Non-smoker
- Type 1 Diabetes
- With history of angina or heart attack in a 1st degree relative
- With Chronic kidney disease
- And Atrial fibrillation
- Being treated for hypertension
- And rheumatoid arthritis
- HDL ratio of 5
- BP (Systolic) of 140 mmHg
- Height 177cm
- Weight 78kg
Here’s a screenshot of the QRisk2 form on the website.
- Score from website: 95.1
- Score from command line tool: 95.059356
There are two obvious and simple reasons for the difference:
- The command line tool expects BMI while the web tool calculates BMI from Height and Weight. I used the NHS Choices BMI calculator and got 24.8kg/m2 from a height of 177cm and weight of 78kg. The online tool calculated a BMI of 24.9 kg/m2.
- The command line gives me the result to 6 decimal places. The online tool presents the score to 1 decimal place: 95.059 has been rounded up to 95.1
And let’s face it, if you are scored north of 90%, a few tenths of one percent are the least of your worries…
The rounding problem is a little more pronounced when I changed the sex of the patient. The same values for a female patient produce scores where the rounding decision/difference between the PHP in the online version and the complied C code could be challenged:
- Score from website: 96.7%
- Score from the command line tool: 96.340760
My 10 year old tells me that 96.34 is closer to 96.3 than 96.7 - I’m not going to argue with her.
One potential source of this more significant difference is the deprivation index. Clinrisk only provide the University of Nottingham “Postcode to Townsend” deprivation table under license. It took me a while to find out the Townsend score for my postcode in order to submit the value on the command line. See my notes on this in the repo. It could be that the 2016 code on the website is using more recent data to derive the score. I’m not sure about this as Townsend is calculated from census data, the last census was in 2011, and published at Lower Layer Super Output Area level which, from what I can gather, doesn’t change very much.
In addition to the Postcode → Townsend mapping available under license, Clinrisk state that “Missing values are handled appropriately, with details of estimated values substituted into the algorithm provided”.
Let’s compare the results between a patient where Townsend is derived from a postcode lookup and the where the missing input is handled appropriately with substitution.
- Substituted Townsend: 94.2%
- Known Townsend: 95.1%
Again, not much… Yet 0.9% could be the difference between two categories of intervention prioritisation: the edge case is important.
I’ve only found a couple of minor issues, easily addressed by using the licensed application with a subscription to the lookup tables.
TPP must be using the professionally supported SDKs. Clinrisk describe the advantages to a system vendor of using the licensed product on the website:
“[Clinrisk] formally accredit your implementation of QRISK2, providing a suitable test harness and data to verify your implementation. Users of our SDKs undergo an accreditation process with us. This ensures two things: that the supplier implements the score accurately, and that the end user of the software can have confidence that it has been implemented properly.”
Oh.
So whats gone wrong here: Garbage in, garbage out? Accreditation failure? Source data mapping? Version control?
What can the collective wisdom on OHH identify?