Duplication isn’t the only or biggest problem. If we think about it, two physicians might well observe e.g. some skin condition or pallor or whatever, just due to the patient having two encounters not in the same health service - e.g. being on holidays for one. There’s nothing innately wrong with that - they are just observations of the same thing. The trick is to know if they are really referring to the same real-world instance, or two distinct instances some time apart. For example, a mole (the same one) could be observed twice 3 months apart in separate visits. However the same kind of fracture in the radius of a child, observed and described the same way twice, 2y apart is almost certainly not the same fracture as such.
To deal with this kind of thing properly requires systems that support something called Referent Tracking - look for papers by Smith & Ceusters.
EDIT: typo, muscle memory took over… it’s ‘referent tracking’, not ‘reference tracking’.
What I think we should be more interested in is having a way to create things that really are documents, such as summaries, discharge summaries and other kinds of reports. These things will generally not contain primary data, they will reference info already established and/or simply restate it in a linguistically summarised fashion, like you SOAP example. The links I provided above are more about this (and the most recent attempt at analysing the semantics of course includes your very useful input).
Joost, What you describe is the missing link in modelling.
Not only do we have model the data entered as item in a document.
But also have to model more explicitly the context of the Observation, the Evaluation, the Planning, the Ordering, and the Execution. In other words the epistemology.
This is a model about: where in time and place, when in absolute or relative terms, why and how of the topic in the real world. It can be modellen in Archetype Space in a Cluster.
It is the issue Barry and Werner defined as “Reference Tracking”.
We can not exclude that Documents contain all kinds of data: O,E,P,O,E but also parts ofother documents.
Its a bit hard for a generic engine to know about these rules. If you query for facts you expect the primary data. Other times you query all discharge summaries containing x or z. That’s another use- case. Both is needed. The models or definition of data need to be precise and shared to make this work.
In order to prevent the problems mentioned we must be aware that we need to discern the purpose ofthe data:
-1 Primary (de novo) clinical data about the patient system entered to the record
-2 Secondary data (by reference) where data about the patient system is re-used for clinical reporting (referral, summary, …)
-3 Tertiary data (by reference) where data about the patient system is re-used for administrative (financial) reporting
Type 1 are clinical observational facts de novo facts about the patient system that are used in most queries.
Type 2 are clinical (and non-clinical) reused facts about the patient system as referenced (reused) Type 1 data within the record
Type 3 are clinical (and non-clinical) facts about the patient system referenced (reused) from third parties e.g. from a referral letter, report.
When admitted by the author of the record it changes state and is transformed in a Type 1 entered as if it is a de novo fact. But it must still use a link to the data in the original document.
Yes, I’ve played with the idea that all elements (main data point) should be in a self standing cluster archetype instead of part of the observation/evaluation. This will allow for more context info to be added to the same clinical concept. E.g. body weight is now an element in an obs. If it would be an element in a cluster, that data cluster could be either an observed body weight, an interpretation of a body weight, a target body weight etc.
So for the lab example, a summary of lab result could be a different entry archetype from obs.lab_result including the same cluster.lab_result with a summary context.
This will also solve this problem:
If this is the data you want you can query on the cluster where it’s part of the obs.lab_result (not the entry containing the cluster in a summary).
Technically the data attribute content_item of entry archetype’s shouldn’t allow for elements, only clusters.
Edit: we also need much more attributes to express the relationship between the clusters in different entries. E.g. “summary of”, “interpretation of” “exact copy of” “basis for descion of” AND we need defined expected behaviour of these relationships. E.g. a correction of a lab result that has “basis for decision ” should trigger a reevaluation of that decision.
Yes. This pattern seems very attractive for lots of use-cases. We just developed an application for weight plan. The primary user/customer was a center for eating disorders. They follow up the patient to change weight. A plan might be: increase weight by x grams each week until reaching a y kg.
We define the goal using the goal archetype. Here we would like to model the weight using the same structures as in observation body weight archetype. It would be good if we could reuse structure from the reviewed body weight.
We see the same pattern for other use-case like NEWS2 and most of the vital signs concepts.
@ian.mcnicoll mentioned they tried to model the ITEM_TREE in the early days. If I understood they rejected that modelling pattern because of limitation in the tooling. Perhaps these are ideas to revisit in 2022?
There are a couple of ways that we can address this. One is a better defined vocabulary for codes on LINKs, which technically perform the linking you mention.
Secondly, if we implement the VIEW_ENTRY concept or something close to it, such Entries will contain refs of the things they reference, and they will directly represent the notion of ‘copy of’, or more accurately, ‘citation of’.
I have proposed in the past that we support not just citations, but the representation of documents (aka ‘reports’), which contain not only cited but copied and serialised content (see here).
A report is useful in three ways:
it acts as a literal content capture of some situation (e.g. discharge) at a point in time - it may even be signed
double query results can’t happen, because the report consists of only XML or some other serial document form - there are no ‘Observations’ or suchlike
a report can be sent as is to any other party e.g. discharge summary document - >aged care home, GP etc.
What we normally call a ‘referral’ could probably be treated as a report.
Indeed, a ‘report’ could be generated for any EHR content that has to be communicated in a document-sharing paradigm. This does not preclude true sharing of content e.g. medication list, with the recipient, e.g. GP surgery.
Was this thread ever progressed further? The issue of primary record vs secondary (reused/copied) data is a problem which keeps coming back to bite us.
I agree with @ian.mcnicoll : In a lot of cases there’s a mix of primary and secondary records in a single composition.
I’ve read what @thomas.beale writes about CITATION, and I’m not sure I understand it. But we really need this to work without having to model and govern all archetypes as two separate classes in parallel. A blood pressure has the same (maximum) data set whether it’s in a primary or secondary record. But maybe I’m misunderstanding how the CITATION class would be used?
This sounds like a workable model to me as well.
What if the “flag” is not just a boolean or something, but an object containing the retrieval date, AQLs, etc?
Sure, but this goes for any kind of template modelling etc: Where there are humans, there will be errors.
Some DB users are sure to miss it - they will just query down a table full of records, and treat supposedly cited copies as originals. Relying on a Boolean flag, or any other added special attribute is bad practice. Analytics users will be sure to miss such attributes - they love just diving into relational tables, and they will ignore strange attributes they don’t understand.
Not properly separating out primary data from copies is just fraught with potential problems.
A solution I have proposed in the past represents the cited (i.e. copy) data in serialised form, e.g. JSON or similar. This means that reports, summaries etc that contain such citations have a usable representation of the cited items (e.g. if the discharge summary or report is copied elsewhere), but no structural copies exist in the DB. So no chance of duplicates in queries. The citations contain the original references to the target (cited) info items, so as long as those can be resolved, the original structural form of the cited data can be instantiated as well.
Sure, but one of the guiding principles in openEHR has been to limit the possibilities of making errors. We don’t need to make it easier
We probably have to reverse the perspective on querying anyways, from filter out what you don’t need to select only what you do need, still on the same query parameters.
Already the diagnosis of e.g. the same stroke will be in the (virtual, federated/distributed) EHR many times, once as a working diagnosis by the GP, then by the ambulance, then by the neurologist, another in a managed problem list and yet another in a nursing care plan. Each in it’s own composition.
Off course it would help a lot if there are relations recorded between those different compositions.
Now if you do a ‘dump’ query for all eval.problem_diagnosis, it may look like there’s many strokes. So you need to understand the data model (EHR, composition, archetype, template etc.) in order to create a clinical safe(ish) query. AQL is very powerful and a huge openEHR asset, but in the end it’s the clinician that needs to make the judgement of how many strokes there are. It’s very hard to automate this outside of a highly curated setting.
So I would suggest not caring too much that ‘copied’ data is returned multiple times. But focus on making writing AQL queries safer. E.g. by by default only returning data if a template id is specified, suggesting to add filter parameters based on query result set etc.
Thinking a bit more on this I think we need to seperate out the problems. My example is actually clinically different datapoints. It’s different clinical (re) interpretations (EVALUATION) of the same event by different clinicians, datasources (family, referral letter etc) with different (additional) data available to make the interpretation.
Now for observational data this is different. The same observation (same observer etc), like bloodpressure, should be recorded only once. And when needed in a different CDR, in my opinion the CDRs should be federated (aql federation spec is under way, distributed editing of a VERSIONED is still a challenge technology wise; recently discussed this privately with @sebastian.iancu and @ian.mcnicoll )
If a federation is unfeasible, e.g. because the source data is not openEHR, or there’s no shared infrastructure, quoting makes a lot of sense. I like Thom’s suggestion of (optionally) persisting the source data as a binary/JSON (logically) contained by the composition. As long as it’s clear AQL federation is the ideal solution. The JSON could then contain any openEHR data, but the AQL engine and client apps must recognise this as special data, that’s not by default returned. Maybe we could use a specific new data type for this, DV_CITATION perhaps. Or potentially a different ENTRY subclass: CITATION_ENTRY (or is that covered by DV_PARSABLE. Or is that already covered by GENERIC_ENTRY?. This would pull it into the clinical modelling domain. Anyways it would probably help if there’s an attribute for typing the data in the binary/json to recognise wether it’s openehr data
How would this relate to IMPORTED_VERSION?
Another priority, is improving the DV_(EHR)_URI by adding constraints on the target of the URI. Eg ‘should only point’ to eval.problem_diagnosis. That would help a lot in relating data eg in different compositions. The design for this is mostly agreed, just a regex (I know..). This is also relevant politically, at least in NL, because it’s one of the very few features openEHR doesn’t support, and related standards like ISO 13972 CIM/zib does
Finally we should relate this discussion to progress the other recent discussions on how to relate different datapoints. Currently there’s a lot of options, but too little consensus and feature parity. In addition to the options mentioned above (parsable, generic entry, dv ehr uri, composition containment) there’s also FOLDERs and links and adding a specific archetypes cluster with an identifier.
I’m still struggling to understand how this would work modelling wise. How would archetypes based on the CITATION class get modelled? Or am I misunderstanding the concept completely?
The situation we’re often seeing is something like “use this AQL query to see if there’s an existing Barthel index within scope, and if not allow the user to enter it”. Sometimes both the query result and the user entered variants would be considered “secondary” or a “copy”, while in other cases the user entered data is an actual primary record. So the application would have to be able to switch between these contexts on the fly, and persist the data in the same template path so it can be retrieved within context without a hugely complex AQL.
well there could easily be multiple mentions of symptoms plus an interpretation of ‘probable stroke’ or similar. However, only one formal diagnosis would appear on the master problem list, if it is being managed properly. It usually will be - physicians are very careful about distinguishing multiple strokes (heart attacks, …), since that changes the picture on prognosis.
What we have been discussing is digital copies of the same information item, not multiple real world mentions of (maybe) the same health event / state - they are different issues.
Exactly.
I used to think that too, but after 30y of looking at EMRs, and dreams of a better future, I’d say we are no closer to that perfect world.
A more realistic way of seeing things is not a perfect federation (also requires 100% uptime to guarantee that reference resolution always works - never going to happen) but a true patient-controlled, patient-centric record, hosted in a reliable fault-tolerant (possibly distributed) infrastructure, with which any clinical worker interacts.
This will take at least 10 years to catch on, even if we assume it is now understood as a sensible aspiration, which it now is, in the US. IN Europe we are further ahead in our thinking, but the practice is still the same.
Until we get such patient-centric records, we are looking at (at best) imperfect federations with multiple mentions of the same things, potentially synchronisable across locations. THe openEHR version control system was designed for this reality, which is why there is branching in the versioning scheme - that’s what allows merging etc.
Re: the citation solution, we don’t need special data types, but we do need some extra classes and special attributes to manage it properly. This draft model is close (it’s not quite right though; the VIRTUAL class should have a serialised attribute, and resolved should be removed).
I would not try to do this with DV_EHR_URI, but with Citations.
That was an early version of it. The discussion is correct (IMO at least), but I developed a draft version of a better model (see link above). Unfortunately the version online is not quite correct, but it’s pretty close.
Very good question - this stuff has to work from a modelling point of view.
You will see in that model that the VIRTUAL class attributes, which are inherited into VIRTUAL_ENTRY and VIRTUAL_ITEM are of type LOCATABLE_REF. We need to add a new meta-type to the archetype model (i.e. the AOM) that allows constraining of references of such types. Then we could create constraints in an archetype editor for commit_context , entry_context, etc, that would limit them to e.g. a lab result Composition and a Lab result Observation or similar.
There’s a bit more work to get this right, so don’t take the above as 100% literal truth. I need to do a few experiments on the structures in the ADL Workbench to see what works best.
As an aside, there is a whole other issue of references to Entities like devices, substances (e.g. medicines), persons and places, which is not handled correctly in openEHR. I solved this while in the US and the solution looks like it can be retro-fitted to the openEHR RM. I’ll get a description of that online as soon as I can.
Not an uncommon situation (happens with BMI and all sorts of things). But remember the user-entered ‘copy’ is not a copy of anything, it’s a ‘repeat mention’ of something in the real world, that might have already been described - just like a BP recorded by a machine into the EHR, and then a nurse comes along 1 minute later and does a manual BP and enters that into the EHR. It’s just another ‘sample’ of the same situation.
Querying has to be able to distinguish between:
mentions of distinct events / states, e.g. 3 measurements of high BP months apart, with normal BPs in between
repeat mentions of the same event / state, i.e. multiple ‘samples’ of the same thing - could be made in different EHR instances, e.g. one in the regional health portal, and the other in a hospital system that is running openEHR;
digital copies of primary information - the thing we want to AVOID
As a clinical professional you will know better than I do that the first two are not always easy to distinguish. How do we know when it is 3 distinct episodes of high BP rather than just one long one? A Fib is probably an even better example. THis is part of the diagnostic work up - the query cannot tell you - it is a higher level of inferencing.
We certainly do not want different archetype level models for the same thing, ever.
Does that mean that the VIRTUALclass could contain for example a COMPOSITION or ENTRY? So if I’m entering a body weight as a secondary record, the path to the weight would look something like [the virtual container]/[some new meta-type]/openEHR-EHR-OBSERVATION.body_weight.v2/data[at0002]/events[at0003]/data[at0001]/items[at0004]?
As I mentioned we defined a COMPOSITION.category with value 434 - report many years ago. This was also proposed to the SEC group. There was no other in the community which had the same needs then. That’s why it was not added to the specifications.
There are multiple use-cases for such entries into the EHR. The most common one is a discharge summary from a stay at the hospital. In such documents you will copy information from other sources like lab results, vital signs, previous diseases and other findings during the stay. When a user commit such information to the EHR it is not new information - it is a copy of previous data.
Still we wanted a way to handle versions and also to be able to query the data. Like find discharge summaries where the patient has diabetes, a lab test with a specific value, blood pressure above X and other relevant information.
So we added the new category and tuned the AQL engine in the CDR to leave out such data by default. When needed users can query for compositions matching the COMPOSITION.category.
I read the specs today and find a new COMPOSITION.category (815 - report). I am not sure if the intention of this category is the same as we proposed many years ago. See Support Terminology specification
That spec change was indeed to support your use-case directly, @bna . As you know I was supportive but felt that we might need to stage Entries in a similar way, as e.g a Discharge summary is of ten a mix of new and copied data.
However, we explored this in some depth with a client recently and came to the conclusion that your approach is probably optimal.
The main challenge is to prevent ‘copied’ data being re-included in a cross-composiiton query like ‘show me all recent blood pressures’, and flagging some compositions as ‘report’ does allow these to be excluded.
It is definitely a requirement to be able to query some ‘new’ data directly via AQL e.g ‘Discharge medications’ but that can be done by querying on the templateId or composition/name/value, and these are very unlikely to ve cross-composiiton queries. Even if these are required, it just makes for a more complex query.
It is possible to use Citations / Links, and there are certainly use-cases for these e.g. managing problem lists, but querying is more complex and in the use-case of ‘reports’ e.g. Discharge summary, there is an argument that the composition represents a ‘snapshot’ document and should carry the copies directly.
So I think the ‘report’ category works for discharge type reports and also agree with @joostholslag that this also requires more targeted querying to known ‘curated/managed lists.
We do also need to update the way that we manage links/citations especially as increasingly some of these links will resolve via FHIR References, and have better support in AQL to resolve those references.
The original idea of the VIRTUAL class, from which VIRTUAL_ENTRY etc are derived, is that it acts like a real ENTRY, so for example a citation of an earlier lab result as justification for a diagnosis is in the data as a VIRTUAL_ENTRY corresponding to the cited OBSERVATION . Rendering software knows it is a virtual entry so can show its contents but in a way that makes it clear it is a citation. Similarly, the querying service can ignore virtual Entries, Compositions etc easily, because they are not instances of the primary classes. But a special query could find citations as well.
At the archetype modelling level, you would model original content normally, and you would constrain virtual items within citing Compositions (such as referrals, discharge summaries, reports etc) to be of certain types if you wanted.
I don’t claim all this is 100% worked out, at least I have not had the time to work on it for a few years, but I think the main ideas are in place. I hope to get back to this and related questions soon.
The “thinking math” meme is a good illustration of how I feel right now. I think I’ll need this explained with a whiteboard and follow-up questions in order to fully grasp how this would work. Would it be possible to have a short session about this at EHRCON, do you think?
That’s true, and that’s why I proposed that the copies are also carried inline serialised form, i.e not structured form. So if the report is copied to some external system, no cited content is lost. THere is an argument to say that a ‘report’ should be represented by:
a ‘source’ structure, which contains virtual entry citations and so on, AND
a fully serialised document form, i.e. what people usually think of as a ‘report’
The latter is generated from the former, and can be persisted in some convenient place, and shared as needed. It knows its generator, which contains all the citation links, so all original content can be found if necessary.
The interesting question is then what querying on reports looks like. Probably text-based.
In openEHR, we should use proper Entity refs, which I mentioned earlier above (these would be a new addition). Ideally we would replace all inline reference to devices, persons and so on with such references. One of the things we found at Graphite was that having these inline inline caused a data explosion, not to mention endless copies of the same data. This is a pretty serious issue which we need to address soon.