How accurately do we model "copied" data?

Duplication isn’t the only or biggest problem. If we think about it, two physicians might well observe e.g. some skin condition or pallor or whatever, just due to the patient having two encounters not in the same health service - e.g. being on holidays for one. There’s nothing innately wrong with that - they are just observations of the same thing. The trick is to know if they are really referring to the same real-world instance, or two distinct instances some time apart. For example, a mole (the same one) could be observed twice 3 months apart in separate visits. However the same kind of fracture in the radius of a child, observed and described the same way twice, 2y apart is almost certainly not the same fracture as such.

To deal with this kind of thing properly requires systems that support something called Referent Tracking - look for papers by Smith & Ceusters.

EDIT: typo, muscle memory took over… it’s ‘referent tracking’, not ‘reference tracking’.

What I think we should be more interested in is having a way to create things that really are documents, such as summaries, discharge summaries and other kinds of reports. These things will generally not contain primary data, they will reference info already established and/or simply restate it in a linguistically summarised fashion, like you SOAP example. The links I provided above are more about this (and the most recent attempt at analysing the semantics of course includes your very useful input).

1 Like

Joost, What you describe is the missing link in modelling.
Not only do we have model the data entered as item in a document.
But also have to model more explicitly the context of the Observation, the Evaluation, the Planning, the Ordering, and the Execution. In other words the epistemology.
This is a model about: where in time and place, when in absolute or relative terms, why and how of the topic in the real world. It can be modellen in Archetype Space in a Cluster.
It is the issue Barry and Werner defined as “Reference Tracking”.

We can not exclude that Documents contain all kinds of data: O,E,P,O,E but also parts ofother documents.


Its a bit hard for a generic engine to know about these rules. If you query for facts you expect the primary data. Other times you query all discharge summaries containing x or z. That’s another use- case. Both is needed. The models or definition of data need to be precise and shared to make this work.


In order to prevent the problems mentioned we must be aware that we need to discern the purpose ofthe data:
-1 Primary (de novo) clinical data about the patient system entered to the record
-2 Secondary data (by reference) where data about the patient system is re-used for clinical reporting (referral, summary, …)
-3 Tertiary data (by reference) where data about the patient system is re-used for administrative (financial) reporting

Type 1 are clinical observational facts de novo facts about the patient system that are used in most queries.
Type 2 are clinical (and non-clinical) reused facts about the patient system as referenced (reused) Type 1 data within the record
Type 3 are clinical (and non-clinical) facts about the patient system referenced (reused) from third parties e.g. from a referral letter, report.
When admitted by the author of the record it changes state and is transformed in a Type 1 entered as if it is a de novo fact. But it must still use a link to the data in the original document.

1 Like

Yes, I’ve played with the idea that all elements (main data point) should be in a self standing cluster archetype instead of part of the observation/evaluation. This will allow for more context info to be added to the same clinical concept. E.g. body weight is now an element in an obs. If it would be an element in a cluster, that data cluster could be either an observed body weight, an interpretation of a body weight, a target body weight etc.
So for the lab example, a summary of lab result could be a different entry archetype from obs.lab_result including the same cluster.lab_result with a summary context.
This will also solve this problem:

If this is the data you want you can query on the cluster where it’s part of the obs.lab_result (not the entry containing the cluster in a summary).

Technically the data attribute content_item of entry archetype’s shouldn’t allow for elements, only clusters.

Edit: we also need much more attributes to express the relationship between the clusters in different entries. E.g. “summary of”, “interpretation of” “exact copy of” “basis for descion of” AND we need defined expected behaviour of these relationships. E.g. a correction of a lab result that has “basis for decision ” should trigger a reevaluation of that decision.

Yes. This pattern seems very attractive for lots of use-cases. We just developed an application for weight plan. The primary user/customer was a center for eating disorders. They follow up the patient to change weight. A plan might be: increase weight by x grams each week until reaching a y kg.

We define the goal using the goal archetype. Here we would like to model the weight using the same structures as in observation body weight archetype. It would be good if we could reuse structure from the reviewed body weight.

We see the same pattern for other use-case like NEWS2 and most of the vital signs concepts.

@ian.mcnicoll mentioned they tried to model the ITEM_TREE in the early days. If I understood they rejected that modelling pattern because of limitation in the tooling. Perhaps these are ideas to revisit in 2022?

1 Like

There are a couple of ways that we can address this. One is a better defined vocabulary for codes on LINKs, which technically perform the linking you mention.

Secondly, if we implement the VIEW_ENTRY concept or something close to it, such Entries will contain refs of the things they reference, and they will directly represent the notion of ‘copy of’, or more accurately, ‘citation of’.

I have proposed in the past that we support not just citations, but the representation of documents (aka ‘reports’), which contain not only cited but copied and serialised content (see here).

A report is useful in three ways:

  • it acts as a literal content capture of some situation (e.g. discharge) at a point in time - it may even be signed
  • double query results can’t happen, because the report consists of only XML or some other serial document form - there are no ‘Observations’ or suchlike
  • a report can be sent as is to any other party e.g. discharge summary document - >aged care home, GP etc.

What we normally call a ‘referral’ could probably be treated as a report.

Indeed, a ‘report’ could be generated for any EHR content that has to be communicated in a document-sharing paradigm. This does not preclude true sharing of content e.g. medication list, with the recipient, e.g. GP surgery.

1 Like