Cross-reference, citations and a solution for managed lists

thomas.beale · 5 March 2021 13:28

[All: this is a long post on an important topic, so I’ve made it a wiki post i.e. directly editable by others. Feel free to make inline additions, but please try to retain the general integrity. I suggest to add your initials to any additions. Most likely we should create extra topics on each major question described below.]

We have had a long running need to better solve cross-referencing in the openEHR EHR for managed lists such as the Problem List, Allergies list and so on. We’ve had many discussions in the past, including this recent one on Linking in openEHR. I have previously created UML for some initial ideas (‘view Entries’) if you want to look at something, but this is far from complete and could even be wrong.

There are various needs that simple LINKs and use of DV_EHR_URI don’t solve particularly nicely, much of that analysed by @ian.mcnicoll and other clinical modellers (@siljelb , @heather.leslie, @varntzen, @vanessap etc, feel free to chime in) in trying to build models for Problem List and the like. I’m going to try to articulate a few at a time, in the hope we can expose the needs and therefore the solution here.

(In the below, you can mentally trade other reference lists like Medications List, Allergies, Family History, etc for Problem List, with the same general semantics.)

Semantic Requirements

So the first thing to think about is the idea of one or more Problem Lists (at least one ‘master’ Problem List with the main Dxs) for which I propose the following semantic requirements statement (to be debated). Managed Lists:

are curated, i.e. manually managed (i.e. not query results)
have content consisting of ‘focal’ and ‘related’ data - ‘focal’ meaning the thematically central data i.e. problems, allergies, medications etc; ‘related’ meaning anything else;
are not the primary structure in which the thematically focal data (Dxs and the like) are originally recorded
have their own documentary structure, i.e. something like Section/heading structure
the focal content is citations of previously recorded diagnoses and/or other ‘problems’
may have citations of other related previous content, e.g. important observations, past procedures etc
?could have have internal de novo content, i.e. not just own Sections, but Entries (probably Evaluations?) created within the List to represent notes? summaries? thoughts about care planning?
are managed over time by the usual means, with each modification creating a new version.

One key thing we have to determine is: what can be cited? Is it:

A: only Entries within previous Compositions? I.e. individual clinical statements?
B: Sections containing multiple content items within previous Compositions?
- runs the danger of pointing to too much content if you don’t check properly;
C: sub-Entry level items, e.g. Clusters and Elements, e.g. a single lab analyte inside a lab result OBSERVATION?
- runs the danger of mixing up e.g. a target value (e.g. target BP) with an actual value, or anything else taken out of context;
D: any structure anywhere in a previous COMPOSITION (let’s limit it to LOCATABLEs, which is nearly everything);
- seems dangerous in general.

I am personally strongly in favour of a type A kind of citation - having a single Entry as the target. It always seems attractive to want to refer to anything, but I think that is of limited utility, and carries dangers. It is of course technically possible to model different kinds of citation object, that can point to different kinds of target structure.

Technical Requirements - Representation

To these we need to add some technical requirements, e.g.:

does a retrieve of the Problem List:
- get all its cited contents in one go? I.e. what the clinician considers to be the content? OR
- get only the heading structure and the citation objects (some kind of direct references), with further dereferencing needed to resolve all the citations in order to build the List for display and update?

It seems fairly obvious that the first option is what we want - the whole point of the managed List after all is that you can easily get hold of it as a single logical object.

So here’s the main technical problem. To achieve the result that the full List, including all cited contents, is returned through the API on request requires a solution to either persisting or computing the full contents of what the citations point to. The options include (with some obvious dangers listed):

Persisted Copies: citations are resolved at create time, i.e. they cause copying into the persistent List structure, i.e. the EVALUATION recorded 3 years ago containing my diabetes type 2 Dx is just copied into the Problem List when it is added in the curation process.
- the obvious danger here is that copies of Entries are likely to cause duplicates in querying - we are breaking the golden rule of IT here after all;
- however, making some sort of safe, encapsulated copy is undoubtedly possible;
Generated Copies: citation references are resolved at retrieve time on the server when a retrieve request is made such that the full Problem List is instantiated prior to sending through the API
- this requires a model that includes data items that are not persisted, but generated post retrieve - more complicated;
- the query service has to do a different sort of retrieval, so that these duplicate content structures are not created prior to executing the query - again more complexity;
Persisted Serialisations: citations are resolved at create time, but don’t create structure copies (e.g. a 2nd EVALUATION etc), instead are instantiated in serialised form, e.g. XML or JSON which just need to be rendered to the screen (this kind of approach is documented in the Confluence page on Report representation).
- this approach will prevent duplication in querying and any other process that aggregates persisted EHR data;
- but it loses the native openEHR structures that might be useful on the client side.
Some other (new) native technical representation: some new converted form of the current native structures, e.g. a flattened readonly Entry or similar (see below).

As per that Confluence page, I think there are very good arguments for using the serialised approach for report-like objects, e.g. discharge summaries, referrals, etc, because they are indeed a kind of recorded statement at a point in time that is treated as a medico-legal document. Whether that same logic holds for managed lists is a question.

There is another potential requirement as well, which is that the client may want not just the cited Entries in the Problem List, but:

their context info i.e. from their containing COMPOSITIONs, indicating ‘when and where did you get your Dx of type 2 diabetes’ AND/OR
the version information, i.e. from the containing ORIGINAL_VERSION object, indicating ‘when did this information become visible in the EHR’.

So we might not just want ‘straight’ Entries, but ‘wrapped Entries’ or ‘flattened Entries’ containing that other data, or each cited Entry. Note - this need is not specific to managed lists, but could be desirable within query results in general (today we solve it by stating the bits and pieces we want in the SELECT part of a query).

Technical Requirements - Update

When a managed list is being updated, i.e. ‘curated’ as we often call it, you can’t modify the cited contents (well, you might be able to do that, if you see errors etc, but it’s not a routine part of List update).

Therefore if the ‘resolved’ (client side) representation includes native objects representing the citation targets, those latter objects have to be considered readonly. If the representation is in a serialised (or some other) form, it might be easier to do this.

Other than this, updating a managed list should allow any reasonable change - removal of references, addition of new ones etc.

Technical Requirements - Interoperability

There are some other technical questions to think about as well. For example, what happens when copying the Problem List(s) and Medication List to another EHR system, e.g. GP → hospital? This can be via an EHR Extract or some other means. How would the receiving (openEHR) system persist the data? That depends on how it is represented, according to those options above - as native openEHR structures, or as a serialised form. Would such copying require that all the cited Entries and their containing Compositions be copied over as well? For native openEHR → native openEHR, a full copy should be made (like a Git repo sync operation with branches being pushed to a target repo) but for other environments, we might want to make fewer assumptions.

We might therefore consider that there is a form of managed List that has references that no longer have targets in the system where it is persisted, due to being a copy.

Towards a Solution

My current thinking over the years on this issue is toward the following kind of solution:

within a openEHR EHR system, we represent managed lists such that citations contain direct references, which are resolved (each time) on retrieval in the server, so that the structure that goes through the API is the ‘full’ structure.

joostholslag · 6 March 2021 15:30

I’m really happy this topic gets attention! My initial thoughts are to preliminary to edit the wiki post, so I put them in a reply to get some feedback first:

I think we need to add the requirement that often you want to use different wording for the problem in the problem list then the original entry. E.g. cardiology diagnosis/conclusion: “right dec cordis due to tricuspidalis insufficiency for which a valve replacement has been performed with bioprosthesis brand X type y serial xxxx”
In the gp problem list this would be summarised as “heartfailure (bio) valve replacement”
Furthermore it’s common practices to abbreviate the date of the diagnosis if it’s long ago. But that should be a clinical decision (not computerised) so where is this stored?
Often (unfortunately) the source of the problem information is the patient, or some other non resolvable source. How do we then record this information?

My preliminary thoughts are:

the problem in the problem list is not identical to the problem at the time of diagnosis. So a full copy is too limited and doesn’t feel semantically correct, e.g. misses the new perspective on the same problem.
I appreciate the problem of double results on query, but I think we may need multiple valid strategies: ideally I’d like some attribute on the ‘copy’ so it gets excluded at query time (maybe with a param/ select/where statement) but implementers should maybe be offered a simpler solution to avoid copying.
I academically like the idea of the persisted serialisation, but am wary of the extra work rendering these structure will cost our frontend developers.
how about storing regular openEHR entries in something that is not a composition? Maybe a new class that is a root EHR structure. Maybe we could extent the folder class to something that would make it a good fit to store curated lists?

I think we should not specify this limitation, because I expect people will come up with valid usecase we can’t think of right now. I think it makes sense to recommend citing at the composition or entry level. But in practise you usually want to (only) show the value of the “problem/diagnosis name” ELEMENT in the master problem list UI.

Shouldn’t the client/patient request this from the record holder of that composition/version? Or what is the usecase for this potential requirement?

I’m thinking more towards letting the full composition structure (containing the problem/diagnosis) go through the api only when curating (adding/summarising/editing) that entry in the problem list. And when querying the curated (problem) list, only sent it’s contents: summary/problem name (and potentially other elements like date) and the reference to the original location of that problem. Although not a primary concern I’m also worried about performance since if you sent the referenced source compositions through the api when querying the problem list what happens if that source structure contains a reference itself, e.g. the haerfailure cardiology dx contains a reference to the video of the surgical procedure, do we sent that as well. Also, this chain of references could loop, or be otherwise large-endless.

As a further requirement I would like to model the problem list in an archetype (like) structure. To be able to add semantic meaning to a reference, or to add a comment on a reference or, as described above: to rephrase/ summarise a text value without changing the source composition.

I do feel this problem of how to handle linked data from a querying efficiency and clinical modelling perspective should get some answer from the openEHR specifications, since it’s so fundamental to healthcare that all data is related to other data. And interpreting/recording these relationships is a big part of the art and science of medicine.

thomas.beale · 6 March 2021 15:52

2 very good requirements - this tells me that we probably should not even consider the Entries in the Problem list to themselves be the same kind of thing as what they cite, i.e. they might be more like simple list Entries, e.g.

type I diabetes mellitus, diagnosed at age 17
previous heartfailure with (bio) valve replacement

We’d need to be careful to distinguish between a) current Dx (e.g. current CHF), b) previous Dx, now recovered (previous CHF), and c) previous procedures, which to my knowledge is a different list.

But it’s not for people like me to say what should be on a ‘problem list’ - perhaps it should be anything, and we should treat citation(s) in each list entry as more like a pointer to whatever constitutes evidence?

On the question of previous real world diagnoses that the patient tells you but occurred long before they started being seen in your practice, and therefore, long before any data was created for them, AFAIK what we do in openEHR is record a normal Problem/Dx Evaluation with the extra information indicating when initially diagnosed. That way, a query for instances of Problem/Dx Evaluations will correctly pickup patient-reported historical diagnoses, as well as ones made by the current clinic.

In fact the idea here would be to store the cited content in a proto-rendered form, e.g. a JSON or XML proto-form structure, such that rendering would be very close to a single function call in some standard modern UI framework.

yes it almost certainly has to work like that, because the curating phase (which of course is constantly repeated) has to display query results that include anything that looks like a Problem/Dx (or Allergy, etc, depending on which list you are curating) from past EHR content, such that you could drag and drop or otherwise create the intended citations.

If there is enough interest, we could transfer my original wiki post here to the openEHR Confluence site, and work on it more carefully.

ian.mcnicoll · 6 March 2021 18:31

Some ideas on this -drawn from Contsys but agree basically there is a difference between a Problem ‘header’ and the individual problems within it. Can be quite tricky .

GitHub - openehr-clinical/shn-contsys: Resources for the EU SemanticHealthNet Contsys project. Never implemented.

@joostholslag - the need to express problems at different levels of granularity is definitely true but I think it might be a slightly different issue. It is one of the reasons why managing a universal problem list will be in my IMO very difficult.

thomas.beale · 7 March 2021 09:53

I’m assuming (based on your last analysis!) that we are thinking of a ‘master problem list’ (current medical diagnoses & problems), and potentially multiple other problems lists, e.g. nursing home care list (‘can’t get upstairs’ etc).

Remember also that I’m using the problem list as an example of any kind of managed list, so we could be talking medications here, in which case we really do want one list (excepting homeopathics and the like I guess).

So I think we need enough facilities to enable these kinds of lists with citations to original relevant content to be built, even if we might not necessarily achieve every last minor detail. I.e. to have a substantially better solution than we have today.

ian.mcnicoll · 7 March 2021 11:31

It is good conversation but is probably mixing up the tech and the clinical semantics/governance. The tech that is not quite right for me is a simpler approach to references between Entry objects that allows

Nesting of Entries including inside the same composition.
Making Entries more Restful from a ‘Get’ point of view - i.e. it is easier to consume their parent composition metadata, when querying
Querying across references (simple use-cases only for now)
Standardise use of Entry Ids (sub-items like events, activities etc) esp to line up with FHIR. Can these be an alternative to aql paths (or at least hidden from 3rd party devs by the CDR)
Figure out how best to express these in ADL and tooling - I’m thinking LINK might be sufficient, with some tweaks e.g make target etc optional but add the ability to provide a uid at run-time rather than a formal AQL path (to be resolved by the CDR).

I think that gives us the playpark we need to start to give solid advice for the kind of questions that Joost is asking. I know various vendors are doing some of this already but it would be good to bring it into the standards space so we can give general advice on ‘how to do’ care-plans or allergies list etc, with some confidence that the optimal approaches do involve references. You can make that work right now but it is harder to do than it ought to be.

thomas.beale · 7 March 2021 13:53

Can we clarify the sense in which you mean ‘nesting’ … do you mean:

A: an Entry (primarily an Evaluation?) can refer to other previously recorded Entries, and the linked structure is visualised as a hierarchy?
B: a managed list can define a hierarchical (i.e. heading) structure under which citations to previous Entries occur, and this is to be visualised according to the hierarchy in the citing list?
C: something else?

A) is already supported by LINK, but archetyping would be better if we could constrain the target of LINKs - the addition to AOM/ADL to do this is not complicated. Probably most current solutions don’t have a hierarchical visualisation of this kind of LINKed chain/tree structure. Querying would also be better if we could follow LINKs (e.g. to a max depth) and pull back the whole pile of LINKed items in one hit.

B) is what the current discussion thread was about.

Your 2. was what I was talking about in terms of ‘wrapped’ or ‘flattened’ Entries. Wrapped would be: you get (a copy of) the enclosing Version, wrapping the enclosing Composition, and inside, just the Entry you want, i.e. all other contents of that Composition are removed. Technically, this might be termed a ‘wrapped projection’. The attraction of this is that no new implementations structures are needed to deal with it.

A ‘flattened’ Entry would be the same content but more flattened, e.g. a structure like:

FLAT_ENTRY
- entry: ENTRY[1] – original Entry, i.e. Eval, Obs etc
- context: EVENT_CONTEXT [0…1] – context from owning Composition
- commit_audit: AUDIT_DETAILS[0.1] – audit from owning Version
- attestations: ATTESTATION[*] – any attestations
- version_uid: OBJECT_VERSION_ID – uid of owning VERSION

This requires new modelling and new implementation in the server, APIs and applications, but might be a bit easier to deal with in apps.

matijap · 8 March 2021 10:41

I’ll add my technical 2 cents here, and let the others figure out the medico-legal and otherwise content-related aspects, and I’ll also send @borut.fabjan and @vanessap this way for that reason.

From technical POV, I would only like to see a persisted copy of the data within those lists if there is a non-technical reasoning (e.g. legal) behind it. Otherwise I would very much prefer those lists would be compositions containing only the links, and that is what would also be retrieved in the usual ways (requesting a composition via API or AQL query).

As there would probably be a widespread interest in having the CDR handle the assembly of the actual list (with contents, not just links), that should be handled in another way, e.g. another API endpoint, and a server-side function or similar for AQL. Then it would be up to implementation to provide this content in a timely manner and the specifications committee should abstain from giving instructions or even advice to implementors.

thomas.beale · 8 March 2021 11:20

Right. But we need to work out what model changes are required to support this - that’s the SEC’s responsibility. E.g. the addition of at least a Citation class or similar appears likely, as per this early model UML for some initial ideas (‘view Entries’).

The question is whether the other requirements described above / emerging will require some more support in the models.

ian.mcnicoll · 8 March 2021 11:44

I’m starting to think that Citation may not be needed, with a little bit of work on the LINK class.

What are people’s views on making more use of uids vs. full AQL paths, at least at runtime, and emulating the CDA/FHIR approach of allowing in-composition local references?

matijap · 9 March 2021 20:26

If I fall under the “people” category, I must say I didn’t understand the question. Could you elaborate?

birger.haarbrandt · 10 March 2021 07:28

To my understanding, we would have to do this anyway if we need to link to a specific entry inside compositions with repeating paths. I gave it some thought that this should be a configuration of EHRbase to automatically create uids if an entry’s path inside a particular composition is not unique.

ian.mcnicoll · 10 March 2021 09:02

@matijap - sorry - I was being a bit confusing. Fir ‘id’ I was raising the issue that I see increasing use of the ENTRY.uid or sometimes ENTRY.event.uid, either for internal references or to identify the ENTRY or (sub-ENTRY like event) to line-up with FHIR. e.g how do you gnerate a FHIR.id for an OBSERVATION?

I also sense that 3rd-party devs can get their heads around uids much more easily than formal AQL paths. Birger adds the further issue about uniquely identifying cloned paths.

I guess I am arguing that we should try to standardise the use of uids, or at least share experience, of how and where they are being used. I assume that internally a uid reference would be resolved to a specific path?

The issue about ‘local references’ is that FHIR and CDA allow the modeller to indicate that a link /reference to an ENTRY is to be stored inside the same composition.

thomas.beale · 10 March 2021 10:22

I think eventually mandating or strongly recommending UIDs in ENTRY root points (and maybe in event root points) is uncontroversial, but not because it ‘lines up with FHIR’. If it does, that’s good, but we can go back to 13606 (not sure about today’s version) which demanded UIDs on every single node, which creates a massive impost on source systems to manufacture (easy) and record forever (costly) ids that they have no use for, but some interop standard (that will be replaced one day) requires. I actually calculated out the space cost of this on a representative CDR and the extra useless ids constituted a substantial fraction of the data volume. This matters when you are buying RAID 10 Tb of storage for real systems. It also added significant complexity.

Always be careful what you wish for with interop standards… Having said that, I’ve always believed we should have UIDs in Entry root points since they are independent clinical statements.

Not sure if I get this one: a LINK from some EVAL1 (say) inside COMP1, to OBS2 in some COMP2, is part of EVAL1, and is inside COMP1. Why should the modeller have to say anything about where the LINK goes?

thomas.beale · 12 March 2021 11:05

The changes that we would need to propose on LINK would be the same ones that CITATION brings - primarily an attribute containing the resolved target of the LINK on retrieval but not persisting it. There might be advantages, and avoiding entirely new classes like CITATION also has the obvious attractions of reducing downstream implementation work. I will think more on what we can do with LINK.

thomas.beale · 18 March 2021 13:47

Just to provide a counter-argument… If we use LINKs to implement Citations, then it is likely that at runtime, a Problem or other list will contain the ‘first class’ Citation LINKs (pointing to the primary data of the list, i.e. problems, Dxs etc from past Compositions) and also other LINKs. If we want to treat Citation LINKs as special, e.g. always ‘follow and resolve’, they will need to be marked as Citations or similar, to distinguish them from other LINKs that are not automatically resolved on retrieved (e.g. some sort of ‘see also xyz’ link, or order tracking Links).

So to implement what we think we want with LINKs, we need to:

designate some specific values for LINK.type (and maybe LINK.meaning) - see spec here.
add at a new field that contains the resolved value on retrieve;
possibly some other meta-data fields;
add something to the AOM to enable link targets to be constrained in a similar way to slot constraints (we need to do this in any case).

I’m not sure if this is any less work than adding a CITATION class and VIEW_ENTRY (my original idea), but it could go either way.

Something else potentially in favour of the VIEW_ENTRY approach: I believe we may want to be able to add something similar to a Citation, but where the resolved result is an AQL query result, e.g. last 10 BPs or a list of medications matching some criterion. If we did want that, it would probably be another child of VIEW_ENTRY, i.e. we might have this model:

VIEW_ENTRY (abstract)
- CITATION
- QUERY_RESULT

A further reason to consider this approach is that these special new ENTRYs can have LINKs attached to or pointing to them, just like any other ENTRY today, but ‘LINK citations’ cannot.

Some of these other kinds of quoted / cited Entry are described in the top bit of this old wiki post.

One more possible reason in favour of a CITATION (and related) classes: if we can routinely archetype LINKs within archetypes, it will be harder to distinguish Citations from other LINKs that might be archetyped, if they are implemented as LINKs.

I suggest we could discuss these options (and any other bright ideas people have) on Monday’s call.

thomas.beale · 18 March 2021 17:52

I’ve created an actual wiki page for this issue now.

I’ve added in the beginnings of a Citation Entry approach (explanation to be added), and also Ian’s suggested LINK-based approach. If others have a different idea, please add it as another Solution.

joostholslag · 26 April 2021 08:17

Another use-case example. Mayo score - ready for publication - #8 by joostholslag

Sidharth_Ramesh · 6 February 2024 20:13

I know I’m a bit late to this party, but what’s the plan with this implementation of unique UIDs for every LOCATABLE in EHRbase? Last time I checked, UIDs are not being generated for every archetype in a composition automatically while creating.

birger.haarbrandt · 6 February 2024 20:32

Correct. But as you bring it up again after 3 years, this still could be a valid idea