# How accurately do we model "copied" data? **Category:** [Clinical](https://discourse.openehr.org/c/clinical/5) **Created:** 2022-06-06 16:10 UTC **Views:** 1484 **Replies:** 42 **URL:** https://discourse.openehr.org/t/how-accurately-do-we-model-copied-data/2691 --- ## Post #1 by @johnmeredith openEHR templates will frequently feature data from other repositories. In the context of a clinical referral letter, results may be added in to the narrative of the text and this is itself a copy of the source data from the pathology system. We have to be aware that this data exists and should not be used i.e. query only the source systems. We have to make a decision on how accurately we model this data i.e. as an observations, SNOMED CT codes etc with accurate values. Or do we only store these "copies" as text strings to prevent reuse (misuse)? Is there best practise in this scenario? --- ## Post #2 by @bna We've added a separate context/category code for report documents. The purpose of these is to reuse data from the EHR for outbound data. Content in such compositions will not be included in AQL results by default. They are queryable but the query editor must make explicit conditions in the AQL to get them --- ## Post #3 by @joostholslag Important point John! Some thoughts: there’s a generic entry class, one of its goals is an integration scenario like you described. The same problem occurs if e.g. lab data is summarised or quoted in a discharge summary both stored in openEHR CDR, there’s been some talk with proposals from @thomas.beale of references some months ago on this forum. Let me know if you can’t find it. @Bna do you mean this is a local solution dips built? Or is it in the spec? Could you share a bit more? I’m really curious (a) --- ## Post #4 by @varntzen Hi John [quote="johnmeredith, post:1, topic:2691"] openEHR templates will frequently feature data from other repositories. In the context of a clinical referral letter, results may be added in to the narrative of the text and this is itself a copy of the source data from the pathology system. [/quote] I wonder if you can be more specific. By "other repositories" you mean another in-house system, or from outside of the organisation the EHR is suited? The results that could be added in a clinical referral, is this a referral from an external part to the health provider or outbound back to the external part (i.e. a GP)? A concrete use case would help to understand your question. The way Bjørn describes how DIPS is dealing with reports "out of the domain of their system" is a nice solution. It re-uses the same archetypes as in production, but in another context ("just for reporting") and doesn't "pollute" the true production data from the hospital/health service provider. Regards, Vebjørn --- ## Post #5 by @thomas.beale [quote="joostholslag, post:3, topic:2691"] there’s been some talk with proposals from @thomas.beale of references some months ago on this forum. Let me know if you can’t find it. [/quote] [Might be this](https://discourse.openehr.org/t/cross-reference-citations-and-a-solution-for-managed-lists/1355). [quote="joostholslag, post:3, topic:2691"] do you mean this is a local solution dips built? Or is it in the spec? Could you share a bit more? I’m really curious [/quote] This is something DIPS does locally. [quote="varntzen, post:4, topic:2691"] openEHR templates will frequently feature data from other repositories. In the context of a clinical referral letter, results may be added in to the narrative of the text and this is itself a copy of the source data from the pathology system. [/quote] [This wiki page](https://openehr.atlassian.net/wiki/spaces/spec/pages/905183257/Applications+Retrieve+Data+sets+and+Bindings) may address what John is talking about here. Also related: [Subject Proxy Service](https://specifications.openehr.org/releases/SM/latest/openehr_platform.html#_subject_proxy_service_sps), for getting subject (patient) data from other sources. e.g. patient demographics from Oracle MPI or vital signs from wearable device. Here's a [wiki page on Reports](https://openehr.atlassian.net/wiki/spaces/spec/pages/92358988/Reports) (much based on @bna presentation from some years ago). --- ## Post #6 by @johnmeredith This is very much in line with some of our thinking. Is this implemented as a standard operating procedure when dealing with AQL i.e. don't include this code as standard on all AQL calls? Probably better if you had an example? --- ## Post #7 by @johnmeredith Sorry for not being clear! We have a pathology results service and for the time being, we would not want to replicate the data there as this component is designed to support the requests and results etc. In the medium term this will be wrapped or converted to FHIR native. My concern is breaking the clinical narrative by not modelling "properly". We hit this yesterday with @ian.mcnicoll and it occurred to me afterwards that I didn't like the notion of either dumbing down or even excluding the observation in question. Part of this is to do with our need to present a copy of the composition in our document repo. We would then be faced with both the form needed to get data from 2 places and then replicating it to create a PDF binary as a document. If we can avoid this issue with a "no reporting" flag as @Bna mentions, that seems a more elegant solution. --- ## Post #8 by @ian.mcnicoll I like the idea of the 'reporting/ secondary record' flag but I've always felt that it needs to be at Entry level, not for whole Composition, which as in this case, or even some discharge reports, there is a mix of primary and secondary records. In our use-case there is also an element of potentially needing to capture the exact lab results which underpinned the decision-making, s this may not be clear when the lab results are pulled dynamically, esp from an eternal source. --- ## Post #9 by @varntzen [quote="ian.mcnicoll, post:8, topic:2691"] eternal source [/quote] How wonderful, problem with data for life solved! :-D (sorry, couldn't resist) --- ## Post #10 by @ian.mcnicoll Sometimes we have to deal with the world as-is not as we might like it!! Ideally the lab tests would be in the same consolidated CDR, of course. You still need ot flag a 'copied result' but at least everything is under your control. --- ## Post #11 by @GerardFreriks A few cents by a relative outsider. Data in any patient record has context. 1- it is from a third party like: lab results, a referral letter, … Third party data has to be admitted to the record and occasionally annotated by the author that admits it formally to the record 2- it is entered by the author Third party data will be separated from the data the author has entered either is a separate database with incoming or send documents. Documents needs to have a state (received, read, discarded) Any document is modelled using a Composition since the third party data can be a complete report with many pages and various clinical facts. In the patient record it must be possible to link to data in a document that has been read and that is stored in the document repository; the links are inserted by the author. This implies that AQL must be able to deal with referred data. Or the documents are entered in the record next to data entered by the author. In this case all Compositions need to be able to indicate that is a from a third party and has a state (received, read, discarded) The author of the patient record is able to link to data in Compositions entered as third party document. AQL needs to be take into account the flag indicating what it is: authored by the author or third party. The first solution has my preference. --- ## Post #12 by @thomas.beale [quote="ian.mcnicoll, post:8, topic:2691"] I like the idea of the ‘reporting/ secondary record’ flag but I’ve always felt that it needs to be at Entry level, not for whole Composition, which as in this case, or even some discharge reports, there is a mix of primary and secondary records. [/quote] This is what a [Citation (Entry)](https://openehr.atlassian.net/wiki/spaces/spec/pages/1510146078/Managed+List+Model) is... [quote="GerardFreriks, post:11, topic:2691"] Documents needs to have a state (received, read, discarded) [/quote] This is something we don't deal with properly - the acknowledge/read/ etc status of received documents, at least not in the RM. --- ## Post #13 by @ian.mcnicoll I've never used Citation - it feels clunky, involves referencing and precludes using AQL directly when appropriate. I'd much rather have an attribute on Entry (similar to Dips 'report') that indicated that a given Entry is not a primary datasource and should not be picked up in 'standard querying'- same intent as 'report' but much more flexible and granular. --- ## Post #14 by @heather.leslie [quote="thomas.beale, post:12, topic:2691"] This is what a [Citation (Entry)](https://openehr.atlassian.net/wiki/spaces/spec/pages/4915215/Citations) is… [/quote] Sam wrote that wiki page in 2008 and built this [CITATION archetype](https://ckm.openehr.org/ckm/archetypes/1013.1.721) in 2010. There is no EVALUATION.citation that I'm aware of, as he suggests in the CLUSTER archetype 'Use'. --- ## Post #15 by @thomas.beale [quote="heather.leslie, post:14, topic:2691"] Sam wrote that wiki page in 2008 and built this [CITATION archetype](https://ckm.openehr.org/ckm/archetypes/1013.1.721) in 2010. There is no EVALUATION.citation that I’m aware of, as he suggests in the CLUSTER archetype ‘Use’. [/quote] Oops... completely wrong link. [This is the page I intended to link](https://openehr.atlassian.net/wiki/spaces/spec/pages/1510146078/Managed+List+Model). --- ## Post #16 by @thomas.beale [quote="ian.mcnicoll, post:13, topic:2691, full:true"] I’ve never used Citation - it feels clunky, involves referencing and precludes using AQL directly when appropriate. I’d much rather have an attribute on Entry (similar to Dips ‘report’) that indicated that a given Entry is not a primary datasource and should not be picked up in ‘standard querying’- same intent as ‘report’ but much more flexible and granular. [/quote] The citation I am talking about is something like the View model that we have discussed much more recently. This covers representing: * referenced static data, e.g. an existing diagnosis or lab result * capturing an AQL result * capturing the result of an external API call, e.g. some CDS system result. The problem with putting a flag on Entry is that we are just making copies of things, and providing a way to mark the copy as a copy rather than an original. That would help a bit. But we need more than that: we need to be able to capture AQL results, API results etc. And I think it's likely that some systems / apps will not set the flag the right way, so there will be querying errors anyway. Better to have a dedicated type(s) that support different kinds of views, and also support the representation of reports and summaries as shareable documents, which a flag on Entry won't do either. --- ## Post #17 by @johnmeredith We have the notion of this "clean" architecture where pathology results will eventually be able to be referenced as a URI to a FHIR based repository. We would even be feeding some of these results where they exist primarily in openEHR, and our principle apps would interact with national services for observations or results. We have the latter now but it's behind some clunky SOAP based endpoints or direct SQL querying. Not at all acceptable in this day and age...! The flag might well be a suitable get out jail card though. --- ## Post #18 by @anoopshah I agree with John and Ian - this question also applies to data held outside of the openEHR system, e.g. in another repository, and to medication / problem lists. (e.g. GP has a problem list. Patient is admitted to hospital. Doctor copies GP problem list into hospital system, and then into electronic discharge letter. Patient is discharged to GP. GP accepts hospital discharges into the problem list. GP system now has a duplicate version of the problem list. This needs to be prevented, e.g. by linking the returned entries to the originals so that the roundtrip does not create duplicates). Could we think of the 'copied' Entry as effectively being a cached reference to the underlying source of truth? If the network was working perfectly we could forget about keeping a copy and just refer to the original source (and choose to view the version that existed at the time the composition was committed). When running queries the important thing would be deduplication if both the original and the 'copy' are retrieved. Maybe a solution could be that the Entry contains a flag to say that it is a copy, and it also contains a reference / URI to the source of truth? --- ## Post #19 by @thomas.beale [quote="anoopshah, post:18, topic:2691"] Could we think of the ‘copied’ Entry as effectively being a cached reference to the underlying source of truth? [/quote] This is the intent of [this kind of modelling](https://openehr.atlassian.net/wiki/spaces/spec/pages/1510146078/Managed+List+Model) (solution #2 and #3). --- ## Post #20 by @joostholslag One extra usecase we need to keep in mind is summarising. E.g. the highlights from a lab results are regularly put into a soap report: e.g S/ no more bleeding O/ old blood crustae in the nose, no fresh bleeding Lab/ Hb 5.1 (4.9 yesterday) E/ slowing of anemia after nose bleeding P/ hb in two days, transfusion in case of hb < 4 Do we want to put the data under O/ lab/ in an obs.lab? This will lead to duplication. As pointed out unless it’s flagged somehow it leads to duplication. (Which I don’t think is as big a problem as assumed here, since it’s obviously a duplicate, but from a different perspective (doc note instead of automated lab, which indicates increased validity and information to the lab result itself). --- ## Post #21 by @thomas.beale [quote="joostholslag, post:20, topic:2691"] This will lead to duplication [/quote] Duplication isn't the only or biggest problem. If we think about it, two physicians might well observe e.g. some skin condition or pallor or whatever, just due to the patient having two encounters not in the same health service - e.g. being on holidays for one. There's nothing innately wrong with that - they are just observations of the same thing. The trick is to know if they are really referring to the same real-world instance, or two distinct instances some time apart. For example, a mole (the same one) could be observed twice 3 months apart in separate visits. However the same kind of fracture in the radius of a child, observed and described the same way twice, 2y apart is almost certainly not the same fracture as such. To deal with this kind of thing properly requires systems that support something called Referent Tracking - look for papers by Smith & Ceusters. EDIT: typo, muscle memory took over... it's 'referent tracking', not 'reference tracking'. What I think we should be more interested in is having a way to create things that really are documents, such as summaries, discharge summaries and other kinds of reports. These things will generally not contain primary data, they will reference info already established and/or simply restate it in a linguistically summarised fashion, like you SOAP example. The links I provided above are more about this (and the most recent attempt at analysing the semantics of course includes your very useful input). --- ## Post #22 by @GerardFreriks Joost, What you describe is the missing link in modelling. Not only do we have model the data entered as item in a document. But also have to model more explicitly the **context** of the Observation, the Evaluation, the Planning, the Ordering, and the Execution. In other words the epistemology. This is a model about: where in time and place, when in absolute or relative terms, why and how of the topic in the real world. It can be modellen in Archetype Space in a Cluster. It is the issue Barry and Werner defined as "Reference Tracking". We can not exclude that Documents contain all kinds of data: O,E,P,O,E but also parts ofother documents. --- ## Post #23 by @bna [quote="joostholslag, post:20, topic:2691"] somehow it leads to duplication. (Which I don’t think is as big a problem as assumed here, since it’s obviously a duplicat [/quote] Its a bit hard for a generic engine to know about these rules. If you query for facts you expect the primary data. Other times you query all discharge summaries containing x or z. That’s another use- case. Both is needed. The models or definition of data need to be precise and shared to make this work. --- ## Post #24 by @GerardFreriks In order to prevent the problems mentioned we must be aware that we need to discern the purpose ofthe data: -1 Primary (de novo) clinical data about the patient system entered to the record -2 Secondary data (by reference) where data about the patient system is re-used for clinical reporting (referral, summary, …) -3 Tertiary data (by reference) where data about the patient system is re-used for administrative (financial) reporting Type 1 are clinical observational facts de novo facts about the patient system that are used in most queries. Type 2 are clinical (and non-clinical) reused facts about the patient system as referenced (reused) Type 1 data within the record Type 3 are clinical (and non-clinical) facts about the patient system referenced (reused) from third parties e.g. from a referral letter, report. When admitted by the author of the record it changes state and is transformed in a Type 1 entered as if it is a de novo fact. But it must still use a link to the data in the original document. --- ## Post #25 by @joostholslag [quote="GerardFreriks, post:22, topic:2691"] It can be modellen in Archetype Space in a Cluster. [/quote] Yes, I’ve played with the idea that all elements (main data point) should be in a self standing cluster archetype instead of part of the observation/evaluation. This will allow for more context info to be added to the same clinical concept. E.g. body weight is now an element in an obs. If it would be an element in a cluster, that data cluster could be either an observed body weight, an interpretation of a body weight, a target body weight etc. So for the lab example, a summary of lab result could be a different entry archetype from obs.lab_result including the same cluster.lab_result with a summary context. This will also solve this problem: [quote="bna, post:23, topic:2691"] If you query for facts you expect the primary data. [/quote] If this is the data you want you can query on the cluster where it’s part of the obs.lab_result (not the entry containing the cluster in a summary). Technically the data attribute content_item of entry archetype’s shouldn’t allow for elements, only clusters. Edit: we also need much more attributes to express the relationship between the clusters in different entries. E.g. “summary of”, “interpretation of” “exact copy of” “basis for descion of” AND we need defined expected behaviour of these relationships. E.g. a correction of a lab result that has “basis for decision ” should trigger a reevaluation of that decision. --- ## Post #26 by @bna [quote="joostholslag, post:25, topic:2691"] Yes, I’ve played with the idea that all elements (main data point) should be in a self standing cluster archetype instead of part of the observation/evaluation. [/quote] Yes. This pattern seems very attractive for lots of use-cases. We just developed an application for weight plan. The primary user/customer was a center for eating disorders. They follow up the patient to change weight. A plan might be: increase weight by x grams each week until reaching a y kg. We define the goal using the goal archetype. Here we would like to model the weight using the same structures as in observation body weight archetype. It would be good if we could reuse structure from the reviewed body weight. We see the same pattern for other use-case like NEWS2 and most of the vital signs concepts. @ian.mcnicoll mentioned they tried to model the ITEM_TREE in the early days. If I understood they rejected that modelling pattern because of limitation in the tooling. Perhaps these are ideas to revisit in 2022? --- ## Post #27 by @thomas.beale [quote="joostholslag, post:25, topic:2691"] Edit: we also need much more attributes to express the relationship between the clusters in different entries. E.g. “summary of”, “interpretation of” “exact copy of” “basis for descion of” AND we need defined expected behaviour of these relationships. E.g. a correction of a lab result that has “basis for decision ” should trigger a reevaluation of that decision. [/quote] There are a couple of ways that we can address this. One is a better defined vocabulary for codes on LINKs, which technically perform the linking you mention. Secondly, if we implement the VIEW_ENTRY concept or something close to it, such Entries will contain refs of the things they reference, and they will directly represent the notion of 'copy of', or more accurately, 'citation of'. I have proposed in the past that we support not just citations, but the representation of *documents* (aka 'reports'), which contain not only cited but copied and serialised content ([see here](https://openehr.atlassian.net/wiki/spaces/spec/pages/92358988/Reports)). A report is useful in three ways: * it acts as a literal content capture of some situation (e.g. discharge) at a point in time - it may even be signed * double query results can't happen, because the report consists of only XML or some other serial document form - there are no 'Observations' or suchlike * a report can be sent as is to any other party e.g. discharge summary document - >aged care home, GP etc. What we normally call a 'referral' could probably be treated as a report. Indeed, a 'report' could be generated for any EHR content that has to be communicated in a document-sharing paradigm. This does not preclude true sharing of content e.g. medication list, with the recipient, e.g. GP surgery. --- ## Post #28 by @siljelb Was this thread ever progressed further? The issue of primary record vs secondary (reused/copied) data is a problem which keeps coming back to bite us. [quote="thomas.beale, post:12, topic:2691"] [quote="ian.mcnicoll, post:8, topic:2691"] I like the idea of the ‘reporting/ secondary record’ flag but I’ve always felt that it needs to be at Entry level, not for whole Composition, which as in this case, or even some discharge reports, there is a mix of primary and secondary records. [/quote] This is what a [Citation (Entry)](https://openehr.atlassian.net/wiki/spaces/spec/pages/1510146078/Managed+List+Model) is… [/quote] I agree with @ian.mcnicoll : In a lot of cases there’s a mix of primary and secondary records in a single composition. I’ve read what @thomas.beale writes about CITATION, and I’m not sure I understand it. But we really need this to work without having to model and govern all archetypes as two separate classes in parallel. A blood pressure has the same (maximum) data set whether it’s in a primary or secondary record. But maybe I’m misunderstanding how the CITATION class would be used? [quote="ian.mcnicoll, post:13, topic:2691"] I’d much rather have an attribute on Entry (similar to Dips ‘report’) that indicated that a given Entry is not a primary datasource and should not be picked up in ‘standard querying’- same intent as ‘report’ but much more flexible and granular. [/quote] This sounds like a workable model to me as well. [quote="thomas.beale, post:16, topic:2691"] The problem with putting a flag on Entry is that we are just making copies of things, and providing a way to mark the copy as a copy rather than an original. That would help a bit. But we need more than that: we need to be able to capture AQL results, API results etc. [/quote] What if the “flag” is not just a boolean or something, but an object containing the retrieval date, AQLs, etc? [quote="thomas.beale, post:16, topic:2691"] And I think it’s likely that some systems / apps will not set the flag the right way, so there will be querying errors anyway. [/quote] Sure, but this goes for any kind of template modelling etc: Where there are humans, there will be errors. --- ## Post #29 by @thomas.beale [quote="siljelb, post:28, topic:2691"] What if the “flag” is not just a boolean or something, but an object containing the retrieval date, AQLs, etc? [/quote] Some DB users are sure to miss it - they will just query down a table full of records, and treat supposedly cited copies as originals. Relying on a Boolean flag, or any other added special attribute is bad practice. Analytics users will be sure to miss such attributes - they love just diving into relational tables, and they will ignore strange attributes they don’t understand. Not properly separating out primary data from copies is just fraught with potential problems. A solution I have proposed in the past represents the cited (i.e. copy) data in serialised form, e.g. JSON or similar. This means that reports, summaries etc that contain such citations have a usable representation of the cited items (e.g. if the discharge summary or report is copied elsewhere), but no structural copies exist in the DB. So no chance of duplicates in queries. The citations contain the original references to the target (cited) info items, so as long as those can be resolved, the original structural form of the cited data can be instantiated as well. [quote="siljelb, post:28, topic:2691"] Sure, but this goes for any kind of template modelling etc: Where there are humans, there will be errors. [/quote] Sure, but one of the guiding principles in openEHR has been to limit the possibilities of making errors. We don’t need to make it easier ;) --- ## Post #30 by @joostholslag We probably have to reverse the perspective on querying anyways, from filter out what you don’t need to select only what you do need, still on the same query parameters. Already the diagnosis of e.g. the same stroke will be in the (virtual, federated/distributed) EHR many times, once as a working diagnosis by the GP, then by the ambulance, then by the neurologist, another in a managed problem list and yet another in a nursing care plan. Each in it’s own composition. Off course it would help a lot if there are relations recorded between those different compositions. Now if you do a ‘dump’ query for all eval.problem_diagnosis, it may look like there’s many strokes. So you need to understand the data model (EHR, composition, archetype, template etc.) in order to create a clinical safe(ish) query. AQL is very powerful and a huge openEHR asset, but in the end it’s the clinician that needs to make the judgement of how many strokes there are. It’s very hard to automate this outside of a highly curated setting. So I would suggest not caring too much that ‘copied’ data is returned multiple times. But focus on making writing AQL queries safer. E.g. by by default only returning data if a template id is specified, suggesting to add filter parameters based on query result set etc. --- ## Post #31 by @joostholslag Thinking a bit more on this I think we need to seperate out the problems. My example is actually clinically different datapoints. It’s different clinical (re) interpretations (EVALUATION) of the same event by different clinicians, datasources (family, referral letter etc) with different (additional) data available to make the interpretation. Now for observational data this is different. The same observation (same observer etc), like bloodpressure, should be recorded only once. And when needed in a different CDR, in my opinion the CDRs should be federated (aql federation spec is under way, distributed editing of a VERSIONED is still a challenge technology wise; recently discussed this privately with @sebastian.iancu and @ian.mcnicoll ) If a federation is unfeasible, e.g. because the source data is not openEHR, or there’s no shared infrastructure, quoting makes a lot of sense. I like Thom’s suggestion of (optionally) persisting the source data as a binary/JSON (logically) contained by the composition. As long as it’s clear AQL federation is the ideal solution. The JSON could then contain any openEHR data, but the AQL engine and client apps must recognise this as special data, that’s not by default returned. Maybe we could use a specific new data type for this, DV_CITATION perhaps. Or potentially a different ENTRY subclass: CITATION_ENTRY (or is that covered by DV_PARSABLE. Or is that already covered by [GENERIC_ENTRY](https://specifications.openehr.org/releases/RM/development/data_types.html#_dv_parsable_class)?. This would pull it into the clinical modelling domain. Anyways it would probably help if there’s an attribute for typing the data in the binary/json to recognise wether it’s openehr data How would this relate to IMPORTED_VERSION? Another priority, is improving the DV\_(EHR)\_URI by adding constraints on the target of the URI. Eg ‘should only point’ to eval.problem_diagnosis. That would help a lot in relating data eg in different compositions. The design for this is mostly agreed, just a regex (I know..). This is also relevant politically, at least in NL, because it’s one of the very few features openEHR doesn’t support, and related standards like ISO 13972 CIM/zib does Finally we should relate this discussion to progress the other recent discussions on how to relate different datapoints. Currently there’s a lot of options, but too little consensus and feature parity. In addition to the options mentioned above (parsable, generic entry, dv ehr uri, composition containment) there’s also FOLDERs and links and adding a specific archetypes cluster with an identifier. --- ## Post #32 by @siljelb [quote="thomas.beale, post:29, topic:2691"] Some DB users are sure to miss it - they will just query down a table full of records, and treat supposedly cited copies as originals. Relying on a Boolean flag, or any other added special attribute is bad practice. Analytics users will be sure to miss such attributes - they love just diving into relational tables, and they will ignore strange attributes they don’t understand. [/quote] Yeah, I think this would need to be coded into the CDR, sort of like what DIPS has done with the Report COMPOSITIONS. [quote="thomas.beale, post:29, topic:2691"] A solution I have proposed in the past represents the cited (i.e. copy) data in serialised form, e.g. JSON or similar. This means that reports, summaries etc that contain such citations have a usable representation of the cited items (e.g. if the discharge summary or report is copied elsewhere), but no structural copies exist in the DB. So no chance of duplicates in queries. The citations contain the original references to the target (cited) info items, so as long as those can be resolved, the original structural form of the cited data can be instantiated as well. [/quote] Is this the CITATION class as outlined in [Managed List Model - Specifications - Confluence](https://openehr.atlassian.net/wiki/spaces/spec/pages/1510146078/Managed+List+Model) ? I’m still struggling to understand how this would work modelling wise. How would archetypes based on the CITATION class get modelled? Or am I misunderstanding the concept completely? [quote="joostholslag, post:31, topic:2691"] Now for observational data this is different. The same observation (same observer etc), like bloodpressure, should be recorded only once. And when needed in a different CDR, in my opinion the CDRs should be federated [/quote] [quote="joostholslag, post:31, topic:2691"] If a federation is unfeasible, e.g. because the source data is not openEHR, or there’s no shared infrastructure, quoting makes a lot of sense. [/quote] The situation we’re often seeing is something like “use this AQL query to see if there’s an existing Barthel index within scope, and if not allow the user to enter it”. Sometimes both the query result and the user entered variants would be considered “secondary” or a “copy”, while in other cases the user entered data is an actual primary record. So the application would have to be able to switch between these contexts on the fly, and persist the data *in the same template path* so it can be retrieved within context without a hugely complex AQL. --- ## Post #33 by @thomas.beale [quote="joostholslag, post:30, topic:2691"] Already the diagnosis of e.g. the same stroke will be in the (virtual, federated/distributed) EHR many times, once as a working diagnosis by the GP, then by the ambulance, then by the neurologist, another in a managed problem list and yet another in a nursing care plan. Each in it’s own composition. [/quote] well there could easily be multiple mentions of symptoms plus an interpretation of ‘probable stroke’ or similar. However, only one formal diagnosis would appear on the master problem list, if it is being managed properly. It usually will be - physicians are very careful about distinguishing multiple strokes (heart attacks, …), since that changes the picture on prognosis. What we have been discussing is digital copies of the same information item, not multiple real world mentions of (maybe) the same health event / state - they are different issues. [quote="joostholslag, post:31, topic:2691"] Thinking a bit more on this I think we need to seperate out the problems. My example is actually clinically different datapoints. It’s different clinical (re) interpretations (EVALUATION) of the same event by different clinicians, datasources (family, referral letter etc) with different (additional) data available to make the interpretation. [/quote] Exactly. [quote="joostholslag, post:31, topic:2691"] If a federation is unfeasible, e.g. because the source data is not openEHR, or there’s no shared infrastructure, quoting makes a lot of sense. [/quote] I used to think that too, but after 30y of looking at EMRs, and dreams of a better future, I’d say we are no closer to that perfect world. A more realistic way of seeing things is not a perfect federation (also requires 100% uptime to guarantee that reference resolution always works - never going to happen) but a true patient-controlled, patient-centric record, hosted in a reliable fault-tolerant (possibly distributed) infrastructure, with which any clinical worker interacts. This will take at least 10 years to catch on, even if we assume it is now understood as a sensible aspiration, which it now is, in the US. IN Europe we are further ahead in our thinking, but the practice is still the same. Until we get such patient-centric records, we are looking at (at best) imperfect federations with multiple mentions of the same things, potentially synchronisable across locations. THe openEHR version control system was designed for this reality, which is why there is branching in the versioning scheme - that’s what allows merging etc. Re: the [citation solution](https://specifications.openehr.org/releases/UML/development/index.html#Diagrams___19_0_83e026d_1574366530113_751508_5059), we don’t need special data types, but we do need some extra classes and special attributes to manage it properly. This draft model is close (it’s not quite right though; the VIRTUAL class should have a `serialised` attribute, and `resolved` should be removed). [quote="joostholslag, post:31, topic:2691"] Another priority, is improving the DV\_(EHR)\_URI by adding constraints on the target of the URI. Eg ‘should only point’ to eval.problem_diagnosis. That would help a lot in relating data eg in different compositions. The design for this is mostly agreed, just a regex (I know..). This is also relevant politically, at least in NL, because it’s one of the very few features openEHR doesn’t support, and related standards like ISO 13972 CIM/zib does [/quote] I would not try to do this with `DV_EHR_URI`, but with Citations. [quote="siljelb, post:32, topic:2691"] Is this the CITATION class as outlined in [Managed List Model - Specifications - Confluence](https://openehr.atlassian.net/wiki/spaces/spec/pages/1510146078/Managed+List+Model) ? [/quote] That was an early version of it. The discussion is correct (IMO at least), but I developed a draft version of a better model (see link above). Unfortunately the version online is not quite correct, but it’s pretty close. [quote="siljelb, post:32, topic:2691"] I’m still struggling to understand how this would work modelling wise. How would archetypes based on the CITATION class get modelled? Or am I misunderstanding the concept completely? [/quote] Very good question - this stuff has to work from a modelling point of view. You will see in [that model](https://specifications.openehr.org/releases/UML/development/index.html#Diagrams___19_0_83e026d_1574366530113_751508_5059) that the `VIRTUAL` class attributes, which are inherited into `VIRTUAL_ENTRY` and `VIRTUAL_ITEM` are of type `LOCATABLE_REF`. We need to add a new meta-type to the archetype model (i.e. the AOM) that allows constraining of references of such types. Then we could create constraints in an archetype editor for `commit_context` , `entry_context`, etc, that would limit them to e.g. a lab result Composition and a Lab result Observation or similar. There’s a bit more work to get this right, so don’t take the above as 100% literal truth. I need to do a few experiments on the structures in the ADL Workbench to see what works best. As an aside, there is a whole other issue of references to Entities like devices, substances (e.g. medicines), persons and places, which is not handled correctly in openEHR. I solved this while in the US and the solution looks like it can be retro-fitted to the openEHR RM. I’ll get a description of that online as soon as I can. [quote="siljelb, post:32, topic:2691"] The situation we’re often seeing is something like “use this AQL query to see if there’s an existing Barthel index within scope, and if not allow the user to enter it”. Sometimes both the query result and the user entered variants would be considered “secondary” or a “copy”, while in other cases the user entered data is an actual primary record. [/quote] Not an uncommon situation (happens with BMI and all sorts of things). But remember the user-entered ‘copy’ is not a copy of anything, it’s a ‘repeat mention’ of something in the real world, that might have already been described - just like a BP recorded by a machine into the EHR, and then a nurse comes along 1 minute later and does a manual BP and enters that into the EHR. It’s just another ‘sample’ of the same situation. Querying has to be able to distinguish between: * mentions of distinct events / states, e.g. 3 measurements of high BP months apart, with normal BPs in between * repeat mentions of the same event / state, i.e. multiple ‘samples’ of the same thing - could be made in different EHR instances, e.g. one in the regional health portal, and the other in a hospital system that is running openEHR; * digital copies of primary information - the thing we want to AVOID As a clinical professional you will know better than I do that the first two are not always easy to distinguish. How do we know when it is 3 distinct episodes of high BP rather than just one long one? A Fib is probably an even better example. THis is part of the diagnostic work up - the query cannot tell you - it is a higher level of inferencing. [quote="siljelb, post:32, topic:2691"] So the application would have to be able to switch between these contexts on the fly, and persist the data *in the same template path* so it can be retrieved within context without a hugely complex AQL. [/quote] We certainly do not want different archetype level models for the same thing, ever. --- ## Post #34 by @siljelb [quote="thomas.beale, post:33, topic:2691"] You will see in [that model](https://specifications.openehr.org/releases/UML/development/index.html#Diagrams___19_0_83e026d_1574366530113_751508_5059) that the `VIRTUAL` class attributes, which are inherited into `VIRTUAL_ENTRY` and `VIRTUAL_ITEM` are of type `LOCATABLE_REF`. We need to add a new meta-type to the archetype model (i.e. the AOM) that allows constraining of references of such types. Then we could create constraints in an archetype editor for `commit_context` , `entry_context`, etc, that would limit them to e.g. a lab result Composition and a Lab result Observation or similar. [/quote] Does that mean that the `VIRTUAL`class could *contain* for example a `COMPOSITION` or `ENTRY`? So if I’m entering a body weight as a secondary record, the path to the weight would look something like `[the virtual container]/[some new meta-type]/openEHR-EHR-OBSERVATION.body_weight.v2/data[at0002]/events[at0003]/data[at0001]/items[at0004]`? --- ## Post #35 by @bna Time flies when having fun modelling :slight_smile: As I mentioned we defined a COMPOSITION.category with value 434 - report many years ago. This was also proposed to the SEC group. There was no other in the community which had the same needs then. That’s why it was not added to the specifications. There are multiple use-cases for such entries into the EHR. The most common one is a discharge summary from a stay at the hospital. In such documents you will copy information from other sources like lab results, vital signs, previous diseases and other findings during the stay. When a user commit such information to the EHR it is not new information - it is a copy of previous data. Still we wanted a way to handle versions and also to be able to query the data. Like find discharge summaries where the patient has diabetes, a lab test with a specific value, blood pressure above X and other relevant information. So we added the new category and tuned the AQL engine in the CDR to leave out such data by default. When needed users can query for compositions matching the COMPOSITION.category. I read the specs today and find a new COMPOSITION.category (815 - report). I am not sure if the intention of this category is the same as we proposed many years ago. See https://specifications.openehr.org/releases/TERM/development/SupportTerminology.html#\_composition_category --- ## Post #36 by @ian.mcnicoll That spec change was indeed to support your use-case directly, @bna . As you know I was supportive but felt that we might need to stage Entries in a similar way, as e.g a Discharge summary is of ten a mix of new and copied data. However, we explored this in some depth with a client recently and came to the conclusion that your approach is probably optimal. 1. The main challenge is to prevent ‘copied’ data being re-included in a cross-composiiton query like ‘show me all recent blood pressures’, and flagging some compositions as ‘report’ does allow these to be excluded. 2. It is definitely a requirement to be able to query some ‘new’ data directly via AQL e.g ‘Discharge medications’ but that can be done by querying on the templateId or composition/name/value, and these are very unlikely to ve cross-composiiton queries. Even if these are required, it just makes for a more complex query. 3. It is possible to use Citations / Links, and there are certainly use-cases for these e.g. managing problem lists, but querying is more complex and in the use-case of ‘reports’ e.g. Discharge summary, there is an argument that the composition represents a ‘snapshot’ document and should carry the copies directly. So I think the ‘report’ category works for discharge type reports and also agree with @joostholslag that this also requires more targeted querying to known ‘curated/managed lists. We do also need to update the way that we manage links/citations especially as increasingly some of these links will resolve via FHIR References, and have better support in AQL to resolve those references. --- ## Post #37 by @thomas.beale [quote="siljelb, post:34, topic:2691"] Does that mean that the `VIRTUAL`class could *contain* for example a `COMPOSITION` or `ENTRY`? So if I’m entering a body weight as a secondary record, the path to the weight would look something like `[the virtual container]/[some new meta-type]/openEHR-EHR-OBSERVATION.body_weight.v2/data[at0002]/events[at0003]/data[at0001]/items[at0004]`? [/quote] The original idea of the `VIRTUAL` class, from which `VIRTUAL_ENTRY` etc are derived, is that it acts like a real `ENTRY`, so for example a citation of an earlier lab result as justification for a diagnosis is in the data as a `VIRTUAL_ENTRY` corresponding to the cited `OBSERVATION` . Rendering software knows it is a virtual entry so can show its contents but in a way that makes it clear it is a citation. Similarly, the querying service can ignore virtual Entries, Compositions etc easily, because they are not instances of the primary classes. But a special query could find citations as well. There are some diagrams of use cases in earlier discussion on this topic. [BMI example here](https://discourse.openehr.org/t/citations-and-references-bmi-example/3117) and [Care plan example here](https://discourse.openehr.org/t/citations-and-references-care-plan-example/3122). At the archetype modelling level, you would model original content normally, and you would constrain virtual items within citing Compositions (such as referrals, discharge summaries, reports etc) to be of certain types if you wanted. I don’t claim all this is 100% worked out, at least I have not had the time to work on it for a few years, but I think the main ideas are in place. I hope to get back to this and related questions soon. --- ## Post #38 by @siljelb [quote="thomas.beale, post:37, topic:2691"] The original idea of the `VIRTUAL` class, from which `VIRTUAL_ENTRY` etc are derived, is that it acts like a real `ENTRY`, so for example a citation of an earlier lab result as justification for a diagnosis is in the data as a `VIRTUAL_ENTRY` corresponding to the cited `OBSERVATION` . [/quote] The “thinking math” meme is a good illustration of how I feel right now. I think I’ll need this explained with a whiteboard and follow-up questions in order to fully grasp how this would work. Would it be possible to have a short session about this at EHRCON, do you think? --- ## Post #39 by @thomas.beale [quote="ian.mcnicoll, post:36, topic:2691"] It is possible to use Citations / Links, and there are certainly use-cases for these e.g. managing problem lists, but querying is more complex and in the use-case of ‘reports’ e.g. Discharge summary, there is an argument that the composition represents a ‘snapshot’ document and should carry the copies directly. [/quote] That’s true, and that’s why I proposed that the copies are also carried inline *serialised* form, i.e not structured form. So if the report is copied to some external system, no cited content is lost. THere is an argument to say that a ‘report’ should be represented by: * a ‘source’ structure, which contains virtual entry citations and so on, AND * a fully serialised document form, i.e. what people usually think of as a ‘report’ The latter is generated from the former, and can be persisted in some convenient place, and shared as needed. It knows its generator, which contains all the citation links, so all original content can be found if necessary. The interesting question is then what querying on reports looks like. Probably text-based. [quote="ian.mcnicoll, post:36, topic:2691"] 1. We do also need to update the way that we manage links/citations especially as increasingly some of these links will resolve via FHIR References, and have better support in AQL to resolve those references. [/quote] In openEHR, we should use proper Entity refs, which I mentioned earlier above (these would be a new addition). Ideally we would replace all inline reference to devices, persons and so on with such references. One of the things we found at Graphite was that having these inline inline caused a data explosion, not to mention endless copies of the same data. This is a pretty serious issue which we need to address soon. --- ## Post #40 by @ian.mcnicoll Now I remember!! - the intention of COMPOSITION.category (815 - report) was exactly to meet the requirement of the DIPS ‘COMPOSITION.category with value 434’ but when we came to adding it the spec/terminology we realised that 434 had already been allocated. --- ## Post #41 by @linforest Recently, I’ve been thinking about how to handle with the health data OCRed from paper-based health check-up report and other clinical documents? --- ## Post #42 by @ian.mcnicoll I think that one is relatyively easy - it is not really copied data in the sense of being ‘duplicated’ inside the CDR. You can use FEEDER_AUDIT to capture the source system, other metadata and the original *c*ontent if needed. Every archetype node in openEHR can carry feeder_audit information via the LOCATABLE class. Something like this in FLAT format "vitals/vitals/body_temperature:0/\_feeder_audit/original_content": “SVB$GNGE@£$%^&\*£DE", "vitals/vitals/body_temperature:0/\_feeder_audit/original_content|formalism": "jpg", "vitals/vitals/body_temperature:0/\_feeder_audit/originating_system_audit|system_id": "MylovelyOCRApp", "vitals/vitals/body_temperature:0/\_feeder_audit/originating_system_audit|version_id": "2.34" } ### --- ## Post #43 by @linforest That’s exactly what I was looking for! Thanks a lot, @ian.mcnicoll ! --- **Canonical:** https://discourse.openehr.org/t/how-accurately-do-we-model-copied-data/2691 **Original content:** https://discourse.openehr.org/t/how-accurately-do-we-model-copied-data/2691