Questions about ADL/AOM 1.5, archetype flattening and operational templates

Hi,

I’m reading this page trying to understand how to implement archetype flattening and operational template support to our EHRGen project: http://www.openehr.org/wiki/pages/viewpage.action?pageId=196633#openEHRADL%26AOM1.5-TemplatesandSpecialisedArchetypes-Source%2Cflatandoperationalformsofarchetypessupported

What I don’t get is: when you have a flat archetype (e.g. without slots, internal refs and only with the specialized nodes) or an operational template (also flat), where is the reference to the original archetype nodes in the flattened AOM object for the resolved references (slots, internal refs, etc.)?

For example:

Archetype A: [at0000] OBS → [at0001] HISTORY → [at0002] EVENT (slot to archetype B)
Archetype B: [at0000] EVENT → [at0001] ITEM_TREE → …

Flattened: (Archetype A) [at0000] OBS → [at0001] HISTORY → [at0002] EVENT → (Archetype B) [at0000] EVENT → [at0001] ITEM_TREE → …

If I use the flattened archetype in my application, I would like to know what is the original archetype that constrained my EVENT, because could create queries based on the paths of that archetype. Maybe there’s another way of doing the same that I can’t see yet.

Thanks a lot!

Hi Pablo,

when archetypes are flattened, their ids replace the at-codes at the root points. This example shows the flattened version of the EHR_EXTRACT template test archetype. You can see the archetype ids, also a remaining open slot. It’s not a proper OPT, so the top root node does not yet have the archetype id substituted.

Is the atXXXX from the solved slot lost? Is not possible to redefine
the text or description and change it from the at0000 of the included
slot?
I think it would be useful to have it somehow

Also, are the atXXXX from the resolved template changed in any way?
I see EXTRACT_CHAPTER with [at0002] and ELEMENT with [at0002.1], which
I think it may be changing specialization semantics

Is the atXXXX from the solved slot lost? Is not possible to redefine
the text or description and change it from the at0000 of the included
slot?
I think it would be useful to have it somehow



the information is still in the archetype - the ontology section for a template looks something like this:

ontology
term_definitions = <
[“en”] = <
[“at0000.1”] = <
text = <“Discharge summary”>
description = <“Discharge summary document for patient leaving a hospital”>

[“at0006.1”] = <
text = <“Clinical data”>
description = <“Clinical discharge data for patient”>

[“at0100.1”] = <
text = <“Patient demographics slot - closed”>
description = <“Patient demographics slot - closed”>


[“at0103”] = <
text = <“Patient discharge data slot”>
description = <“Patient discharge data slot”>

component_ontologies
component_ontologies = <
[“openEHR-DEMOGRAPHIC-PERSON.t_patient_ds.v1”] = <
term_definitions = <
[“en”] = <
[“at0000.1”] = <
text = <“Patient demographic data”>
description = <“Simple patient personal demographic data”>

[“at0010.1”] = <
text = <“Other details - closed slot”>
description = <“Other details - closed slot”>


[“at0040”] = <
text = <“Relationship type”>
description = <“Defines the type of relationship between related persons.”>

constraint_definitions = <
[“en”] = <
[“ac0000”] = <
text = <“Codes for type of relationship”>
description = <“Valid codes for type of relationship.”>

[“openEHR-DEMOGRAPHIC-ORGANISATION.healthcare_establishment.v1”] = <
term_definitions = <
[“en”] = <
[“at0000”] = <
text = <“Organisation”>
description = <“Organisation demographic data”>

[“at0001”] = <
text = <“Identification”>
description = <“Identification - the names the organisation is known by”>

[“at0003”] = <
text = <“Name”>
description = <“An organisation name”>

[“at0004”] = <
text = <“Identifier”>
description = <“An organisation Identifier”>

[“openEHR-DEMOGRAPHIC-PERSON.healthcare_professional.v1”] = <
term_definitions = <
[“en”] = <
[“at0000”] = <
text = <“Healthcare professional demographic data”>
description = <“Healthcare professional demographic data”>

[“at0001”] = <
text = <“Identification”>
description = <“Identification - the names the professional is known by”>

[“at0003”] = <
text = <“Name”>
description = <“Healthcare professional name”>

[“at0004”] = <
text = <“Identifier”>
description = <“Healthcare professional Identifier”>

[“openEHR-EHR-COMPOSITION.t_clinical_info_ds.v1”] = <
term_definitions = <
[“en”] = <
[“at0000.1”] = <
text = <“Clinical detail”>
description = <“Clinical detail of Simple discharge summary”>

[“at0000”] = <
text = <“Discharge”>
description = <“A summarising communication about at the time of discharge from an institution or an episode of care”>

[“at0027”] = <
text = <“Details”>
description = <“*”>

these two codes don't happen to be related - the at0002 is from the Extract archetype (openEHR-EHR_EXTRACT-EXTRACT.t_basic_discharge_summary.v1), the at0002.1 is from the demographic archetype openEHR-DEMOGRAPHIC-CLUSTER.t_person_race_data_ds.v1.

Within the same archetype, a code of the form atN.M.P is a specialisation of a code atN.M from a parent, and transitively, of atN from the topmost archetype. That means a query that mentions 'atN' should match instances of the other two (within the relevant archetypes of course, not some other unrelated archetypes). How this is physically done depends on how a query processor is implemented.

- thomas

So if we want to create two siblings slots (e.g. patient data and
relative data) which could have their own text, description (even
code), but are solved to the same archetype. Is it possible to do
that?

But if you have sibling nodes (see the example above) you will have
paths that won't be unique as original slot atXXXX is lost

Sibling nodes are no problem. Let's say you want to fill them with 2 archetypes

openEHR-demographic-PERSON.person_type_A.v1
openEHR-demographic-PERSON.person_type_B.v1

there is no problem. If you want to specialised the archetype openEHR-demographic-PERSON.person_type_A.v1 into different things, you can get e.g.

openEHR-demographic-PERSON.person_type_A1.v1
openEHR-demographic-PERSON.person_type_A2.v1

(the names can be whatever, obviously)

- thomas

Hi Thomas,

This example is very helpful, thanks.

About Diego’s questions and your answers on other emails, as I understand I have to “merge/resolve” the ontology section too, so all needed codes are there without ambiguity.
Is the " component_ontologies" a constructor from ADL1.5?

About the new nodeId codes with archetype ids, this should be transparent to software applications or at some point do I have to differentiate between normal at codes and archid codes?
E.g. I see descriptions for normal at codes in termDefinitions but for those nodeIds with archetype id the codes are defined in component_ontologies section. Maybe there are other cases where those codes should be treated differently. It would be nicer to don’t interpret the internal structure of nodeID for implementation simplicity.

(attachments)

ijghheei.png

so you have to define two different archetype id even if the
archetypes are the same?
and again, slot text, description and codes are lost with this kind of approach

if the archetypes are the same, you just use that archetype once, and allow multiple occurrences. There is never a need to duplicate an identical constraint object in an archetype.

I am not sure what you mean by the ‘slot text, description and code being lost’. Everything is right there in its archetype. A template contains all the codes. It doesn’t include copies of the description because it doesn’t need it - flattened objects are operational entities (‘compiled’ entities) not source entities. It’s the same when you compile Java source code - the comments disappear in the output.

  • thomas

Hi Thomas,

This example is very helpful, thanks.

About Diego’s questions and your answers on other emails, as I understand I have to “merge/resolve” the ontology section too, so all needed codes are there without ambiguity.
Is the " component_ontologies" a constructor from ADL1.5?

that operation is pretty simple. For each specialised archetype, you flatten the terminology simply by adding the terms - technically you can just do a Hash table merge. In the Eiffel code of the flattener, search for ‘flatten_ontology’, you will see it is a single statement, which I imagine will be nearly the same in Java.

When you do a template flatten, you have to construct the component_ontologies, which is a bit more work, but really it is just the addition of already flattened ontology sections into a container data structure. See the last routine in the above file and the routine ‘add_component_ontology’ here.

About the new nodeId codes with archetype ids, this should be transparent to software applications or at some point do I have to differentiate between normal at codes and archid codes?

the software does need to know that it will encounter ‘codes’ in real data built from templates that will in fact be Archetype Ids.

E.g. I see descriptions for normal at codes in termDefinitions but for those nodeIds with archetype id the codes are defined in component_ontologies section. Maybe there are other cases where those codes should be treated differently. It would be nicer to don’t interpret the internal structure of nodeID for implementation simplicity.

actually, you can treat the archetype ids that you hit in the archetyped data simply as strings. It may seem strange at first, but in fact, it is just an efficient method of storing ‘code name spaces’ (the archetype ids and codes within each namespace. People sometimes ask: why aren’t we using OIDs or GUIDs for this? I originally thought we would. But apart from the fact that functionally they don’t add any particular value (they are just strings as well), there are advantages with the current method:

  • at-codes are generally 4-10 byte strings (the vast majority are 6). Oids are variable (but usually long due to the leader part) and GUIDs are stored as a 36-byte string or a 16-byte integer. Let’s say the median difference is 30 bytes if the GUID is stored in its most common String form. For an average archetype with 20 data points, this means 19 x 30 = 570 bytes extra if GUIDs are used. In a realistic COMPOSITION with average 3 archetypes, this is 1,710 extra bytes. This could easily be a significant proportion of the total COMPOSITION size, which means it can affect the persistence provisioning requirement. On its own, possibly not such a big problem, but if no effort is made to keep data space-efficient, all the inefficiencies can change the final deployment costs and performance significantly.
  • at-codes have the structure where child codes (‘child’ in the IS-A subsumption sense) have the same code as the parent but with ‘.’ sections appended, e.g. at0002.2.17, enabling very efficient query processing. GUIDs and OIDs don’t have this property, so your query processor has to do extra work to figure out if a code is a) specialised and b) a child of some other code.
  • the path processing of the data is simplified, and also the paths themselves are relatively short. Paths with OIDs or GUIds would be far longer.

That said, in the future it may be that the archetype_ids (not the at-codes) might be replace by archetype GUIds (or both might be used), for the purposes of some kinds of fast GUID-based indexing.

  • thomas

wow, I have a lot to study and try :smiley:
this might be a good weekend project :wink:

thanks a lot Thomas!

Ok, let me make an example so I can explain me better. I'm not saying
this is the way we should model this case, but just to show that the
use case is there.

If we get blood pressure archetype and decide to represent systolic,
diastolic, and mean arterial pressure as slots to another archetype
(in this case pressure_measurement), you get something like this

http://img717.imageshack.us/img717/6919/a4e77856c56c4c5499c5d1b.png

this is the ADL code:

definition
    ENTRY[at0000] occurrences matches {1..1} matches { -- Blood Pressure
        items existence matches {0..1} cardinality matches {0..*;
unordered} matches {
            CLUSTER[at0001] occurrences matches {0..*} matches { --
Blood Pressure Measurement
                parts existence matches {0..1} cardinality matches
{0..*; unordered; unique} matches {
                    allow_archetype ELEMENT[at0003] occurrences
matches {0..*} matches { -- Systolic
                        include
                            archetype_id/value matches
{/CEN-EN13606-ELEMENT\.pressure_measurement\.v1/}
                    }
                    allow_archetype ELEMENT[at0006] occurrences
matches {0..*} matches { -- Diastolic
                        include
                            archetype_id/value matches
{/CEN-EN13606-ELEMENT\.pressure_measurement\.v1/}
                    }
                    allow_archetype ELEMENT[at0009] occurrences
matches {0..*} matches { -- Mean Arterial Pressure
                        include
                            archetype_id/value matches
{/CEN-EN13606-ELEMENT\.pressure_measurement\.v1/}
                    }
                }
                structure_type existence matches {1..1} matches {
                    CS occurrences matches {1..1} matches { --
                        codeValue existence matches {0..1} matches {"STRC01"}
                        codingSchemeName existence matches {0..1}
matches {"CEN/TC251/EN13606-3:STRUCTURE_TYPE"}
                    }
                }
            }
        }
    }

ontology
    terminologies_available = <"SNOMED-CT", ...>
    term_definitions = <
        ["es"] = <
            items = <
                ["at0000"] = <
                    text = <"Blood Pressure">
                    description = <"Blood Pressure">
                >
                ["at0001"] = <
                    text = <"Blood Pressure Measurement">
                    description = <"a meassure of a BP">
                >
                ["at0003"] = <
                    text = <"Systolic">
                    description = <"Peak systemic arterial blood
pressure - measured in systolic or contraction phase of the heart
cycle.">
                >
                ["at0006"] = <
                    text = <"Diastolic">
                    description = <"Minimum systemic arterial blood
pressure - measured in the diastolic or relaxation phase of the heart
cycle.">
                >
                ["at0009"] = <
                    text = <"Mean Arterial Pressure">
                    description = <"The average arterial pressure that
occurs over the entire course of the heart contraction and relaxation
cycle.">
                >
            >
        >
    >
    constraint_definitions = <
    >
    term_binding = <
        ["SNOMED-CT"] = <
            items = <
                ["at0003"] = <[SNOMED-CT::163030003]>
                ["at0009"] = <[SNOMED-CT::6797001]>
                ["at0006"] = <[SNOMED-CT::163031004]>
            >
        >
    >

In the cases like this, if you resolve pressure_measurement then you
get something like this

definition
    ENTRY[at0000] occurrences matches {1..1} matches { -- Blood Pressure
        items existence matches {0..1} cardinality matches {0..*;
unordered} matches {
            CLUSTER[at0001] occurrences matches {0..*} matches { --
Blood Pressure Measurement
                parts existence matches {0..1} cardinality matches
{0..*; unordered; unique} matches {

ELEMENT[CEN-EN13606-ELEMENT.pressure_measurement.v1] occurrences
matches {1..1} matches {
                       value existence matches {0..1} matches {
                          PQ occurrences matches {0..1} matches { -- PQ
                             units existence matches {1..1} matches {
                                CS occurrences matches {1..1} matches { --
                                   codeValue existence matches {0..1}
matches {"mm[Hg]"}
                                   codingSchemeName existence matches
{0..1} matches {"UCUM"}
                                  }
                             }
                             value existence matches {1..1} matches
{|>0.0..<1000.0|}
                          }
                       }
                    }

ELEMENT[CEN-EN13606-ELEMENT.pressure_measurement.v1] occurrences
matches {1..1} matches {
                       value existence matches {0..1} matches {
                          PQ occurrences matches {0..1} matches { -- PQ
                             units existence matches {1..1} matches {
                                CS occurrences matches {1..1} matches { --
                                   codeValue existence matches {0..1}
matches {"mm[Hg]"}
                                   codingSchemeName existence matches
{0..1} matches {"UCUM"}
                                  }
                             }
                             value existence matches {1..1} matches
{|>0.0..<1000.0|}
                          }
                       }
                    }

ELEMENT[CEN-EN13606-ELEMENT.pressure_measurement.v1] occurrences
matches {1..1} matches {
                       value existence matches {0..1} matches {
                          PQ occurrences matches {0..1} matches { -- PQ
                             units existence matches {1..1} matches {
                                CS occurrences matches {1..1} matches { --
                                   codeValue existence matches {0..1}
matches {"mm[Hg]"}
                                   codingSchemeName existence matches
{0..1} matches {"UCUM"}
                                  }
                             }
                             value existence matches {1..1} matches
{|>0.0..<1000.0|}
                          }
                       }
                    }
                }

And as you can see, you have lost text, descriptions, and codes.

This kind of problem can in fact show up. e.g. AIDS report should
require two different AIDS tests, one for the first test and another
for the confirmation test.
Another different example could be having a main diagnosis (as an
obligatory slot with their own code), and secondary diagnosis (0..*
slot with their own code) referring both to an hypothetical diagnosis
archetype

I hope it takes you longer than that… it took me a lot longer to work it out in the first place :wink:

  • thomas

The example below I would say is taking things to extremes. Normally, if you are going to create separate archetypes, they have distinct semantics. Here you are trying to use one archetype for three purposes, but to nevertheless retain the semantic distinctions inside the parent archetype, rather than specifying them in the child archetypes. So one has to ask the question: why bother with separate archetypes here? If you really want to have this ELEMENT archetype for some the purpose of reuse, then you can constraint ELEMENT.name to be the coded term you want in each case i.e. ‘systolic BP’ etc.

I have to admit I don’t see much use in having such an ELEMENT archetype, because it is not really saying anything much. Defining the same thing inline seems to be clearer and easier.

Do you have any more realistic examples?

  • thomas
(attachments)

OceanInformaticsl.JPG

Hi All

There is no limit to the complexity we can all support - but you will lose the clinicians if the level of fragmentation and reuse gets beyond a certain point. One advantage of openEHR is that we have pushed the very common patterns (e.g. timing, distributed workflow) into the reference model.

I would recommend using examples from current models.

Cheers, Sam

Hi Sam,

I would agree. I found something similar when modelling the RCPA histopathology archetypes. While it seemed sensible to model certain aspects e.g. ‘Lymph node findings’ and ‘excision margins’ as generic clusters, it became clear that the different reporting requirements of each cancer type forced me to include more and more variants in the child cluster archetypes, most of which needed to be constrained out at template level for individual cancer reports. Increasing fragmentation also adds to the burden of authoring and reviewing archetypes. If/when I revisit the histpoath archetypes I would reduce the number of separate Clusters and model more in-line for each individual cancer.

There are opportunities to re-use patterns at CLUSTER and occasionally ELEMENT level but there are drawbacks, as many of these seeming patterns end up being elusive and unhelpful.

Ian

A more realistic example:

http://img96.imageshack.us/img96/8431/8566bdf17b8b46ad85acbb3.png

definition
    COMPOSITION[at0000] occurrences matches {1..1} matches { -- HIV report
        content existence matches {0..1} cardinality matches {1..2;
ordered; unique} matches {
            allow_archetype OBSERVATION[at0001] occurrences matches
{0..*} matches { -- Initial Test
                include
                    archetype_id/value matches
{/openEHR-EHR-OBSERVATION\.HIV_Test\.v1/}
            }
            allow_archetype OBSERVATION[at0002] occurrences matches
{0..*} matches { -- Confirmation Test
                include
                    archetype_id/value matches
{/openEHR-EHR-OBSERVATION\.HIV_Test\.v1/}
            }
        }
    }

This report includes an initial test and a confirmation test, both HIV
Tests (which in fact have their own snomed codes). Initial and
confirmation test can be checked using different techniques.

Again, if you resolve the slot you are losing the information that one
is an initial test and the other is a confirmation test and you .