The good, the bad and the "Wat?" of current simplified FLAT/SimSDT openEHR exchange format

erik.sundvall · 9 April 2023 06:42

Hi!

At Karolinska we are a bit confused about how to format/use some things in different forms of the FLAT/SimSDT format. We are testing at least three CDR products that are supposed to support it and at least two (non-openEHR-based) systems that try to use it correctly for exporting/converting form data. I’m starting this thread to collect some questions (and we hope also answers) regarding confusing things including some that remind me of the Gary Bernhardt’s “Wat?” short speech at CodeMash 2012.

The related discussion threads and documentation we have looked at so far when trying to reduce confusion are:

EHRbase’s extensive explanations of the FLAT format in chapters 2.5.2 &
- 2.5. Step 4: Load Data — EHRbase documentation
- 9. Flat — EHRbase documentation
The unfinished specification documents with maturity status “DEVELOPMENT”
Discussion threads:
- Understanding Flat Composition JSON
- …

There are many really good things with the key+value approach of the simSDT/FLAT (and the less used simNC) approaches. Combined with the JSON based “web template” (as explained in 2.5 of EHRbase docs linked above) the appproach has in very short time made it possible to use openEHR templates as a base for configurable forms in existing non-openEHR systems (like Sectra’s IDS7 and Omniq’s Alltid Öppet) Some years ago we also investigated possible use in non-openEHR major Swedish EHR systems, see IOS Press Ebooks - Configuration of Input Forms in EHR Systems Using Spreadsheets, openEHR Archetypes and Templates

Feel free to add posts below for each kind of issue/confusion, I’ll try to start with some regarding “context”, category and identifiers shortly. Also feel free to edit this post to add more references etc. (it’s a “wiki” post).

Updates from 13 April 2023 and onwards below:

FINDINGS (described in detail furher down in this thread)

The archetype authoring tool Archetype Designer had an error that added pointless extra “at-codes” at nodes that usually never have at codes. That error caused seriously confusing results and errors in downstream tools and pipelines, including FLAT/SimSDT.
The errors went undetected for quite a while and caused the CKM-published archetype self_reported_data.v1 to contain the errors. (Some draft archetypes also contain the errors)
self_reported_data.v1 was the root archetype of the template confusing at least us at Karolinska when using FLAT format with solutions from Cambio, EHRbase, Better and Omniq.
The error has been recognised by Better and will be fixed in an upcoming version of Archetype Designer. An updated ‘fixed’ version of the Composition self_reported_data has been submitted as change request to the International CKM. There appear to be 4 other composition archetypes affected - all unpublished.
There are many other somewhat confusing things (depending on perspective) mixed with the good things of the FLAT/simSDT format and “simplifications” - these things are also sometimes discussed in the thread below.

erik.sundvall · 9 April 2023 09:18

Confusing subject #1: CONTEXT, CTX etc
Regarding context, ctx and event_context it gets a bit confusing especially when you come from (a fairly natural start of) having looked only at RM-spec (for example fig 14 below) and have not yet discovered https://specifications.openehr.org/releases/SM/latest/simplified_im_b.htm reverse-engineereed object model spec. of the simplified formats (“FLAT” and “STRUCTURED”) - that e.g. figure 6 and 7 below are copied from.

…/ctx/…
If we understand explanations in simplified_im_b and a related discussion correctly, then the “ctx” object seems to provide a possibility to set a mixed bag of defaults that will be (re)used in several parts of a COMPOSITION if not provided in other input. Note that “ctx” is a flat map of variables, NOT a mapping to any particluar RM context-related object.

…/context/…
Examples in the spec having a “…/context/…” are pointing to partially different things depending on if we’re talking about the (no longer actively deveoped?) “simNC” example…

    "/context/health_care_facility|name":"Northumbria Community NHS",
    "/context/health_care_facility|identifier":"999999-345",

…that points to the actual EVENT_CONTEXT object of the canonical RM model (as shown in the UML diagram in figure 14 below) - or if we are talking about the “simSDT” (now actively used by both Better and EHRbase) example…

  "laboratory_order/context/_health_care_facility|id": "999999-345",
  "laboratory_order/context/_health_care_facility|id_scheme": "2.16.840.1.113883.2.1.4.3",
  "laboratory_order/context/_health_care_facility|id_namespace": "NHS-UK",
  "laboratory_order/context/_health_care_facility|name": "Northumbria Community NHS",

…that actually points to the simplified S_EVENT_CONTEXT of the simplified_im_b model (see figure 6 below).

This really confused us, since in the real canonical RM the ways of setting an identifier (or several identifiers) for health_care_facility using PARTY_IDENTIFIED…

…should, according to Common Information Model allow/force us to add a list of several identifiers ( identifiers : List<DV_IDENTIFIER>) not just a single identifier that looks like is is attached directly to the to the (S_)PARTY_IDENTIFIED object. We do have use cases where it would be convenient to add more than one identifier to (S_)PARTY_IDENTIFIED objects.

So here comes actual questions:
Is it not possible (or just undocumented how) to add more than one identifier (e.g. a list of identifiers) to e.g. health_care_facility (and other S_PARTY_IDENTIFIED objects) using simSDT/FLAT format when submitting a COMPOSITION? Is it also impossible to use the external_ref attribute (that points to a PARTY_REF)?

…/event_context/…
Adding to confusion is that when you use the “OUTPUT” variant of FLAT/simSDT the …/context/… object is nowhere to be seen anymore, instead you get a structure like

  "chemoform-mba.v5/event_context/start_time": "2023-04-04T00:35:42.71+02:00",
  "chemoform-mba.v5/event_context/setting|code": "238",
  "chemoform-mba.v5/event_context/setting|value": "other care",
  "chemoform-mba.v5/event_context/setting|terminology": "openehr",

(The example comes from our experiments available in Release ChemoForm-MBA.v5.rc8 · regionstockholm/CKM-mirror-via-modellbibliotek · GitHub )

And this would probably be understandable if you knew that in “OUTPUT” mode we always get something looking more lik the RM for EVENT_CONTEXT, but for “INPUT” mode you shlould use the “ctx” or “context” way…
…BUT even when generating examples in “INPUT” mode in we get a mix of …/ctx/… and …/event_context/…

  "ctx/health_care_facility|name": "Hospital",
  "ctx/health_care_facility|id": "9091",
  "chemoform-mba.v5/event_context/vårdenhet/namn": "Namn 74",
  "chemoform-mba.v5/event_context/vårdenhet/identifierare:0": "79f7d19f-cc7c-4f95-9d9e-6ad4499a4d58",
  "chemoform-mba.v5/event_context/vårdenhet/identifierare:0|issuer": "Issuer",
  "chemoform-mba.v5/event_context/vårdenhet/identifierare:0|assigner": "Assigner",
  "chemoform-mba.v5/event_context/vårdenhet/identifierare:0|type": "Prescription"

Wat? Ah, maybe it’s a way of enabling use of the other_context attribute of (S_)EVENT_CONTEXT - but why not make the path less confusing …/event_context/other_context/…

After this we thought - “maybe we can use the path …/event_context/health_care_facility/… also for input purposes” to e.g. add a list of idenitfiers to health_care_facility, but then we get errors from the CDRs when trying to commit a COMPOSITION. (Thus the question in bold above…) Or maybe it would work better if not exporting with Swedish as the primary template language?

Phew…

IMAGES refenced above

Source: EHR Information Model figure 14

Source: openEHR Simplified Information Model 'B' figure 6

Source: openEHR Simplified Information Model 'B' figure 7

pablo · 10 April 2023 16:14

Hi Erik, just out of curiosity, do you see any advantages on using FLAT and similar formats for openEHR data exchange vs. using the official canonical JSON format?

erik.sundvall · 11 April 2023 07:32

No, not between two openEHR-compliant systems. Template-specific formats can become a maintenance nightmare outside narrow use casess.

sebastian.iancu · 11 April 2023 09:21

yeap … these are shortcoming of the flat format. As the name suggests, these are flattening the “structure” - so the more complex the tree-structure is, the more complex the serialization of the gets. At some point you may need to evaluate if not the canonical is not a better option for your purpose.

I have unfortunately no experience with implementations, so cannot answer directly on this - but It seems to me very important to tackle this in the SEC.
Just a guess, have you tried something like:

"/context/health_care_facility|name":"Facility name",
"/context/health_care_facility|identifiers:0":"999999-345",
"/context/health_care_facility|identifiers:0|issuer":"issuer-0",
"/context/health_care_facility|identifiers:0|type":"type-0",
"/context/health_care_facility|identifiers:1":"123-345",
"/context/health_care_facility|identifiers:1|issuer":"issuer-1",
"/context/health_care_facility|identifiers:1|type":"type-1",

which would be the equivalent of:

{
  "context": {
    "health_care_facility": {
      "_type": "PARTY_IDENTIFIED",
      "name": "Facility name",
      "identifiers": [
        {
          "id": "999999-345",
          "issuer": "issuer-0",
          "type": "type-0"
        },
        {
          "id": "123-345",
          "issuer": "issuer-1",
          "type": "type-2"
        }
      ]
    }
  }  
}

joshua.grisham · 11 April 2023 10:29

Hi! I work with @erik.sundvall and am also taking a look at the same template in the same environments. I just wanted to add an observation which might be helpful (and/or maybe is driving the differences and thus the point of some confusion?)

In a previous (and much more simple) version of the template, it worked for us quite similarly to how you mentioned, except to prefix health_care_facility with an underscore as it is an optional node (minOccurs=0), and then to use both optional prefix (underscore) and the “singular” version of the object name (“_identifier:n” instead of “_identifiers:n”), sort of like this:

"{webtemplate-top-level-id}/context/_health_care_facility|name":"Facility name",
"{webtemplate-top-level-id}/context/_health_care_facility/_identifier:0|id":"999999-345",
"{webtemplate-top-level-id}/context/_health_care_facility/_identifier:0|type":"type-0",

When I look at the exported Web Template (JSON) file from this version of the template, I can find the rmType=EVENT_CONTEXT node listed only once, with id=context, with a min/max occurs both of 1, and includes as a “child” an extra item we have added under /context/other_context[at0001].

But now in a newer (and more complicated) version of this template, when I look at the exported Web Template file, there are now 2 different nodes with rmType=EVENT_CONTEXT

First instance this time has id=event_context and aqlPath=/context[at0002], min/max of 1, and has our custom other_context cluster plus start_time and settingfor some reason? (with both min/max as 1)
Second instance with id=context and aqlPath=/context , but this one has a minOccurs of 0, and only includes start_time and setting but not our additional item under other_context.

I’m not sure exactly why we seem to be getting some kind of “specialized” node of context/“Event Context” (as at0002) – Erik modeled this and I am about as far as you can get from a modeling expert But it seems like in order to send attributes as part of EVENT_CONTEXT then we need to use this “custom” instance with id=event_context instead of what we assumed was the “default” one with id=context.

I guess from my perspective, the two questions that I have are:

Is there a good reason that 2 different instances of EVENT_CONTEXT should exist in the Web Template with different IDs (or is this maybe some kind of “bug”)?
And/or if this is expected, how can we know which path should be used in order to actually populate the various values under EVENT_CONTEXT? (such as location, participations, health_care_facility, etc)

(and ideally in a way that our tooling can detect this and we do not have to test and set this manually template-by-template)

joshua.grisham · 11 April 2023 10:53

Also, not sure it is worth a quick mention, but we also see the same difference in the newer version of this template with /category

Before there was only id=category at aqlPath=/category
Now there is both id=category at aqlPath=/category and a new id=coded_text at aqlPath=/category[at0001], plus in some of our implementations it no longer works to create a composition using {id}/category|code etc instead now we have to specify against the “new” customized version (at0001) of category using {id}/coded_text|code

Note sure if this is the same “issue”/interpretation but the smell-check tells me that it could at least be in the same family

sebastian.iancu · 11 April 2023 10:55

Is this the way both EHRbase and Better platform behaves?

joshua.grisham · 11 April 2023 12:09

First day back from a short vacation for me, I will try to take some time this afternoon and see if I can test a bit more to be sure.

One thing I have now noticed is that it seems from Better’s Archetype Designer we are now receiving a node_id (this at0002 and at0001) on these when we export (both as Web Template as well as in OPT) where we did not seem to before. So I wonder if this issue is “new” based on a change in Archetype Designer (which I think we were also involved in one recently )?

Before for example in OPT the export looked like this:

...
        <attributes xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="C_SINGLE_ATTRIBUTE">
            <rm_attribute_name>context</rm_attribute_name>
...
            <children xsi:type="C_COMPLEX_OBJECT">
                <rm_type_name>EVENT_CONTEXT</rm_type_name>
...
                <node_id></node_id>
...

With special attention that node_id is empty. And now the same template exports like this:

...
        <attributes xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="C_SINGLE_ATTRIBUTE">
            <rm_attribute_name>context</rm_attribute_name>
...
            <children xsi:type="C_COMPLEX_OBJECT">
                <rm_type_name>EVENT_CONTEXT</rm_type_name>
...
                <node_id>at0002</node_id>
...

… noting that EVENT_CONTEXT now has this node_id of at0002.

Interestingly, if I remove this value so that node_id is empty again when I import the OPT template into EHRBase, it seems to “work” as before (that when reading out the Web Template from EHRBase, we get only /context with id=context and not a 2nd one, and that writing a composition works our “old way” again using /context).

pablo · 11 April 2023 16:47

I’m of the same opinion, but didn’t thought about the maintainability aspect.

I think flat was proposed to simplify something to developers, but I think it’s as difficult to implement flat as it is to implement the official canonical formats. They still need to understand what they are doing, what those paths mean, learn how to transform those paths back and forth to a RM instance, etc. If they don’t know what they are doing and are just putting values in a place somebody told them to put the values, it is the same to put the values on certain parts of a flat or a canonical format.

Until now nobody has shown a real argument why we need these extra formats as part of the specs, but I guess implementers can do whatever they think they need, and that’s OK, they can even coordinate between vendors to share the same formats, though having too many official options is a little messy.

birger.haarbrandt · 12 April 2023 07:02

Hi,

without going into all the details, just my 2 cents from the EHRbase perspective (which the developers can further elaborate on).

Regarding the use of WebTemplate and FLAT:

I very much disagree with the statement that using FLAT is as difficult as using the RM directly. Our experience is that it lowers the learning curve and improves readability enormously.
Any openEHR procurement I came across the last 2 years asked for the WebTemplate and FLAT formats (maybe even so the STRUCTURED format)
If we want systems to be interoperable and interchangeable, such formats need to be part of of the specifications and this was the view of the SEC already some time ago.

On the topic itself:

The implementation in EHRbase is basically reverse engineered from Better’s implementation
However, this thread is a good case to further work on the conformance framework to ensure that ALL the tiny bits and pieces are aligned
There was/is awareness that the first version of the format is Better’s approach which we implemented (and as far as I know Solit Clouds did the same). When discussing with Better and doing the implementation, we found some things that might be straightened out, but there was also a deliberate decision in the SEC to not let perfect be the enemy of good.
For the use-case stated by @erik.sundvall: you can also have a try by mixing FLAT with RAW (which actually should become canonical JSON when simSDT is used through the official REST API…) like shown in this example: openEHR_SDK/corona_with_raw.json at develop · ehrbase/openEHR_SDK · GitHub
This should allow you to use multiple identifiers without any doubts

ian.mcnicoll · 12 April 2023 09:13

Thanks Josuha,

There is/was something odd around the way that Web templates are handling context, which coincided, I think, with Archetype Designer adding these (I think redundant) atCodes on category and context.

@Eeik - I’ll look at your examples but I don’t think this is related to or affected by the use of the ctx blocks. Personally I always use/recommend using the OUTPUT format, as I found it causes confusion by the read and write formats being assymmetrical.

@joshua.grisham I think you have documented correctly the support for PARTY_IDENTIFIED.identifiers

@erik.sundvall - The Simplified Info model in Figure 6 is incorrect in that both PARTY_PROXY.external_ref and PARTY_IDENTIFIED.identifier are supported.

.../_health_care_facility|name":"Facility name",
".../_health_care_facility/_identifier:0|id":"999999-345",
".../_health_care_facility/_identifier:0|type":"type-0",

which I get to commit in both Ehrbase and Better

versus the PARTY_PROXY.external_ref

  ".../_health_care_facility|id": "999999-345",
  ".../_health_care_facility|id_scheme": "2.16.840.1.113883.2.1.4.3",
  ".../_health_care_facility|id_namespace": "NHS-UK",
  ".../_health_care_facility|name": "Northumbria Community NHS",

also working correctly in both EhrBase and Better

So I think the main confusion remains around the use of ‘context’ and ‘event_context’.

There definitely was an issue but I’m now getting consistent use of …/context/… and /category when asking for example compositions i/e I don’t see event_context appearing in paths

What version of the Better CDR server are you using ? My guess is at some point that adding the atCodes into the opt caused a snafu in Web templates that has since been corrected?

stefanspiska · 12 April 2023 14:43

Yes this is due to Composition Archetypes are generated with atcode in EventContext

So you need to remove this wrong atcode.

ian.mcnicoll · 13 April 2023 13:42

Thanks Stefan. That does exactly explain what has been happening. Those redundant codes were only being added to new compositions created in AD, which then appeared in the .opt and were messing up the web template/example generation. This only affects fairly new compositions created in AD ? from about 12 months ago.

I agree that the fix is to remove the atCodes from the composition and regenerate the .opts - I have reported this directly on the AD JIRA, as it should be fixed properly.

ian.mcnicoll · 13 April 2023 13:55

That’s not really true Pablo - the CDR handles transforming the short paths to and from the RM instance, nor doe tey have to understand what the paths mean, technically.

Like Birger, our experience has been that using the FLAT formats has been transformative in our ability to upskill new, non-specialist devs into our world, Not perfect by any means but a heck of a lot easier to work with, for newbies, than canonical. FLAT is particularly helpful when documenting integrations against source xpath.

I’m obviously very happy if people want to use canonical but so far all of our clients have successfully used FLAT for ? all of our projects.

But maybe we should argue the for/against case elsewhere and keep this thread for raising/resolving issues with FLAT.

@erik.sundvall - I wonder if it is worth documenting our ‘findings’ and solutions in a separate thread/channel/wiki. Already I think we have nailed down the issues you raised with Identifiers and context paths - it would be good to have these clearly identifiable to point others towards.

erik.sundvall · 13 April 2023 15:03

The top post in this thread is a “wiki-post” so many active users (but not recently registered ones) should be able to edit it, I’ll add a “findings” heading where we all can help briefly summimg up findings. (I am not by a computer until Monday so I hope someone else can start to summarize.)

pablo · 13 April 2023 16:29

That’s the issue, if that is the case we are providing something to developers that don’t know what they are doing!

So putting a value in a path someone told them to use has the same technical complexity as putting a value in a specific place of a canonical json. In fact we have a tool that generates canonical instances with know tags in the places where values are, and developers can just do a replace of the tags for a value.

On the other hand, if developers use in-memory RM instances and put the values there, then is a tool that generates the flat format, so the developer doesn’t even know about it, which again, has the same technical complexity as putting the values in an in-memory instance then serializing to a canonical json. If they do this, they actually need to understand the RM because they work with it in-memory.

The third option is they generate the json from scratch, even generating the paths. On this case they surely need to know what they are doing since the paths in the flat format are based in the opt definitions. And again, if they can do this, they can do canonical json.

That is why I don’t see any advantage on using flat but making the messages a little lighter. Is that really worth it? I mean, implementing an specific architecture to generate, validate and process another format is really worth it just to save a few bytes?

Consider I’m talking as a developer!

I think if that is needed is because there is no formal training in place, which is key for any project. If we just throw developers at it, it won’t work.

When I work with openEHR JSON I need to have the RM in one screen, the JSON schema in another screen and my code in the third screen. This is not magic, I don’t know all the models from memory, and I’m not really smart, but with the right specifications as references when developing, there is no way this approach is complicated. It just takes time to do it right.

thomas.beale · 14 April 2023 11:00

Just catching up with this. I read through the thread (ok, not every single word and have the following thoughts:

Attribute naming: we should have a clear model basis for any attribute names - the context/event_context/ctx problem. Ideally we’d just use the ones from the RM, but it’s no problem to define another derived model that contains shorter names. Such a model has to have a formal transform to the canonical one though.
Attribute single/plural names: some people like using singular names for any use of an attribute whether for a container or a single-valued attribute. The openEHR RM doesn’t do this (it uses the old school naming approach). I’m unclear on the formal reasons to use singular naming only. There might be some but I don’t see them yet.
Optional ‘underscore’ naming: I didn’t know that prepending an underscore to an attribute name indicated it was optional in the model. What is the purpose of doing this? It seems to be an attempt to convey model information in the data - is it to make validation easier somehow?

I am (was the editor of he SDT, ‘Simplifed Model B’ and SDF draft specs. These specs contain information from the various threads of development to do with path-based ‘flat templates’. I’d like to help get these rationalised but we would need to get some agreement. I see these as follows:

Serial Data Formats (SDF): this was the most recent attempt to document serialised forms of data types and higher-level structures. This spec was purely reverse-engineered from reality. It contains ‘EhrScape variants’ in some places. I would have thought that progressing this to completion would be helpful and ?easy. Note that it is in the Service Model (SM) component, meaning that the serialisations would be usable for any kind of API, not just REST (might be wrong - it’s easy to move).
Simplified Information Model B (SIM-B): this was an attempt to define a reduced information model that a) has a formal transform to the RM and b) can act as a direct model basis for shorter names and paths and so on. This model can be repurposed in any way that makes sense to achieve this goal, and I would have thought was still useful for that purpose. But can be removed if of no use as well.
Simplified Data Template (SDT): my original attempt to document what products were doing, with a design basis for formalising flat templates. I still think the design thinking is at least potentially relevant. If not, we remove the spec; it is is, I suggest we merge its useful content with the SIM B spec, assuming that will be used in some form. This spec currently in the ITS group.

I have certainly missed some details from current conversations on all this, so feel free to correct any of the above.

ian.mcnicoll · 17 April 2023 11:21

[quote=“birger.haarbrandt, post:11, topic:3819”]

@borut.fabjan has just told me that this issue with new compositions is an error and will be fixed in AD.

ian.mcnicoll · 17 April 2023 11:47

the context/event_context/ctx

Is a mixture of confusion on how the ctx directives work, and a bug/misinterpretation in how AD creates new compositions (being fixed).

TBH I was unaware of this, I’ve certainly never noticed that in the documentation. I had assumed that the underscores were only there where there might be a risk of conflict with an identically named archetyped nodes.