The good, the bad and the "Wat?" of current simplified FLAT/SimSDT openEHR exchange format

Is this the way both EHRbase and Better platform behaves?

First day back from a short vacation for me, I will try to take some time this afternoon and see if I can test a bit more to be sure.

One thing I have now noticed is that it seems from Better’s Archetype Designer we are now receiving a node_id (this at0002 and at0001) on these when we export (both as Web Template as well as in OPT) where we did not seem to before. So I wonder if this issue is “new” based on a change in Archetype Designer (which I think we were also involved in one recently :smile: )?

Before for example in OPT the export looked like this:

...
        <attributes xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="C_SINGLE_ATTRIBUTE">
            <rm_attribute_name>context</rm_attribute_name>
...
            <children xsi:type="C_COMPLEX_OBJECT">
                <rm_type_name>EVENT_CONTEXT</rm_type_name>
...
                <node_id></node_id>
...

With special attention that node_id is empty. And now the same template exports like this:

...
        <attributes xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="C_SINGLE_ATTRIBUTE">
            <rm_attribute_name>context</rm_attribute_name>
...
            <children xsi:type="C_COMPLEX_OBJECT">
                <rm_type_name>EVENT_CONTEXT</rm_type_name>
...
                <node_id>at0002</node_id>
...

… noting that EVENT_CONTEXT now has this node_id of at0002.

Interestingly, if I remove this value so that node_id is empty again when I import the OPT template into EHRBase, it seems to “work” as before (that when reading out the Web Template from EHRBase, we get only /context with id=context and not a 2nd one, and that writing a composition works our “old way” again using /context).

I’m of the same opinion, but didn’t thought about the maintainability aspect.

I think flat was proposed to simplify something to developers, but I think it’s as difficult to implement flat as it is to implement the official canonical formats. They still need to understand what they are doing, what those paths mean, learn how to transform those paths back and forth to a RM instance, etc. If they don’t know what they are doing and are just putting values in a place somebody told them to put the values, it is the same to put the values on certain parts of a flat or a canonical format.

Until now nobody has shown a real argument why we need these extra formats as part of the specs, but I guess implementers can do whatever they think they need, and that’s OK, they can even coordinate between vendors to share the same formats, though having too many official options is a little messy.

Hi,

without going into all the details, just my 2 cents from the EHRbase perspective (which the developers can further elaborate on).

Regarding the use of WebTemplate and FLAT:

  • I very much disagree with the statement that using FLAT is as difficult as using the RM directly. Our experience is that it lowers the learning curve and improves readability enormously.
  • Any openEHR procurement I came across the last 2 years asked for the WebTemplate and FLAT formats (maybe even so the STRUCTURED format)
  • If we want systems to be interoperable and interchangeable, such formats need to be part of of the specifications and this was the view of the SEC already some time ago.

On the topic itself:

  • The implementation in EHRbase is basically reverse engineered from Better’s implementation
  • However, this thread is a good case to further work on the conformance framework to ensure that ALL the tiny bits and pieces are aligned
  • There was/is awareness that the first version of the format is Better’s approach which we implemented (and as far as I know Solit Clouds did the same). When discussing with Better and doing the implementation, we found some things that might be straightened out, but there was also a deliberate decision in the SEC to not let perfect be the enemy of good.
  • For the use-case stated by @erik.sundvall: you can also have a try by mixing FLAT with RAW (which actually should become canonical JSON when simSDT is used through the official REST API…) like shown in this example: openEHR_SDK/corona_with_raw.json at develop · ehrbase/openEHR_SDK · GitHub
  • This should allow you to use multiple identifiers without any doubts
1 Like

Thanks Josuha,

There is/was something odd around the way that Web templates are handling context, which coincided, I think, with Archetype Designer adding these (I think redundant) atCodes on category and context.

@Eeik - I’ll look at your examples but I don’t think this is related to or affected by the use of the ctx blocks. Personally I always use/recommend using the OUTPUT format, as I found it causes confusion by the read and write formats being assymmetrical.

@joshua.grisham I think you have documented correctly the support for PARTY_IDENTIFIED.identifiers

@erik.sundvall - The Simplified Info model in Figure 6 is incorrect in that both PARTY_PROXY.external_ref and PARTY_IDENTIFIED.identifier are supported.

.../_health_care_facility|name":"Facility name",
".../_health_care_facility/_identifier:0|id":"999999-345",
".../_health_care_facility/_identifier:0|type":"type-0",

which I get to commit in both Ehrbase and Better

versus the PARTY_PROXY.external_ref

  ".../_health_care_facility|id": "999999-345",
  ".../_health_care_facility|id_scheme": "2.16.840.1.113883.2.1.4.3",
  ".../_health_care_facility|id_namespace": "NHS-UK",
  ".../_health_care_facility|name": "Northumbria Community NHS",

also working correctly in both EhrBase and Better

So I think the main confusion remains around the use of ‘context’ and ‘event_context’.

There definitely was an issue but I’m now getting consistent use of …/context/… and /category when asking for example compositions i/e I don’t see event_context appearing in paths

What version of the Better CDR server are you using ? My guess is at some point that adding the atCodes into the opt caused a snafu in Web templates that has since been corrected?

Yes this is due to Composition Archetypes are generated with atcode in EventContext

So you need to remove this wrong atcode.

1 Like

Thanks Stefan. That does exactly explain what has been happening. Those redundant codes were only being added to new compositions created in AD, which then appeared in the .opt and were messing up the web template/example generation. This only affects fairly new compositions created in AD ? from about 12 months ago.

I agree that the fix is to remove the atCodes from the composition and regenerate the .opts - I have reported this directly on the AD JIRA, as it should be fixed properly.

That’s not really true Pablo - the CDR handles transforming the short paths to and from the RM instance, nor doe tey have to understand what the paths mean, technically.

Like Birger, our experience has been that using the FLAT formats has been transformative in our ability to upskill new, non-specialist devs into our world, Not perfect by any means but a heck of a lot easier to work with, for newbies, than canonical. FLAT is particularly helpful when documenting integrations against source xpath.

I’m obviously very happy if people want to use canonical but so far all of our clients have successfully used FLAT for ? all of our projects.

But maybe we should argue the for/against case elsewhere and keep this thread for raising/resolving issues with FLAT.

@erik.sundvall - I wonder if it is worth documenting our ‘findings’ and solutions in a separate thread/channel/wiki. Already I think we have nailed down the issues you raised with Identifiers and context paths - it would be good to have these clearly identifiable to point others towards.

The top post in this thread is a “wiki-post” so many active users (but not recently registered ones) should be able to edit it, I’ll add a “findings” heading where we all can help briefly summimg up findings. (I am not by a computer until Monday so I hope someone else can start to summarize.)

1 Like

That’s the issue, if that is the case we are providing something to developers that don’t know what they are doing!

So putting a value in a path someone told them to use has the same technical complexity as putting a value in a specific place of a canonical json. In fact we have a tool that generates canonical instances with know tags in the places where values are, and developers can just do a replace of the tags for a value.

On the other hand, if developers use in-memory RM instances and put the values there, then is a tool that generates the flat format, so the developer doesn’t even know about it, which again, has the same technical complexity as putting the values in an in-memory instance then serializing to a canonical json. If they do this, they actually need to understand the RM because they work with it in-memory.

The third option is they generate the json from scratch, even generating the paths. On this case they surely need to know what they are doing since the paths in the flat format are based in the opt definitions. And again, if they can do this, they can do canonical json.

That is why I don’t see any advantage on using flat but making the messages a little lighter. Is that really worth it? I mean, implementing an specific architecture to generate, validate and process another format is really worth it just to save a few bytes?

Consider I’m talking as a developer!

I think if that is needed is because there is no formal training in place, which is key for any project. If we just throw developers at it, it won’t work.

When I work with openEHR JSON I need to have the RM in one screen, the JSON schema in another screen and my code in the third screen. This is not magic, I don’t know all the models from memory, and I’m not really smart, but with the right specifications as references when developing, there is no way this approach is complicated. It just takes time to do it right.

Just catching up with this. I read through the thread (ok, not every single word :wink: and have the following thoughts:

  • Attribute naming: we should have a clear model basis for any attribute names - the context/event_context/ctx problem. Ideally we’d just use the ones from the RM, but it’s no problem to define another derived model that contains shorter names. Such a model has to have a formal transform to the canonical one though.
  • Attribute single/plural names: some people like using singular names for any use of an attribute whether for a container or a single-valued attribute. The openEHR RM doesn’t do this (it uses the old school naming approach). I’m unclear on the formal reasons to use singular naming only. There might be some but I don’t see them yet.
  • Optional ‘underscore’ naming: I didn’t know that prepending an underscore to an attribute name indicated it was optional in the model. What is the purpose of doing this? It seems to be an attempt to convey model information in the data - is it to make validation easier somehow?

I am (was :wink: the editor of he SDT, ‘Simplifed Model B’ and SDF draft specs. These specs contain information from the various threads of development to do with path-based ‘flat templates’. I’d like to help get these rationalised but we would need to get some agreement. I see these as follows:

  • Serial Data Formats (SDF): this was the most recent attempt to document serialised forms of data types and higher-level structures. This spec was purely reverse-engineered from reality. It contains ‘EhrScape variants’ in some places. I would have thought that progressing this to completion would be helpful and ?easy. Note that it is in the Service Model (SM) component, meaning that the serialisations would be usable for any kind of API, not just REST (might be wrong - it’s easy to move).
  • Simplified Information Model B (SIM-B): this was an attempt to define a reduced information model that a) has a formal transform to the RM and b) can act as a direct model basis for shorter names and paths and so on. This model can be repurposed in any way that makes sense to achieve this goal, and I would have thought was still useful for that purpose. But can be removed if of no use as well.
  • Simplified Data Template (SDT): my original attempt to document what products were doing, with a design basis for formalising flat templates. I still think the design thinking is at least potentially relevant. If not, we remove the spec; it is is, I suggest we merge its useful content with the SIM B spec, assuming that will be used in some form. This spec currently in the ITS group.

I have certainly missed some details from current conversations on all this, so feel free to correct any of the above.

1 Like

[quote=“birger.haarbrandt, post:11, topic:3819”]

@borut.fabjan has just told me that this issue with new compositions is an error and will be fixed in AD.

the context/event_context/ctx

Is a mixture of confusion on how the ctx directives work, and a bug/misinterpretation in how AD creates new compositions (being fixed).

TBH I was unaware of this, I’ve certainly never noticed that in the documentation. I had assumed that the underscores were only there where there might be a risk of conflict with an identically named archetyped nodes.

1 Like

Digging through the docs, I can now see that the underscore = ‘optional RM attribute’ might well be the case

It doesn’t sound like a good idea, for exactly the reason you mentioned just above - the usual interpretation of such names is they are some special / meta / pseudo attribute…

We have looked into this a bit more today and had a few more observations (especially when we hit yet another problem, now with /context/other_context !)

Our theory is that at some level (generating a Web Template? or maybe in other places as well?) there is an intrepretation happening where, at a minimum,

  • “/category” is expected to have a DV_CODED_TEXT child without node_id specified
  • “/context” is expected to have a child EVENT_CONTEXT without node_id specified
  • “/context/other_context” is expected to have a child ITEM_TREE with node_id = at0001

It seems like this is generally the case for many of the different specialized Composition archetypes but not for all of them, and especially not for some of the “newer” ones that we are looking at.

For example, openEHR-EHR-COMPOSITION.report-result.v1 seems to follow this pattern, but not openEHR-EHR-COMPOSITION.self_reported_data.v1, which instead looks like this:

  • “/category” has a DV_CODED_TEXT child with node_id at0001 (instead of blank)
  • “/context” has a child EVENT_CONTEXT with a node_id at0002 (instead of blank)
  • “/context/other_context” has a child ITEM_TREE with node_id = at0003 (instead of at0001)

And then when these kind of differences happen, we don’t seem to get the “normal” paths for these nodes in the Web Template, and then have this problem/confusion when we try to write compositions and build more standardized tooling.

@thomas.beale Does it seem right/ok to have different node_ids for these different specialized Compositions (e.g. sometimes other_context can be at0002, at0003, etc) or does it seem more that all types of compositions should follow the same pattern when it comes to node_id for these nodes?

And as a follow-up question, does it seem reasonable that we should be able to expect for example to “always” write to /context/ and /category (etc) in FLAT format, or that we should instead expect to get different paths for these in every new Web Template (depending on the template itself and its own node_ids)?

1 Like

Hi Joshua,

I’m pretty sure (and confirmed with @borut.fabjan ) that this is simply down a change made to AD when creating new compositions.

The atCodes on /context and /category are not required, but are legal in ADL, but the main problem was that when the related terms were generated, they were added as ‘Event Context’ not ‘context’ , and as ‘DV coded text’ not as ‘category’.

In any case, I would argue that the web template generator should ignore any non-LOCATABLE name overrides, since these never find there way into the CDR data.

  • “/context/other_context” has a child ITEM_TREE with node_id = at0003 (instead of at0001)

This is correct and simply reflects that the archetype has had a slot constraint added. The ITEM_TREE atcodes will be inconsistent across archetypes and this is normal and expected.

I don’t think the specialisation is causing any issues here. and I’m pretty sure the confusion is wholly down to the combination of the redundant atCodes being added to ‘context’ and ‘category’, the incorrect terms that were being created, and finally the web template generation picking up these terms, when they should be ignored.

1 Like

Can somebody with enough CKM privileges (@sebastian.garde, @siljelb or somebody else?) then please patch and re-release that self_reported_data.v1 archetype to remove those two extra at codes for /category and /context and perhaps somebody could also later scan through other COMPOSITION archetypes edited ~last year to see if the error has slipped into any other archetype.

I have just uploaded a ‘fixed’ self_reported_data composition as a change request on CKM.

That is the only published archetype on CKM that is affected as far as I can tell. There are 3 other non-published archetypes with the same issue

I think there is an open question on whether this is, or at least should be regarded as, a breaking change.

On the one hand, strictly speaking, this is not breaking change since the extra atCodes have no impact on run-time / persisted data, this is really an issue only for design-time tooling. OTOH, given the confusion potentially being caused in folks who use web templates and FLAT formats, there is an argument for bumping it up to v2. @sebastian.garde ??

The other compositions (all unpublished) are

Obstetric history, Draft archetype [Internet]. openEHR Foundation, openEHR Clinical Knowledge Manager [cited: 2023-04-17]. Available from: Clinical Knowledge Manager

Advance care, Draft archetype [Internet]. openEHR Foundation, openEHR Clinical Knowledge Manager [cited: 2023-04-17]. Available from: Clinical Knowledge Manager

Certificate, Draft archetype [Internet]. openEHR Foundation, openEHR Clinical Knowledge Manager [cited: 2023-04-17]. Available from: Clinical Knowledge Manager

Disease surveillance, Draft archetype [Internet]. openEHR Foundation, openEHR Clinical Knowledge Manager [cited: 2023-04-17]. Available from: Clinical Knowledge Manager

Note that I had to edit the ADL outside Archetype designer to make it ‘stick’.