Migrating CDR data to a newer RM release

borut.jures · 26 January 2022 09:44

@sebastian.iancu asked whether different versions of RM models can be used/loaded at the same time.

Yes, they can. Each AM/RM release is generated as a separate package/library and they can all be imported at the same time (see #1 below).

The example code is for reading/processing OPTs but a similar approach can be used to:

Migrate CDR data from one RM release to another (e.g. RM 1.0.4 to RM 1.1.0).
Convert archetypes to a newer ADL release (e.g. AOM 2.0.6 to AOM 2.3.0).

#2 walks through an OPT file to produce an OO class tree of a template.

#3 returns a helper class to access RM models during runtime (RM release and RM class name are known only at the runtime).

pablo · 28 January 2022 14:34

Note for inter-version model migration of data, a migration path between models should be specified, for instance “in RM 1.0.2 data point a.b.c of type T1 should be in a.c.d of type T2 in RM 2.x.y”

That way we can have backwards incompatible changes in major version releases of the RM and still migrate code and data between the models.

We might need such “mapping” language in the near future to keep things consistent for everyone.

Seref · 28 January 2022 18:10

That’s usually a can of worms, with many ways to do it, but something that’s likely to be needed.
I’d suggest focusing on ITS approach and piggybacking something that exists, such as XSLT over XML representation of compositions.

I know it’s not what the young people want to do these days, but it strikes a pretty good balance between re-inventing the wheel and reaching a large user base. Personal opinion of course.

pablo · 28 January 2022 19:38

XSLT works for document to document transformations, though it is a mess to maintain a big XSLT or a big set of smaller XSLTs.

Before jumping into a solution, we need to understand the problems, and IMO that is too specify how we SEC will document changes in the RM in a preferable way. Then you can generate XSLTs , code, or whatever ITS used to implement the mappings.

The VeraTech guys have experience in that area. @yampeku @damoca

Seref · 29 January 2022 10:28

Ok, take two, with less subtlety and more clarity

I think that’s a good idea.

I don’t think that’s a good idea, if you’re suggesting openEHR specs inventing yet another language to do something which can be done with existing technologies. We have enough hoops to jump through as it is.

Always a good idea, but orthogonal to what I’m trying to say. What I’m trying to say is: given your pretty good problem statement above, regardless of the details, let’s try to avoid inventing yet another “openEhr way” to solve the problem, if we can, and focus on existing implementing technology specifications first as the way to implement it.

yampeku · 30 January 2022 10:31

We use XQuery for that. We automatically generate the XQuery scripts from point to point mappings (and optional grouping semantics). Probably the most common problem is understanding and fine tuning grouping semantics and having different hierarchies in source/target (e.g think as the source having Person with Departments and target has Departments with Persons)

birger.haarbrandt · 30 January 2022 10:43

Hi all,

I guess I missed the start of this conversation. To my understanding, as long as we don’t release RM 2.0, data should be backwards compatible if we stick to semantic versioning. Would be great to get some more background. I actually see the big advantage in openEHR that we avoid any mappings between versions which plagues so many people within the FHIR community.

borut.jures · 30 January 2022 11:15

I had in mind a process similar to what Rails Active Record (and similar ORMs) are using for migrating databases.

They use “up” and “down” methods:

class RenameReadingListsToWishLists < ActiveRecord::Migration
  def up
      ALTER TABLE reading_lists RENAME TO wish_lists;
  end

  def down
      ALTER TABLE wish_lists RENAME TO reading_lists;
  end
end

Instead of altering table structure, openEHR migrations would make changes to the data. Pseudo code:

rm200.ELEMENT.new_attribute = rm110.ELEMENT.old_attribute + '<some change>'

There would be “up” and “down” functions between each RM release:

rm104.upgrade() would update data to RM 1.1.0
rm110.upgrade() would update data to RM 2.0.0
…
rm200.downgrade() would update data BACK to RM 1.1.0

These functions may be hand written or they could use the expression language.

Seref · 30 January 2022 15:08

Hi Birger,
Frankly, I don’t remember if we have any breaking changes in RM within 1.x. I was commenting in relation to the case of a breaking change in the RM, which we may or may not have at hand yet.

So as far as my comments are concerned, it is a hypothetical scenario, but a good one to discuss, so that we can link back to these from SEC discussions when the time comes

I agree re the difficulties of intra-version mapping, though the situation may be a bit of an apples-oranges if you’re referring to mappings between different versions of resources. That’s level 2 of our thing. Some interesting thought exercises lie that way but it’s a Sunday and I’d rather leave this response here and go for a walk and some clean air

birger.haarbrandt · 30 January 2022 18:18

Hi Seref,

thanks for clarification. The issue I see in FHIR is that is has no clear distinction between RM and Archetypes/Profiles. Make a single change within the “Observation” or “Condition” Resource and you will need to migrate millions of resources in thousands of implementations (for the case that people still think it is a good idea to use FHIR as their database). In openEHR, if Blood Pressure changes, only the Blood Pressure Archetype changes to v2, no big drama.

You have been around for a much longer time in the openEHR community, but this is actually not to be overstated what positive implication a very stable RM has for long-term system maintainability.

pablo · 30 January 2022 19:44

Nope, I’m saying we need such language, not that we need to develop a new one. I’m not against adoption of current useful technologies or standards. But first we need to understand the requirements, for RM to RM migration and other requirements like RM to custom or custom to RM or CDA/FHIR/HL7v2.x/DICOM/xxx to RM and viceversa, Because a mapping language could be used for data migration between different versions of the RM, but also for integration, ETL, etc.

Seref · 30 January 2022 20:27

Thanks Pablo, we’re on the same page on these things I think.

Seref · 30 January 2022 20:49

Hello again

I think you said something very interesting here! Here’s the question that follows your point: what happens when these changes take place in a system where semantic harmonisation via openEHR is in place? I think we have some interesting options here based on the fact that archetypes are maximal data definitions. My gut feeling is we can actually mitigate some problematic situations by leveraging the way archetypes are built. If we can nail down some specifics, it’d make a very convincing argument re future proofing systems based on FHIR + openEHR. Some dedicated effort would be needed for this, but it’d be worth it.

Maybe, but the situation you’re describing did not emerge until recently Our HL7 input was rather stable, but now we have new challenges in a FHIR + openEHR world. What I’m trying to say is: you’re right, the stability of RM and the maximal dataset design of Archetypes help a lot, and we may have a case in our hands in which they’ll help even more going forward. An obvious, imminent scenario is changes in the FHIR sources of data in the Covid platform: some sources switching to v5 at some point.

Thanks, you’ve given me something really interesting to think about.

borut.jures · 30 January 2022 21:12

I’ll add one more screenshot to extend my previous thoughts

The expression language should cover most simple data transformation needs. In such a scenario a “migrate_from_previous_rm_release” method specification could be added to the BMM for each type that has changed.

From the “migrate_from_previous_rm_release” method in expression language a code for the “ItemList.from()” method in the screenshot could be generated:

When a migration to a data in the new release is needed, the old element would be read with its RM release object and used to create an instance in the new RM release:

Using code generated from the computable specifications means that the result is 100% conformant with the specifications.

This could be extended to the archetype level too. OPT SDK could be used to call all the needed migrate methods for the used archetypes.

If a vendor’s codebase isn’t suitable for such an approach, the above approach could be run as a simple API and process the migrations for the vendor’s CDR. Vendor neutral, programming language neutral,…

pablo · 31 January 2022 00:12

Hi @birger.haarbrandt my comment about the migrations, is to have a clear migration path between versions of the RM. This is in general, I’m not implying specific chances or versioning, this is a wider topic.

This migration path could support breaking and non breaking chances. The key thing is that these migrations could be processed by software. Right now changes to the spec are started in free text in different RM documents.

So I read commenting about having some kind of language that starts from an initial RM version and then states all the non breaking chances (additions), and breaking chances (modifications and deletes), breaking chances should include a napping mechanism from the previous version to the next major version (here what you mentioned: breaking chances should notify the major version in server.

That is similar to the DB migrations in the Rails framework as Boeut commented.

BTW I think all this is necessary to what @borut.jures is doing with the models.

And too add to the soup, a little off topic, done breaking chances are:

the improvements to the entry model we’ve been discussing for years
recently, the support for some DV_PLAIN_TEXT class to avoid a concrete super class for coded text that causes some issues while working with the model