Duplicate check during composition upload

Hey folks!

I’m currently working on a proof of concept to transform data from an existing medical information system to my test ehrBase server. I am currently ready to send my old data through my parser and create new compositions.

It has happened to me a few times now that I have created several compositions for e.g. laboratory result, because I have restarted my parsing process.

Does openEHR/ehrScape not offer the possibility to recognize exactly identical compositions during the upload? If I interpret the result correctly, currently a new CompositionUID is simply assigned each time, which gives me multiple compositions with the exakt same content (except uid)

Thanks for any advice!

I believe clinically it’s possible to have two documents with the same information.


Rougly speaking: no. The specification (openEHR) does not specify any behaviour regarding the commit of the same content, but I don’t think it’d be possible at all to commit the same composition twice with openEHR.

Why? Because openEHR’s design actually favours what you may call immutability via versioning of compositions. In other words, when it comes to operations on compositions in an EHR, well… you can’t step in the same river twice because of versioning.

Think about it, if you commit a composition without an uid, it means it is a composition not known to the CDR, i.e. not identified by openEHR, and the implementation of it. Once it is identified, it is identified with its version being the part of the identifier. We also have the rule that if you apply an operation to a composition, you must have an identity to identify the composition you’re applying the operation to, so this implies that you can only apply operations to a version of a composition. Since every operation will trigger versioning, and it’ll produce a new version, even if you push the same content for an update, you’ll produce a new version, i.e. a new identity.

The above approach enables openEHR to support medico-legal requirements related to auditability etc, and poking a hole into that would be bad idea, taking away a huge design benefit.

If we allowed silently identifying and re-associating content with an existing identifier (with a version), that’d in theory open doors to a collision attack, allowing a bad intentioned person to modify health records, at least in theory. So I’d argue against it in SEC meetings based on the above reasoning, if it made it to change request :slight_smile:

All that being said, synchronising systems with health data is a complicated task. You’ll most likely to have some sort of logic sitting between your source system and your CDR, to handle the situation you’re facing, and have some checks run before you push your compositions. I’d suggest checking if you have a reliable identifier for the source data, then you can ask the modellers if/how you can associate that identifier with your compositions so that you can perform an insert/update-if-not-exits operation.