Duplicate check during composition upload

mc1995 · 16 May 2022 11:19

Hey folks!

I’m currently working on a proof of concept to transform data from an existing medical information system to my test ehrBase server. I am currently ready to send my old data through my parser and create new compositions.

It has happened to me a few times now that I have created several compositions for e.g. laboratory result, because I have restarted my parsing process.

Does openEHR/ehrScape not offer the possibility to recognize exactly identical compositions during the upload? If I interpret the result correctly, currently a new CompositionUID is simply assigned each time, which gives me multiple compositions with the exakt same content (except uid)

Thanks for any advice!

pablo · 16 May 2022 12:49

I believe clinically it’s possible to have two documents with the same information.

Seref · 17 May 2022 08:37

Rougly speaking: no. The specification (openEHR) does not specify any behaviour regarding the commit of the same content, but I don’t think it’d be possible at all to commit the same composition twice with openEHR.

Why? Because openEHR’s design actually favours what you may call immutability via versioning of compositions. In other words, when it comes to operations on compositions in an EHR, well… you can’t step in the same river twice because of versioning.

Think about it, if you commit a composition without an uid, it means it is a composition not known to the CDR, i.e. not identified by openEHR, and the implementation of it. Once it is identified, it is identified with its version being the part of the identifier. We also have the rule that if you apply an operation to a composition, you must have an identity to identify the composition you’re applying the operation to, so this implies that you can only apply operations to a version of a composition. Since every operation will trigger versioning, and it’ll produce a new version, even if you push the same content for an update, you’ll produce a new version, i.e. a new identity.

The above approach enables openEHR to support medico-legal requirements related to auditability etc, and poking a hole into that would be bad idea, taking away a huge design benefit.

If we allowed silently identifying and re-associating content with an existing identifier (with a version), that’d in theory open doors to a collision attack, allowing a bad intentioned person to modify health records, at least in theory. So I’d argue against it in SEC meetings based on the above reasoning, if it made it to change request

All that being said, synchronising systems with health data is a complicated task. You’ll most likely to have some sort of logic sitting between your source system and your CDR, to handle the situation you’re facing, and have some checks run before you push your compositions. I’d suggest checking if you have a reliable identifier for the source data, then you can ask the modellers if/how you can associate that identifier with your compositions so that you can perform an insert/update-if-not-exits operation.

SevKohler · 28 April 2023 14:05

@seref So if i getting this right if I POST another composition with the same UUID a newer version will be written to the CDR?
Where do i find this information normally ?
Inside the UUID string ?

PS: I love immutability and that was/is a very smart move in the spec.

ian.mcnicoll · 28 April 2023 14:17

No I think you will get a 409, which I think is the correct behaviour. I’d be very wary of making an assumption that a duplicate composition should create a new version - you should never have that happen

In the original question

I would say this is exactly the correct behaviour. In a production environment, this would never happen in normal circumstances, unless perhaps a message service was resending the same message (kinda what you are doing here). I’d want o pick up that 'duplication very explicitly from a message header or identifier, and not by trying to do some sort of diff on the composition content.

So I would say this is largely an artefact of your test environment

ian.mcnicoll · 28 April 2023 14:20

And I’m happy that my learned friend @Seref and I seem to actually agree for once!!

There are also interesting rules defined in the specs around copy and move operations between CDRs, as well as gnarly issues of de-duplicating EHRs. They certainly have been implemented and tested in real world examples so seem to be sound.

Seref · 28 April 2023 14:35

You should not. As things stand, POST semantics for openEHR rest api is defined as ‘server assigns resource id’. So that’s where the process should fail, ideally with a 409 as @ian.mcnicoll suggests.

Sorry, I don’t understand, which information? At the rest layer, semantics is supposed to be expressed with status codes and headers, so “if” this was to be allowed, these would be the places to express it.

We recently had a discussion about assigning the resource ids from the client side, it’s somewhere around here. That would probably be a context where what you’re suggesting could happen: exact same resource pushed more than once via REST, with the exact same id. REST kicks in again in that case, if we use PUT, it is supposed to be idempotent etc.

It’s more of a love-hate relationship for me.

SevKohler · 28 April 2023 14:48

Well i will have the case in the future that the provider is not allowed to know what they send so they just bulk POST/PUT stuff.
Obviously it should be PUT not POST when the uuid should be maintained.
I will check the implementations thanks @ian.mcnicoll

Seref · 28 April 2023 14:57

You may find the discussion we had interesting. Sounds obvious initially, but when it comes to REST it’s all angels and pinheads

SevKohler · 28 April 2023 15:01

True, problem with REST is more one of interpretation it the correct way. On top of that vendors may still implement stuff differently.

Topic		Replies	Views
Merging and unmerging EHR Ids and their compositons Platform rest-apis , ehrbase	12	379	13 June 2024
Referring to the enclosing composition from ACTION.instruction_details? Specifications	14	296	26 March 2025
About Versioning and Redundant storage New to openEHR?	20	84	24 January 2026
REST API for creating compositions with id ITS	52	1332	24 March 2023
How to use Persistent Compositions? RM	4	1492	23 March 2020
Missing fields in uploaded composition Platform archetype , template , ehrbase , rest-apis	4	993	16 August 2022
Some confusion about COMPOSITION Specifications	11	893	15 December 2020
Exploring the Use of openEHR for Integrating Patient Health Records Across Multiple Systems New to openEHR?	9	464	18 July 2024
PATCH for persistent compositions Specifications	11	114	19 February 2025
AQL for getting a list of deleted compositions AQL	11	216	5 August 2024

Duplicate check during composition upload

Related topics