Duplicate check during composition upload

Hey folks!

I’m currently working on a proof of concept to transform data from an existing medical information system to my test ehrBase server. I am currently ready to send my old data through my parser and create new compositions.

It has happened to me a few times now that I have created several compositions for e.g. laboratory result, because I have restarted my parsing process.

Does openEHR/ehrScape not offer the possibility to recognize exactly identical compositions during the upload? If I interpret the result correctly, currently a new CompositionUID is simply assigned each time, which gives me multiple compositions with the exakt same content (except uid)

Thanks for any advice!

I believe clinically it’s possible to have two documents with the same information.

2 Likes

Rougly speaking: no. The specification (openEHR) does not specify any behaviour regarding the commit of the same content, but I don’t think it’d be possible at all to commit the same composition twice with openEHR.

Why? Because openEHR’s design actually favours what you may call immutability via versioning of compositions. In other words, when it comes to operations on compositions in an EHR, well… you can’t step in the same river twice because of versioning.

Think about it, if you commit a composition without an uid, it means it is a composition not known to the CDR, i.e. not identified by openEHR, and the implementation of it. Once it is identified, it is identified with its version being the part of the identifier. We also have the rule that if you apply an operation to a composition, you must have an identity to identify the composition you’re applying the operation to, so this implies that you can only apply operations to a version of a composition. Since every operation will trigger versioning, and it’ll produce a new version, even if you push the same content for an update, you’ll produce a new version, i.e. a new identity.

The above approach enables openEHR to support medico-legal requirements related to auditability etc, and poking a hole into that would be bad idea, taking away a huge design benefit.

If we allowed silently identifying and re-associating content with an existing identifier (with a version), that’d in theory open doors to a collision attack, allowing a bad intentioned person to modify health records, at least in theory. So I’d argue against it in SEC meetings based on the above reasoning, if it made it to change request :slight_smile:

All that being said, synchronising systems with health data is a complicated task. You’ll most likely to have some sort of logic sitting between your source system and your CDR, to handle the situation you’re facing, and have some checks run before you push your compositions. I’d suggest checking if you have a reliable identifier for the source data, then you can ask the modellers if/how you can associate that identifier with your compositions so that you can perform an insert/update-if-not-exits operation.

3 Likes

@seref So if i getting this right if I POST another composition with the same UUID a newer version will be written to the CDR?
Where do i find this information normally ?
Inside the UUID string ?

PS: I love immutability and that was/is a very smart move in the spec.

No I think you will get a 409, which I think is the correct behaviour. I’d be very wary of making an assumption that a duplicate composition should create a new version - you should never have that happen

In the original question

I would say this is exactly the correct behaviour. In a production environment, this would never happen in normal circumstances, unless perhaps a message service was resending the same message (kinda what you are doing here). I’d want o pick up that 'duplication very explicitly from a message header or identifier, and not by trying to do some sort of diff on the composition content.

So I would say this is largely an artefact of your test environment

1 Like

And I’m happy that my learned friend @Seref and I seem to actually agree for once!!

There are also interesting rules defined in the specs around copy and move operations between CDRs, as well as gnarly issues of de-duplicating EHRs. They certainly have been implemented and tested in real world examples so seem to be sound.

1 Like

You should not. As things stand, POST semantics for openEHR rest api is defined as ‘server assigns resource id’. So that’s where the process should fail, ideally with a 409 as @ian.mcnicoll suggests.

Sorry, I don’t understand, which information? At the rest layer, semantics is supposed to be expressed with status codes and headers, so “if” this was to be allowed, these would be the places to express it.

We recently had a discussion about assigning the resource ids from the client side, it’s somewhere around here. That would probably be a context where what you’re suggesting could happen: exact same resource pushed more than once via REST, with the exact same id. REST kicks in again in that case, if we use PUT, it is supposed to be idempotent etc.

It’s more of a love-hate relationship for me.

1 Like

Well i will have the case in the future that the provider is not allowed to know what they send so they just bulk POST/PUT stuff.
Obviously it should be PUT not POST when the uuid should be maintained.
I will check the implementations thanks @ian.mcnicoll

You may find the discussion we had interesting. Sounds obvious initially, but when it comes to REST it’s all angels and pinheads :slight_smile:

1 Like

True, problem with REST is more one of interpretation it the correct way. On top of that vendors may still implement stuff differently.

1 Like