I had this questions in mind for a long time: did someone implemented the distributed versioning of openEHR?
The specs define a great distributed versioning mechanism but it is a little trickier to implement. Also there is no clear who will do the work of managing that, and how that structure will be queried. It is very difficult to me to think of an amendment sent to an EHR and that not being available for all the parties looking at the EHR of the patient.
In the case of the EHRServer I built, only linear versioning is possible, there is only one latest version for each compo, and queries only get data from latest versions.
Just wondering, what do others did for versioning and what policies do you have if you implemented the distributed approach in terms of branching, merging and querying.
Hi Pablo, I did it a few years ago, just dumped not-current versions in a slow XML database, because, in normal cases they are never queried, and when they need to be queried, there can always be found a faster solution.
But of course, this was a linear version system. ExistDB supports distributed versioning on XML out of the box. And you can also use a normal, not OpenEHR, version system like Git or VCL.
But when looking at how OpenEHR works, is there ever need of merging? Do people edit concurrently same datasets? I think they are they always working on new versions of datasets, there is only one exception, that is the persistent Composition, there could occur merging problems.
But I think, you don't need distributed versioning to handle this, a locking system (like databases have) is, I think, good enough. That is how classic EHR builders handle concurrency.
We have the data structures set up in our database to handle branching and merging. But we do not yet support it for users and do not currently have plans to build it.
The merge process looks like it could be complicated for users.
Merging and keeping the data within the constraints of the archetypes is nearly impossible to do automated. Because, what do you do when person A adds an item to a structure and at the same moment person B adds an item to the same structure, but in the archetype is defined that in that specific structure only one item is allowed. There you have the problem, inconsistent data because of merging. I agree with you that distributed versioning is not feasible, even, sometimes, dangerous. It would be good to remove it from the specs. Bert
Hi Bert the case you mention is not versioning control, is concurrence control. Not sure which system will allow two users to insert invalid data on an item when the archetype constraint says it is not valid. If that is an implementation, I don’t think it is secure in terms of concurrence.
Versioning would be when a uses commits a document that is “complete”. IMO incomplete compos should not be final versions, and if one user is working on an incomplete version, no other user can work on that (read-write work). If two users need to read-write incomplete compos, then 2 separate versions are needed and there you have branches. Linear versioning would not allow to create branches, and new versions would not be created until the user that has the current version in read-write mode finishes and commits the completed version. That is the only way to keep it linear, with locks.
Not sure about removing the current approach from the specs, but creating a simpler alternative might be of use to enable more and quicker implementations.
I must have expressed myself badly. I try again, if I misunderstand you, I am sorry, but that is what I make of your issue, which I think is a valid issue. -------------------------------- It can be concurrency/locking problem to solve when there is no versioning at all or only linear versioning. Then it is simply not allowing to edit a dataset when another has the dataset editing it, there are variants for optimistic and pessimistic locking.. It becomes a merging problem if you allow two instances to create a new version of a dataset, and there is only one dataset to be the result. In that case, the two versions have to be merged. In sourcecode this also sometimes a problem, sometimes hard to do. For example, some lines may only occur once in a Java file. There maybe only one package-line, there maybe only one public class and it must have the same name as the file, etc. Then the version system mostly gives up, and handles the file with the problem to the developer, who mostly immediately starts to curse. I OpenEHR merging conflicts will typically occur in persistent compositions. Two users are concurrently changing a persistent composition dataset. User A post his change, and then user B wants to post it. The version system will not accept it from user B, because the dataset has been changed since User B has opened it. The versioning system must then judge if the change of User B is in conflict with the change of User A. There is a conflict when an automated merge will: 1) overwrite the changes of User A, 2) When the automated merge will be in conflict with an archetype constraint. The best thing the versioning system can do in one of these cases is to handle the change of User A to user B and let him decide what to do. I am pretty sure that 99% of the users of a medical system do not have the talent to solve such a problem. This makes the merging system useless when there is a conflict. And a merging system which can have such a high failure rate is useless. This leaves us to the only version option, that is linear version system, which only one user at a time able to change a dataset. So locking is then needed. Bert
Sorry Pablo, I see your full reply first time now, I didn't see it before on my phone, seems my email app was hiding a part of the text..
Versioning would be when a uses commits a document that is "complete". IMO incomplete compos should not be final versions, and if one user is working on an incomplete version, no other user can work on that (read-write work). If two users need to read-write incomplete compos, then 2 separate versions are needed and there you have branches. Linear versioning would not allow to create branches, and new versions would not be created until the user that has the current version in read-write mode finishes and commits the completed version. That is the only way to keep it linear, with locks.
We have the same conclusion, but you were first. Linear version is needed with locks, only one user at a time changing a persistent composition (or another kind of dataset, but that is unlikely to happen)
Not sure about removing the current approach from the specs, but creating a simpler alternative might be of use to enable more and quicker implementations.
I think it is the only possible conclusion since a distributed version system cannot be used, because of the inability of users to solve manually merging problems.
Sorry again for writing what you had already written just before that.
distributed versioning with branching was included to allow syncing of data gathered about the same patient in different EHR repositories. For most data, syncing the repos is trivial, since it’s different data.
The classic cases of potential clashes is medication list, problem list, basic clinical demographic data, etc. If a sync was started and two medication lists are found that are forks of a single earlier one, a manual merge will be required.
We are only just starting to see the implementation of systems where syncing may be a question, so although there may be adjustments to make to the branched versioning model, I would not be in favour of throwing it out at this point.
We are however going to move it to the BASE component and make it a standalone model.
I did not realise that this discussion reached the point of suggesting that distributed versioning is taken out from the specs. I have been designing and implementing lots of openEHR data syncing functionality which relies on the distributed versioning specifications. I have heaps of work pending, which will also use that part of the specs. I can tell you that the current specs have worked just fine for me so far and they are up to the same high quality as the rest of the specifications, so they are absolutely usable and useful.
The challenges of distributed versioning does not eliminate the need for them, so I cannot agree with the suggestion to remove them. Sure, move them around in the specs all you want, but please keep them.
From Thomas comments, it is clear that we have such last two use cases, internal versioning and cross-system versioning / sync / consolidation.
Consider people here is talking about their own experience with the specs under the use case they implemented. We can argue that internal versioning is needed in 100% of the implementations while cross-system is a much less implemented case. This doesn’t mean that the current specs are not usable and useful in abstract, but we should contextualize the discussion by use case and by the frequency each is implemented.
For internal versioning it is clear that distributed versioning spec generate some friction at implementation time. IMO we need to address both use cases trying to minimize friction for new developers. That can improve the quality of the specs without print anything out.
Also, I would like to hear more about implementations of both use cases and the challenges implementers had to really validate the idea of addressing both use cases explicitly in the specs.
Naturally I am all for revising the specs (it’s what we do and throwing out dead stuff. But one thing I have realised over the years is that many of the scenarios (such as multi-system syncing) we thought of in the 1990s and early 200s are only just coming onto the radar now. Progress is much slower than many of us thought.
Consequently, some types of implementation experience gained so far - particularly anything cross-enterprise or regional - is not going to be an indicator to the future. Of course, some kinds of experience, say with using the RM, or ADL 1.4, AQL etc, has been giving us all the feedback that we needed to make the updates we are currently making to the specifications.
Probably what we should consider in this case is an update to the Change control spec that describes a variant or guideline for using the model to implement linear versioning, while allowing for later addition of branched versioning when/if needed.
I agree that merging is (normally) only interesting for persistent compositions, that are the only kind of compositions which are candidat for simultaneously editing (branching), and then afterwards merging of the branches is needed.
I think, getting rid of the persistent compositions would solve these problems. I don’t see objections against medication-lists in normal compositions. Maybe the persistent composition idea was something from the old days to have all medications handy together?
If that is so, than we can consider that this way of preordening is not anymore needed because modern systems can quickly find medication-entries, and the extra advantage is that branching and merging is then also not needed anymore.
We are using persistent compositions a lot in our system. These are compositions with content that lasts and might be updated several times. To make persistent compositions usable we have introduced “scope”. Examples of scopes is episode of care, period of care, ward stay, etc. A persistent compositions contains information where only the latest version holds the correct data.
So far we haven’t implemented (no need for) branching in versions. But I know that kind of requirements will come.
I think we should keep persistent compositions (and even extend them) and the versioning chapters in the specification. The conformance levels will tell what kind of functions that will be needed in the different profiles.
Is see three different topics; (non-)Persistent, versioning and synchronisation/consolidation.
I see no use of non-persistant flags.
I see two reasons for versioning.
Synchronisation,consolidation is too complex for now.
ERS systems document events.
Each documented event is equally important to document.
What is important now is not later. And vice versa.
Always a subjective opinion is documented by the author.
Persistent
Depending on circumstances an event is important or not.
In this context I fail to grasp the need to label certain events as persistent and others not.
All documented events need to persist in EHR-systems.
Lists re-used data.
Some events warrant to be re-used in context dependent lists: Active medication, Problem list, Previous diagnosis, etc.
Each context, HcProvider will need different lists for different purposes.
These lists are the result of documented events and persist.
Creating lists is an example of re-using data, because list content is derived from pre-existing events/data.
These lists are either changing are not-changing over time.
Versioning Lists
Lists can be updated as the result of new events in the patient system and therefor need to be versioned.
These are non-technical new versions. They are the result because of changes in the patient system
Versioning events
While documenting events and committing them to the data base sometimes event data needs to be changed, updated, corrected.
The same event gets a new technical version, because nothing in the patient system changed but the documentingHcProvider initiated the change.
Synchronisation, merging, syncing
This is a complex topic.
Is there an extensive list of examples and requirements?
@Bert Persistent records are a well know pattern in ehrs and it’s usefulness should not be under question. Of course systems that focus on primary care might not implement them. But for hospital or even regional / national records need a wider view of the patient, persistent shows their value.
@Bjorn, what is the relationship between a scope and OPTs/folders? The concepts you mentioned are likely to be modeled with OPTs and folders.
It is very interesting that you didn’t need branch versioning yet.
Which problem do you solve with persistent records which cannot be solved in another way? Do you agree that persistent records are the only reason to have branching/merging necessary?
well they are likely to be the most common element of an EHR to which branches and merging would be applied. However they are ubiquitous and are also likely to be extended to things like care plans and so on. But in principle, branch and merge could happen to anything in the record, e.g. for reasons like adding competing translations of clinical notes etc.