# Special treatment of "incomplete" VERSIONs **Category:** [RM](https://discourse.openehr.org/c/rm/42) **Created:** 2020-03-05 14:26 UTC **Views:** 1187 **Replies:** 33 **URL:** https://discourse.openehr.org/t/special-treatment-of-incomplete-versions/420 --- ## Post #1 by @matijap At Better we have requests from multiple customers to implement the following change of behaviour for `VERSION`s that have `lifecycle_state` set to `incomplete`: * When committing incomplete data, the validation should be somewhat relaxed. At the moment we think it is sufficient to skip validation of cardinality lower bounds, in other words, to allow missing data. All other validation (cardinality upper bounds and constraints on values) would be performed normally. * When querying, data from incomplete versions would be completely ignored, unless the query acknowledged existence of `VERSION` and explicitly mentioned its `lifecycle_state` anywhere, in other words, there would have to be a `CONTAINS VERSION v` and anywhere in SELECT or WHERE part of the query there would have to be a reference to `v/lifecycle_state`; in this case the filter that only allowed seeing `complete` data would not be applied. I think we only discussed this on Slack and private discussions with @thomas.beale and @bna. Given the multiple requests we're getting for this we'll implement it ASAP, but we'd like to see it standardised in the near future, and it would be nice to hear opinions if we got the details of this wrong. It seems to apply nicely to the use cases we foresee, but we might have tunnel vision. :) I'd appreciate input from @ian.mcnicoll as well. --- ## Post #2 by @thomas.beale Agree on the general idea. But the second bullet makes for a complicated query. It seems to me that we might want to contemplate a method of options or switches in queries, e.g. `INCLUDE DRAFT` : include data classified as 'draft'. A rule that states what 'is_draft' means for openEHR data would need to be written once and hidden in the system (something that looks for an owner object of type `VERSION` with `lifecycle_state = incomplete`). Maybe the same approach could be used for the exclusion of data not relating to the data subject of the record, e.g . `EXCLUDE NON-SUBJECT` would translate to a rule e.g. one that looks for an owner object of type `ENTRY` with `typeof(subject != PARTY_SELF)`. I'm just thinking of ways to make life easier for the query author. --- ## Post #3 by @matijap May I be so rude to point out that you're adding keywords to a generic language that relate to concepts in a specific RM? ;) I assume someone working with incomplete versions will know about existence of this class of objects and what field stores the information about the contents being incomplete. (I avoid using the word draft because we have a separate functionality named like that, although it might get deprecated when we implement this.) I'd like to hear @ian.mcnicoll and @bna's thoughts. --- ## Post #4 by @thomas.beale [quote="matijap, post:3, topic:420"] May I be so rude to point out that you’re adding keywords to a generic language that relate to concepts in a specific RM? [/quote] haha I knew someone would say that. But it's not really anything to do with openEHR or its RM if we add notions of 'draft' and 'subject'. However, it is more specific than 'any possible model'. As I said in other posts, I would prefer not to put anything in AQL for this kind of thing - I think it goes in a level above, that further filters a result set based on semantic rules specific to the broad class of data (let's say: 'recorded information about a subject'). I just suggested the above as at least a better option (maybe) than having to remember how to mention `VERSION` and `lifecycle_state` so as to get rid of draft data. --- ## Post #5 by @sebastian.iancu Agree. Do you think some of these lines should also appear on REST or CNF spec? --- ## Post #6 by @matijap [quote="thomas.beale, post:4, topic:420"] mention `VERSION` and `lifecycle_state` so as to get rid of draft data. [/quote] Our idea was you had to mention those to get access to draft data, not to get rid of it, but you probably understood that correctly and made a mistake in writing. Well, I'll wait for a few more opinions. :slight_smile: --- ## Post #7 by @thomas.beale [quote="matijap, post:6, topic:420"] Our idea was you had to mention those to get access to draft data, not to get rid of it, but you probably understood that correctly and made a mistake in writing. [/quote] Yes, sorry misquoted you there. But I think the principle is clear. --- ## Post #8 by @matijap I need to report that we couldn't afford to wait for more opinions, so we just pushed our idea through. :) Incomplete data can only be seen now if the query has "CONTAINS VERSION v" and a reference to "v/lifecycle_state" (or any of its fields, naturally) in the SELECT or WHERE part. As for validation, we only ignore lower bounds of _cardinality_ and _occurrence_. _existence_ is validated normally. That is all optional behaviour of the EHR Server, disabled by default. --- ## Post #9 by @thomas.beale [quote="matijap, post:1, topic:420"] At Better we have requests from multiple customers to implement the following change of behaviour for `VERSION` s that have `lifecycle_state` set to `incomplete` : * When committing incomplete data, the validation should be somewhat relaxed. At the moment we think it is sufficient to skip validation of [occurrences and] cardinality lower bounds, in other words, to allow missing data. All other validation (cardinality upper bounds and constraints on values) would be performed normally. * When querying, data from incomplete versions would be completely ignored, unless the query acknowledged existence of `VERSION` and explicitly mentioned its `lifecycle_state` anywhere, in other words, there would have to be a `CONTAINS VERSION v` and anywhere in SELECT or WHERE part of the query there would have to be a reference to `v/lifecycle_state` ; in this case the filter that only allowed seeing `complete` data would not be applied. I think we only discussed this on Slack and private discussions with @thomas.beale and @bna. [/quote] The above is pretty close to how I think we should treat 'incomplete' data: relax the validation such that missing items that are required in the archetype are ok, but if data is there, it has to be valid (i.e. no actually broken data). In addition, for now at least, the only way a query would return incomplete data would be as Matija states above. I still think this is ugly and painful, but nevertheless it's regular AQL and not something new. So as a default, I think it's ok. I want to create a RM 1.1.0 CR for this, so I'd like to know : * if we have general agreement on the validation relaxation principle * if we think this is for any versioned data (COMPOSITION, PARTY, WORK_PLANs...) or just EHR content - if the former, it becomes part of the Change_control chapter in Common IM; othewise it is in the EHR spec. * if we agree on a AQL 1.1.0 (or later) CR to state that 'incomplete' versioned content is never returned in a query, unless it is stated according to the approach above. [edit: there is now [SPECRM-97](https://openehr.atlassian.net/browse/SPECRM-97) for this]. We have some changes in Task Planning that will rely on this, and also Better has already implemented it. I imagine other implems will need to decide on an approach - it would be preferable to all agree on a common one sooner rather than later. Thoughts? --- ## Post #10 by @ian.mcnicoll Matija's suggestions look sensible to me. --- ## Post #11 by @ian.mcnicoll Hi Matija, Can I ask how the updated rules around composition lifecyle_state works with the Better idea of /draft compositions - will /draft become largely redundant or do you see a case for both? Ian --- ## Post #12 by @matijap First let me note that in practice it turned out that perhaps other constraints will need to be lifted as well, such as _existence_ and RM-mandated mandatory attributes. I personally see _incomplete_ versions as a better alternative to drafts, which would make the draft functionality obsolete, yes. The draft functionality is most likely here to stay, though. --- ## Post #13 by @ian.mcnicoll Thanks. I guess it might simpler just to drop any requirement for validation. As long as the crappy composition is not 'discoverable/ by normal querying, as you have suggested. I understand you may need to keep '/draft' for legacy reasons but good to understand that the need for it is essentially deprecated by the new lifecyle behaviour. --- ## Post #14 by @thomas.beale [quote="ian.mcnicoll, post:13, topic:420"] I guess it might simpler just to drop any requirement for validation. As long as the crappy composition is not 'discoverable/ by normal querying, as you have suggested [/quote] I don't agree - I don't see any reason to allow technically broken data in the record (for a start most tools won't function to produce that), and it is likely to break any other tools that try to process it in whatever special mode. I think it is far preferable to stick to the rule that mandatoriness can be ignored, but any structures that are in the data are technically correct. This is easy to understand and code for, and won't break anything that knows this simple rule. --- ## Post #15 by @bna [quote="ian.mcnicoll, post:13, topic:420"] Thanks. I guess it might simpler just to drop any requirement for validation. [/quote] I assume you need to validate to the maximal dataset defined by the RM, and leaving out constraints defined by archetypes and templates? If so I think I agree. As long as the data is serializable. --- ## Post #16 by @pieterbos A clear use case for saving invalid data is some sort of auto-save to prevent data loss when people do things like accidentally closing a browser window, or when they are suddenly called away to do something else. You cannot ask anything to users in these cases, and it is not acceptable to just throw invalid data away. We do this with a different mechanism, so it is not a problem for us, but if that is a use case that needs to be covered, I say one would need to be able to skip more validations. I know the adl 1.4 java library combines the RM validation with object creation, so you cannot create invalid RM objects. This is of course not archetype validation, which should always be separate. That could be a problem here. Are there more tools out there that do such things? --- ## Post #17 by @ian.mcnicoll That makes more sense than my original suggestion! --- ## Post #18 by @bna This seems OK with me. One question : what about versioning? I assume incomplete goes through the same versioning regime as ordinary compositions? If so, this is what distinguish it from drafts which doesn't have to be versioned. As a consequence we need both!? --- ## Post #19 by @pieterbos We version incomplete compositions. I just checked, we allow both invalid and incomplete data, although I do not know if the invalid part is actually used. We have a concept that we call draft, that is for the use case I mentioned above, to prevent data loss. It is not versioned, only visible for the user who authored the data and of a very temporary nature. I have no idea how that relates to the draft concept in other CDRs. --- ## Post #20 by @ian.mcnicoll The scenario we have is and an end of life resuscitation form (persistent composition) which may have an clinically invalid 'draft' state (I guess that's what you would say does not need to be versioned) and a series of updates (with potentially interim drafts). So we can have a situation where there are multiple updated versions, some of which are `incomplete`. I know these do not need to be versioned but I can't see the harm, as long as the querying rules are followed. So I would say that this could replace the current 'draft/unversioned' alternatives that everyone seems to have developed independently. --- ## Post #21 by @ian.mcnicoll How do you currently decide if something is 'draft' or 'incomplete' ? --- ## Post #22 by @bna We actually don't 🤔 The client application (the EHR system using the #openehr CDR) has a document workflow for incomplete compositions. A user may store a draft. Those entries goes into a workflow called "documents for approval". A user might save the draft several times before finishing and approving the content. This is the composition which gets committed to the CDR. For this use case versioning it's not needed. I don't want to open a Pandora box here.... But there are use-cases with multiple authors work simultaneously on a composition. This will require a different protocol for sharing of updates and versioning. I think we can leave this out from current discussions. --- ## Post #23 by @ian.mcnicoll [quote="bna, post:22, topic:420"] I don’t want to open a Pandora box here… But there are use-cases with multiple authors work simultaneously on a composition. This will require a different protocol for sharing of updates and versioning. I think we can leave this out from current discussions. [/quote] I think we need to open the box!! , and have a clear idea of the full requirements in this space. Then decide what needs to be part of the spec (and part of the versioned CDR). These are universal requirements. --- ## Post #24 by @pieterbos So what we do: Draft is auto-saved - there is no user involvement other than using a form and returning to it later. It is deleted when committing the composition. Note that this could just be a feature of an app, perhaps even client-side, not necessarily part of a CDR. The option to save an incomplete composition is presented if a user tries to save with validation errors. They are given the choice between saving an incomplete version or going back to fix the problems. It is an explicit choice of a user to present their input to other users or allow other users to complete their input. @bna Simultaneous multi-user editing is something else indeed. If that needs to be solved, that sounds to me like a separate problem :) --- ## Post #25 by @ian.mcnicoll Simultaneous (Google Docs type) editing - yikes!! I agree but a more common scenario is where the document is 'open' for a number of contributors e.g. a multidisciplinary team document. We may not need to be able to solve all the use-cases but it would be good to understand them and possibly flag those that are definitely out-of-scope. For your incomplete example where someone cannot produce a validated composition, would you mean, for example, that a form demanded a 'diagnosis' (mandated in the template) but the user could not actually give one, and the UI did not allow an alternative? --- ## Post #26 by @pieterbos Anything that is not complete, so I have no full list of use cases ready. A score that is part of several events that need to be recorded separately was our first use case. Could be modelled differently, of course. --- ## Post #27 by @thomas.beale [quote="pieterbos, post:16, topic:420"] A clear use case for saving invalid data is some sort of auto-save to prevent data loss when people do things like accidentally closing a browser window [/quote] I would solve this with timestamped encrypted snapshots of form fills in the application space - I don't think this kind of partial data could be regarded as reliable in any sense at all (except where, by luck, the user happened to have filled enough things in completely). --- ## Post #28 by @thomas.beale [quote="pieterbos, post:24, topic:420"] Draft is auto-saved - there is no user involvement other than using a form and returning to it later. It is deleted when committing the composition. Note that this could just be a feature of an app, perhaps even client-side, not necessarily part of a CDR. [/quote] This seems like the right kind of solution to me. --- ## Post #29 by @Seref I think the definition of the incomplete state, or to be more precise, the inconsistency of the definition is problematic. The way I see it, an RM instance either passes validation with respect to a template (archetype) or not. It is a binary outcome. I cannot understand why the spec allows a 'little bit inconsistent' state based on relaxed cardinality constraints. From the point of view of the validators, that is an invalid instance, period. Would any clinician be able to give a blanket guarantee from a clinical safety p.o.v that some missing items is safer than, say, a free text observation that was not completely typed (because someone interrupted the clinician)? As long as implementers commit to ensuring that incomplete state is not visible to any party until it is committed (serialised transaction isolation level in relational db server terms...) what different does it make 'how incomplete it is' ? Btw as long as a vendor is guaranteeing that data won't be displayed to anybody other than the party/parties working on it, where it is kept is an implementation detail and should be up to the implementer. --- ## Post #30 by @thomas.beale [quote="Seref, post:29, topic:420"] As long as implementers commit to ensuring that incomplete state is not visible to any party until it is committed (serialised transaction isolation level in relational db server terms…) what different does it make ‘how incomplete it is’ ? [/quote] Just one thing to note here: we are talking about data that is being committed, so it is 'visible' in the EHR, in a limited way, so it can be reviewed, edited to completion etc. This is why we are talking about the query processor ignoring it - which it wouldn't need to do with uncommitted data. --- ## Post #31 by @Seref The spec does not seem to clarify for whom this data would be visible, so I'm assuming it'd be limited to the author, which is what the common inf. model spec seems to say, or at least that's my impression from reading it. If it is visible to others, or if this is undefined behaviour, then it makes sense (to me) to limit the visibility of this data to its author and clearly express this in the spec. My main point(s) are; it'd be better if the spec simply defined all data that fails validation for whatever reason as incomplete and it'd also be better if the spec clarified to whom data in this state should be visible to. My 2 pennies. --- ## Post #32 by @thomas.beale [quote="Seref, post:31, topic:420"] The spec does not seem to clarify for whom this data would be visible, so I’m assuming it’d be limited to the author, which is what the common inf. model spec seems to say, or at least that’s my impression from reading it. If it is visible to others, or if this is undefined behaviour, then it makes sense (to me) to limit the visibility of this data to its author and clearly express this in the spec. [/quote] This is a good point and I don't remember it being discussed. @ian.mcnicoll and/or others - any opinions? I guess at least the senior clinicians to whom the author reports need to be able to see the content. --- ## Post #33 by @ian.mcnicoll I think in principle that is correct, it is a temporary workspace for the original author(s) - and that would be the only complication - some documents do have multiple authors -ultimately the access rules would have to be quite local/application-specific. --- ## Post #34 by @pieterbos I think the information should be visible to at least potential future authors, plus people who may need to interpret data so they know there is more data on its way, or to ask someone to complete something they need. Very much application specific. For us it is a way to record something that makes sense as a single composition, that cannot completely be recorded in one session, for any reason, but can safely be completed later. --- **Canonical:** https://discourse.openehr.org/t/special-treatment-of-incomplete-versions/420 **Original content:** https://discourse.openehr.org/t/special-treatment-of-incomplete-versions/420