Special treatment of "incomplete" VERSIONs

At Better we have requests from multiple customers to implement the following change of behaviour for VERSIONs that have lifecycle_state set to incomplete:

  • When committing incomplete data, the validation should be somewhat relaxed. At the moment we think it is sufficient to skip validation of cardinality lower bounds, in other words, to allow missing data. All other validation (cardinality upper bounds and constraints on values) would be performed normally.
  • When querying, data from incomplete versions would be completely ignored, unless the query acknowledged existence of VERSION and explicitly mentioned its lifecycle_state anywhere, in other words, there would have to be a CONTAINS VERSION v and anywhere in SELECT or WHERE part of the query there would have to be a reference to v/lifecycle_state; in this case the filter that only allowed seeing complete data would not be applied.

I think we only discussed this on Slack and private discussions with @thomas.beale and @bna.

Given the multiple requests we’re getting for this we’ll implement it ASAP, but we’d like to see it standardised in the near future, and it would be nice to hear opinions if we got the details of this wrong. It seems to apply nicely to the use cases we foresee, but we might have tunnel vision. :slight_smile: I’d appreciate input from @ian.mcnicoll as well.

Agree on the general idea. But the second bullet makes for a complicated query.

It seems to me that we might want to contemplate a method of options or switches in queries, e.g.

INCLUDE DRAFT : include data classified as ‘draft’.

A rule that states what ‘is_draft’ means for openEHR data would need to be written once and hidden in the system (something that looks for an owner object of type VERSION with lifecycle_state = incomplete).

Maybe the same approach could be used for the exclusion of data not relating to the data subject of the record, e.g .

EXCLUDE NON-SUBJECT

would translate to a rule e.g. one that looks for an owner object of type ENTRY with typeof(subject != PARTY_SELF).

I’m just thinking of ways to make life easier for the query author.

May I be so rude to point out that you’re adding keywords to a generic language that relate to concepts in a specific RM? :wink:

I assume someone working with incomplete versions will know about existence of this class of objects and what field stores the information about the contents being incomplete. (I avoid using the word draft because we have a separate functionality named like that, although it might get deprecated when we implement this.) I’d like to hear @ian.mcnicoll and @bna’s thoughts.

haha I knew someone would say that. But it’s not really anything to do with openEHR or its RM if we add notions of ‘draft’ and ‘subject’. However, it is more specific than ‘any possible model’.

As I said in other posts, I would prefer not to put anything in AQL for this kind of thing - I think it goes in a level above, that further filters a result set based on semantic rules specific to the broad class of data (let’s say: ‘recorded information about a subject’).

I just suggested the above as at least a better option (maybe) than having to remember how to mention VERSION and lifecycle_state so as to get rid of draft data.

Agree.
Do you think some of these lines should also appear on REST or CNF spec?

Our idea was you had to mention those to get access to draft data, not to get rid of it, but you probably understood that correctly and made a mistake in writing.

Well, I’ll wait for a few more opinions. :slight_smile:

Yes, sorry misquoted you there. But I think the principle is clear.

I need to report that we couldn’t afford to wait for more opinions, so we just pushed our idea through. :slight_smile: Incomplete data can only be seen now if the query has “CONTAINS VERSION v” and a reference to “v/lifecycle_state” (or any of its fields, naturally) in the SELECT or WHERE part. As for validation, we only ignore lower bounds of cardinality and occurrence. existence is validated normally. That is all optional behaviour of the EHR Server, disabled by default.

2 Likes

The above is pretty close to how I think we should treat ‘incomplete’ data: relax the validation such that missing items that are required in the archetype are ok, but if data is there, it has to be valid (i.e. no actually broken data).

In addition, for now at least, the only way a query would return incomplete data would be as Matija states above. I still think this is ugly and painful, but nevertheless it’s regular AQL and not something new. So as a default, I think it’s ok.

I want to create a RM 1.1.0 CR for this, so I’d like to know :

  • if we have general agreement on the validation relaxation principle
  • if we think this is for any versioned data (COMPOSITION, PARTY, WORK_PLANs…) or just EHR content - if the former, it becomes part of the Change_control chapter in Common IM; othewise it is in the EHR spec.
  • if we agree on a AQL 1.1.0 (or later) CR to state that ‘incomplete’ versioned content is never returned in a query, unless it is stated according to the approach above.

[edit: there is now SPECRM-97 for this].

We have some changes in Task Planning that will rely on this, and also Better has already implemented it. I imagine other implems will need to decide on an approach - it would be preferable to all agree on a common one sooner rather than later.

Thoughts?

2 Likes

Matija’s suggestions look sensible to me.

1 Like

Hi Matija,

Can I ask how the updated rules around composition lifecyle_state works with the Better idea of /draft compositions - will /draft become largely redundant or do you see a case for both?

Ian

First let me note that in practice it turned out that perhaps other constraints will need to be lifted as well, such as existence and RM-mandated mandatory attributes.

I personally see incomplete versions as a better alternative to drafts, which would make the draft functionality obsolete, yes. The draft functionality is most likely here to stay, though.

Thanks. I guess it might simpler just to drop any requirement for validation. As long as the crappy composition is not 'discoverable/ by normal querying, as you have suggested.

I understand you may need to keep ‘/draft’ for legacy reasons but good to understand that the need for it is essentially deprecated by the new lifecyle behaviour.

I don’t agree - I don’t see any reason to allow technically broken data in the record (for a start most tools won’t function to produce that), and it is likely to break any other tools that try to process it in whatever special mode. I think it is far preferable to stick to the rule that mandatoriness can be ignored, but any structures that are in the data are technically correct. This is easy to understand and code for, and won’t break anything that knows this simple rule.

I assume you need to validate to the maximal dataset defined by the RM, and leaving out constraints defined by archetypes and templates?

If so I think I agree. As long as the data is serializable.

A clear use case for saving invalid data is some sort of auto-save to prevent data loss when people do things like accidentally closing a browser window, or when they are suddenly called away to do something else. You cannot ask anything to users in these cases, and it is not acceptable to just throw invalid data away.
We do this with a different mechanism, so it is not a problem for us, but if that is a use case that needs to be covered, I say one would need to be able to skip more validations.

I know the adl 1.4 java library combines the RM validation with object creation, so you cannot create invalid RM objects. This is of course not archetype validation, which should always be separate. That could be a problem here. Are there more tools out there that do such things?

That makes more sense than my original suggestion!

This seems OK with me.
One question : what about versioning? I assume incomplete goes through the same versioning regime as ordinary compositions?

If so, this is what distinguish it from drafts which doesn’t have to be versioned.

As a consequence we need both!?

We version incomplete compositions. I just checked, we allow both invalid and incomplete data, although I do not know if the invalid part is actually used.
We have a concept that we call draft, that is for the use case I mentioned above, to prevent data loss. It is not versioned, only visible for the user who authored the data and of a very temporary nature. I have no idea how that relates to the draft concept in other CDRs.

The scenario we have is and an end of life resuscitation form (persistent composition) which may have an clinically invalid ‘draft’ state (I guess that’s what you would say does not need to be versioned) and a series of updates (with potentially interim drafts).

So we can have a situation where there are multiple updated versions, some of which are incomplete.

I know these do not need to be versioned but I can’t see the harm, as long as the querying rules are followed. So I would say that this could replace the current ‘draft/unversioned’ alternatives that everyone seems to have developed independently.