Option to record free text as choice against DV_CODED_TEXT - do we need DV_PLAIN_TEXT

thomas.beale · 1 May 2020 17:48

Thinking a bit more about this… if it’s going to help, it’s a reasonably easy change to make to the RM, we could add it into 1.1.0. Is the group convinced (apart from @ian.mcnicoll ?
I’m ok with the change, so if the consensus is to do it, we might as well do it sooner rather than later.

It is also an easy change to make to the BMM files, and that is what drives ArchetypeDesigner, so the change will appear more or less instantly. It would be interesting to know if making the addition to the BMM (e.g. as a test) indeed fixes the problem @borut.fabjan reported last year.

pablo · 1 May 2020 20:07

Isn’t that the whole idea of the AOM to validate RM instances? I don’t see the issue, this is what we’ve been doing since day one in openEHR.

The RM is just a container, the AOM is what allows to make sense and validate whatever you put in the RM.

What I’m concerned about is to add solutions to modeling issues by changing the RM instead of the AOM, which is the model for modeling. If we need more constraints over the RM, we know that is at the AOM level. I would be happy to change the RM if we really need new RM semantics (new constructors), but not for new constraints.

thomas.beale · 2 May 2020 10:08

Just to be clear: yes The sense of your response is 100% correct. It seems that modellers need a ‘pure text’ type (which we did actually have in an earlier design).

I’m still not clear myself how this change would fix a problem in ArchetypeDesigner but hope we can find that out.

Seref · 4 May 2020 10:58

Please re-consider the part of my response before the “but…”

Let me try to rephrase my previous responses.

We’d like as much of the validation to be done as early as possible during the software development lifecycle. If validation can be projected to programming language level, this stops the programmer from creating invalid data. Validation can always catch errors at a later stage, but by then, time is spent, code is written and figuring out what went wrong will take time. Especially if code created an instance of RM, then serialised it and pushed it to some openEHR API implementation. Now the error is caught by validation that happends all the way into some other system’s stack, behind a REST api etc.

Some constraints used in openEHR modelling are easier to map to programming languages and artefacts, some are harder. Using some sort of ‘final’ modifier at the modelling level is hard to enforce at the programming language level as I tried to briefly explain. Possible, but harder than the case where we introduce a dedicated RM type.

Therefore, my preference to avoid a ‘final’ approach is based on my view of pros/cons of implementability. It is also based on an assumption of mainstream, statically typed programming languages such as C# or Java.

Hope that clarifies it.

ian.mcnicoll · 4 May 2020 11:57

Hi Thomas,

Currently .oet based templates support the DV_CODED_TEXT/DV_TEXT construct, where coded_term constraints are introduced at template-level. This is not supported by ADL2, and my understanding from Fabio was that he was reluctant to add this capacity to AD, precisely because of the issue of people potentially working around strict coded_text constraints. Right now we can constrain a DV_TEXT to a DV_CODED_TEXT at template level in AD, but we cannot maintain any existing DV_TEXT ‘choice’ whether that exists in the original archetype, or we would like to be persisted as we subclass DV_TEXT to DV_CODED_TEXT. So it is not a limitation of the tool per-se but that Fabio was reluctant to implement the current arrangement until this was clearly support in ADL2 (along with some other issues that I have raised elsewhere).

I may have go that wrong, and I’ll try to get Fabio to confirm, but that’s my understanding.

As I understand it, this would resolve Fabio’s discomfort, which I do understand. From a modelling perspective I have no preference re an attribute on DV_TEXTvs. a sub-class. I think the sub-class is easier to understand, particularly when we have to explain sub-classing of DV_TEXT to modellers anyway. There are definitely a few use-cases for PLAIN_TEXT apart from that.

thomas.beale · 4 May 2020 13:50

This doesn’t sound right to me. An ADL2 template is just more archetypes, with the addition of a structure that allows overlays to be ‘inline’ for the template (as we designed some years ago;). So there’s no problem to add the same kind of coded term constraints as you can in an archetype.

Or - are we talking about the strict v non-strict idea of ‘introducing coded constraints’ in a template?

If that’s so, we agreed on the fix for that, which is to add a strength attribute to C_TERMINOLOGY_CODE and a function to indicate required() (so a quick check can be done), as per this discussion.

So assuming we implement that, is there still a problem for AD?

I’m fine with that. But let’s deal with each issue in turn…

ian.mcnicoll · 5 May 2020 10:02

I think we have (had) 3 issues around current .oet/.opt functionality that Fabio saw as a blocker.

THe issue around sub-classing DV_TEXT, as discussed here
the issue with ‘loosening’ coded_term constraints ‘binding-strength’ that we have resolved.
The ability of .oet to include 'in-line, extensional codesets (not bindings) i.e. rather than creating an internal list of atCodes, I can create an ‘internal list’ of SNOMED terms Example [here])https://ckm.apperta.org/ckm/templates/1051.57.259) under significant co-morbidities.
We have started discussing this, but I don’t think we have got it fully thought through. It is a bit of a complex space that includes the discussion on bindings/mappings.

My understanding is that once these are resolved we are in a much better position to mode quickly to adopting AOM2 as the primary ‘template’ formalism. In discussion with Fabio, there were a couple of other ‘technical’ gaps related to embedded templates but nothing that would be a major blocker.

thomas.beale · 5 May 2020 10:54

If I am not mistaken, this is about the question of what codes you actually want in the data - the at-codes or some ‘native codes’ that are in use, or even just SNOMED codes. This is about representation in data, not archetypes or template per se. This is discussed in the ADL2 spec.

The reasoning there might need to be updated, so don’t take it as gospel. However, the basic idea is that the choice of what kind of codes to store in the data (i.e. what do CODE_PHRASE instances look like) should be made in a stage of OPT generation. See the above discussion for examples.

ian.mcnicoll · 6 May 2020 10:12

I’ll have another look at that but choosing the optimal mapping/binding/defining_code into instance data approach, will ultimately need to be done on a per-element basis. It’s not a generic per-template setting than can be applied.

thomas.beale · 6 May 2020 10:28

Agree with that - take a look at the ref to see how it could be done (not saying there aren’t better ways).

sebastian.iancu · 19 May 2020 09:35

off-topic: Unfortunately I did not notice this thread in the last weeks, so I could not reply in time. For some unknown reason, lately I don’t get right notification in discourse and neither new/active thread are not marked as new/unread.

From my point of view, if we could somehow avoid having that new class DV_PLAIN_TEXT would be better, because I think otherwise it will make not much sense for newbies or clinical modeler, when they have to pick DV_TEXT vs DV_PLAIN_TEXT. Even with proper explanation, they will have to have good understanding of the ADL / AOM / archetype / template, or very clear user-friendly tooling. Plus, if we going to introduce this, I also expect that in next period a lot of (published) archetypes are going to be changed to follow the new pattern, which eventually will cause a chain reaction (or at least a presure) on RM/AQL/persistance implementations.

From my point of view, if the goal is to block the extension of DV_TEXT alternative, then is reasonable to work on AOM / ADL for a ‘final’ kind of keyword.

ian.mcnicoll · 19 May 2020 10:24

Thanks Sebastian,

From a modelling POV, it probably does not matter too much in terms of education/training etc. We still have to explain the idea of sub-classing re DV_CODED_TEXT. Either way I would expect the tooling to help people do the ‘right thing’ e.g. if DV_TEXT wa sub-classed to DV_CODED_TEXT, I would expect the tooling to offer, either DV_TEXT with ‘final’ preset or DV_PLAIN_TEXT, as the default. Make it hard for people to do the wrong thing. We can make it easier still by asking very direct questions such as ‘Do you want to allow free text, alongside this codelist’ - the only disadvantage is that you disconnect the modelling view a little from the technical constraint that is given to devs which can then be hard to communicate.

I also doubt very much if we would (either way) do a major retrospective routine update of archetypes. That should be non-breaking, in any case.

So, I would consider this a purely technical decision. The impact on training/tooling etc is, I think going to be much the same, and manageable.

I really do not have a strong preference between PLAIN_TEXT and ‘final’.

ian.mcnicoll · 10 August 2020 12:56

DV_NARRATIVE_TEXT as an alternative name?

sebastian.garde · 10 August 2020 14:52

If we change it from DV_TEXT to DV_NARRATIVE_TEXT (or whatever we can call it), I think that needs to be considered as a breaking change (even if conversion is trivial where DV_TEXT is not a DV_CODED_TEXT)?

DV_NARRATIVE_TEXT or maybe simply DV_UNCODED_TEXT sounds better to me than DV_PLAIN_TEXT in the light of the recent changes to DV_TEXT’s formatting attribute.

Having said that, I am still not sure I see a need that strong here that it justifies a new data type…
If someone really wants to provide additional information by coding the alternative free text as well, is this really so bad that we need to prevent this at the cost of introducing a new data type?

thomas.beale · 10 August 2020 19:08

Assuming we go ahead with this addition, I think DV_PLAIN_TEXT is still a better name, we just need to document it clearly to say that it’s about text not something coded, i.e. the ‘plain’ is in contra-distinction to ‘coded’ (text).

If we had our time again we might have gone for something like this:

DV_TEXTUAL
- DV_TEXT
- DV_CODED

… but we didn’t…

I’d say keep it simple and move on…

Seref · 11 August 2020 07:51

I took some time to think about this discussion this morning and realised that I could not really see where exactly the problem is.
I understand the mechanics of the issue, but I cannot see how that turns into a problem.

I also thought about the pros and cons briefly discussed in yesterday’s meeting and realised that we’re looking at some hairy downstream changes in tooling and runtimes no matter which way we go.

Can we have a description of the problem with clear definitions of actors affected by it and how exactly they are affected? Who ends up in a less than ideal situation and why?

thomas.beale · 11 August 2020 10:20

Most of us had the same problem. Have a look at the first post again, Ian’s bit on Fabio’s problem in the tooling.

So a little bit of clarity + coffee for me this morning made me realise the following: if you have an RM class (eg. DV_TEXT) that can itself be concretely instantiated and also has a concretely instantiable child(ren) (e.g. DV_CODED_TEXT), to enable a constraint to enforce the runtime instance to conform to the parent class, we need an ability to constrain a node to an instance of ‘just this class’, i.e. no child class instances.

This is a generic OO model property, it’s not specific to openEHR in any way, as all the devs/tech people here will know perfectly well (i.e. the fact that you can easily have instantiable parent classes, not only abstract ones up the inheritance hierarchy).

To fix this, we need an additional constrainer flag on C_OBJECT in the AOM, something like ‘rm_type_fixed’. Now, that is more or less the ‘final’ keyword that @sebastian.iancu was suggesting, and I think someone suggested ‘frozen’ as well. @Seref and I were against this, but I now think the variant I am talking about i.e. ‘rm type fixed’ is needed - maybe ‘rm type frozen’ or ‘rm type final’ are better names (I would agree).

However… this doesn’t fix everything… see @pieterbos’s post (#16) above - you could still add another alternative, i.e. by this:

value matches {
    DV_PLAIN_TEXT[id5] occurrences matches {0}
    DV_CODED_TEXT[id4] occurrences matches {0}
    DV_CODED_TEXT[id0.2]
}

We have another need: to be able to prevent the addition of any more ‘alternatives’ (single-valued attribute) or other children (container attribute). We do have this on ARCHETYPE_SLOT already, it’s called is_closed, i.e. closed to more ‘filling’.

I think we need this kind of idea but on C_ATTRIBUTE. That would prevent the situation Pieter was talking about.

However, we are not done yet! @pieterbos thought of another case related to this, see SPECAM-59. The idea here is to prohibit instances of some specific RM type while allowing others. I don’t think this applies to our current case, since we don’t want to prevent instances of DV_CODED_TEXT, we want to prevent new constraints for that type, subverting the existing constraint. I just mention this here for completeness, in case anyone thinks it does apply.

So, I think to achieve everything @ian.mcnicoll and the clinical modellers want, we need to be able to write the following:

value closed matches {
    DV_CODED_TEXT[id4] matches {<value set with or without bindings>}
    DV_TEXT[id5] rm_type_final matches {<e.g. max 100 chars>}
}

Those two keywords ‘closed’ and ‘rm_type_final’ achieve what we need:

no new alternative constraints e.g. in the template (or a child template) that could subvert the 2 constraints we have
no redefining of the DV_TEXT into DV_CODED_TEXT.

I think this gets us out of needed any DV_PLAIN_TEXT. In any case, the above AOM capabilities can be justified with respect to any OO RM, so I think we should add them. If we still want DV_PLAIN_TEXT later, that will be another matter.

Obviously how we make this ‘easy to understand’ on a tool UI is a question for tool designers.

SEC: see if you can break this analysis; would be good if we can get @borut.fabjan to have a look as well.

Seref · 11 August 2020 10:28

@thomas.beale I’m trying to follow this while debugging some code written by an idiot (named Seref according to git commit history) so I’ll just say that you’re still discussing the mechanics of the cure, without commenting on those said mechanics.

What I cannot see in Ian’s relayed message from @borut.fabjan is how a dv_coded_text constraint can be weakened.
if a field is declared dv_text, it won’t declare any terminology constraints. so the modeller should use a dv_coded_text, which cannot be implicitly overriden. Someone has to create a new archetype to weaken the term-set bindings of that. no sneaking up here.

if the field is dv_text, the modeller already gave up on the option of constraining its value set/term set.

This is why I’m confused. I cannot see the case the modeller is unable to protect their constraints via existing mechanisms.

thomas.beale · 11 August 2020 10:34

So the starting point is the very common pattern:

value matches {
    DV_CODED_TEXT[id4] matches {<something>}
    DV_TEXT[id5] matches {<something>}
}

which is intended to say:

EITHER match this terminology constraint
OR a plain text is ok (maybe limited in length or whatever)

The design intention of the DV_TEXT in this circumstance really is ‘just plain text here!!!’

And that’s the problem - you can always redefine a DV_TEXT in an archetype to be a DV_CODED_TEXT in a template or more specialised archetype, and that latter DV_CODED_TEXT can always subvert the constraint in a DV_CODED_TEXT node that you already have.

Seref · 11 August 2020 10:42

I did write a comment expecting this, then deleted it, because I expected Ian to write this.

The modeller, as I see it, is defining two options here. One of them is strictly controlled in terms of its terminology bindings, the other not.

As per my comment yesterday, the archetype path of this node, along with its name, implies that the semantics of both options are the same, but in this case, they are not. so they should not be alternatives, they should have their own paths.

Yes, someone can extend dv_text even then, and introduce a relaxed set for terminology values but that’s clearly using the wrong node in the model when there is a dedicated node to provide coded data.

The modeller can save a whole lot of others by making their concern explicit. There’ll be some inconvenience to live with but compared to complexity of the remedies we’re discussing here, I’d go for what I said above. Unless I’m missing something here.

ps: I’d assume it would not be too difficult to catch this tricky condition in the modelling tools and warn the modeller.