Option to record free text as choice against DV_CODED_TEXT - do we need DV_PLAIN_TEXT

Thanks Sebastian,

From a modelling POV, it probably does not matter too much in terms of education/training etc. We still have to explain the idea of sub-classing re DV_CODED_TEXT. Either way I would expect the tooling to help people do the ‘right thing’ e.g. if DV_TEXT wa sub-classed to DV_CODED_TEXT, I would expect the tooling to offer, either DV_TEXT with ‘final’ preset or DV_PLAIN_TEXT, as the default. Make it hard for people to do the wrong thing. We can make it easier still by asking very direct questions such as ‘Do you want to allow free text, alongside this codelist’ - the only disadvantage is that you disconnect the modelling view a little from the technical constraint that is given to devs which can then be hard to communicate.

I also doubt very much if we would (either way) do a major retrospective routine update of archetypes. That should be non-breaking, in any case.

So, I would consider this a purely technical decision. The impact on training/tooling etc is, I think going to be much the same, and manageable.

I really do not have a strong preference between PLAIN_TEXT and ‘final’.

DV_NARRATIVE_TEXT as an alternative name?

If we change it from DV_TEXT to DV_NARRATIVE_TEXT (or whatever we can call it), I think that needs to be considered as a breaking change (even if conversion is trivial where DV_TEXT is not a DV_CODED_TEXT)?

DV_NARRATIVE_TEXT or maybe simply DV_UNCODED_TEXT sounds better to me than DV_PLAIN_TEXT in the light of the recent changes to DV_TEXT’s formatting attribute.

Having said that, I am still not sure I see a need that strong here that it justifies a new data type…
If someone really wants to provide additional information by coding the alternative free text as well, is this really so bad that we need to prevent this at the cost of introducing a new data type?

Assuming we go ahead with this addition, I think DV_PLAIN_TEXT is still a better name, we just need to document it clearly to say that it’s about text not something coded, i.e. the ‘plain’ is in contra-distinction to ‘coded’ (text).

If we had our time again we might have gone for something like this:

  • DV_TEXTUAL
    • DV_TEXT
    • DV_CODED

… but we didn’t…

I’d say keep it simple and move on…

I took some time to think about this discussion this morning and realised that I could not really see where exactly the problem is.
I understand the mechanics of the issue, but I cannot see how that turns into a problem.

I also thought about the pros and cons briefly discussed in yesterday’s meeting and realised that we’re looking at some hairy downstream changes in tooling and runtimes no matter which way we go.

Can we have a description of the problem with clear definitions of actors affected by it and how exactly they are affected? Who ends up in a less than ideal situation and why?

1 Like

Most of us had the same problem. Have a look at the first post again, Ian’s bit on Fabio’s problem in the tooling.

So a little bit of clarity + coffee for me this morning made me realise the following: if you have an RM class (eg. DV_TEXT) that can itself be concretely instantiated and also has a concretely instantiable child(ren) (e.g. DV_CODED_TEXT), to enable a constraint to enforce the runtime instance to conform to the parent class, we need an ability to constrain a node to an instance of ‘just this class’, i.e. no child class instances.

This is a generic OO model property, it’s not specific to openEHR in any way, as all the devs/tech people here will know perfectly well (i.e. the fact that you can easily have instantiable parent classes, not only abstract ones up the inheritance hierarchy).

To fix this, we need an additional constrainer flag on C_OBJECT in the AOM, something like ‘rm_type_fixed’. Now, that is more or less the ‘final’ keyword that @sebastian.iancu was suggesting, and I think someone suggested ‘frozen’ as well. @Seref and I were against this, but I now think the variant I am talking about i.e. ‘rm type fixed’ is needed - maybe ‘rm type frozen’ or ‘rm type final’ are better names (I would agree).

However… this doesn’t fix everything… see @pieterbos’s post (#16) above - you could still add another alternative, i.e. by this:

value matches {
    DV_PLAIN_TEXT[id5] occurrences matches {0}
    DV_CODED_TEXT[id4] occurrences matches {0}
    DV_CODED_TEXT[id0.2]
} 

We have another need: to be able to prevent the addition of any more ‘alternatives’ (single-valued attribute) or other children (container attribute). We do have this on ARCHETYPE_SLOT already, it’s called is_closed, i.e. closed to more ‘filling’.

I think we need this kind of idea but on C_ATTRIBUTE. That would prevent the situation Pieter was talking about.

However, we are not done yet! @pieterbos thought of another case related to this, see SPECAM-59. The idea here is to prohibit instances of some specific RM type while allowing others. I don’t think this applies to our current case, since we don’t want to prevent instances of DV_CODED_TEXT, we want to prevent new constraints for that type, subverting the existing constraint. I just mention this here for completeness, in case anyone thinks it does apply.

So, I think to achieve everything @ian.mcnicoll and the clinical modellers want, we need to be able to write the following:

value closed matches {
    DV_CODED_TEXT[id4] matches {<value set with or without bindings>}
    DV_TEXT[id5] rm_type_final matches {<e.g. max 100 chars>}
} 

Those two keywords ‘closed’ and ‘rm_type_final’ achieve what we need:

  • no new alternative constraints e.g. in the template (or a child template) that could subvert the 2 constraints we have
  • no redefining of the DV_TEXT into DV_CODED_TEXT.

I think this gets us out of needed any DV_PLAIN_TEXT. In any case, the above AOM capabilities can be justified with respect to any OO RM, so I think we should add them. If we still want DV_PLAIN_TEXT later, that will be another matter.

Obviously how we make this ‘easy to understand’ on a tool UI is a question for tool designers.

SEC: see if you can break this analysis; would be good if we can get @borut.fabjan to have a look as well.

@thomas.beale I’m trying to follow this while debugging some code written by an idiot (named Seref according to git commit history) so I’ll just say that you’re still discussing the mechanics of the cure, without commenting on those said mechanics.

What I cannot see in Ian’s relayed message from @borut.fabjan is how a dv_coded_text constraint can be weakened.
if a field is declared dv_text, it won’t declare any terminology constraints. so the modeller should use a dv_coded_text, which cannot be implicitly overriden. Someone has to create a new archetype to weaken the term-set bindings of that. no sneaking up here.

if the field is dv_text, the modeller already gave up on the option of constraining its value set/term set.

This is why I’m confused. I cannot see the case the modeller is unable to protect their constraints via existing mechanisms.

1 Like

So the starting point is the very common pattern:

value matches {
    DV_CODED_TEXT[id4] matches {<something>}
    DV_TEXT[id5] matches {<something>}
}

which is intended to say:

  • EITHER match this terminology constraint
  • OR a plain text is ok (maybe limited in length or whatever)

The design intention of the DV_TEXT in this circumstance really is ‘just plain text here!!!’

And that’s the problem - you can always redefine a DV_TEXT in an archetype to be a DV_CODED_TEXT in a template or more specialised archetype, and that latter DV_CODED_TEXT can always subvert the constraint in a DV_CODED_TEXT node that you already have.

I did write a comment expecting this, then deleted it, because I expected Ian to write this.

The modeller, as I see it, is defining two options here. One of them is strictly controlled in terms of its terminology bindings, the other not.

As per my comment yesterday, the archetype path of this node, along with its name, implies that the semantics of both options are the same, but in this case, they are not. so they should not be alternatives, they should have their own paths.

Yes, someone can extend dv_text even then, and introduce a relaxed set for terminology values but that’s clearly using the wrong node in the model when there is a dedicated node to provide coded data.

The modeller can save a whole lot of others by making their concern explicit. There’ll be some inconvenience to live with but compared to complexity of the remedies we’re discussing here, I’d go for what I said above. Unless I’m missing something here.

ps: I’d assume it would not be too difficult to catch this tricky condition in the modelling tools and warn the modeller.

Not sure I understand this bit - the two alternative constraints have distinct paths:

  • .../value[id4]
  • .../value[id5]

You might be thinking in ADL1.4 terms, but in AOM2, every node has an id-code (it’s just that not all id-codes have terminology definitions). So in the final data you can always tell which node the instance data conformed to. Indeed, you could have a DV_CODED_TEXT in the data containing a term that doesn’t match the id4 constraint, and it would still be allowed to match the id5 constraint, since a DV_TEXT constraint node means ‘a DV_TEXT instance, or an instance of any RM sub-type is ok here’. Which is exactly the thing we are trying to prevent, which is why we need to do something special. So for now, I am proposing only additions to the AOM, nothing on the RM.

We humans know that, how to tell the tools that? :slight_smile:

Ok, I see what you mean now. I think my point re the misuse of [id5] still stands though, and it is not easy to enable that misuse either, unless you’re using some sort of form generator approach that automatically allows a UI widget etc to offer dv_coded_text based input for every dv_text node in the archetype.

I (currently) think, if we are to disallow this, some AOM level annotation approach would be the cleanest option, after having this discussion, but I’m very reluctant to make use of that anywhere but validation. Implementers can choose to do whatever they want of course, but there be dragons…

Anyway, this is all I can contribute today, hopefully I’ll go over this again, if the above comments don’t make sense, just ignore them on account of being made with less attention than the issue deserves.

1 Like

Ah but that is the key issue. In the UK allergies context I want to enforce the use of agreed terminologies but I also have t to accept that there are cases where only free text can be used (foreign drugs, experimental drugs). What I don’t want to be used is any other kind of terminology. It’s either use our agreed terminology constraints or free text - no other terminologies allowed.

and I’m happy with the AOM suggestion.

There is possibly some additional value beyond just the terminology issue. There are a few places where being able to prevent the use of codes in a DV_TEXT constraint i.e prevent sub-classing to DV_CODED_TEXT, would be valuable e.g where the purpose of the element is to carry pure narrative.

I’m being very rude, I know, but how about telling the software implementer not to allow coded entry to dv_text field and let validation take care of your terminology constraint for the other field?

It is not like models are thrown into a magic room where software comes from another room. these things are not meant to be and never turned into software fully automatically. I wish they could, we’d then all find new jobs and have normal lives…

You are being rude but in this case ‘English rude’ but not ‘Turkish/Scottish rude’ , which is a whole other thing :wink:

In the context of a single developer environment you are probably correct but it a world where we are trying establish the rules for a wide range of different downstream suppliers, the less we have to depend on written guidance, the better.

We definitely need to get @borut.fabjan into the discussion as this was a real concern for him which he raised in Braunschweig.

So I think my rm_type_final proposal, or something close to it, will do what you want, even in circumstances other than the DV_CODED_TEXT + DV_TEXT situation.

Well sometimes you do want to allow this - maybe with other RM types like PARTY_PROXY and subtypes, or classes in another RM entirely. We need to be able to signal either design intention to the tools.

1 Like

@thomas.beale proposal above seems to me a good one, and it might avoid introducing a new class as a patch (for the problem), and like @Seref mentioned, is a cleaner solution to the real problem.

1 Like

The not-yet-quite-expert-panelist @borut.fabjan does not seem to have access to this category. Could somebody grant that access to him? Does it matter whether he somewhat-formally agrees to his status of an expert? :slight_smile:

Not anymore. He can now see everything…

1 Like

For us the closed/rm_type_final part would be much easier to implement than a new data type - just add the changes in the AOM and the grammar, add two archetype validations and an RM object validation and this is fully supported. DV_PLAIN_TEXT would require changes to be made in any app that could potentially encounter the data type.
If any apps exist that allow for a DV_TEXT to be replaced with a DV_CODED_TEXT at data entry time, so without template or specialisation, that will require work - but those require work for a DV_PLAIN_TEXT type as well.

However, I still fail to see that this solves a real problem, and not just a theoretical one.
We could also add a warning to the archetype validation if someone tries to specialise a DV_TEXT in a DV_CODED_TEXT in the specific case that another DV_CODED_TEXT alternative is already available, explaining why doing this is generally a bad idea and why any recorded data will fail to validate against the parent archetype.

I did also think of this, and indeed, we could go as far as stating it as a rule. It seems sensible but I suppose someone might eventually come up with a reason why it should not apply in some other circumstances…