AOM 1.5 - single v multiple attributes

A discussion I think has occurred in the distant past.. and it is now
more relevant than before, with the new interest in ADL/AOM by the CIMI
group.

do we really need C_SINGLE_ATTRIBUTE and C_MULTIPLE_ATTRIBUTE
descendants of C_ATTRIBUTE in the AOM? In the reference compiler I have
always had just C_ATTRIBUTE with an is_multiple flag. This flag is set
at object creation time (or it could be later), but is independent of
whether the C_ATTRIBUTE has multiple 'children' (C_OBJECTs). This is
because even a C_ATTRIBUTE representing a single-valued attribute (like
OBSERVATION.protocol) can have multiple children, each representing an
alternative. Thus, almost all the logic to do with multiple children is
the same, regardless of the is_multiple flag.

No structural logic in my implementation at least is affected by the
is_multiple flag - i.e. everything to do with managing
C_ATTRIBUTE.children is the same, regardless of whether the C_ATTIRBUTE
object represents a constraint on a single-valued or a container RM
attribute.

Qualititatively, the reason for this is: regardless of the attribute
type (single-valued or container), the constraint on the child object/s
is always potentially a number of C_OBJECTs. The only difference is in
the actual data - can the attribute accommodate one or more child
objects? But in constraint-land, this distinction is nearly invisible.

So my proposal for AOM 1.5 is to reduce C_ATTRIBUTE / C_SINGLE_ATTRIBUTE
/ C_MULTIPLE_ATTRIBUTE to just C_ATTRIBUTE. Having implemented 90% of
ADL/AOM 1.5, this feels safer to me right now, because I know I can
state all of the class logic (e.g. notion of conformance of a C_OBJECT
node in a specialised archetype to the counterpart in its parent
archetype).

The impact should be limited to breaking some code in the Java and other
ADL/AOM 1.4 compilers when being upgraded to ADL/AOM 1.5; but it should
be a simplification (I say this because I have never implemented the two
C_ATTRIBUTE subtypes, not even in the 1.4 compiler, and there has never
been any pressing need to....).

In summary: I think it is more appropriate to do this simplification in
the AOM1.5 spec, because it means that the entirety of the spec is
backed by a real compiler, with fully implemented logic.

reactions?

- thomas

I Just implemented this specification.
Agree!

Shinji

Thomas,

I agree with you.
This always has been annoying to deal with and is a pretty much useless complication of things.
I believe it will break things in a couple of places, including in my code, but nothing that shouldn’t be found and fixed easily.

Sebastian

(attachments)

oceanlogo.png

Hi Thomas,
It may not affect archetype compilers but I think it cause issues with instance validation. From top of my head I think our opt validator use differ logic depending on single and multiple attribute. For example, single is assumed to always a choice and multiple is always a sequence. I don’t see how we know this if you drop the differentiation

Interested if Shinji updated his validator and he did this if it was no problem.

Heath

(attachments)

oceanlogo.png

I don’t think there should be a problem, because C_ATTRIBUTE still has the following two properties:

  • is_multiple: Boolean – True if the attribute constrained by this C_ATTRIBUTE is a container
  • cardinality: CARDINALITY – non-Void if … (as above)

The child objects are always in an ordered list, it is just that the order is ignored in the case of C_ATTRIBUTEs for which is_multiple is False.

  • thomas

Hi Heath,

2011/12/20 Heath Frankel <heath.frankel@oceaninformatics.com>

Interested if Shinji updated his validator and he did this if it was no problem

It does not matter for me. I am just writing validator and will do it with no problem.
For Ruby implementation, it is easy to adapt such refactoring.

Shinji.

Hi,

The only problem I see here is that the ADL parser won’t know when the is_attribute must be set true. Yes, it could know it through the reference model, but I prefer to think that it is not always available. Moreover, a person directly reading the ADL will not distinguish between a single and a multiple attribute.

Some time ago I proposed a solution, that is just to add a reserved keyword in ADL to indicate that the attribute is a container without having to mention or modify the cardinality part. Something as:

CLUSTER matches {
items container matches {*}
}

David

2011/12/20 Thomas Beale <thomas.beale@oceaninformatics.com>

The Java Parser currently differentiates this via the presence of cardinality

if(cardinality == null) {
attribute = new CSingleAttribute(path, name, existence, children);
} else {
attribute = new CMultipleAttribute(path, name, existence, cardinality, children);
}

This is consistent with what Thomas says below, but probably not as straight-forward as your approach of a distinct keyword.

I can only speak for myself, but I had a look through my code, and while a couple of minor changes will be required, I don’t think there would be a big problem without the differentiation.

To be honest, I cannot remember ever seeing the is_multiple flag, neither in the Java Reference Implementation nor the Specs at http://www.openehr.org/uml/release-1.0.1/Browsable/S.040.1433.36.147Report.html

Sebastian

(attachments)

oceanlogo.png

2011/12/20 Sebastian Garde <sebastian.garde@oceaninformatics.com>

The Java Parser currently differentiates this via the presence of cardinality

if(cardinality == null) {
attribute = new CSingleAttribute(path, name, existence, children);
} else {
attribute = new CMultipleAttribute(path, name, existence, cardinality, children);
}

Yes, I know. The question appeared some time ago when thinking in the case when you don’t want to constrain the RM cardinality. In that case, the “cardinality” keyword will not be present at the ADL and thus you cannot differentiate between a CSingleAttribute and a CMultipleAttribute (or between setting is_multiple to false or true).

Yes, exactly why I doubt that the above approach is sustainable, at least if we do not allow ‘bogus’ constraints.
So it seems we can either

  • not care at all if it is multiple or single (but then Heath’s concerns apply).

  • allow [and in fact in some scenarios mandate] bogus constraints

  • use a distinct keyword like the one suggested by you.

[Note that this seems to be an orthogonal decision to whether this should then be expressed as C_Attribute or C_Multiple_Attribute and C_Single_Attribute Subclasses, respectively or via a cardinality or is_multiple property]

Sebastian

Hi David,

Hi,

The only problem I see here is that the ADL parser won’t know when the is_attribute must be set true. Yes, it could know it through the reference model, but I prefer to think that it is not always available.

that is the assumption we used to used with the previous parser, a few years ago. But in general now there is a reference model description available - why wouldn’t there be, in any realistic production context? This means superfluous cardinality constraints are no longer needed just to indicate that the attribute is a container. If for some reason the RM is not available in the tooling, then there is nothing to stop such extra cardinality constraints being added, as they used to be.

Moreover, a person directly reading the ADL will not distinguish between a single and a multiple attribute.

if they are directly reading the differential source, that’s true - the source only expresses constraints in addition to the RM. If they read the flattened view with RM included, they see everything. The following two screen shots are of the same top-level archetype - source form and flattened.

This is the RM-flattened version

Having the RM available (as you have in your tools as well) is not that hard, and it seems to me non-RM enabled tools are a thing of the past. So I guess I am still struggling to see the context in which it makes sense for working without an RM.

  • thomas

The Java Parser currently differentiates this via the presence of cardinality

if(cardinality == null) {
attribute = new CSingleAttribute(path, name, existence, children);
} else {
attribute = new CMultipleAttribute(path, name, existence, cardinality, children);
}

This is consistent with what Thomas says below, but probably not as straight-forward as your approach of a distinct keyword.

I can only speak for myself, but I had a look through my code, and while a couple of minor changes will be required, I don’t think there would be a big problem without the differentiation.

To be honest, I cannot remember ever seeing the is_multiple flag, neither in the Java Reference Implementation nor the Specs at http://www.openehr.org/uml/release-1.0.1/Browsable/S.040.1433.36.147Report.html

I probably should clarify one thing: with no hint in the archetype as to whether a given attribute is single or multiple, the parser does have to use the RM. This added a few lines of code to my parser. It seems a small price to pay for ‘clean’ archetypes, and as a bonus, the parser now detects many other errors earlier.

  • thomas

c_attr_head: V_ATTRIBUTE_IDENTIFIER c_existence c_cardinality
{
rm_attribute_name := $1
if not object_nodes.item.has_attribute (rm_attribute_name) then
if rm_schema.has_property (object_nodes.item.rm_type_name, rm_attribute_name) then
bmm_prop_def := rm_schema.property_definition (object_nodes.item.rm_type_name, rm_attribute_name)
if bmm_prop_def.is_container then
create attr_node.make_multiple (rm_attribute_name, $2, $3)
c_attrs.put (attr_node)
object_nodes.item.put_attribute (attr_node)
elseif $3 = Void then
create attr_node.make_single (rm_attribute_name, $2)
c_attrs.put(attr_node)
object_nodes.item.put_attribute(attr_node)
else – error - cardinality stated, but on a non-container attribute
abort_with_error(“VSAM2”, <<rm_attribute_name>>)
end
else
abort_with_error(“VCARM”, <<rm_attribute_name, object_nodes.item.path, object_nodes.item.rm_type_name>>)
end
else
abort_with_error(“VCATU”, <<rm_attribute_name>>)
end
}

absolute_path c_existence c_cardinality
{
..
}
;

Thomas,
My issue here is needing to know if attribute children are a choice or a sequence. It is not true that children are always ordered, this depends on the cardinality isordered flag. If this defaults to false when cardinality is null then perhaps it will be ok.
My concern is the implicit rules necessary to interpret the resulting model. Currently it is explicit via the type and the AOM xsd uses this allowing no special processing.
If we had a grouping construct allowing choice vs sequence as per xsd then we would have an explicit constraint without the need for c_attribute sub types.
My greatest concern is the lack of backward compatibility, this is a breaking change.
If this was a v2 proposal I would not be raising this, part for me a 1.5 release should be backward compatible with 1.4.

Heath

Hi Thomas,

I think we have already discussed this several times :slight_smile:

Moreover, a person directly reading the ADL will not distinguish between a single and a multiple attribute.

if they are directly reading the differential source, that’s true - the source only expresses constraints in addition to the RM. If they read the flattened view with RM included, they see everything. The following two screen shots are of the same top-level archetype - source form and flattened.

I’m thinking in the first case, reading a differential ADL without specific tools, just a plain text editor.

This is the RM-flattened version

Having the RM available (as you have in your tools as well) is not that hard, and it seems to me non-RM enabled tools are a thing of the past. So I guess I am still struggling to see the context in which it makes sense for working without an RM.

We are absolutely in favour of having the RM available for advanced tooling, but we cannot forget the case when it is not possible. All systems cannot be aware of all possible models, but at least they should be able to work at a pure archetype level. In other words, I’m in favour of having the most basic tools (ADL parser, ADL viewer) that are capable of working with “standalone” archetypes. ADL parsers should work solely at the syntax level and not depend on other semantics.

I don’t know about Eiffel, but seeing your code, what happens if you try to parse an ADL with a misspell of an attribute name? If you cannot retrieve it from the RM, you cannot finish the parsing to correct it directly in the tool.

Or if we go beyond, what happens if you try to parse an ADL of non-available RM? For example, a CDA archetype (the next example is not completely accurate since it still maintains the cardinality information):

POCD_MT000040_Person[at0079] occurrences matches {0..1} matches { – POCD_MT000040_Person
name existence matches {0..1} cardinality matches {0..*; unordered; unique} matches {
PN[at0081] occurrences matches {0..1} matches { – Person Name
family existence matches {0..1} cardinality matches {0..2; ordered; unique} matches {
En_family[at0083] occurrences matches {0..1} matches { – First surname

Hi Thomas,

I think we have already discussed this several times :slight_smile:

yep - I’d like to find a conclusion!

Moreover, a person directly reading the ADL will not distinguish between a single and a multiple attribute.

if they are directly reading the differential source, that’s true - the source only expresses constraints in addition to the RM. If they read the flattened view with RM included, they see everything. The following two screen shots are of the same top-level archetype - source form and flattened.

I’m thinking in the first case, reading a differential ADL without specific tools, just a plain text editor.

I wonder how many people will really do this?

Having the RM available (as you have in your tools as well) is not that hard, and it seems to me non-RM enabled tools are a thing of the past. So I guess I am still struggling to see the context in which it makes sense for working without an RM.

We are absolutely in favour of having the RM available for advanced tooling, but we cannot forget the case when it is not possible. All systems cannot be aware of all possible models, but at least they should be able to work at a pure archetype level.

Hm… this is something I find a bit strange - what use can an archetype be with no access at all to its reference model? Well, it’s true that in a design context like CKM, it could be useful to view the archetypes to look at their clinical design, but I find it hard to believe that the kind of people who would do that - clinical people, surely - would want to look at raw ADL, especially differential ADL. I think they are more likely to look at the mindmap or HTML, and all that is generated.

In other words, I’m in favour of having the most basic tools (ADL parser, ADL viewer) that are capable of working with “standalone” archetypes. ADL parsers should work solely at the syntax level and not depend on other semantics.

I have to admit that was my view some years ago. Then I got sick of a) not being able to properly validate anything - e.g. does OBSERVATION have a ‘data’ attribute or not? and b) having to include spurious constraints that didn’t say anything, except to signal attribute multiplicity.

I don’t know about Eiffel, but seeing your code, what happens if you try to parse an ADL with a misspell of an attribute name? If you cannot retrieve it from the RM, you cannot finish the parsing to correct it directly in the tool.

you get the relevant error, then you go edit the archetype at the line number indicated, and fix the attribute name. The RM validation is in the phase 2 validator these days. Multi-phase validation is certainly needed in order to allow the author to fix RM errors in an otherwise correct archetype.

Or if we go beyond, what happens if you try to parse an ADL of non-available RM? For example, a CDA archetype (the next example is not completely accurate since it still maintains the cardinality information):

in the current tool, you can’t. I could probably re-instate a mode like that, but … I am not sure I see the point. Let’s say I have the archetype below, my first instinct is to want to know what is in the reference model, such as in the following view (using an openEHR archetype):

what can I do with this CDA archetype that is (by definition) a set of partial constraints on a model I don’t have access to?

  • thomas

Well, I have been able to define a complete reference model (MedXML
MML) by defining the archetypes of all the concepts defined on the
model documentation. So I have no reference model to check things at
first place.
For LinkEHR, the reference models are (a set of) archetypes. You can
define new archetypes through specialization.

So at least I need to be able to parse archetypes without reference model :slight_smile:

Now is when this is getting interesting :slight_smile:

Moreover, a person directly reading the ADL will not distinguish between a single and a multiple attribute.

if they are directly reading the differential source, that’s true - the source only expresses constraints in addition to the RM. If they read the flattened view with RM included, they see everything. The following two screen shots are of the same top-level archetype - source form and flattened.

I’m thinking in the first case, reading a differential ADL without specific tools, just a plain text editor.

I wonder how many people will really do this?

I don’t know, but it is not our role to decide what people can or cannot do, or to obligate them to use specific tools. What I miss here is a bit of coherence. If we assume that archetypes are always edited with specific tools, why are we bothering about ADL to be readable or not? Why do we use structures like domain types at the ADL to ease reading? The tool could hide all these things…

Having the RM available (as you have in your tools as well) is not that hard, and it seems to me non-RM enabled tools are a thing of the past. So I guess I am still struggling to see the context in which it makes sense for working without an RM.

We are absolutely in favour of having the RM available for advanced tooling, but we cannot forget the case when it is not possible. All systems cannot be aware of all possible models, but at least they should be able to work at a pure archetype level.

Hm… this is something I find a bit strange - what use can an archetype be with no access at all to its reference model? Well, it’s true that in a design context like CKM, it could be useful to view the archetypes to look at their clinical design, but I find it hard to believe that the kind of people who would do that - clinical people, surely - would want to look at raw ADL, especially differential ADL. I think they are more likely to look at the mindmap or HTML, and all that is generated.

All that stuff can mostly be generated without knowing nothing about the RM. So, why to require it at the previous step (during the parsing)?

In other words, I’m in favour of having the most basic tools (ADL parser, ADL viewer) that are capable of working with “standalone” archetypes. ADL parsers should work solely at the syntax level and not depend on other semantics.

I have to admit that was my view some years ago. Then I got sick of a) not being able to properly validate anything - e.g. does OBSERVATION have a ‘data’ attribute or not? and b) having to include spurious constraints that didn’t say anything, except to signal attribute multiplicity.

Again, here you are mixing two different processes. One thing is the archetype parsing and validation, following the AOM rules and ADL syntax. The archetype should be parsed and syntactically validated to instantiate the AOM. Then, a different thing is to validate it against a RM in a second phase. In fact this is not different from the typical steps of any compiler.

I don’t know about Eiffel, but seeing your code, what happens if you try to parse an ADL with a misspell of an attribute name? If you cannot retrieve it from the RM, you cannot finish the parsing to correct it directly in the tool.

you get the relevant error, then you go edit the archetype at the line number indicated, and fix the attribute name. The RM validation is in the phase 2 validator these days. Multi-phase validation is certainly needed in order to allow the author to fix RM errors in an otherwise correct archetype.

And is it not better to do this correction inside the tool, in a guided environment? Here (absolutely yes), guided by a reference model.

David

I don’t know, but it is not our role to decide what people can or cannot do, or to obligate them to use specific tools. What I miss here is a bit of coherence. If we assume that archetypes are always edited with specific tools, why are we bothering about ADL to be readable or not? Why do we use structures like domain types at the ADL to ease reading? The tool could hide all these things…

well I don’t think we are obliging anyone to use specific tools. In my view, the reason for ADL being human readable is to enable a relatively small number of people - generally not end users - to understand the semantics of the formalism, in the same way that certain people understand OWL by using/studying OWL-abstract resources. So I think ADL source is useful for:

  • learning / teaching / self-education - typically by some software developers, educators, standards people
  • understanding archetypes in debugging / testing - knowing what the archetype really says, in case a tool seems to be lying to you.

These people are usually doing something specialised, and understand the formalism properly (or are learning it). I don’t think many end users are going to read ADL, other than educators/software developers/theorists. I don’t think this requirement means that the archetype has to stand alone as a modelling construct. I say this mainly because the very definition of an archetype is about constraining an information model, so I am not sure that what it means to have one with no access to the information model.

For example the node

ELEMENT [at0004] – systolic pressure

means… what? If you don’t know what an ELEMENT is? So while ADL is designed to be readable, that alone doesn’t tell you what it means if you don’t know what the RM references (class names & attribute names) mean…

Having the RM available (as you have in your tools as well) is not that hard, and it seems to me non-RM enabled tools are a thing of the past. So I guess I am still struggling to see the context in which it makes sense for working without an RM.

We are absolutely in favour of having the RM available for advanced tooling, but we cannot forget the case when it is not possible. All systems cannot be aware of all possible models, but at least they should be able to work at a pure archetype level.

Hm… this is something I find a bit strange - what use can an archetype be with no access at all to its reference model? Well, it’s true that in a design context like CKM, it could be useful to view the archetypes to look at their clinical design, but I find it hard to believe that the kind of people who would do that - clinical people, surely - would want to look at raw ADL, especially differential ADL. I think they are more likely to look at the mindmap or HTML, and all that is generated.

All that stuff can mostly be generated without knowing nothing about the RM. So, why to require it at the previous step (during the parsing)?

some of those views can be generated, but only if the archetype has already been validated, including against the RM. You can’t generate any view that includes RM elements in it though, and clinicians have been screaming to have that in the modelling tools and CKM. It is now (sort of) in CKM. The reason they need it is because they want a ‘total view’ of the data. If they can’t see RM elements, they start to think those items have not been included.

In the end, archetypes and templates define an object structure + variants; but you can only see the full object structure if you have the RM there as well. Some archetypes are very small, and define very few constraints; on their own, I find it hard to understand what use they are.

In other words, I’m in favour of having the most basic tools (ADL parser, ADL viewer) that are capable of working with “standalone” archetypes. ADL parsers should work solely at the syntax level and not depend on other semantics.

I have to admit that was my view some years ago. Then I got sick of a) not being able to properly validate anything - e.g. does OBSERVATION have a ‘data’ attribute or not? and b) having to include spurious constraints that didn’t say anything, except to signal attribute multiplicity.

Again, here you are mixing two different processes. One thing is the archetype parsing and validation, following the AOM rules and ADL syntax. The archetype should be parsed and syntactically validated to instantiate the AOM. Then, a different thing is to validate it against a RM in a second phase. In fact this is not different from the typical steps of any compiler.

sure - that’s how the reference compiler works as well. But if we want to do early compiler stages with no RM available, the cost is adding spurious non-constraints - extra syntax. So let’s say we do that, and now we get to phase 3 validation (or whatever it is in each tool) and you get a mismatch between that syntax marker for multiple-valued attributes, and what the RM says? Now we have a new source of errors we did not have before.

So the question is (in my mind at least): is it worth having that, so we can perform some basic parsing and partial validation of an archetype? Consider also:

  • the question of how the syntax marker got there in the first place: presumably with an editor. All editors of the future will obviously be RM-aware, because every user of the current tools that are not RM-aware complains about about (and rightly so). So if we assume editors are RM-aware, why would we assume compilers are not RM-aware?

  • I think it is also reasonable to assume that there will only be a small number of good quality compilers in the end, so here again, it seems hard to see why they would not all be RM-enabled.

  • If you agree that the RM is likely to be present in a compiler, even for say stage 3 or 4 validation, then it is available… and it can be used at any point. Why not use it earlier on to detect some basic errors as well?

A marker for an attribute being multiple or single is not the only thing you have to have in archetypes to correctly express attributes. You also have to have [at-codes] on child objects of container attributes. But if there is only one child under a particular attribute in a particular archetype, then it still has to be marked with an at-code, whereas under a single-valued attribute it doesn’t. So what happens if the attribute is marked ‘container’ but the child object(s) do not have at-codes? The only way to deal with that properly is with the RM, because you have to determine what kind of attribute it really is before you can say whether the at-codes have to be there or not. So yes, you could ignore it at that stage in parsing and generate AOM objects, but they are likely to be wrong AOM objects.

These are some of the reasons I find it hard to see much value in a specific marker for container attributes, or doing much parsing of archetypes with no RM present.

I would be interested to know what some others think (I know what I think, and what the UPV group thinks;-)

  • thomas

Thomas,
My issue here is needing to know if attribute children are a choice or
a sequence. It is not true that children are always ordered, this
depends on the cardinality isordered flag. If this defaults to false
when cardinality is null then perhaps it will be ok.

what I meant was that in physical representation the children are always
ordered, so the question of whether the order is significant in validity
then requires looking at the Cardinality is_ordered flag. I must admit I
don't know what the UML default for that is, and probably we should
state the same default (one would guess 'not ordered'?).

My concern is the implicit rules necessary to interpret the resulting
model. Currently it is explicit via the type and the AOM xsd uses this
allowing no special processing.

it is still explicit, just via attributes rather than the type.

If we had a grouping construct allowing choice vs sequence as per xsd
then we would have an explicit constraint without the need for
c_attribute sub types.

well this is a different problem, which the group construct addresses,
to allow choice-of subgroups within multiple attribute children.

My greatest concern is the lack of backward compatibility, this is a
breaking change.

it is a small breaking change yes, but from what I can see the breakages
are small, and the gain in clarity is worthwhile. From your point of
view, where is the main impact, and is it too great to deal with?

- thomas

ok, but the reference model is there - you have just chosen archetypes
to represent it (we did think about that in the past as well). It's a
design decision, but it doesn't mean you have no RM present.

- thomas