AOM 1.5 - single v multiple attributes

Hi Tom,
Since you’ve asked about what others think; I’ll take the liberty of sharing my thoughts.

Initially I was not happy about the dependence on existence of a computable RM just to be able to parse ADL. For me, it meant either waiting for Rong to handle it in the parser, or find a way of walking around this requirement (I am lazy).

However, the need for computable RM is not specific to adl parsing only. Unless you want to hand code a large amount of functionality, you need to deal with this at a couple of points. For example, when serialized RM data arrives over the wire (XML, JSON, YAML, Protocol Buffers, you name it), you may have to handle assignment of a concrete class instance to an abstract class typed field. (my usual example). You have to know the relationship of RM types to do that. You can hand code it, or you can use a computable model

In an archetype/template editor, you may like to introduce some smart features (such as auto complete, or suggestions) or you may simply need to disable various options based on the context. For example, a contaner node in an archetype will have a type, and only that type or its subtypes (for specialization of that node in a t-archetype) should be allowed in the tool. Again, the tool will need to process the RM type info. You can again hand code it, or use a computable model.

As an implementer discovers other use cases, he or she will see that RM must be computable. I don’t think handcoding RM related functionality is something that can be managed. AOM is one of the things that makes openEHR better than the alternatives, and having a different approach to one level of two level modelling really does not make sense; so both levels should be computable, and also use cases require it to be so.

So; if I’m going to have to process RM at more than one point, it is not such a big deal if I have to do it in the parser. I know that one way or another this functionality is going to be necessary, so why try to avoid it at one point, when you know you’ll somehow deal with it anyway?

regards
Seref

2011/12/21 Thomas Beale <thomas.beale@oceaninformatics.com>

I would be interested to know what some others think (I know what I think, and what the UPV group thinks;-)

Yes, I think that is necessary :slight_smile:

Let me just make two final comments:

Regarding the proposal of a single object for attributes, that was the original topic of the thread, I just want to clarify that we completely agree. It is a much better design.

Regarding the other discussion, I’ll try to describe it in simple words to clarify it. We have a model (the AOM) and a serialization of that model (the ADL). The question is whether the transition between them can or should be done independently or if we need to put in the middle the RM to help in the process.

David

actually it does mean that, I am parsing the ADL without a reference
model (currently I don't need know anything about it). Then when it is
loaded, I have my reference model. We decided to use a format to
represent reference models that could take advantage of current tools
(parsers, serializers...), instead of defining our own format :slight_smile:

See the other bit of this thread: the reason (I think) you think we don’t need any RM is because you do in fact have it, just cunningly expressed as archetypes. But it is there.

  • thomas

2011/12/21 Thomas Beale <thomas.beale@oceaninformatics.com>

I would be interested to know what some others think (I know what I think, and what the UPV group thinks;-)

Yes, I think that is necessary :slight_smile:

Let me just make two final comments:

Regarding the proposal of a single object for attributes, that was the original topic of the thread, I just want to clarify that we completely agree. It is a much better design.

Regarding the other discussion, I’ll try to describe it in simple words to clarify it. We have a model (the AOM) and a serialization of that model (the ADL). The question is whether the transition between them can or should be done independently or if we need to put in the middle the RM to help in the process.

See the other bit of this thread: the reason (I think) you think we don’t need any RM is because you do in fact have it, just cunningly expressed as archetypes. But it is there.

This is what I (more or less) did in Opereffa, but it bit me a little bit further down the road :slight_smile:

See the other bit of this thread: the reason (I think) you think we don’t need any RM is because you do in fact have it, just cunningly expressed as archetypes. But it is there.

Yes, but not for the parsing.

Thomas,
I think the point that David is making is that we probably want some separation of concerns when it comes to ADL (or AOM in general ) parsing. A parsing operation to generate the AOM and a separate RM validation operation allows better reuse
I can use your RM validator after deserialising an XML archetype and David can use the Java ADL parser for the apply his AOM based RM validator.
The benefits of loosely coupled components.
Heath

I am not so sure that this is true. And same goes for the current Java Parser I believe:
The RM is not required for parsing, just for subsequent validation.
I think that this separation of concerns makes a lot of sense and has served us well so far.
Enforcing a tighter coupling here for this - in the grand scheme of things - minor issue, feels wrong to me.
Likely most of us agree that we can do great things when we have a computable RM - but I would argue that the real value of this comes well after parsing.

Sebastian

Hi!

For LinkEHR, the reference models are (a set of) archetypes. You can
define new archetypes through specialization.

The ability to express the RM using the AM can provide pretty elegant
(recursive?) solutions (like the one under the hood in LinkEHR-Ed I
believe), so whatever we change in the AM it would be nice if this
ability is kept somehow. But I guess this is not changed by having an
is_multiple attribute on one class instead of having three separate
classes, right?

Also being able to parse archetypes into AOM object-trees as a first
step irrespective of what RM was used is valuable, but does the
suggested change really change that possibility? Archetype format
conversion utilities doing ADL<->JSON<->YAML is an example. (And yes
of course an RM of some kind is necessary to make full use of the AOM
in many other cases.)

Best regards,
Erik Sundvall
erik.sundvall@liu.se http://www.imt.liu.se/~erisu/ Tel: +46-13-286733

I don't think there is any problem to have the attribute instead of 3
classes. It would be even simpler and clearer to test things
(instanceof vs checking if an attribute is true or not). The only
problem I see is what is needed to put a value to that attribute from
ADL :slight_smile:

it's a great idea, and we thought seriously about it for some time. But
... ADL doesn't represent everything that is needed in a proper object
model... including generic types, open/closed generic parameters,
multiple inheritance and a few other things.

- thomas

Hi!


For LinkEHR, the reference models are (a set of) archetypes. You can
define new archetypes through specialization.

The ability to express the RM using the AM can provide pretty elegant
(recursive?) solutions (like the one under the hood in LinkEHR-Ed I
believe), so whatever we change in the AM it would be nice if this
ability is kept somehow. But I guess this is not changed by having an
is_multiple attribute on one class instead of having three separate
classes, right?

see other post - ADL/AOM don’t do quite a few things that a normal object meta-model does (which is what you need if you want to express a full OO IM), including multiple inheritance; everything relating to generic types; specific OO rules of redefinition; abstract v concrete classes; computed and functional properties… and so on.

Also being able to parse archetypes into AOM object-trees as a first
step irrespective of what RM was used is valuable, but does the
suggested change really change that possibility? Archetype format
conversion utilities doing ADL<->JSON<->YAML is an example. (And yes
of course an RM of some kind is necessary to make full use of the AOM
in many other cases.)

no, I think we agree on the suggested change (reduce the complexity of the AOM to the following). The thing I am having trouble with is that if you parse with no RM present, you need to add some kind of marker to distinguish single-valued v multiple-valued (container) classes. I can’t see the post right now, but David proposed something like a ‘container’ keyword.

  • thomas

Thomas,
I think the point that David is making is that we probably want some separation of concerns when it comes to ADL (or AOM in general ) parsing. A parsing operation to generate the AOM and a separate RM validation operation allows better reuse
I can use your RM validator after deserialising an XML archetype and David can use the Java ADL parser for the apply his AOM based RM validator.
The benefits of loosely coupled components.
Heath

But you can do that right now. If you were to use the reference compiler components, you could easily use:

  • parser*: syntax validation, some basic RM checking, generate AOM objects
  • phase 1 validator*: validate languages, slots, slot-fillers, basic ontology validation
  • phase 2 validator*: (validation requiring flat parent): validate constraint structure, main RM validation, further ontology validation
  • flattener: flatten child archetype onto parent
  • phase 3 validator: (validation on flattened archetype): VACMC validation

The starred components use the RM to do some work.

This compiler is not particularly an example of great design; it just implements the specification in a rigorous way so others can test against it. But you could deploy any of the above phases separately. If I was to supply any of them to you as a software component, I would include the RM schema component as well.

A version of the parser could be made with no access to the RM, you are back to creating possibly broken AOM objects. I still think we need to address the following problems:

sure - that’s how the reference compiler works as well. But if we want to do early compiler stages with no RM available, the cost is adding spurious non-constraints - extra syntax. So let’s say we do that, and now we get to phase 3 validation (or whatever it is in each tool) and you get a mismatch between that syntax marker for multiple-valued attributes, and what the RM says? Now we have a new source of errors we did not have before.

So the question is (in my mind at least): is it worth having that, so we can perform some basic parsing and partial validation of an archetype? Consider also:

  • the question of how the syntax marker got there in the first place: presumably with an editor. All editors of the future will obviously be RM-aware, because every user of the current tools that are not RM-aware complains about about (and rightly so). So if we assume editors are RM-aware, why would we assume compilers are not RM-aware?

  • I think it is also reasonable to assume that there will only be a small number of good quality compilers in the end, so here again, it seems hard to see why they would not all be RM-enabled.

  • If you agree that the RM is likely to be present in a compiler, even for say stage 3 or 4 validation, then it is available… and it can be used at any point. Why not use it earlier on to detect some basic errors as well?

A marker for an attribute being multiple or single is not the only thing you have to have in archetypes to correctly express attributes. You also have to have [at-codes] on child objects of container attributes. But if there is only one child under a particular attribute in a particular archetype, then it still has to be marked with an at-code, whereas under a single-valued attribute it doesn’t. So what happens if the attribute is marked ‘container’ but the child object(s) do not have at-codes? The only way to deal with that properly is with the RM, because you have to determine what kind of attribute it really is before you can say whether the at-codes have to be there or not. So yes, you could ignore it at that stage in parsing and generate AOM objects, but they are likely to be wrong AOM objects.

These are some of the reasons I find it hard to see much value in a specific marker for container attributes, or doing much parsing of archetypes with no RM present.

  • thomas

Yes, the problem is is_multiple is not an attribute but a property derived from the existence of cardinality. It is my understanding that cardinality will become optional in AOM 1.5 due to the assumed knowledge of an RM or parent archetype. Now this necessary to support specialisation and templates but will cause ambiguity with parsing base models. From an XML perspective I would prefer a type over an attribute but ADL doesn’t have the same means of making it explicit.

Heath

I still don’t think all the questions I raised are yet addressed, but in any case, I hope we can agree that the current draft of the AOM constraint model is ok? The debate we are having here is whether we add something to ADL (and by implication, all other serialised forms) that informs the parser of whether an attribute node is single- or multiple-valued (i.e. something that can be directly used to set the C_ATTRIBUTE.is_multiple flag).

If we agree on this, then I will proceed with ongoing work with the AOM. If the community direction is to want an extra flag in ADL then we can deal with it separately.

Seem reasonable?

  • thomas

all,

I want to get back to the issue of whether we mark (in ADL) multiple-valued attributes and solve it. As I said in a post aeons ago (i.e. before Christmas) I was not convinced by the arguments in favour of needing to be able to parse ADL archetypes in a standalone mode, i.e. with no RM present. But I could be wrong or not understanding the arguments. In addition I have found an argument that I think persuades me (but might not others!)… anyway, here are the arguments FOR adding an indicator that I think I heard so far, with my responses:

  • the parser (at least initial stage that generates an AOM structure) will be simpler with no RM required
  • TB: this is clearly true, but does it matter? The added complexity is pretty low, at least in the reference parser.
  • we should be able to generate an AOM structure even if the RM details of the model are wrong
  • TB: I am struggling to see the utility of this… what’s the use of an AOM structure that might contain basic errors? Assuming one of the following stages of the parser does do the full RM check (mine does it in phase 2 validation) then the question is: between the point of having generated an AOM, but before having done the full RM check, what useful thing could you do with the AOM structure? One obvious answer is: display it in some way… but why? Maybe there are reasons that make sense, I just can’t see them yet.- we should be able to parse an archetype if there is no representation of the RM available
  • TB: this implies that you could do stage 1 parsing, i.e. generate a raw AOM structure… but what would you do with it. Again, the most obvious answer is probably to display it. Now I can see some utility here, since the basic node visualisation (like AWB and LinkEHR do) is easier to understand than raw ADL, especially for large archetypes, and it would mean that a user could get their mind around some new kind of archetype in a basic way, even with no RM availability at all.

I think that was it. I still struggle with solid reasons for point #2 above.. until just today, when I was looking at the structure of the compiler for parsing generated ADL flat files, e.g. flattened archetypes (.adlf files from the AWB) and operational archetypes (.opt files). Most people seem to want an XML format for OPTs (that’s what all Ocean’s software uses, and I believe most other implementers have done the same). However, some people (at least Rong and Seref) have voiced interest in an ADL representation of OPT. This is already available in the AWB (there are test EHR Extract archetypes for it). Now, if this ADL format OPT was saved to a file and parsed later on, one thing we know is that it must be 100% correct, assuming no software bugs in the ADL-OPT serialiser. So processing it with the full heavy-weight ADL compiler is unnecessary; it could be parsed to AOM structures (again, as long as the parser has no bugs, this AOM tree will be correct) and there is likewise no need for RM checking. This ‘guaranteed-correct’ AOM structure could be useful for all kinds of things, and it is obviously nice to be able to generate it as fast as possible.

Thoughts on this?

  • thomas

Hi Thomas,

All the arguments mentioned are of course of relative importance. Your answers to them (and with which I could easily agree) are from a final user perspective, but not from a coherent specification design perspective. You say that you can’t imagine the utility of parsing without RM, and the answer is that you shouldn’t matter about it. Maybe someone designs a web based editor and the ADL parser is at the client side while the RM remains at the server. I don’t know and I don’t mind. The RM-aware ADL parser can be part of a technical implementation specification or best practices for implementers, but you cannot bring that requirement to the model specifications.

The really important point to be solved is that if ADL is the serialized representation of the AOM it must be able to represent all of its properties (and the way back). If a mark for is_multiple is not added in ADL, it will be the only property of the AOM that cannot be instantiated from a standalone ADL. In other words, ADL will not be a correct specification for its theoretical scope. That is clearly an exception (if not an error) and a source of future limitations.

David

2012/1/9 Thomas Beale <thomas.beale@oceaninformatics.com>

Thomas Beale wrote:

  • we should be able to generate an AOM structure even if the RM details of the model are wrong
    • TB: I am struggling to see the utility of this... what's the use of an AOM structure that might contain basic errors? Assuming one of the following stages of the parser does do the full RM check (mine does it in phase 2 validation) then the question is: between the point of having generated an AOM, but before having done the full RM check, what useful thing could you do with the AOM structure? One obvious answer is: display it in some way... but why? Maybe there are reasons that make sense, I just can't see them yet.

It would be useful to be able to display an AOM structure that is invalid with respect to the RM. For example, an archetype editing tool that validates against the RM during parsing would presumably be incapable of loading the archetype and would throw an incomprehensible parser error at the user. Not nice. What would be much better would be if the editor could display the AOM structure, and then highlight the RM validation errors in a way that made it easy for the user to find and correct them.

- Peter

Thomas Beale wrote:

	• we should be able to generate an AOM structure even if the RM details of the model are wrong
		• TB: I am struggling to see the utility of this... what's the use of an AOM structure that might contain basic errors? Assuming one of the following stages of the parser does do the full RM check (mine does it in phase 2 validation) then the question is: between the point of having generated an AOM, but before having done the full RM check, what useful thing could you do with the AOM structure? One obvious answer is: display it in some way... but why? Maybe there are reasons that make sense, I just can't see them yet.

It would be useful to be able to display an AOM structure that is invalid with respect to the RM. For example, an archetype editing tool that validates against the RM during parsing would presumably be incapable of loading the archetype and would throw an incomprehensible parser error at the user. Not nice. What would be much better would be if the editor could display the AOM structure, and then highlight the RM validation errors in a way that made it easy for the user to find and correct them.

I am warming up to the idea;-)

Practical question: what ADL syntax addition should be made to indicate multiple-valued attributes? It’s not a constraint, so it’s not part of the ‘matches {}’ syntax. I guess something like:

container_attr is_multiple matches {

}

or

container_attr is_container matches {

}

Questions:

  • is it required if there happens to be a cardinality constraint on the same attribute (which also gives away that it is a container)?

  • having “container_attr is_multiple cardinality matches {2..*} matches {” seems clumsy

  • if a compiler uses the RM even in the parser stage, does it ignore this flag, or report errors when the flag doesn’t match the RM definition?

  • presumably the flag would be obligatory on serialisation; then the first question applies also.

  • thomas

Thomas Beale wrote:

  • is it required if there happens to be a cardinality constraint on the same attribute (which also gives away that it is a container)?
    • having "container_attr is_multiple cardinality matches {2..*} matches {" seems clumsy

I'm sure this must have been discussed before but I've forgotten the answer ...

Is there some reason why cardinality can't be mandatory on container attributes?

- Peter