occurrences and cardinality in ADL, XML, JSON

thomas.beale · 10 November 2011 18:11

In the current ADL 1.4-based XSDs used in openEHR, occurrences, cardinality and existence are expressed as XML elements. We will want to improve this for ADL 1.5 based XML. Now, we don’t want to only take care of XML; we also need to make it work for JSON, and (internally) for dADL - neither of the latter have XML’s ‘attributes’. Many people have asked for more efficient ways of serialising. Here are some ideas for ADL 1.5 XML, JSON etc.

Occurrences and cardinality are proper intervals in the AOM representation. The most simplified object structure (JSON and dADL) for occurrences and cardinality could look as follows (I use dADL & occurrences here):

occurrences = <
lower = <2> -- Integer field
upper = <10> -- Integer field
>

but the upper limit is commonly unbounded, i.e. '*' in typical UML-like syntax. We could do:

occurrences = <
lower = <2> -- Integer field
upper_bounded = <True> -- Boolean field
>

meaning that 3 possible attributes could occur for an occurrences, but only ever 2 at the same time. Or we could make everything into a string:

occurrences = <
lower = <"2"> -- String field
upper = <"*"> -- String field
>

The upside is that the 'upper' attribute now handles both bounded and unbounded values. The downside is that the JSON / dADL parsers would have to do a bit more work to generate the required Interval<Integer> object - since the 'upper' attribute now has to be treated as a little fragment of syntax and checked before being turned into an Integer.

If we were just doing JSON, dADL and other 'proper' OO syntaxes, the first one would be the obvious one. But since we are also targetting XML, we have to think whether it makes more sense to do:

<children node_id="at0005" occurrences_lower="2" **occurrences_upper**="10"> -- xsi:type=C_OBJECT
<rm_type_name>CLUSTER</rm_type_name>

and

<children node_id="at0005" occurrences_lower="2" **occurrences_unbounded**="true"> -- xs:boolean has to support 0/1 and true/false
<rm_type_name>CLUSTER</rm_type_name>

which is the analog of the first approach above, or it could be:

<children node_id="at0005" occurrences_lower="2" **occurrences_upper**="10">
<rm_type_name>CLUSTER</rm_type_name>

and

<children node_id="at0005" occurrences_lower="2" **occurrences_upper**="*">
<rm_type_name>CLUSTER</rm_type_name>

with both attributes defined in the XSD as xs:string. This means that like for JSON/dADL, the XML standard parser only generates strings, and somehting further has to be done to obtain a proper Interval object.

My preference is still to go with the first way of doing things. Do others agree with this? If so, it is what I will implement in the ADL 1.5 workbench.

~~~~~~~~~~~~~ second question:existence ~~~~~~~~~~~~
Existence as an interval can be 0..0 (prohibited, commonly used in templates), 0..1 (optional, typical in the RM) and 1..1 (used in templates and sometimes in archetypes). Now, since archetypes and templates are *constraint* structures, they can only *further* constrain the RM in ADL/AOM 1.5. The only possibilities for this are actually "0..0" and "1..1", so we could collapse existence onto a single Boolean for serialised representation (it could also be a single Boolean in the AOM, but that would be a breaking change, and since we already use Intervals for occurrences and cardinality, it does not seem worth the trouble).

Thus in JSON/dADL it could be:

some_attr = <
existence = <True|False>
>

In XML:

<attributes rm_attribute_name="name"**existence**="true">
....
</attributes>

Now, this is cheating a bit because we are making it look like there is an AOM property 'existence' of type Boolean, but there isn't. Should it be named something else to make this clear? I.e. a pseudo attribute that only exists in serialised format but not in AOM internal format, e.g. 'existence_constraint'? I would favour this. In my current implementation, the serialised format actually has its own object model, and this would have to be true for JSON as well. I think it also makes sense in XML - that there will be a level of classes corresponding to the space-efficient serial form, which are not the same as the internal AOM classes.

thoughts?

- thomas beale

Andrew_Patterson · 11 November 2011 03:36

In the current ADL 1.4-based XSDs used in openEHR, occurrences, cardinality and existence are expressed as XML elements. We will want to improve this for ADL 1.5 based XML. Now, we don't want to only take care of XML; we also need to make it work for JSON, and (internally) for dADL - neither of the latter have XML's 'attributes'. Many people have asked for more efficient ways of serialising. Here are some ideas for ADL 1.5 XML, JSON etc.

~~~~~~~~~~ first question: occurrences and cardinality ~~~~~~~~~~~~~~~~
Occurrences and cardinality are proper intervals in the AOM representation. The most simplified object structure (JSON and dADL) for occurrences and cardinality could look as follows (I use dADL & occurrences here):

occurrences = <
    lower = <2> -- Integer field
    upper = <10> -- Integer field
>

but the upper limit is commonly unbounded, i.e. '*' in typical UML-like syntax. We could do:

occurrences = <
    lower = <2> -- Integer field
    upper_bounded = <True> -- Boolean field
>

Why cant' the absence of a value mean unbounded?

occurrences = <
lower = <2>
>

Means 2..*

I vaguely remember us discussing this many moons ago but I've forgotten the rationale..

Also, what about inclusive/exclusive values at either end
of the interval? I know that this isn't an issue for occurence and
cardinality intervals which are always inclusive - but are we proposing that
the representation of normal intervals will not use the same mechanisms
are you are proposing here?

~~~~~~~~~~~~~ second question:existence ~~~~~~~~~~~~
Existence as an interval can be 0..0 (prohibited, commonly used in templates), 0..1 (optional, typical in the RM) and 1..1 (used in templates and sometimes in archetypes). Now, since archetypes and templates are /constraint/ structures, they can only /further /constrain the RM in ADL/AOM 1.5. The only possibilities for this are actually "0..0" and "1..1", so we could collapse existence onto a single Boolean for serialised representation (it could also be a single Boolean in the AOM, but that would be a breaking change, and since we already use Intervals for occurrences and cardinality, it does not seem worth the trouble).

Thus in JSON/dADL it could be:

some_attr = <
existence = <True|False>
>

In XML:

<attributes rm_attribute_name="name"*existence*="true">
....
</attributes>

If it was just to optimize the XML I'd give this a vote of 'meh'.. but given that existence is not really an interval because
as you say it has very few possible valid values, I think the removal of the ambiguity by turning it into a single boolean
is probably worthwhile.

Andrew

system · 11 November 2011 04:59

Hi All

As ADL only states constraints there is no logical reason to include unbounded. So no constraint expressed = RM max. This is likely to be one or unbounded.

In the current ADL 1.4-based XSDs used in openEHR, occurrences, cardinality and existence are expressed as XML elements. We will want to improve this for ADL 1.5 based XML. Now, we don’t want to only take care of XML; we also need to make it work for JSON, and (internally) for dADL - neither of the latter have XML’s ‘attributes’. Many people have asked for more efficient ways of serialising. Here are some ideas for ADL 1.5 XML, JSON etc.
Occurrences and cardinality are proper intervals in the AOM representation. The most simplified object structure (JSON and dADL) for occurrences and cardinality could look as follows (I use dADL & occurrences here):

occurrences = <
lower = <2> -- Integer field
upper = <10> -- Integer field
>

but the upper limit is commonly unbounded, i.e. '*' in typical UML-like syntax. We could do:

occurrences = <
lower = <2> -- Integer field
upper_bounded = <True> -- Boolean field

Sam: no need for this.

meaning that 3 possible attributes could occur for an occurrences, but only ever 2 at the same time. Or we could make everything into a string:

occurrences = <
lower = <“2”> – String field
upper = <“*”> – String field

Sam: no need for this

The upside is that the ‘upper’ attribute now handles both bounded and unbounded values. The downside is that the JSON / dADL parsers would have to do a bit more work to generate the required Interval object - since the ‘upper’ attribute now has to be treated as a little fragment of syntax and checked before being turned into an Integer.

If we were just doing JSON, dADL and other ‘proper’ OO syntaxes, the first one would be the obvious one. But since we are also targetting XML, we have to think whether it makes more sense to do:

<children node_id=“at0005” occurrences_lower=“2” occurrences_upper=“10”> – xsi:type=C_OBJECT
<rm_type_name>CLUSTER</rm_type_name>

and

<children node_id=“at0005” occurrences_lower=“2” occurrences_unbounded=“true”> – xs:boolean has to support 0/1 and true/false
<rm_type_name>CLUSTER</rm_type_name>

which is the analog of the first approach above, or it could be:

<children node_id=“at0005” occurrences_lower=“2” occurrences_upper=“10”>
<rm_type_name>CLUSTER</rm_type_name>

and

<children node_id=“at0005” occurrences_lower=“2” occurrences_upper=“*”>
<rm_type_name>CLUSTER</rm_type_name>

with both attributes defined in the XSD as xs:string. This means that like for JSON/dADL, the XML standard parser only generates strings, and somehting further has to be done to obtain a proper Interval object.

My preference is still to go with the first way of doing things. Do others agree with this? If so, it is what I will implement in the ADL 1.5 workbench.
Existence as an interval can be 0..0 (prohibited, commonly used in templates), 0..1 (optional, typical in the RM) and 1..1 (used in templates and sometimes in archetypes). Now, since archetypes and templates are *constraint* structures, they can only *further* constrain the RM in ADL/AOM 1.5. The only possibilities for this are actually "0..0" and "1..1", so we could collapse existence onto a single Boolean for serialised representation (it could also be a single Boolean in the AOM, but that would be a breaking change, and since we already use Intervals for occurrences and cardinality, it does not seem worth the trouble).

Thus in JSON/dADL it could be:

some_attr = <
existence = <True|False>
>

In XML:

<attributes rm_attribute_name="name"**existence**="true">
....
</attributes>

Now, this is cheating a bit because we are making it look like there is an AOM property 'existence' of type Boolean, but there isn't. Should it be named something else to make this clear? I.e. a pseudo attribute that only exists in serialised format but not in AOM internal format, e.g. 'existence_constraint'? I would favour this. In my current implementation, the serialised format actually has its own object model, and this would have to be true for JSON as well. I think it also makes sense in XML - that there will be a level of classes corresponding to the space-efficient serial form, which are not the same as the internal AOM classes.

thoughts?

Agree, it could be 0 or 1

yampeku · 11 November 2011 07:34

Although this would work, I think that it would make ADL far less
readable and would oblige people to know always the reference model
underneath AND their parent archetype (if for some reason the parent
archetype is not available then you are completely screwed). Even if
you say that people should know very well the model they are defining
archetypes for, I think that you would agree with me that they should
not be obliged to remember all archetypes on the specialization
hierarchy.

This could be even worse for the minimum, as if no constraint is
expressed = RM min (and again, also taking into account parent
archetype), which is almost always 0 or 1. And not being able to tell
at first look if something is not needed is really bad (IMHO).

system · 11 November 2011 08:15

Hi Thomas and colleagues,

I would like to discuss about the other serialization form of archetype, too.
I thought YAML could be an alternative of them.
However, JSON/YAML are based on weakly typing languages, do not have
established scheme definition, such as XSD/ADL.

inline.

~~~~~~~~~~ first question: occurrences and cardinality ~~~~~~~~~~~~~~~~
but the upper limit is commonly unbounded, i.e. '*' in typical UML-like
syntax. We could do:

occurrences = <
lower = <2> -- Integer field
upper_bounded = <True> -- Boolean field

I think upper_bounded is typo for upper_unbounded, but this format has the
most conformance to INTERVAL specification of assumed types library.
I agree this, because this form is easier to parse and generate an
INTERVAL instance.
I also agree with the first way of XML scheme with the same reason.

BTW, Rubyist might be prefer this format(YAML):

occurrence:
2..

~~~~~~~~~~~~~ second question:existence ~~~~~~~~~~~~
Thus in JSON/dADL it could be:

some_attr = <
existence = <True|False>

I prefer shorter method.
To keep backward compatibility, new "exist" property
would be defined as Boolean, because it looks more narrative.
e.g.

attribute.exist == true?
attribute.existence == 1..1

Shinji Kobayashi

system · 11 November 2011 08:19

Hi!

Although this would work, I think that it would make ADL far less
readable

Some readability thoughts...

When a value (e.g. upper bound) may be either a number or a symbol (*
or infinity) most recieveing software will need to have logic
separating the cases anyway, no matter how they are serialized.
So then I wonder how much harder it would be to include string parsing
logic so that we can have JSON-fields with string values like...
"occurrences": "1..*"

Will a string pattern be good enough for validation by auto-generated
validators or does separation into fields clearly make auto-generated
validators more capable in this case?

Archetypes and templates will likely often be re-used as in-memory
objects anyway so a little bit of string parsing overhead at startup
might not have any significant overhead cost.

On the other hand if we want to be verbose we could re-use some of the
formalisms from http://json-schema.org/ Then we get schema validators
in many programming languages for free
(http://json-schema.org/implementations.html). Or perhaps json-schema
should be an output format from something similar to the TDS (template
data schema) approach?

Best regards,
Erik Sundvall
erik.sundvall@liu.se http://www.imt.liu.se/~erisu/ Tel: +46-13-286733

thomas.beale · 11 November 2011 12:08

Why cant' the absence of a value mean unbounded?

occurrences = <
lower = <2>
>

Means 2..*

ok - if you are thinking in an XML mode, the implication is that the
default for upper is 'unbounded'.

I vaguely remember us discussing this many moons ago but I've
forgotten the rationale..

Also, what about inclusive/exclusive values at either end
of the interval? I know that this isn't an issue for occurence and
cardinality intervals which are always inclusive - but are we
proposing that
the representation of normal intervals will not use the same mechanisms
are you are proposing here?

yes - the point here is a specific simplification of representation of
Intervals, because a) the standard representation requires 6 properties
and b) occurrences, cardinality and existence are so frequent in
archetypes that serialisation in the standard way can greatly increase
the size of the file in XML (even in dADL or JSON, which are nearly
twice as efficient as XML, the size is significantly increased).

- thomas

thomas.beale · 11 November 2011 12:16

Hi Thomas and colleagues,

I would like to discuss about the other serialization form of archetype, too.
I thought YAML could be an alternative of them.

I had forgotten about YAML I have to admit. It would be interesting to support that in the ADL 1.5 tools as well. I will look into it.

However, JSON/YAML are based on weakly typing languages, do not have
established scheme definition, such as XSD/ADL.

inline.

2011/11/11 Thomas Beale [<thomas.beale@oceaninformatics.com>](mailto:thomas.beale@oceaninformatics.com):

~~~~~~~~~~ first question: occurrences and cardinality  ~~~~~~~~~~~~~~~~
but the upper limit is commonly unbounded, i.e. '*' in typical UML-like
syntax. We could do:

occurrences = <
    lower = <2> -- Integer field
    upper_bounded = <True> -- Boolean field

I think upper_bounded is typo for upper_unbounded, but this format has the

oops - you are right. Sorry about that.

most conformance to INTERVAL specification of assumed types library.
I agree this, because this form is easier to parse and generate an
INTERVAL instance.
I also agree with the first way of XML scheme with the same reason.

BTW, Rubyist might be prefer this format(YAML):

occurrence:
  2..

well, that’s close to what I generate in dADL right now:

but XML developers don’t like that.

thomas

thomas.beale · 11 November 2011 12:30

Although this would work, I think that it would make ADL far less
readable and would oblige people to know always the reference model

to be clear, I am not proposing to make any change at all to ADL. ADL is
meant as a proper readable, mathematical formal expression of archetype
semantics. It is the other serialisations we are concerned with here -
i.e. serialisations of AOM structures.

underneath AND their parent archetype (if for some reason the parent
archetype is not available then you are completely screwed). Even if
you say that people should know very well the model they are defining
archetypes for, I think that you would agree with me that they should
not be obliged to remember all archetypes on the specialization
hierarchy.

yes, that is another issue here, which is whether you are seeing an
archetype in differential or flattened form. If we use the ADL format
for occurrences, cardinality and existence ranges, you can always just
look at the most specialised archetype and you know the resulting
occurrences / card/ ex, because you always have the full range e.g. occ
= 2..5 or whatever. But in the scheme I am proposing, this is not so
easy to work out visually. The tools of course should generate the right
result in 'flat' view. If you play around with the AWB, you will see the
diff & flat views, but currently these intervals are easy to understand
because of always being in the full n..m form (even in the dADL and XML
serialisation). So... good point....

This could be even worse for the minimum, as if no constraint is
expressed = RM min (and again, also taking into account parent
archetype), which is almost always 0 or 1. And not being able to tell
at first look if something is not needed is really bad (IMHO).

well it would be bad if there were no flattener, but it is always
possible to implement a flattener. The way the AWB tool works is that
the serialised form of a differential archetype is converted to AOM form
- which has proper MULTIPLICITY_INTERVAL objects (these are essentially
just Interval<Integer>) before flattening; then serialisation occurs in
the other direction. So a flattened archetype will show the result of
the archetype lineage and also the RM, if the 'flatten RM' option is on.
I am not saying all tools have to work this way - this is the way I have
done the reference compiler, but others may come up with more
stream-based approaches in the future.

Anyway, this is a good point to be careful of.

- thomas

thomas.beale · 11 November 2011 12:50

Hi!

Although this would work, I think that it would make ADL far less
readable

Some readability thoughts...

When a value (e.g. upper bound) may be either a number or a symbol (*
or infinity) most recieveing software will need to have logic
separating the cases anyway, no matter how they are serialized.
So then I wonder how much harder it would be to include string parsing
logic so that we can have JSON-fields with string values like...
"occurrences": "1..*"

well that's my opinion as well, and XML-ers always react badly! The
'proper' parser code for dealing with this form, used in the ADL parser
is (from the .y file):

...
%type <MULTIPLICITY_INTERVAL> c_occurrences c_existence occurrence_spec
existence_spec
...
c_occurrences: -- empty is ok
     > SYM_OCCURRENCES SYM_MATCHES SYM_START_CBLOCK occurrence_spec
SYM_END_CBLOCK
         {
             $$ := $4
         }
     > SYM_OCCURRENCES error
         {
             abort_with_error("SOCCF", Void)
         }
     ;

occurrence_spec: cardinality_limit_value -- single integer or '*'
         {
             if not cardinality_limit_pos_infinity then
                 create multiplicity_interval.make_point($1)
             else
                 create multiplicity_interval.make_upper_unbounded(0)
                 cardinality_limit_pos_infinity := False
             end
             $$ := multiplicity_interval
         }
     > V_INTEGER SYM_ELLIPSIS cardinality_limit_value
         {
             if cardinality_limit_pos_infinity then
                 create multiplicity_interval.make_upper_unbounded($1)
                 cardinality_limit_pos_infinity := False
             else
                 create multiplicity_interval.make_bounded($1, $3)
             end
             $$ := multiplicity_interval
         }
     ;

....

cardinality_limit_value: integer_value
         {
             $$ := $1
         }
     > '*'
         {
             cardinality_limit_pos_infinity := True
         }
     ;

But the 'fast dADL' parser doesn't bother with any of that. Here is the
Eiffel code - you can see how simple it is, and how it would work in
Java, Python etc etc. Note that this parser only handles correct
Interval strings, i..e that were generated by the serialiser, not by
some erroneous human hand!

class MULTIPLICITY_INTERVAL

inherit INTERVAL [INTEGER]

     make_from_string (a_str: attached STRING)
             -- make from a string of the form "n..m" or just "n", where
n and m are integers, or m may be '*'
         require
             valid_multiplicity_string: valid_multiplicity_string (a_str)
         local
             a_lower, an_upper, delim_pos: INTEGER
             a_mult_str: STRING
         do
             a_mult_str := a_str.twin

-- remove any spaces
a_mult_str.prune_all (' ')

             -- make the interval
             delim_pos := a_mult_str.substring_index
(Multiplicity_range_delimiter, 1)
             -- n..m case
             if delim_pos > 0 then
                 a_lower := a_mult_str.substring (1, delim_pos-1).to_integer
                 if a_mult_str.item (a_mult_str.count) =
Multiplicity_unbounded_marker then
                     make_upper_unbounded (a_lower)
                 else
                     an_upper := a_mult_str.substring
(a_mult_str.substring_index (Multiplicity_range_delimiter, 1) +
Multiplicity_range_delimiter.count, a_mult_str.count).to_integer
                     make_bounded (a_lower, an_upper)
                 end
             -- * case
             elseif a_mult_str.item (1) = Multiplicity_unbounded_marker then
                 make_upper_unbounded (0)
             -- m (single integer) case
             else
                 a_lower := a_mult_str.to_integer
                 make_bounded (a_lower, a_lower)
             end
         end

Not exactly hard..... but I think XML developers are not used to this,
and seem to prefer the XML-attributes style, which of course is not an
OO structure, but does reduce the size of the XML file significantly.

Will a string pattern be good enough for validation by auto-generated
validators or does separation into fields clearly make auto-generated
validators more capable in this case?

Archetypes and templates will likely often be re-used as in-memory
objects anyway so a little bit of string parsing overhead at startup
might not have any significant overhead cost.

On the other hand if we want to be verbose we could re-use some of the
formalisms from http://json-schema.org/ Then we get schema validators
in many programming languages for free
(http://json-schema.org/implementations.html). Or perhaps json-schema
should be an output format from something similar to the TDS (template
data schema) approach?

I guess my assumption is that ADL will always use the most efficient and
human readable form (the proper 'n..m' form), while XML will probably
require the <atttributes rm_attr_name='xxx' card_lower='n'
card_upper='m'> kind of approach. What we do in JSON/YAML/dADL is up for
grabs. My personal feeling for probably YAML and dADL would actually be
to stick with the string 'n..m' form or else the simplified object
property form prposed initially in this thread, but JSONs mind-numbing
simplicity might imply to use the proper full object explosion form - I
assume noone cares how big a JSON file is?

- thomas

Andrew_Patterson · 11 November 2011 13:44

I'd agree with Eric here. The minute the receiving end has to deal with
"*" or "number"
then the data binder is going to need some special logic. You mind as
well make the
logic deal with parsing "1..*". It's not like that is much more of a
challenge.

So from an XML point of view we could then have

or for where we need elements

To specify wildcards for "upper" in XSD would have taken a regex string
restriction anyway - the regex for the "n..*" form is similar complexity.

The range string is easily implementable for JSON and YAML.

Andrew

Andrew_Patterson · 11 November 2011 13:56

Well I consider myself an XML-er and I don't see massive problems with it, but
maybe I have become soft in my old age.

My main argument would be that the XML at one point was almost a straight serialization
of the object model, as supported by various XML data binding libraries. So
XML -> AOM memory objects -> XML was all doable with very standard
binding libraries.

BUT

I was happy with status quo because I don't really care about the
size of the XML or how often elements are repeated or the fact that is looks
ugly to people - if people want compressed data then they should use fastinfoset
or exi, and then gzip and it'll compress beautifully. The size/format/look
is a concern to others.

BUT

If I have lost the battle and if we are going to do customised
XML serializations then once you've taken it outside the
normal data binding by introducing "*" forms or even
'properties' that aren't really properties but kind of quasi computed fields
then you mind as well as give up on the pretence that the XML serialization
will bind straight into an AOM compatible object model..
in which case parsing "1..*" is not a problem

Andrew

ian.mcnicoll · 11 November 2011 14:16

Apart from the size issue, readability is a particular problem because
of the verbosity of the current XML schema.

Ian

Dr Ian McNicoll
office +44 (0)1536 414 994
fax +44 (0)1536 516317
mobile +44 (0)775 209 7859
skype ianmcnicoll
ian.mcnicoll@oceaninformatics.com

Clinical Modelling Consultant, Ocean Informatics, UK
openEHR Clinical Knowledge Editor www.openehr.org/knowledge
Honorary Senior Research Associate, CHIME, UCL
BCS Primary Health Care www.phcsg.org

Andrew_Patterson · 11 November 2011 14:29

I'm not convinced that human readability should matter too much
(especially seeing ADL is meant to be the human readable format
- if we have readable XML can we ditch the ADL??)

But I'm not passionately opposed to it or anything Just when it
was brought up in the past many moons ago I thought we had other
more pressing issues. But if the change is happening as part of
an update to 1.5 then I'm all for it.

Andrew

ian.mcnicoll · 11 November 2011 14:37

Hi Andrew,

In principle I agree. I speak only as one of the poor sods who
sometimes has to visually check the .opt template schemas and which
use the same format. I know - get a tool But even in something
like XMLSpy it can get hard to see the clinical wood for the
occurences trees.

Ian

Dr Ian McNicoll
office +44 (0)1536 414 994
fax +44 (0)1536 516317
mobile +44 (0)775 209 7859
skype ianmcnicoll
ian.mcnicoll@oceaninformatics.com

Clinical Modelling Consultant, Ocean Informatics, UK
openEHR Clinical Knowledge Editor www.openehr.org/knowledge
Honorary Senior Research Associate, CHIME, UCL
BCS Primary Health Care www.phcsg.org

pablo · 11 November 2011 16:21

Hi, I was thinking of this a lot: using a schema-less formats to represent archetypes and RM instances.

I think if we agree on a common language/standard/definition, we don’t need to define the types of any node on a JSON/YAML structure, because those types are defined on the laguage/standard/definition those structures will follow. And if we define a good serialization to JSON/YAML of archetypes and RM instances, we don’t need a schema to share instances of those structures, we just need to implement the serialization definitions, and base the parsing on the attribute names.

What do you think?

PS: I was thinking of archetypes serialized to JSON because I want to build a web-based GUI Generation layer completely implemented with Javascript (JSON objects are javascript objects), so we can use&share this thin layer to show archetype-based GUI generation easily, and, if we have a REST layer that implement EHR-Server services, we can user that GUI layer to send data input to the server and get information to show (a complete circle). If anyone want to collaborate on the JSON format of ADL/AOM please send contact me.

thomas.beale · 12 November 2011 00:54

“occurrences”: “1..*”
well that’s my opinion as well, and XML-ers always react badly! The
‘proper’ parser code for dealing with this form, used in the ADL parser
is (from the .y file):

Well I consider myself an XML-er and I don’t see massive problems with it, but
maybe I have become soft in my old age.

My main argument would be that the XML at one point was almost a straight serialization
of the object model, as supported by various XML data binding libraries. So
XML → AOM memory objects → XML was all doable with very standard
binding libraries.

yes, that’s the way the current schemas look. It seems most people don’t like them because the XML docs end up being ‘big’, which seems an interminable obsession in XML-land, although noone cares about it much in other (more efficient) serialisations. Size sometimes matters - e.g. in EHR data. In archetypes, which might one day number in the thousands but not more, I think it is of questionable importance. But who am I to say…

BUT

I was happy with status quo because I don’t really care about the
size of the XML or how often elements are repeated or the fact that is looks
ugly to people - if people want compressed data then they should use fastinfoset
or exi, and then gzip and it’ll compress beautifully. The size/format/look
is a concern to others.

BUT

If I have lost the battle and if we are going to do customised
XML serializations then once you’ve taken it outside the
normal data binding by introducing “" forms or even
‘properties’ that aren’t really properties but kind of quasi computed fields
then you mind as well as give up on the pretence that the XML serialization
will bind straight into an AOM compatible object model..
in which case parsing "1..” is not a problem

that would also be my point of view, and the current release of the ADL Workbench produces this kind of XML, viz:

ELEMENT __2..*; ordered__ True False __1..*; unordered__ True False protocol **1** False

At least some people don’t like this. If the consensus is to use this form, then great, it is more reliable, but again, others might not like it.

thomas

thomas.beale · 12 November 2011 01:04

Again, I agree with this point of view. But XML people may not… but now I should clarify something…

I should have explained on other thing: what I have done in the current AOM 1.5 implementation (but not yet documented) is to create a parallel set of P_XX classes (‘P_’ means ‘persistent’) like P_ARCHETYPE, P_C_OBJECT and so on. These classes formally specify the serialised form of the archetype so there can be no ambiguity. It is these classes that current have occurrences, cardinality and existence defined as String properties. There are a few other simplifications as well. My proposal is to add these P_XX class definitions to the specification. It mihgt seem like slight overkill (and I resisted it for a long time) but once I implemented it, it seems worthwhile, and it allows us to separate the in-memory computable version of the AOM from a P_ version whose sole purpose is serialisation. The Eiffel P_ classes are here; it is easy to imagine what the Java, Python etc would look like.

So Pablo’s argument, applied to the P_ classes would indeed mean that the serialised form in JSON, YAML (also dADL) is a pure consequence of the P_AOM classes, and no extra logic is needed. That is why I built the P_ classes.

thomas

pablo · 12 November 2011 02:05

Hi Thomas, do you have some examples of the JSON produced with your P_ classes from a couple AOM instances? It would be nice to see the results.

I don’t see why anyone would dislike not to have each node’s type specified in the serialization form when we are talking about a schema-less format (I mean: we don’t need to put each node’s class in every instance of a JSON/YAML serialization from an AOM instance) and if we could agree a specification of this format (and the specification will have each nodes type, or a mapping to an AOM object that has a type defined in the AOM specs).

This is not the issue, but I don’t like the name “persistence” for the package, because I get the idea this is only for persisting something, but what I realy want to do is to use the serialization for “archetype interchange” (between a server and a web browser).

Heath_Frankel3 · 13 November 2011 22:43

I too have no problem with this custom serialisation as I have a hand-coded
serializer that does the job (I gave up on the auto-generated ones years
ago).

However, I think we need to go back a step and get agreement from the
community what the most important features of an XML serialization are:
readability, size, auto-generation etc. Once we get some sort of ranking
then we can score each implementation choice accordingly.

I personally don't see the need to have consistently between different
serialization formats, I think we should make the decisions that are best
for the particular format. Having said that, I would be surprised if the
logical features of the different formats would be different unless there
intended use are dramatically different (i.e. the importance of
auto-generation is likely to be the same for both JSON and XML).

Heath

Topic		Replies	Views
Could YAML replace dADL as human readable AOM serialization format? Technical (archive)	33	5	15 December 2011
a first attempt JSON archetype Reference Implementation: Java (archive)	9	0	25 November 2011
AOM 1.5 - single v multiple attributes Implementers (archive)	47	3	25 January 2012
ADL formalisms ADL	35	812	21 June 2024
More on ISO 21090 complexity Technical (archive)	35	3	24 November 2010
optional existence, cardinality and occurrences. Technical (archive)	9	3	20 July 2009
pass_through attribute in ADL 1.5 Technical (archive)	39	3	1 February 2012
TDS (and TDD) implementations? Technical (archive)	23	8	14 June 2013
Missing rule in AOM 1.4 for non-unique sibling nodeIds Specifications	42	969	8 April 2024
JSON for definitions-notation Technical (archive)	14	1	16 February 2019

occurrences and cardinality in ADL, XML, JSON

Related topics