# Could YAML replace dADL as human readable AOM serialization format? **Category:** [Technical (archive)](https://discourse.openehr.org/c/technical-archive/156) **Created:** 2011-11-22 11:51 UTC **Views:** 5 **Replies:** 33 **URL:** https://discourse.openehr.org/t/could-yaml-replace-dadl-as-human-readable-aom-serialization-format/15105 --- ## Post #1 by @system Hi! A little suggestion/thought (that might be of value also for CIMI-folks and others looking at "archetyping" using ADL and AOM and wondering if a specific language is needed). **Limitations:** For efficient handling of RM (Reference Model) instances (patient data) flying back and forth between systems you'd probably want some binary format ([protobuf](http://code.google.com/p/protobuf/), [thrift datatypes](http://thrift.apache.org/), serialized Java objects or whatever), this is NOT what this suggestion is about. For development and debugging RM-instance exchange you may also want some fairly human-readable serialization that is supported by many platforms (Like [JSON](http://www.json.org/), [YAML](http://www.yaml.org/), XML or whatever) this is NOT what the suggestion is about either. Also note that the current suggestion only aims at looking for replacement of dADL not cADL. Also note that the AOM and XML serialisations of the AOM are not affected by this suggestion. **Background:** cADL (Constraint ADL) is a compact [DSL](http://en.wikipedia.org/wiki/Domain-specific_language) that is aimed at defining constraints on an object model, while dADL (Data ADL) on the other hand is mainly a general object-graph serialization format. If I understand section 1.7.5 in the [ADL 1.5 spec](http://www.openehr.org/svn/specification/TRUNK/publishing/architecture/am/adl1.5.pdf) correctly, ADL 2.0 will allow the option to define **all** parts of an archetype (including what is now done in cADL) as a dADL serialization of the [AOM](http://www.openehr.org/svn/specification/TRUNK/publishing/architecture/am/aom1.5.pdf) (Archetype Object Model). Is that correct Tom? **Suggestion:** Investigate if YAML can replace or complement dADL as object-graph serialization format for archetypes. (Perhaps there is interest from people using an openEHR AOM implementation in a language that already has YAML serializers to make a quick experiment?) **Motivation:** - YAML parsers converting YAML documents to native object graphs already exist for [a number of languages](http://www.yaml.org/) (C/C++, Ruby, Python, Java, Perl, C#/.NET, PHP, OCaml, Javascript, Actionscript, Haskell) so there would be less work creating and maintaining archetype parsers that turn archetype files into in-memory object graphs. (If you write an archetype authoring tool an need to validate archetypes, not just instantiate already validated archetypes, then the "Validity Rules" (such as the ones in blue under 4.3.1.1 in the AOM spec.) will of course still need to be implemented in software. - Having an archetype specific object-serialization language like dADL might make "archetyping" look more mysterious and suspect and might hide the fact that the semantics expressed in the AOM is the interesting thing that can be serialised in many different ways. - And (admittedly subjective) YAML lists and objects look slightly better and more readable than dADL. A notable exception is probably intervals/ranges that have a compact representation in dADL (see section [4.5.2 of the ADL 1.5 spec](http://www.openehr.org/svn/specification/TRUNK/publishing/architecture/am/adl1.5.pdf)) but not natively in YAML. **Observations:** YAML is extensible, so data types for intervals etc can be added like in [http://yaml.org/YAML_for_ruby.html#ranges](http://yaml.org/YAML_for_ruby.html#ranges), also see discussion at [http://stackoverflow.com/questions/3337020/how-to-specify-ranges-in-yaml](http://stackoverflow.com/questions/3337020/how-to-specify-ranges-in-yaml). A similar approach could be taken to dADLs "Plug-in Syntaxes" (see [section 4.6](http://www.openehr.org/svn/specification/TRUNK/publishing/architecture/am/adl1.5.pdf)) using YAML. A number of language-independent extra YAML datatypes ([timestamp ](http://yaml.org/type/timestamp.html)for example) are listed at [http://yaml.org/type/index.html](http://yaml.org/type/index.html) and you can define your own if you need more. It seems like specification 1.1 ([http://yaml.org/spec/1.1/](http://yaml.org/spec/1.1/)) is the most implemented, so any dADL comparisons should probably be done towards that version to be fair. Best regards, Erik Sundvall [erik.sundvall@liu.se](mailto:erik.sundvall@liu.se) [http://www.imt.liu.se/~erisu/](http://www.imt.liu.se/~erisu/) Tel: +46-13-286733 P.s. Tom Beale and I sort of started a brief off-list discussion about YAML, here is now an attempt to get input from more people. --- ## Post #2 by @thomas.beale > Hi! > > A little suggestion/thought (that might be of value also for CIMI-folks and others looking at "archetyping" using ADL and AOM and wondering if a specific language is needed). > > **Limitations:** > For efficient handling of RM (Reference Model) instances (patient data) flying back and forth between systems you'd probably want some binary format ([protobuf](http://code.google.com/p/protobuf/), [thrift datatypes](http://thrift.apache.org/), serialized Java objects or whatever), this is NOT what this suggestion is about. For development and debugging RM-instance exchange you may also want some fairly human-readable serialization that is supported by many platforms (Like [JSON](http://www.json.org/), [YAML](http://www.yaml.org/), XML or whatever) this is NOT what the suggestion is about either. Also note that the current suggestion only aims at looking for replacement of dADL not cADL. Also note that the AOM and XML serialisations of the AOM are not affected by this suggestion. > > **Background:** > cADL (Constraint ADL) is a compact [DSL](http://en.wikipedia.org/wiki/Domain-specific_language) that is aimed at defining constraints on an object model, while dADL (Data ADL) on the other hand is mainly a general object-graph serialization format. > If I understand section 1.7.5 in the [ADL 1.5 spec](http://www.openehr.org/svn/specification/TRUNK/publishing/architecture/am/adl1.5.pdf) correctly, ADL 2.0 will allow the option to define **all** parts of an archetype (including what is now done in cADL) as a dADL serialization of the [AOM](http://www.openehr.org/svn/specification/TRUNK/publishing/architecture/am/aom1.5.pdf) (Archetype Object Model). Is that correct Tom? actually, ADL 2.0 as reported in this document is now obsolete. The ADL 1.5 compiler already does this, and will use it as a fast save/retrieve format. See below for example, or download the current release of the ADL Workbench to play. I am intending to document the 'P_' classes on which this serialisation is based, and on which I think any JSON / YAML / XML serialisation should be based - when we can agree on it. It is in these classes that things like occurrences are changed from MULTIPLICITY_INTERVAL to String. > **Suggestion:** > Investigate if YAML can replace or complement dADL as object-graph serialization format for archetypes. (Perhaps there is interest from people using an openEHR AOM implementation in a language that already has YAML serializers to make a quick experiment?) My motivation for making pure dADL archetypes is to have a fast, efficient serialisation of the object graph of an archteype, so that when an archetype compiles successfully, it can be saved in this form and later retrieved, bypassing the ADL compiler. The value in this is that formats like dADL / JSON / YAML are low-level graph serialisations, and that really fast parsers can be written for them for use on persisted files __*known to be correct*__ (i.e. generated by a serialiser in a previous save). My own dADL parser is not such a fast parser, but that's only a matter of time ;-) So the same arguments would apply to JSON or YAML in my view. At least for this purpose (fast save & retrieve of previously compiled archetypes), any such format could be used. > **Motivation:** > > - YAML parsers converting YAML documents to native object graphs already exist for [a number of languages](http://www.yaml.org/) (C/C++, Ruby, Python, Java, Perl, C#/.NET, PHP, OCaml, Javascript, Actionscript, Haskell) so there would be less work creating and maintaining archetype parsers that turn archetype files into in-memory object graphs. (If you write an archetype authoring tool an need to validate archetypes, not just instantiate already validated archetypes, then the "Validity Rules" (such as the ones in blue under 4.3.1.1 in the AOM spec.) will of course still need to be implemented in software. > - Having an archetype specific object-serialization language like dADL might make "archetyping" look more mysterious and suspect and might hide the fact that the semantics expressed in the AOM is the interesting thing that can be serialised in many different ways. > - And (admittedly subjective) YAML lists and objects look slightly better and more readable than dADL. A notable exception is probably intervals/ranges that have a compact representation in dADL (see section [4.5.2 of the ADL 1.5 spec](http://www.openehr.org/svn/specification/TRUNK/publishing/architecture/am/adl1.5.pdf)) but not natively in YAML. > > **Observations:** > YAML is extensible, so data types for intervals etc can be added like in [http://yaml.org/YAML_for_ruby.html#ranges](http://yaml.org/YAML_for_ruby.html#ranges), also see discussion at [http://stackoverflow.com/questions/3337020/how-to-specify-ranges-in-yaml](http://stackoverflow.com/questions/3337020/how-to-specify-ranges-in-yaml). A similar approach could be taken to dADLs "Plug-in Syntaxes" (see [section 4.6](http://www.openehr.org/svn/specification/TRUNK/publishing/architecture/am/adl1.5.pdf)) using YAML. A number of language-independent extra YAML datatypes ([timestamp ](http://yaml.org/type/timestamp.html)for example) are listed at [http://yaml.org/type/index.html](http://yaml.org/type/index.html) and you can define your own if you need more. One area where dADL beats JSON and YAML (I think) is its better support for Xpath-like paths. Plus its much more compact than JSON. Personally I find YAML hard to read because there are so many syntax elements (triple '-', triple '.' etc) but that might just be me. - thomas ~~~~~~~~~~~~~~~~~~~ openEHR-DEMOGRAPHIC-PERSON.person.v1 as dADL via P_XX classes ~~~~~~~~~~~~~~~~~ (P_ARCHETYPE) < original_language = <[ISO_639-1::pt-br]> translations = < ["en"] = < language = <[ISO_639-1::en]> author = < ["name"] = <"Sergio Miranda Freire"> ["organisation"] = <"Universidade do Estado do Rio de Janeiro - UERJ"> ["email"] = <["sergio@lampada.uerj.br"](mailto:sergio@lampada.uerj.br)> > > > description = < original_author = < ["name"] = <"Sergio Miranda Freire & Rigoleta Dutra Mediano Dias"> ["organisation"] = <"Universidade do Estado do Rio de Janeiro - UERJ"> ["email"] = <["sergio@lampada.uerj.br"](mailto:sergio@lampada.uerj.br)> ["date"] = <"22/05/2009"> > details = < ["en"] = < language = <[ISO_639-1::en]> purpose = <"Representation of a person's demographic data."> use = <"Used in demographic service to collect a person's data."> keywords = <"demographic service", "person's data"> misuse = <""> copyright = <"© 2011 openEHR Foundation"> > ["pt-br"] = < language = <[ISO_639-1::pt-br]> purpose = <"Representação dos dados demográficos de uma pessoa."> use = <"Usado em serviço demográficos para coletar os dados de uma pessoa."> keywords = <"serviço demográfico", "dados de uma pessoa"> misuse = <""> copyright = <"© 2011 openEHR Foundation"> > > lifecycle_state = <"Authordraft"> other_contributors = <"Sebastian Garde, Ocean Informatics, Germany (Editor)", "Omer Hotomaroglu, Turkey (Editor)", "Heather Leslie, Ocean Informatics, Australia (Editor)"> other_details = < ["references"] = <"ISO/TS 22220:2008(E) - Identification of Subject of Care - Technical Specification - International Organization for Standardization."> > > artefact_object_type = <"DIFFERENTIAL_ARCHETYPE"> archetype_id = <"openEHR-DEMOGRAPHIC-PERSON.person.v1"> adl_version = <"1.5"> artefact_type = <"archetype"> definition = < rm_type_name = <"PERSON"> node_id = <"at0000"> attributes = < ["1"] = < rm_attribute_name = <"details"> children = < ["1"] = (P_ARCHETYPE_SLOT) < rm_type_name = <"ITEM_TREE"> node_id = <"at0001"> occurrences = <"1"> includes = < ["1"] = < expression = (EXPR_BINARY_OPERATOR) < type = <"Boolean"> operator = < value = <2007> > left_operand = (EXPR_LEAF) < type = <"String"> reference_type = <"attibute"> item = <"archetype_id/value"> > right_operand = (EXPR_LEAF) < type = <"C_STRING"> reference_type = <"constraint"> item = (C_STRING) < regexp = <"(person_details)[a-zA-Z0-9_-]*\\.v1"> is_open = regexp_default_delimiter = > > precedence_overridden = > > > is_closed = > > is_multiple = > ["2"] = < rm_attribute_name = <"identities"> children = < ["1"] = (P_ARCHETYPE_SLOT) < rm_type_name = <"PARTY_IDENTITY"> node_id = <"at0002"> occurrences = <"1"> includes = < ["1"] = < expression = (EXPR_BINARY_OPERATOR) < type = <"Boolean"> operator = < value = <2007> > left_operand = (EXPR_LEAF) < type = <"String"> reference_type = <"attibute"> item = <"archetype_id/value"> > right_operand = (EXPR_LEAF) < type = <"C_STRING"> reference_type = <"constraint"> item = (C_STRING) < regexp = <"(person_name)[a-zA-Z0-9_-]*\\.v1"> is_open = regexp_default_delimiter = > > precedence_overridden = > > > is_closed = > > is_multiple = > ["3"] = < rm_attribute_name = <"contacts"> children = < ["1"] = (P_C_COMPLEX_OBJECT) < rm_type_name = <"CONTACT"> node_id = <"at0003"> occurrences = <"1"> attributes = < ["1"] = < rm_attribute_name = <"addresses"> children = < ["1"] = (P_ARCHETYPE_SLOT) < rm_type_name = <"ADDRESS"> node_id = <"at0030"> occurrences = <"1"> includes = < ["1"] = < expression = (EXPR_BINARY_OPERATOR) < type = <"Boolean"> operator = < value = <2007> > left_operand = (EXPR_LEAF) < type = <"String"> reference_type = <"attibute"> item = <"archetype_id/value"> > right_operand = (EXPR_LEAF) < type = <"C_STRING"> reference_type = <"constraint"> item = (C_STRING) < regexp = <"(address)([a-zA-Z0-9_-]+)*\\.v1"> is_open = regexp_default_delimiter = > > precedence_overridden = > > ["2"] = < expression = (EXPR_BINARY_OPERATOR) < type = <"Boolean"> operator = < value = <2007> > left_operand = (EXPR_LEAF) < type = <"String"> reference_type = <"attibute"> item = <"archetype_id/value"> > right_operand = (EXPR_LEAF) < type = <"C_STRING"> reference_type = <"constraint"> item = (C_STRING) < regexp = <"(electronic_communication)[a-zA-Z0-9_-]*\\.v1"> is_open = regexp_default_delimiter = > > precedence_overridden = > > > is_closed = > > is_multiple = > > > > is_multiple = > ["4"] = < rm_attribute_name = <"relationships"> children = < ["1"] = (P_C_COMPLEX_OBJECT) < rm_type_name = <"PARTY_RELATIONSHIP"> node_id = <"at0004"> attributes = < ["1"] = < rm_attribute_name = <"details"> children = < ["1"] = (P_C_COMPLEX_OBJECT) < rm_type_name = <"ITEM_TREE"> attributes = < ["1"] = < rm_attribute_name = <"items"> children = < ["1"] = (P_C_COMPLEX_OBJECT) < rm_type_name = <"ELEMENT"> node_id = <"at0040"> attributes = < ["1"] = < rm_attribute_name = <"value"> children = < ["1"] = (P_C_COMPLEX_OBJECT) < rm_type_name = <"DV_TEXT"> > ["2"] = (P_C_COMPLEX_OBJECT) < rm_type_name = <"DV_CODED_TEXT"> attributes = < ["1"] = < rm_attribute_name = <"defining_code"> children = < ["1"] = (P_CONSTRAINT_REF) < rm_type_name = <"CODE_PHRASE"> target = <"ac0000"> > > is_multiple = > > > > is_multiple = > > > > is_multiple = > > > > is_multiple = > > > > is_multiple = > > > ontology = < term_definitions = < ["pt-br"] = < ["at0000"] = < text = <"Dados da pessoa"> description = <"Dados da pessoa."> > ["at0001"] = < text = <"Detalhes"> description = <"Detalhes demográficos da pessoa."> > ["at0002"] = < text = <"Nome"> description = <"Conjunto de dados que especificam o nome da pessoa."> > ["at0003"] = < text = <"Contatos"> description = <"Contatos da pessoa."> > ["at0004"] = < text = <"Relacionamentos"> description = <"Relacionamentos de uma pessoa, especialmente laços familiares."> > ["at0030"] = < text = <"Endereço"> description = <"Endereços vinculados a um único contato, ou seja, com o mesmo período de validade."> > ["at0040"] = < text = <"Grau de parentesco"> description = <"Define o grau de parentesco entre as pessoas envolvidas."> > > ["en"] = < ["at0000"] = < text = <"Person"> description = <"Personal demographic data."> > ["at0001"] = < text = <"Demographic details"> description = <"A person's demographic details."> > ["at0002"] = < text = <"Name"> description = <"A person's name."> > ["at0003"] = < text = <"Contacts"> description = <"A person's contacts."> > ["at0004"] = < text = <"Relationships"> description = <"A person's relationships, especially family ties."> > ["at0030"] = < text = <"Addresses"> description = <"Addresses linked to a single contact, i.e. with the same time validity."> > ["at0040"] = < text = <"Relationship type"> description = <"Defines the type of relationship between related persons."> > > > constraint_definitions = < ["pt-br"] = < ["ac0000"] = < text = <"Códigos para tipo de parentesco"> description = <"códigos válidos para tipo de parentesco."> > > ["en"] = < ["ac0000"] = < text = <"Codes for type of relationship"> description = <"Valid codes for type of relationship."> > > > > is_controlled = is_generated = is_valid = --- ## Post #3 by @system Hi! Let the battle begin :-) see: [http://www.imt.liu.se/~erisu/2011/AOM-beauty-contest.html](http://www.imt.liu.se/~erisu/2011/AOM-beauty-contest.html) > actually, ADL 2.0 as reported in this document is now obsolete. The ADL 1.5 compiler already does this, and will use it as a fast save/retrieve format. Will cADL become optional or go away somehow? > One area where dADL beats JSON and YAML (I think) is its better support for Xpath-like paths. Why would that be different? I guess most path queries will run on instantiated object trees rather than on documents and then there is no difference - and if paths were run directly on documents, then please explain why dADL would support them better. > Plus its much more compact than JSON. Much? Less noisy I would agree to though. > Personally I find YAML hard to read because there are so many syntax elements (triple '-', triple '.' etc) but that might just be me. Have a look at... [http://www.imt.liu.se/~erisu/2011/AOM-beauty-contest.html](http://www.imt.liu.se/~erisu/2011/AOM-beauty-contest.html) ...again. The triple '-' and triple '.' are (mostly optional) start and end markers of documents that make life easier when concatenating streams/documents, see the YAML specification. Am I the only one that thinks YAML is more readable than dADL? Best regards, Erik Sundvall [erik.sundvall@liu.se](mailto:erik.sundvall@liu.se) [http://www.imt.liu.se/~erisu/](http://www.imt.liu.se/~erisu/) Tel: +46-13-286733 --- ## Post #4 by @thomas.beale > Hi! > > Let the battle begin :-) see: > [http://www.imt.liu.se/~erisu/2011/AOM-beauty-contest.html](http://www.imt.liu.se/%7Eerisu/2011/AOM-beauty-contest.html) nice page - that's quite fun to see them all pasted up there. My question is: what's the/your purpose for human readability. Is it: - education e.g. in some kind of class-room / training situation - debugging - self-learning - something else Just a question.... > > > actually, ADL 2.0 as reported in this document is now obsolete. The ADL 1.5 compiler already does this, and will use it as a fast save/retrieve format. > > Will cADL become optional or go away somehow? its not my intention. To be honest, I am not sure if a streaming cADL parser that knows it is parsing guranteed correct cADL might not be faster than the equivalent dADL parser for the archetype definition. But either way, cADL is a notation that really gives you a direct feel for the implicated semantics, so for understanding what you are looking at it has to be better. dADL / XML / JSON etc don't give you a direct picture, they give you a serialised object picture from which your brain has to infer an object structure (but admittedly this is unambiguous, so your brain will probably get it right). In my view 'proper syntax' is nicer for direct comprehension and therefore learning. > > One area where dADL beats JSON and YAML (I think) is its better support for Xpath-like paths. > > Why would that be different? I guess most path queries will run on instantiated object trees rather than on documents and then there is no difference - and if paths were run directly on documents, then please explain why dADL would support them better. Looking at the JSON again, I might have to eat my words... I guess if the attribute names / hash tags are turned into Xpath predicates the implied set of paths has to be the same. > > Plus its much more compact than JSON. > > Much? Less noisy I would agree to though. > > > Personally I find YAML hard to read because there are so many syntax elements (triple '-', triple '.' etc) but that might just be me. > > Have a look at... > [http://www.imt.liu.se/~erisu/2011/AOM-beauty-contest.html](http://www.imt.liu.se/%7Eerisu/2011/AOM-beauty-contest.html) > ...again. > > The triple '-' and triple '.' are (mostly optional) start and end markers of documents that make life easier when concatenating streams/documents, see the YAML specification. > > Am I the only one that thinks YAML is more readable than dADL? when I get a moment I will add YAML to the serialiser club in the tool and we can then see if proper YAML is is or isn't better to read (I am assuming that it will be somewhat different from the inferred YAML you generated with that web tool). I think 'readability' is starting to come down to congitive and linguistic / semiotic issues, which is very interestinng. There may be no objective answer to this question; if there is it will be interesting to know what the criteria are. Nice work on the contest! - thomas --- ## Post #5 by @Heath_Frankel3 Thanks Erik, Interesting to see the line up. Can’t believe that XML wasn’t the longest file in the list, that kills one of the arguments for JSON vs XML. For someone that is not aware of YAML, are the white space significant. If so, this kinds of kills it for me, otherwise for a Human reader its fairly natural to read without lots of brackets of various kinds. Heath --- ## Post #6 by @system Hi Erik, is the Javascript Object Dump missing regexps for 'address' and 'electronic\_communications'? Or is that irrelevant? In the YAML, some comma separated key\-value pairs are condensed into 1 line; it would be nicer if they could all be on their own line: makes it lengthier, but more readable and a fairer comparison to the other formats\. Cheers, Roger --- ## Post #7 by @system Hi! > > [http://www.imt.liu.se/~erisu/2011/AOM-beauty-contest.html](http://www.imt.liu.se/~erisu/2011/AOM-beauty-contest.html) > > is the Javascript Object Dump missing regexps for 'address' and > 'electronic_communications'? Or is that irrelevant? Thanks for spotting that, obviously something went wrong in the object dump. I have now commented that on the web page. > In the YAML, some comma separated key-value pairs are condensed into 1 > line; it would be nicer if they could all be on their own line: makes > it lengthier, but more readable and a fairer comparison to the other > formats. I think this is the default way of nesting flow style within block style with limited line length, but we should double check that, I agree that one line per thing would be more readable. Perhaps that can be configured in serializers. > Interesting to see the line up. Can’t believe that XML wasn’t the longest file in the list, that kills one of the arguments for JSON vs XML. Well that depends how you measure length or weight in bytes in readable or compact form. Have a look at the bottom of the [http://www.imt.liu.se/~erisu/2011/AOM-beauty-contest.html](http://www.imt.liu.se/~erisu/2011/AOM-beauty-contest.html) where I have now added some length comparison of whitespace-compressed formats. > For someone that is not aware of YAML, are the white space significant. Indentation level is significant when using YAML block style but not YAML flow style. See the YAML specification for details. > If so, this kinds of kills it for me, otherwise for a Human reader its fairly natural to read without lots of brackets of various kinds. Well aren't the most common ways of defining the tree structures to either use brackets/tags/delimiters of some kind or to use indentation? Do you have any other obvious and still readable methods that avoid brackets etc but where whitespace or indentation is not significant? --- ## Post #8 by @thomas.beale Thanks Erik, Interesting to see the line up. Can’t believe that XML wasn’t the longest file in the list, that kills one of the arguments for JSON vs XML. For someone that is not aware of YAML, are the white space significant. If so, this kinds of kills it for me, otherwise for a Human reader its fairly natural to read without lots of brackets of various kinds. Heath --- ## Post #9 by @Heath_Frankel3 I think previously I had indicated I had no problem with the stringified interval approach in XML, but I am reverting my thinking on this and feel that it would be counter intuitive for those who what to use the XML schemas for code generation purposes. I think in this case the computable requirement outweighs the human readable requirement. I think we can come up with a much more concise representation of these intervals without compromising the computable requirement, something similar to XML schema maxOccurs/minOccurs. Heath please everyone remember that the dADL, JSON and XML generated from AWB all currently use the stringified expression of cardinality / occurrences / existence. Now, these are usually the most numerous constraints in an archetype and if expressed in the orthodox way, take up 6 lines of text, hence the giant files (e.g. AOM 1.4 based XML we currently use) - and thus the much reduced files you see on Erik's page, because we are using ADL 1.5 flavoured serialisations not the ADL 1.4 one. Now, I think we should probably go with the stringified form in all of these formalisms. The cost of doing this is a small micro-parser, but it is the same microparser for everyone, which seems attractive to me. The alternative that Erik mentioned was more native, but still efficient interval expressions, e.g. dADL has it built in (0..* is |>=0| in dADL), and YAML and JSON could probably be persuaded to make some sort of array of integer-like things be used. XML still doesn't have any such support. In theory this approach would be the best if each syntax supported it properly, but XML doesn't at all, and the others don't support Intervals with unbounded upper limit (i.e. the '*' in '0..*'). But Erik's exercise certainly proved that efficient representation of the humble Interval is actually worthwhile. (Once again thanks for that page, its quite a good way to get a good feel for these syntaxes very quickly). - thomas --- ## Post #10 by @system Hi All I am going to say it once more: If there is an expression on occurrences of ‘0..*’ anywhere in ADL then it is an error, for that is not a constraint – and can only be wrong (ie the RM may have a narrower constraint). We just need a max int and a min int – both optional. I won’t say it again – but it does keep it simple and it is correct! Cheers, Sam --- ## Post #11 by @yampeku and if you want to express something like 'a set with all the past test results for this patient' \(that could have none\)? it would be a constraint as you are only allowing some kinds of entries \(children of a certain Snomed code for example\) --- ## Post #12 by @thomas.beale Hi All I am going to say it once more: If there is an expression on occurrences of ‘0..*’ anywhere in ADL then it is an error, for that is not a constraint – and can only be wrong (ie the RM may have a narrower constraint). We just need a max int and a min int – both optional. I won’t say it again – but it does keep it simple and it is correct! --- ## Post #13 by @system Hi! > I think previously I had indicated I had no problem with the stringified interval approach in XML, but I am reverting my thinking on this and feel that it would be counter intuitive for those who what to use the XML schemas for code generation purposes. I think in this case the computable requirement outweighs the human readable requirement. You are probably right regarding XML, and maybe this is valid also for most JSON use-cases where the desire for an as simple as possible object-serialization-mapping outweighs human readability. I think the openEHR community is best served by having different archetype serialization format categories with different priorities for different purposes. E.g.: 1a. An XML format optimized for mapping to XML-schema generated code. 1b. A JSON format optimized for mapping to AOM object models handcrafted or generated from AOM-specifications. 2. A cADL-variant wrapped in YAML optimized for human readability. It could be used for archetype files stored in version control systems (making version diffs readable) and as textual format when you need textual examples in documentation, teaching etc. In 1a & 1b easy implementation should be prioritized over readability but in #2 human readability should be prioritized. Prioritizing both in the same format would likely fail. Things like default ordering of nodes and attributes could be recommended but optional for #1 but should be mandatory for #2 (otherwise readability suffers and diffs get messed up). > I think we can come up with a much more concise representation of these intervals without compromising the computable requirement, something similar to XML schema maxOccurs/minOccurs. Probably, but for #1 maybe being close to the AOM should be prioritized over being concise. After all, archetypes will not be sent over the wire at the same scale as patient data (RM instances). By the way, is the AOM open for changes (like renaming attributes) if that would increase clarity? If we would change subject and discuss RM instance serialization, then binary formats (like Protobuf and Thrift) could form a third category where message size and speed of conversion would be prioritized over ease of implementation or readability. XML and JSON would likely be good to have also for interoperability and debugging purposes. YAML for the RM would not be an obvious "over the wire"-format, but can be very useful for compact human readable long term EHR archiving storage as plain text files and for documentation examples. Best regards, Erik Sundvall [erik.sundvall@liu.se](mailto:erik.sundvall@liu.se) [http://www.imt.liu.se/~erisu/](http://www.imt.liu.se/~erisu/) Tel: +46-13-286733 --- ## Post #14 by @Seref Hi Erik, I'll repeat a point I've tried to make before, since it is relevant in the context of binary serialization. I've used protocol buffers serialization of AOM in Bosphorus (I'll put the source code under Opereffa's svn soon, it appears I don't even have time to clean it up) These are very fast, but much more simplistic formalisms to represent data. You can use them to improve the performance of many things, but you'll be writing a lot of code, and you'll have to find non standard ways of dealing with the simplicity of the formalism. Here is the simplest example from Bosphorus: Eiffel is an object oriented language, Java is also an object oriented language. openEHR specs use interitance, which is reflected into type hierarchies of both Eiffel and Java classes. You have the protocol buffers language which does not support inheritance. How do you represent instances of abstract types in protocol buffers? How do you read/write them from/to Eiffel/Java? I've done these in my own way, but it will be a problem every time someone uses formalisms which are not designed for oo languages and frameworks. In a way, it is a conceptual distance from OO. Every alternative mentioned here is at a particular position to a particular level of OO support (take it as a point in a multidimensional space). Every alternative has values higher than the rest in a particular dimension, but none of them is absolutely closer to the OO support point (represented by Java/Eiffel/C#/Python etc) In my opinion, without this evaluation of OO support, which is what we use in the actual languages of system development, other discussions are not really relevant. What if protocol buffers are fast? What if YAML, ADL, or JSON are easier to read, space efficient? Maybe I'm being too rigid about this particular issue, but the programming language, its tools and frameworks built on it is what determines industry adoption more than everything else today. I don't think this is being considered in these discussions, but that is just me. Kind regards Seref --- ## Post #15 by @system Hi Seref! > I'll repeat a point I've tried to make before, since it is relevant in the context of binary serialization. > I've used protocol buffers serialization of AOM in Bosphorus Why do you use binary serialization for AOM? (Just curious, I thought text formats would cater for most AOM use cases.) I have not looked deeply into protobuf so I'll take your word on the lack of OO support. Looking at [http://wiki.apache.org/thrift/ThriftTypes](http://wiki.apache.org/thrift/ThriftTypes) their "Structs" also seem to lack inheritance. So I'll try to keep quiet about cross-platform binary formats at least until I have tried applying any of them to openEHR for real. > ... you'll have to find non standard ways of dealing with the simplicity of the formalism. For JSON I would agree that the formalism is sometimes too simple and one may need to make an openEHR specification for how to convey object type when needed, perhaps inspired by something like - [http://flexjson.sourceforge.net/](http://flexjson.sourceforge.net/) that adds a "class" attribute or - by exploring if introspection of the target object type like [http://code.google.com/p/google-gson/](http://code.google.com/p/google-gson/) does is enough for openEHR data. > Here is the simplest example from Bosphorus: Eiffel is an object oriented language, Java is also an object oriented language. openEHR specs use interitance, which is reflected into type hierarchies of both Eiffel and Java classes. You have the protocol buffers language which does not support inheritance. How do you represent instances of abstract types in protocol buffers? Sorry if I'm dense, but when do you need to instantiate abstract types in RM data? > In a way, it is a conceptual distance from OO. Every alternative mentioned here is at a particular position to a particular level of OO support (take it as a point in a multidimensional space). Every alternative has values higher than the rest in a particular dimension, but none of them is absolutely closer to the OO support point (represented by Java/Eiffel/C#/Python etc) In my opinion, without this evaluation of OO support, which is what we use in the actual languages of system development, other discussions are not really relevant. What if protocol buffers are fast? What if YAML, ADL, or JSON are easier to read, space efficient? Do you bundle YAML and XML into that opinion (lacking of OO-support the same way as protobuf)? Do you think that dADL can carry everything needed for openEHR (both AM and RM)? If so why wouldn't YAML? What in basic dADL semantics is missing in YAML? YAML (using a !-prefixed syntax) and partly XML (using e.g. xsi:Type) have ways of conveying object type in the case it cannot be inferred from data. > Maybe I'm being too rigid about this particular issue, but the programming language, its tools and frameworks built on it is what determines industry adoption more than everything else today. I don't think this is being considered in these discussions, but that is just me. I guess language-specific binary formats (like serialized java objects) may be better for binary representation then. Thanks for the word of warning regarding protobuf. Do you think that all openEHR instance serializations really need to be "object oriented" themselves or is it enough that the classes of the receiving application are object oriented and that the deserialization code (or the transfer format) is clever enough to put the data into the right objects? There are some cases where different openEHR datatypes may have the same attribute signature and for those cases even transport formats aiming reduce verbosity will need to explicitly declare class type since they cannot be safely inferred. Best regards, Erik Sundvall [erik.sundvall@liu.se](mailto:erik.sundvall@liu.se) [http://www.imt.liu.se/~erisu/](http://www.imt.liu.se/~erisu/) Tel: +46-13-286733 --- ## Post #16 by @Seref A bunch of responses, most of which should actually go to a wiki page for Bosphorus I've used binary serialization for AOM because although Eiffel is a very impressive language, I am not happy about its libraries. Some of them are mature, but for XML, I could not find anything that'd be guaranteed to be maintained. Protocol buffers is a technology that is used very heavily in Google, and has a large community. Performance is the key aspect of protocol buffers. It is very, very fast. When I'm exchanging simple messages over ZeroMQ (a very fast queue framework that is used in Bosphorus) I can achieve microsecond level performance (not even millisecond!) for Java to Eiffel communication. For desktop tooling purposes, this is much faster than XML. You need to instantiate concrete instances of abstract types every time you use single or multiple attributes in AOM. Both classes descend from CAttribute. So AOM specification gives you a field with type CAttribute (abstract), and instances of this type always have either a single or multiple attribute object assigned to this field. The Eiffel parser creates an AOM Object when it parses an archetype, On the other side of the bridge, a Java object awaits to be filled with the data in the Eiffel object. Both Java and Eiffel know the relationship between these types but protocol buffers does not have inheritance. So when you're defining a protocol buffer message with its language, you have a problem: What should be the type of the field that represents CAttribute? I've had to come up with a method of handling this case. Someone may use another method and that is my point: when we have to do these things, they become source of bugs and obstacles to implementation. So we may benefit from format and readability of JSON, but the type of issues I've been describing would introduce a lot more problems than bandwidth efficiency or human friendliness. Hence, my priorities are slightly different when it comes to what makes a formalism convenient in openEHR implementation. With this view: I find XML seriously crippled for OO support, but at least there is some inheritance support and there is huge tooling and framework support. My job would be to find ways of walking around issues using these frameworks. I'd prefer this to having less tooling and less OO support (for JSON) I can't speak for YAML, but in terms of maturity and support for mechanisms such as schemas, I'd be surprised if it ends up better than XML. For XML, I have JAXB, support in JAVA, Python, .NET, you name it... dADL has the advantage of being designed in a strong openEHR context. I guess both YAML (based on the feature you've mentioned) and XML can match dADL to the extend that any required workarounds could be justified based on industry adoption. I do not know YAML good enough to compare it in detail, but I'd love to hear from someone the type of things I've been sharing here, only with YAML this time instead of JSON and XML. Given this, if you or someone else thinks that YAML can be an alternative to dADL, there is nothing stopping anyone than implementing it and using it. Absolutely nothing. This is what I do. If I think that and XML form of ADL would help, then I take what is out there (Tom's Eiffel code), use it, and move on. I have a feeling that all these discussions about if this or that could replace dADL are too hypothetical. Most of the time they are academic discussions. There is nothing wrong with academic discussions, I am doing a PhD here, but if the openEHR community is spending its time and resources for academic discussions which do not necessarily connect to real life implementations in the near term, then I think we have a problem. Thomas is heroically responding to all queries without judgement, and he is even implementing a lot of code, to give grounded answers, to provide proofs. I guess I am not as mature and as dedicated as he is. I'd rather have him working on adl 1.5 XSD schemas than proving people that openEHR can do JSON if necessary. Because having XSDs for ADL 1.5 is going to increase adoption of openEHR a lot more than having JSON output. If anybody out there does not agree, please come forward and talk about your JSON usage in your project which is about an actual information system that is running, or is supposed to run in a clinical setting. Please do not get me wrong, all the discussion we are having here is useful, it is just that in my humble opinion, some discussions are more useful than others if this standard into which I am heavily investing is to go forward. Best regards Seref --- ## Post #17 by @Stef_Verlinden1 +1 Cheers, Stef --- ## Post #18 by @Koray_Atalag Yeah I was also wondering what is the driver/motivation/aspiration behind using JSON, YAML etc. instead of good old ADL? Is this to do with making openEHR easier to digest for the ‘traditional’ IT community because perhaps they don’t want to let go everything at once and leverage some existing skills like these? I also think that we as a community should look at getting more organised and get our efforts in tune as I know that quite interesting and though times are about to come… --- ## Post #19 by @system Oh sigh... Trying to be open minded, thinking a few steps ahead, sharing thoughts and regularly reevaluating design decisions does not seem to be appreciated by all on this list. Perhaps we need to mark some discussions or sections with... [Warning: may contain new thoughts] ...so that those of us that enjoy such discussions may continue to have them and those that get distracted by them or can't stand them could filter out those parts. > Yeah I was also wondering what is the driver/motivation/aspiration behind using JSON, YAML etc. instead of good old ADL? Good old which ADL? Please go back in the thread and note the difference between dADL and cADL in the reasoning, dADL is a reinvention of the wheel (object tree serialization) cADL is an optimized DSL that I have not seen any obvious widespread alternative to if brevity and readability is sought for. Regarding the motivation you ask for, I would recommend going back in the thread again to the first message... [http://www.openehr.org/mailarchives/openehr-technical/msg06186.html](http://www.openehr.org/mailarchives/openehr-technical/msg06186.html) ...under the boldface heading "**Motivation:**", that you may have missed, and read the three bullet points. You may not agree but that and the rest of this current message might reduce your wondering about the discussion origins. > I also think that we as a community should look at getting more organised and get our efforts in tune Yes, a bit of diversity is good in order to best explore design space, but duplicating work is a waste of time. If we are allowed to discuss future-directed thoughts on this list (without people getting too upset) that may also help us tune our efforts. If we must implement first and then discuss it will be a lot harder to avoid duplication of work. > as I know that quite interesting and though times are about to come… Are you referring to the CIMI-discusions or is it a general observation about how the future usually is :-) Regarding CIMI I think it is valuable to try to look upon openEHR with the eyes of newcomers. If there is unnecessary legacy in models or formats that we don't easily see because we have gotten used to it, then now is a good time to try reducing it while the amount of patient data using openEHR is limited. It will be harder to change things later. Getting the template formalism integrated with the AOM 1.5 was great in this sense, and so is Tom's experimentation with RM 2.0 constructs that may reduce the ITEM_STRUCTURE hierarchy. > **From:** ... **On Behalf Of** Stef Verlinden > +1 +/- infinity Yay, I love flame wars :-) > Given this, if you or someone else thinks that YAML can be an alternative to dADL, there is nothing stopping anyone than implementing it and using it. Absolutely nothing. Do you assume that if somebody is talking about a subject, then they can't possibly be in the middle of implementing it and wanting to share thoughts at an early stage? Please try to be a bit more open minded, I did not ask **you** to be the first to implement YAML support. You are not the the only one implementing openEHR stuff, but I will admit that you deserve credit for, and are great at "release early, release often" and I am not (yet). > Thomas is heroically responding to all queries without judgement... I think that is an unfair description of Tom's judgment. > I have a feeling that all these discussions about if this or that could replace dADL are too hypothetical. Most of the time they are academic discussions. There is nothing wrong with academic discussions, I am doing a PhD here, but if the openEHR community is spending its time and resources for academic discussions which do not necessarily connect to real life implementations in the near term, then I think we have a problem. So if something is not on your personal implementation agenda in near time, then it is "academic" and a waste of resources since it can not possibly be on the implementation agenda of somebody else... :-) The reason I started looking into both JSON and YAML is that they are part of our current implementation (partly using JSON, Javascript etc) (primarily for RM objects) in this process I happened to see that YAML might do the job of dADL and that we then we could reuse parser/serializer work of others (for many programming languages) instead of maintaining dADL frameworks. I wanted to share this thought at an early stage and I do appreciate that some have at least responded with positive interest and curiosity. Sometimes time can be saved by discussion before implementation, especially carefully considering what should or should not be implemented. People at UCL or Ocean Informatics can probably regularly speak in person to core openEHR decision makers and designers, the rest of as have the mailing lists as major channels, please try to respect that too. > Please do not get me wrong, all the discussion we are having here is useful, it is just that in my humble opinion, some discussions are more useful than others if this standard into which I am heavily investing is to go forward. You are not the only one having invested a lot of years and work in openEHR. I would ask you and others to please allow those that want to discuss things before and during implementation to do so if they wish to. Regarding YAML the p.s. on the start message of this thread said: > P.s. Tom Beale and I sort of started a brief off-list discussion about YAML, here is now an attempt to get input from more people. I think it is better for the openEHR community to have things that are of potential interest to others, even things that are not yet tested, as on-list discussions rather then off-list discussions, but I am not longer sure everyone agrees and this is a bit worrying to me. I do still think there is enough people appreciating early open discussions and will try to continue along that path but try to remember tagging such sections with [Warning: may contain new thoughts] :-) Best regards, Erik Sundvall [erik.sundvall@liu.se](mailto:erik.sundvall@liu.se) [http://www.imt.liu.se/~erisu/](http://www.imt.liu.se/~erisu/) Tel: +46-13-286733 P.s. [Warning: may contain new thoughts] I suspect a current off-list discussion of scalable distributed alternatives to the CKM based on GIT might be unwelcome on the list too and it might be better to keep off-list for a long time until it has been at least partially tested some time in the distant future, since there are other things (like releasing other software) that need to be prioritized first before we have time to test anything. --- ## Post #20 by @yampeku I have no problems on having different representations\. In fact, having different representations means more happy people, not less \(for example, people has been using RDF to describe archetypes for some time\)\. Anyway I love this kind of threads, as are great to see new perspectives and technologies\. P\.s\. I like your idea of a GIT based distributed concept repository\. If you want to start an off\-list discussion please count us in, as we are also working on a reference model independent concept repository :D --- ## Post #21 by @Seref Erik, Add my sigh next to yours... Lots of misunderstandings, will try to respond to most obvious ones. I have clearly expressed that all discussions here are useful. I've made no connection to my agenda. My academic work does not even require the things I've mentioned as high priority for openEHR. I've been enjoying the discussions, and will continue to do so. Your comments about dADL below, as well as your original motivations is hinting at what I'm opposing to. Your own words: "*Having an archetype specific object-serialization language like dADL might make "archetyping" look more mysterious and suspect and might hide the fact that the semantics expressed in the AOM is the interesting thing that can be serialised in many different ways.*" This is a negative statement about ADL, right? Nothing wrong with negative statements with ADL, I have a bunch of them in my pocket. But if this is your motivation to discuss YAML, and if the thread you've started is about "replacing adl", you're talking about replacing something that has taken lots of time and effort to create. This is where we have our difference, I agree with many of the criticisms of ADL, and it is exactly at this point I try to be open minded. I can see that there are also significant advantages of ADL, and rather than suggesting that is replaced, I first hypothesize and then go ahead and prove that it can co exist with xml, json, yaml etc. My work is out there showing that adl can co exist alongside with these formalisms. From my point of view, this is quite an open minded approach, at least more open minded than replacing it, without considering what it would actually mean in other contexts. This is not the first time I'm having these types of discussions, and won't be the last. I make my point whenever I see a discussion that seems to suggest switching horses midstream. I'm sorry if I'm being a buzz killer, but I'm in favor of discussing things in a larger context, including consequences for the openEHR standard and its adoption. Reminding these consequences does not mean I'm ruling out other options. I have been discussing them in light of all the proof I have (through my work) and I've asked others to do so. I can not know about your work in advance, can I ? Let us try to eliminate the misunderstanding at this point: If this discussion concludes with the common view that yaml can be an alternative to dADL, do you think openEHR specification should replace ADL? If the answer to the previous question is yes, then do you realize that this would mean replacing all the software that uses ADL, both open source and proprietary ? If the answer to the previous question is yes, then do you have a suggestion for funding these changes? I think this is the best I can do to explain what I'm trying to include in the discussions. Best regards Seref --- ## Post #22 by @Peter_Gummer1 > Your comments about dADL below, as well as your original motivations is hinting at what I'm opposing to\. Your own words: > "Having an archetype specific object\-serialization language like dADL might make "archetyping" look more mysterious and suspect and might hide the fact that the semantics expressed in the AOM is the interesting thing that can be serialised in many different ways\." > > This is a negative statement about ADL, right? I don't think so, Seref\. It's a negative about dADL \.\.\. not ADL per se\. Going back to Erik's original post \.\.\. http://www.openehr.org/mailarchives/openehr-technical/msg06187.html \.\.\. it's pretty clear that he is \_not\_ suggesting that YAML should replace ADL: "\.\.\. Also note that the current suggestion only aims at looking for replacement of dADL not cADL\. Also note that the AOM and XML serialisations of the AOM are not affected by this suggestion\." Now I think Erik made a typo in that last sentence\. I don't know what an "AOM serialisation of the AOM" would be\. I assume that Erik meant to say that "ADL and XML serialisations of the AOM are not affected by this suggestion\." Seref also wrote: > Let us try to eliminate the misunderstanding at this point: > > If this discussion concludes with the common view that yaml can be an alternative to dADL, do you think openEHR specification should replace ADL? > If the answer to the previous question is yes, then do you realize that this would mean replacing all the software that uses ADL, both open source and proprietary ? In response to the first question, I would say no\. If YAML replaced dADL as a serialisation format, it wouldn't imply replacement of ADL too\. And so, in response to your second question, I'd argue that it wouldn't imply replacing any software at all that currently uses ADL\. The only software that would have to be replaced is anything currently doing serialisation with dADL \.\.\. which would be nothing yet, as far as I'm aware\. \- Peter --- ## Post #23 by @Seref Thanks Peter, In that case the suggestion I'm objecting to does not exist. Though I have to confess I don't seem to clearly understand the suggestion here, better re-read the thread with more coffee at hand. Best regards Seref --- ## Post #24 by @thomas.beale > Hi! > > > > I think previously I had indicated I had no problem with the stringified interval approach in XML, but I am reverting my thinking on this and feel that it would be counter intuitive for those who what to use the XML schemas for code generation purposes. I think in this case the computable requirement outweighs the human readable requirement. > > You are probably right regarding XML, and maybe this is valid also for most JSON use-cases where the desire for an as simple as possible object-serialization-mapping outweighs human readability. > > I think the openEHR community is best served by having different archetype serialization format categories with different priorities for different purposes. E.g.: > > 1a. An XML format optimized for mapping to XML-schema generated code. > 1b. A JSON format optimized for mapping to AOM object models handcrafted or generated from AOM-specifications. > > 2. A cADL-variant wrapped in YAML optimized for human readability. It could be used for archetype files stored in version control systems (making version diffs readable) and as textual format when you need textual examples in documentation, teaching etc. I had never thought of that but the AWB has a multi-part serialiser component, so it would be possible. When I get a bit of time ;-) > In 1a & 1b easy implementation should be prioritized over readability but in #2 human readability should be prioritized. Erik, You didn't answer the question a while ago - who are the 'readers'? I am just asking to know if you are talking about some particular kind of educational usage, and what your criteria are for 'readability'. > Prioritizing both in the same format would likely fail. Things like default ordering of nodes and attributes could be recommended but optional for #1 but should be mandatory for #2 (otherwise readability suffers and diffs get messed up). good point, you reminded me I have to fix the order in the AWB serialisations. > > I think we can come up with a much more concise representation of these intervals without compromising the computable requirement, something similar to XML schema maxOccurs/minOccurs. > > Probably, but for #1 maybe being close to the AOM should be prioritized over being concise. After all, archetypes will not be sent over the wire at the same scale as patient data (RM instances). how can a string like "1" or "2..*" be more concise? I think this is the most concise possible format (or some slight variation, e.g. the dADL interval syntax). > By the way, is the AOM open for changes (like renaming attributes) if that would increase clarity? well the AOM 1.5 is a draft, so in principle yes. But we need to assess the impact. Breaking archetype authoring tools probably does not matter so much - there are not many, so we can deal with that. Impacts on EHR system software will have to be more closely evaluated before we agreed to any such changes. But let us know your fantasies anyway ;-) > - thomas --- ## Post #25 by @thomas.beale > A bunch of responses, most of which should actually go to a wiki page > for Bosphorus > > I've used binary serialization for AOM because although Eiffel is a > very impressive language, I am not happy about its libraries\. Some of > them are mature, but for XML, I could not find anything that'd be > guaranteed to be maintained\. I don't think there is any problem with them being maintained, they are part of the main Eiffel tool\. The choice of Protocl buffers \(or maybe there is another better variant?\) makes sense on the basis of performance\.\.\.\. > Protocol buffers is a technology that is used very heavily in Google, > and has a large community\. > Performance is the key aspect of protocol buffers\. It is very, very > fast\. When I'm exchanging simple messages over ZeroMQ \(a very fast > queue framework that is used in Bosphorus\) I can achieve microsecond > level performance \(not even millisecond\!\) for Java to Eiffel > communication\. For desktop tooling purposes, this is much faster than > XML\. orders of magnitude\.\.\. > Thomas is heroically responding to all queries without judgement, and > he is even implementing a lot of code, to give grounded answers, to > provide proofs\. don't give me too much credit: my lightweight serialisation library allowed me to implement JSON output in about 4 hours, plus two days background debugging of the \{\[\]\} \.\.\.\. > I guess I am not as mature and as dedicated as he is\. I'd rather have > him working on adl 1\.5 XSD schemas than proving people that openEHR > can do JSON if necessary\. Because having XSDs for ADL 1\.5 is going to > increase adoption of openEHR a lot more than having JSON output\. If > anybody out there does not agree, please come forward and talk about > your JSON usage in your project which is about an actual information > system that is running, or is supposed to run in a clinical setting\. yes, I think it is about time we posted a proposed AOM 1\.5 XSD\.\.\. \- thomas --- ## Post #26 by @thomas.beale > Good old which ADL? Please go back in the thread and note the difference between dADL and cADL in the reasoning, dADL is a reinvention of the wheel (object tree serialization) Erik, out of academic interest: was either YAML or JSON around in 2000, when I made a first version of dADL (I'm in a plane typing this, can't check)? If they were, I look silly ;-) If not... In any case, JSON is seriously semantically deficient for proper serialisation purposes and is in need of at least 2 basic enhancements to work correctly on any realistic data. I agree it is fairly readable, although why attribute names are in quotes is completely beyond me...I have not yet looked at YAML properly, but it looks like it probably does the job properly. > Yes, a bit of diversity is good in order to best explore design space, but duplicating work is a waste of time. > If we are allowed to discuss future-directed thoughts on this list (without people getting too upset) that may also help us tune our efforts. If we must implement first and then discuss it will be a lot harder to avoid duplication of work. I don't actually think there is any harm in messing around with variations on serialisation - it's not hard to implement (XML being the hardest), but at some point I think a wiki page with a summary of real world requirements behind each variant would be useful. > Are you referring to the CIMI-discusions or is it a general observation about how the future usually is :-) > > Regarding CIMI I think it is valuable to try to look upon openEHR with the eyes of newcomers. If there is unnecessary legacy in models or formats that we don't easily see because we have gotten used to it, then now is a good time to try reducing it while the amount of patient data using openEHR is limited. It will be harder to change things later. Getting the template formalism integrated with the AOM 1.5 was great in this sense, and so is Tom's experimentation with RM 2.0 constructs that may reduce the ITEM_STRUCTURE hierarchy. I have to do a bit more work to get the first proposal defined properly - there is a half done wiki page on that. Should have it fixed in a couple of days, then we can discuss. (I'm not online but if others find the page, feel free to put your own RM 2.0 variations on there somewhere). > +/- infinity > > Yay, I love flame wars :-) you can't win like that. Godel or someone showed that there are different sizes of infinity :) > The reason I started looking into both JSON and YAML is that they are part of our current implementation (partly using JSON, Javascript etc) (primarily for RM objects) in this process I happened to see that YAML might do the job of dADL and that we then we could reuse parser/serializer work of others (for many programming languages) instead of maintaining dADL frameworks. I wanted to share this thought at an early stage and I do appreciate that some have at least responded with positive interest and curiosity. at some point I intend to finalise the ultimate dADL grammar and publish dADL as a standalone with at least C#, Java, Eiffel and possibly C/C++ fast & full parsers + serialisers. This is less work than you might think, and it would make dADL just as available as YAML. Well, ok it won't be in Erlang or Haskell for a while, but I doubt if that will make much difference. - thomas --- ## Post #27 by @thomas.beale all, one of the good decisions I think we made early on in openEHR's history was to have few mailing lists rather than many. One of the consequences is that discussions about new / fun ideas are on the same list and sometimes same thread as discussions about real world implementation priorities. Please continue to enjoy :) - thomas --- ## Post #28 by @Peter_Gummer1 According to http://en.wikipedia.org/wiki/JSON \.\.\. "JSON was used at State Software, a company co\-founded by Crockford, starting around 2001\. The JSON\.org website was launched in 2002\." And http://en.wikipedia.org/wiki/YAML \.\.\. "YAML was first proposed by Clark Evans in 2001 \.\.\." Clearly you were not alone, ten years ago, in thinking that there had to be a better way than XML\! \- Peter --- ## Post #29 by @Koray_Atalag Oh, just my personal thoughts without any sanity check – should have read the whole thread first! My reaction was just to what was written in the subject line of the thread and after reading Seref’s comments about the need to focus on outstanding/high priority issues. Sorry if I have offended – I can’t possibly be against free discussions here – even the most blue sky ones which I seldom broadcast myself ;) Cheers, -koray --- ## Post #30 by @yampeku After reading Pablo's post on domain types I am curious about how should they be represented on each one of the different formats\. I feel they should be 'expanded' before trying to represent them in any other format, but I might be wrong\. Any ideas or opinions? --- ## Post #31 by @thomas.beale I have to say, the more I look at YAML, the more I wonder what the designers were thinking\. For example, in this section of the spec, multi\-line quoted strings are only allowed if the 'key' is also quoted \(the strange looking JSON approach\); if the key is not quoted \(i\.e\. 'simple'\) then the value can't be quoted either\. That's just nonsense\! I am glad I am only implementing a serialiser, not a parser\.\.\. \- thomas --- ## Post #32 by @thomas.beale > I have to say, the more I look at YAML, the more I wonder what the > designers were thinking\. For example, in this section of the spec, http://yaml.org/spec/current.html#id2532720 --- ## Post #33 by @system Hi\! >> I have to say, the more I look at YAML, the more I wonder what the >> designers were thinking\. For example, in this section of the spec, > > http://yaml.org/spec/current.html#id2532720 >> multi\-line quoted strings are only allowed if the 'key' is also quoted >> \(the strange looking JSON approach\); >> if the key is not quoted \(i\.e\. >> 'simple'\) then the value can't be quoted either\. That's just nonsense\! Are you sure that is what it says? "Double quoted scalars are restricted to a single line when contained inside a simple key\." Is it not rather that you may not use a multiline double quoted string as a KEY \(at all\)\. It does NOT forbid you to use multiline double quoted strings in the value, no matter if or how you quote your keys\. I have certainly seen double quoted values for unquoted keys coming from serializers claiming to be specification conformant\. Are any of your keys so long and complicated that they would need multiline quoted strings? >> I am glad I am only implementing a serialiser, not a parser\.\.\. In many less exotic languages they are already implemented :\-\) Then you configure them and then throw your object trees at them\. An example of very unfinished work in progress, using poorly readable ordering and based on the openEHR java\-ref\-impl \(and probably exposing too many fields\) is attached below\. Best regards, Erik Sundvall erik\.sundvall@liu\.se http://www.imt.liu.se/~erisu/ Tel: \+46\-13\-286733 \!<http://www.openehr.org/releases/1.0.2/class/openehr.am.archetype.ARCHETYPE> adl\_version: '1\.4' archetype\_id: openEHR\-DEMOGRAPHIC\-PERSON\.person\.v1 concept: at0000 original\_language: ISO\_639\-1::pt\-br translations:   en:     language: ISO\_639\-1::en     author: \{email: sergio@lampada\.uerj\.br, organisation: Universidade do Estado do Rio de Janeiro \- UERJ, name: Sergio Miranda Freire\} description:   original\_author: \{email: sergio@lampada\.uerj\.br, organisation: Universidade do Estado do Rio de Janeiro \- UERJ, name: Sergio Miranda Freire & Rigoleta Dutra Mediano Dias,     date: 22/05/2009\}   other\_contributors: \['Sebastian Garde, Ocean Informatics, Germany \(Editor\)', 'Omer Hotomaroglu, Turkey \(Editor\)', 'Heather       Leslie, Ocean Informatics, Australia \(Editor\)'\]   lifecycle\_state: Authordraft   details:   \- language: ISO\_639\-1::en     purpose: Representation of a person's demographic data\.     keywords: \[demographic service, person's data\]     use: Used in demographic service to collect a person's data\.     copyright: © openEHR Foundation     original\_resource\_uri: \{\}   \- language: ISO\_639\-1::pt\-br     purpose: Representação dos dados demográficos de uma pessoa\.     keywords: \[serviço demográfico, dados de uma pessoa\]     use: Usado em serviço demográficos para coletar os dados de uma pessoa\.     copyright: © openEHR Foundation     original\_resource\_uri: \{\}   other\_details: \{references: 'ISO/TS 22220:2008\(E\) \- Identification of Subject of Care \- Technical Specification \- International       Organization for Standardization\.'\} definition:   attributes:   \- rm\_attribute\_name: details     children:     \- includes:       \- expression:           left\_operand: \{item: archetype\_id/value, reference\_type: CONSTANT, type: STRING\}           right\_operand:             item: \{pattern: '\(person\_details\)\[a\-zA\-Z0\-9\_\-\]\*\\\.v1'\}             reference\_type: CONSTANT             type: String           operator: OP\_MATCHES           precedence\_overridden: false           type: BOOLEAN       rm\_type\_name: ITEM\_TREE       occurrences: \[1, 1\]       node\_i\_d: at0001       any\_allowed: false       path: /details\[at0001\]     any\_allowed: false     path: /details   \- rm\_attribute\_name: identities     children:     \- includes:       \- expression:           left\_operand: \{item: archetype\_id/value, reference\_type: CONSTANT, type: STRING\}           right\_operand:             item: \{pattern: '\(person\_name\)\[a\-zA\-Z0\-9\_\-\]\*\\\.v1'\}             reference\_type: CONSTANT             type: String           operator: OP\_MATCHES           precedence\_overridden: false           type: BOOLEAN       rm\_type\_name: PARTY\_IDENTITY       occurrences: \[1, 1\]       node\_i\_d: at0002       any\_allowed: false       path: /identities\[at0002\]     any\_allowed: false     path: /identities   \- rm\_attribute\_name: contacts     children:     \- attributes:       \- rm\_attribute\_name: addresses         children:         \- includes:           \- expression:               left\_operand: \{item: archetype\_id/value, reference\_type: CONSTANT, type: STRING\}               right\_operand:                 item: \{pattern: '\(address\)\(\[a\-zA\-Z0\-9\_\-\]\+\)\*\\\.v1'\}                 reference\_type: CONSTANT                 type: String               operator: OP\_MATCHES               precedence\_overridden: false               type: BOOLEAN           \- expression:               left\_operand: \{item: archetype\_id/value, reference\_type: CONSTANT, type: STRING\}               right\_operand:                 item: \{pattern: '\(electronic\_communication\)\[a\-zA\-Z0\-9\_\-\]\*\\\.v1'\}                 reference\_type: CONSTANT                 type: String               operator: OP\_MATCHES               precedence\_overridden: false               type: BOOLEAN           rm\_type\_name: ADDRESS           occurrences: \[1, 1\]           node\_i\_d: at0030           any\_allowed: false           path: /contacts\[at0003\]/addresses\[at0030\]         any\_allowed: false         path: /contacts\[at0003\]/addresses       rm\_type\_name: CONTACT       occurrences: \[1, 1\]       node\_i\_d: at0003       any\_allowed: false       path: /contacts\[at0003\]     any\_allowed: false     path: /contacts   \- rm\_attribute\_name: relationships     children:     \- attributes:       \- rm\_attribute\_name: details         children:         \- attributes:           \- rm\_attribute\_name: items             children:             \- attributes:               \- rm\_attribute\_name: value                 children:                 \- attributes: \[\]                   rm\_type\_name: DV\_TEXT                   occurrences: \[1, 1\]                   any\_allowed: true                   path: /relationships\[at0004\]/details/items\[at0040\]/value                 \- attributes:                   \- rm\_attribute\_name: defining\_code                     children:                     \- reference: ac0000                       rm\_type\_name: CodePhrase                       occurrences: \[1, 1\]                       any\_allowed: false                       path: /relationships\[at0004\]/details/items\[at0040\]/value/defining\_code                     any\_allowed: false                     path: /relationships\[at0004\]/details/items\[at0040\]/value/defining\_code                   rm\_type\_name: DV\_CODED\_TEXT                   occurrences: \[1, 1\]                   any\_allowed: false                   path: /relationships\[at0004\]/details/items\[at0040\]/value                 any\_allowed: false                 path: /relationships\[at0004\]/details/items\[at0040\]/value               rm\_type\_name: ELEMENT               occurrences: \[1, 1\]               node\_i\_d: at0040               any\_allowed: false               path: /relationships\[at0004\]/details/items\[at0040\]             any\_allowed: false             path: /relationships\[at0004\]/details/items           rm\_type\_name: ITEM\_TREE           occurrences: \[1, 1\]           any\_allowed: false           path: /relationships\[at0004\]/details         any\_allowed: false         path: /relationships\[at0004\]/details       rm\_type\_name: PARTY\_RELATIONSHIP       occurrences: \[1, 1\]       node\_i\_d: at0004       any\_allowed: false       path: /relationships\[at0004\]     any\_allowed: false     path: /relationships   rm\_type\_name: PERSON   occurrences: \[1, 1\]   node\_i\_d: at0000   any\_allowed: false   path: / ontology:   term\_definitions\_list:   \- language: pt\-br     definitions:     \- code: at0000       items: \{text: Dados da pessoa, description: Dados da pessoa\.\}     \- code: at0001       items: \{text: Detalhes, description: Detalhes demográficos da pessoa\.\}     \- code: at0002       items: \{text: Nome, description: Conjunto de dados que especificam o nome da pessoa\.\}     \- code: at0003       items: \{text: Contatos, description: Contatos da pessoa\.\}     \- code: at0004       items: \{text: Relacionamentos, description: 'Relacionamentos de uma pessoa, especialmente laços familiares\.'\}     \- code: at0030       items: \{text: Endereço, description: 'Endereços vinculados a um único contato, ou seja, com o mesmo período de validade\.'\}     \- code: at0040       items: \{text: Grau de parentesco, description: Define o grau de parentesco entre as pessoas envolvidas\.\}   \- language: en     definitions:     \- code: at0000       items: \{text: Person, description: Personal demographic data\.\}     \- code: at0001       items: \{text: Demographic details, description: A person's demographic details\.\}     \- code: at0002       items: \{text: Name, description: A person's name\.\}     \- code: at0003       items: \{text: Contacts, description: A person's contacts\.\}     \- code: at0004       items: \{text: Relationships, description: 'A person''s relationships, especially family ties\.'\}     \- code: at0030       items: \{text: Addresses, description: 'Addresses linked to a single contact, i\.e\. with the same time validity\.'\}     \- code: at0040       items: \{text: Relationship type, description: Defines the type of relationship between related persons\.\}   constraint\_definitions\_list:   \- language: pt\-br     definitions:     \- code: ac0000       items: \{text: Códigos para tipo de parentesco, description: códigos válidos para tipo de parentesco\.\}   \- language: en     definitions:     \- code: ac0000       items: \{text: Codes for type of relationship, description: Valid codes for type of relationship\.\}   term\_binding\_list: \[\]   constraint\_binding\_list: \[\] is\_controlled: false --- ## Post #34 by @thomas.beale well I read this to say: - if you double quote a long String containing line breaks (if you don't yet get into different trouble) THEN - this scalar cannot be the value of a 'simple key'; - a 'simple key' is defined as: - A [*simple key*](http://yaml.org/spec/current.html#index-entry-simple%20key) has no identifying mark. It is recognized as being a key either due to being inside a flow mapping, or by being followed by an explicit value. Hence, to avoid unbound lookahead in YAML [processors](http://yaml.org/spec/current.html#processor/), simple keys are restricted to a single line and must not span more than 1024 [stream](http://yaml.org/spec/current.html#stream/syntax) characters (hence the need for the [*flow-key context*](http://yaml.org/spec/current.html#index-entry-flow-key%20context)). Note the 1024 character limit is in terms of Unicode characters rather than stream octets, and that it includes the [separation](http://yaml.org/spec/current.html#separation%20space/) following the key itself. maybe I misunderstood that a 'simple key' can't have quotes, but in any case, the concept of a 'simple key', if the object of YAML is object data serialisation is ... pretty strange (if they are hash keys, then they are normal strings, there should be no problem. Not distingishing between hash keys and attribute names seems to be a problem in YAML as for JSON. Very odd design IMO). Why the syntactic structure of a 'value' should have any dependence on the syntactic structure of a 'key' is beyond me. Anyway, for the moment I will stick with the format (for Strings): unquoted_key: "double quoted string" this format passes the online parser tests, and handles multi-line strings better. Otherwise you have to use '|', '>' and or '\' markers all over the place. - thomas --- **Canonical:** https://discourse.openehr.org/t/could-yaml-replace-dadl-as-human-readable-aom-serialization-format/15105 **Original content:** https://discourse.openehr.org/t/could-yaml-replace-dadl-as-human-readable-aom-serialization-format/15105