JSON for definitions-notation

I always admired OpenEhr for its ability to notate archetype-definitions and now also BMM definitions in any type.

I saw experiments in XML, but the official endorsed notation language is ADL.

I wonder, would it also be possible to write archetypes and reference-models in JSON?

If so, it would save us tons of code, no grammars needed, no parsers needed. Many programming languages support JSON out of the box, with only some annotations needed. NoSQL Databases often support JSON, and have their own JSON-path based hierarchical query-languages.

Venkat Subramaniam, who is a java-guru, said: "Don't walk away from complexity, RUN!!!"

But Einstein said: "Everything Should Be Made as Simple as Possible, But Not Simpler"

So the question is: Are there any technical objections to express archetypes and reference-models in JSON?

Best regards

Bert Verhees

JSON, YAML and ODIN are all just object-dump serial formats that result from traversing an in-memory object graph, so it is a generic operation to generate them from tools (XML is more problematic due to being irregular in many ways and being schema-dependent).

In the case of archetypes, the dump is just of objects that are instances of the AOM, i.e., ARCHETYPE, C_ATTRIBUTE, various kinds of C_OBJECT and so on.

The ADL Workbench has an export mode (for I think around 5 yeras) that generates the first 3 for any archetype, and also a whole archetype library. The folks doing CIMI use at least the JSON mode. It also generates XML, via custom serialiser.

One of the jobs I never completed is a deserialiser for the 3 regular formats, but it is nearly trivial. I am not sure if Archie or Marand’s ADL-designer tools do the same but I think it should be trivial for them to implement as well.

I will look into this again…

  • thomas

The object dump is a common use-case for JSON.

There a few things that are needed more then the object dump.

What we would still need is standardised naming-notation of classes and properties, so there cannot be a conflict on that. I think the current format used in OpenEhr is very good, although not the same convention as in Java, but that is a minor issue.

JSON can, besides using for object-dumps offer also space for meta-information, like type-information. The class name of an object for example, is such information which programmers often use. In Java it is by the instanceof-operator.
Other ecosystems have similar operators.

So to use JSON for archetype notation some things must be agreed on. I would be happy with such agreements.

The advantages of JSON are huge. Except from the easy standard implementation there are many libraries and very efficient. Also inserting JSON in JSON makes flattening and template creation a matter of a few lines of code.

And validating routines can partly be easily generated in memory and applied on JSON datasets.
The only part that need grammar and parsing are the leaf and cardinality constraints. These are easy to define and parse and can be extracted from the current grammars.

Bert

---- Thomas Beale wrote ----

JSON, YAML and ODIN are all just object-dump serial formats that result from traversing an in-memory object graph, so it is a generic operation to generate them from tools (XML is more problematic due to being irregular in many ways and being schema-dependent).

In the case of archetypes, the dump is just of objects that are instances of the AOM, i.e., ARCHETYPE, C_ATTRIBUTE, various kinds of C_OBJECT and so on.

The ADL Workbench has an export mode (for I think around 5 yeras) that generates the first 3 for any archetype, and also a whole archetype library. The folks doing CIMI use at least the JSON mode. It also generates XML, via custom serialiser.

One of the jobs I never completed is a deserialiser for the 3 regular formats, but it is nearly trivial. I am not sure if Archie or Marand’s ADL-designer tools do the same but I think it should be trivial for them to implement as well.

I will look into this again…

  • thomas

Archie offers a json serializer and deserializer. For Odin they are present as well, but has not been tested with archetypes, may need a small bit of work. Yaml should be a matter of adding a dependency and configuring it.
We’re still working on XML - the bindings are there and it works, but the AOM schemas have not been finished yet so there will be changes, see the specifications-ITS-XML repository on GitHub for details.

One could argue ADL is easier to read and write by humans than json, yaml, Odin or XML. The other formats have a lot more tools available. Good thing we have both.

Regards,

Pieter Bos

JSON, YAML and ODIN are all just object-dump serial formats that result from traversing an in-memory object graph, so it is a generic operation to generate them from tools (XML is more problematic due to being irregular in many ways and being schema-dependent).

In the case of archetypes, the dump is just of objects that are instances of the AOM, i.e., ARCHETYPE, C_ATTRIBUTE, various kinds of C_OBJECT and so on.

The ADL Workbench has an export mode (for I think around 5 yeras) that generates the first 3 for any archetype, and also a whole archetype library. The folks doing CIMI use at least the JSON mode. It also generates XML, via custom serialiser.

One of the jobs I never completed is a deserialiser for the 3 regular formats, but it is nearly trivial. I am not sure if Archie or Marand’s ADL-designer tools do the same but I think it should be trivial for them to implement as well.

I will look into this again…

  • thomas

Cc:

Not many people find archetypes readable. I can read them and I have done that many times, but most modelers I know are lost in a second when they see ADL text, I have seen that many times too. But it is more readable than Xml or JSON, I agree.

For readability I would like to promote a documentation protocol, modeling tooling should take care for that. That should not be the purpose of an archetype.

The archetypes in code are for machine processing, that should be the main purpose, and this purpose should be exercised to a maximum of simplicity and efficiency.

I was thinking about the type information ADL has, if it has function in validating datasets.

I don’t think so, and I think it can be omitted. For validating-purpose it is not needed.
An archetype is a tree of properties to properties ending via cardinality to leaf nodes. That is the information that is needed and JSON can deliver that without diverging from the object dump idea. All that information is already in AOM. This has as result that archetypes can be read in AOM without need of parsing.

I was also thinking about the name-convention, but that is a reference model (BMM) issue. The naming convention in the reference model will be used in datasets.

BMM is very powerful, it extends the purpose of reference models and archetypes to virtually every domain, also OpenEhr :wink:.

It is in fact a wonderful invention, especially in combination with NoSql databases, but it needs a simplicity overhauling for efficiency and general connection to programming eco systems or standards can be achieved by using its conventions.

Bert

---- Pieter Bos wrote ----

Archie offers a json serializer and deserializer. For Odin they are present as well, but has not been tested with archetypes, may need a small bit of work. Yaml should be a matter of adding a dependency and configuring it.
We’re still working on XML - the bindings are there and it works, but the AOM schemas have not been finished yet so there will be changes, see the specifications-ITS-XML repository on GitHub for details.

One could argue ADL is easier to read and write by humans than json, yaml, Odin or XML. The other formats have a lot more tools available. Good thing we have both.

Regards,

Pieter Bos

JSON, YAML and ODIN are all just object-dump serial formats that result from traversing an in-memory object graph, so it is a generic operation to generate them from tools (XML is more problematic due to being irregular in many ways and being schema-dependent).

In the case of archetypes, the dump is just of objects that are instances of the AOM, i.e., ARCHETYPE, C_ATTRIBUTE, various kinds of C_OBJECT and so on.

The ADL Workbench has an export mode (for I think around 5 yeras) that generates the first 3 for any archetype, and also a whole archetype library. The folks doing CIMI use at least the JSON mode. It also generates XML, via custom serialiser.

One of the jobs I never completed is a deserialiser for the 3 regular formats, but it is nearly trivial. I am not sure if Archie or Marand’s ADL-designer tools do the same but I think it should be trivial for them to implement as well.

I will look into this again…

  • thomas

A few last words on this.

It is easy for JSON based archetype repository to cooperate with an ADL based repository. Serializing of an AOM structure to ADL is very easy to do, this counts for the DADL and CADL part. The other way around, to convert the ADL definition part to JSON is harder, that involves the parser-code and grammars which are hard to maintain.

It is fragile code.

I remember how hard it was to get stable grammars ready, about two years ago, and they are also hard to test. You can only test the grammar and the generated code after having written thousands of lines derived code and test that. This is not unit-testing. There is a lot of untested code in a parser.

I know from own experience, I already wrote a few times an ADL parser, also other parsers, in Java and golang, it takes a year or more. Pieter is working two years on it, of course not every day, but still. It is really a lot of complex code. I remember a year ago he was using a visitor pattern, now he left that. That was quite a big change. While doing it he finds out, and he does a very good job. He is a very good programmer but with a difficult task.

And even small updates in a grammar can cause great code change, many difficulties.
It does, in my opinion, not belong in a modern programming environment where simplicity, maintainability and testability are important goals.

Only changing a few simple paradigms can make thousands of lines of code difference.

The elegance of proven technology of worldwide programming effort over many years. The simplicity of JSON guarantees easy controllable and maintainable (sustainable) code.

But, as said, for purpose of readability is it easy to serialize to ADL code, if that is an argument.

Why do I write all this.

I am writing a generic multi purpose BMM/AOM kernel as a hobby project, I work an hour a day on it, sometimes less, I have a job also.

I do not want to write an ADL parser because I don’t need it. It saves months of work. The code I write, which can also serve OpenEhr or ISO13606 or CIMI in any version, or accountancy software or many other can be read by a novice programmer, so simple it is.

And I think that is how code should be. Easy to test, easy to debug, easy to read, easy to understand.

That is my story, I wanted to see how other people think about these ideas, thanks for sharing your opinions.

Best regards
Bert

---- Bert Verhees wrote ----

The folks doing CIMI use at least the JSON mode. It also generates XML, via custom serialiser.
One of the jobs I never completed is a deserialiser for the 3 regular formats, but it is nearly trivial.

Exactly my point, I completely agree with this.

Bert

I agree with you, except for how damn limiting pure json* is. Any attempt to introduce long-ints or annotation take you to vertical-specific json+.

  • json is javascript, so has type and other limitations.

Thanks very much for your reply. I did find a link on this which help to study this more.

https://stackoverflow.com/questions/13502398/json-integers-limit-on-size/39681707#39681707

Of course this is not a problem on archetypes but on datasets it can be.

Also worth considering is that there are more kinds of JSON like for example GeoJSON. And there is JSON for expressing mathematical operations. Useful for storing mathematical procedures in a database.

Best regards
Bert

---- William Archibald wrote ----

I agree with you, except for how damn limiting pure json* is. Any attempt to introduce long-ints or annotation take you to vertical-specific json+.

  • json is javascript, so has type and other limitations.

Bert,

if you serialise a AOM archetype to any object dump format, you need typing information for the simple reason that there is polymorphism in the model, mainly places where the static type is C_OBJECT, C_DEFINED_OBJECT or C_PRIMITIVE_OBJECT but the attached type in a real archetype can be a number of descendant types.

W.r.t. the naming convention of RM types, attributes etc, I assume you are referring to the fact that openEHR archetypes use the published RM type and attribute name format, which is so-called 'snake-case' rather than 'camelCase'. To make archetypes that refer to data objects usable across software written in different languages using different case conventions, it may make sense to add an option to OPT generation which indicates the flavour of RM casing. I had not thought of this before but it would be easy to implement in an OPT generator.

BMM is getting more powerful by the way. I have recently extended it so that it allows types to be annotated with a 'value-set' identifier, which can be used to limit the values of e.g. data fields of type CodePhrase or TerminologyCode to particular terminologies.

- thomas

Actually, the AOM -> JSON serialiser is generic code - in my Eiffel code base, it is a generic converter from in-memory objects (AOM instances) to something like a DOM tree (another in-memory structure consisting of just a few types of node and leaf objects) which is then trivially serialised to JSON, ODIN and YAML.

More modern libraries as found in Java and all other mainstream languages these days do the same, and additionally streaming parser tools that can potentially make the conversion quicker (in bulk).

Have a look at the Archie project <https://github.com/openEHR/archie&gt;, you'll find very vanilla Java facilities used to do most of this work.

- thomas

I understand that you need type information to make an archetype understandable for humans, but you don't need it for validating datasets, because in validating only the in the archetype defined properties/constraints are important and the JSON or XML datasets also do not have type-information. The constraint in the leaf-node indicates which type the leaf-node is of. So type-information is unnecessary load in the validation-part (definition) of an archetype.

Here you see a JSON and an XML having the same information. No explicit type information is in it.

{
     "glossary": {
         "title": "example glossary",
    "GlossDiv": {
             "title": "S",
      "GlossList": {
                 "GlossEntry": {
                     "ID": "SGML",
          "SortAs": "SGML",
          "GlossTerm": "Standard Generalized Markup Language",
          "Acronym": "SGML",
          "Abbrev": "ISO 8879:1986",
          "GlossDef": {
                         "para": "A meta-markup language, used to create markup languages such as DocBook.",
            "GlossSeeAlso": ["GML", "XML"]
                     },
          "GlossSee": "markup"
                 }
             }
         }
     }
}
<!DOCTYPE glossary PUBLIC "-//OASIS//DTD DocBook V3.1//EN">
  <glossary><title>example glossary</title>
   <GlossDiv><title>S</title>
    <GlossList>
     <GlossEntry ID="SGML" SortAs="SGML">
      <GlossTerm>Standard Generalized Markup Language</GlossTerm>
      <Acronym>SGML</Acronym>
      <Abbrev>ISO 8879:1986</Abbrev>
      <GlossDef>
       <para>A meta-markup language, used to create markup
languages such as DocBook.</para>
       <GlossSeeAlso OtherTerm="GML">
       <GlossSeeAlso OtherTerm="XML">
      </GlossDef>
      <GlossSee OtherTerm="markup">
     </GlossEntry>
    </GlossList>
   </GlossDiv>
  </glossary>

But for humans, it must be available to understand an archetype, that is the semantic part, and must be taken care of. For that problem are several solutions. There can be a documentation-file, separate, or included, another solution is that the ontology can be extended with "type", next to "text" and "description", and maybe more ontology items. I think I favor an ontology-based solution.

Important for the JSON solution to work is that the JSON must not reflect beyond the extend of the classes, there may not be something in the JSON, which is not in the classes, because in that case, you might lose the advantages which are in JSON when flattening or creating templates, which can be done just by filling up AOM-object properties with objects from other archetypes and then serialize again for validation purpose.

I have not thought all through, but I think that there will be very many advantages in handling AOM objects/JSON instead of fiddling around with paths and finding ways to find the type of a path-node.

Regarding the naming conventions is always discussion. In ISO13606 properties have two different name-conventions for properties, snake-case and camelCase, and that in one standard, even in one class. In DATA_VALUE there is "nullFlavor" and in URI which descents from DATA_VALUE, there is "fragment_id". But then again, it is up to the reference-model designer. Some property-names came from HL7 and the desire to be in line with HL7 regarding properties with the same name made them keeping the same property-name-convention.

So the tooling and definition for reference-model-design must be agnostic to this.

Regarding your third line, I don know if I understand well, but for me it is best when BMM is domain-neutral, and bringing in terminology-information could harm this idea, except for when it is made completely voluntary to use. But I don't know why it should be there, and I don't see it as powerful. Sometimes, less is more.

Bert

A few last words on this.

It is easy for JSON based archetype repository to cooperate with an ADL based repository. Serializing of an AOM structure to ADL is very easy to do, this counts for the DADL and CADL part. The other way around, to convert the ADL definition part to JSON is harder, that involves the parser-code and grammars which are hard to maintain.

Actually, the AOM -> JSON serialiser is generic code - in my Eiffel code base, it is a generic converter from in-memory objects (AOM instances) to something like a DOM tree (another in-memory structure consisting of just a few types of node and leaf objects) which is then trivially serialised to JSON, ODIN and YAML.

More modern libraries as found in Java and all other mainstream languages these days do the same, and additionally streaming parser tools that can potentially make the conversion quicker (in bulk).

Have a look at the Archie project <https://github.com/openEHR/archie&gt;, you'll find very vanilla Java facilities used to do most of this work.

Thank you for pointing this out. But I already knew this. My point is not that it is easy to dump an ADL archetype via a parser to a JSON representation. My point is that the JSON representation must be the result and working modus of archetype/data-handling, in the archetype-designer and in the repositories. So the ADL parser and all that complex code around it becomes superfluous. There should be no temptation to do any processing with ADL.

If there is desire to have ADL-files, it is easy to serialize to them, as you already indicated.

And yes, there is YAML too, and ODIN, but both have as disadvantage that there is less support in industry then there is for JSON. So libraries to do fancy things with JSON are many. I also do not see why JSON would be a bad choice, so there is no reason for looking further then that.

Bert

Even in tools that read ADL archetypes, hardly any of the processing is done in ADL, it is done on the in-memory AOM structure - that's true both in Archetype modelling tools and runtime validation tools.

The utility of ADL as a syntax, apart from being human-readable (at least for those who understand the semantic basis of archetypes) is that it never changes, whereas object syntaxes, XML etc are changing all the time. This month we are using this flavour of JSON, next month it is another flavour. Next year, everyone decides they like YAML instead. IT is mostly a fashion show of these kinds of changes.

It's the same with programming languages - their syntax changes only with the semantic changes (e.g. Java 1.5 -> 1.8), but not with compiled representation.

So I am all for using JSON or whatever in the way that you say, but we should just remember that such syntaxes are machine formats optimised for something, and they will always be replaced by another format(s) optimised for something else. Saving out to ADL can be very useful sometimes, in case you want to guarantee to preserve semantics across these changes.

The other reason ADL exists is that it is a lot easier for humans to learn the archetype formalism via the ADL specification than the AOM specification, even though the latter is the one containing most of the formal semantics.

- thomas

These are all very good reasons.

So I make an ADL serializer, so I can always represent my archetype ADL for archive purpose.
But the complexity of the parser, that I don't want anymore. Life is short. And there are better things to do.

As said, I am building an BMM/AOM environment as a hobby project, just for fun. I don't know where I will end up. But one of my important goals is to avoid complexity.
We'll see where it goes too.

I started this discussion yesterday with the question if there are technical objections to represent archetypes in JSON. I was not sure that I was overseeing it all.
Now I am, and I understand that there are no technical objections, so that makes me glad.

have a nice day
Bert