a first attempt JSON archetype

The JSON archetype below was generated by traversing the P_XX class structure I have developed that simplifies the persisted form of an archetype. It mainly turns a lot of fields into Strings, including our old friends occurrences and cardinality. Things to note:

  • ordering of attributes is somewhat random
  • JSON is pretty mindless in some ways - Hash keys are serialised in the same way as attribute names, which must make input parsing pretty annoying. At least I think I have this right.

The Eiffel P_ classes can be found here; as yet I have not documented them, but the idea would be for the community to agree on the attribute structure of each class. I can change things quickly in the AWB for people to look at different options, and I have some rule-driven capability.

My first question is: can anyone validate this in some basic way? Then - can we work out what flavour of JSON we want for ADL 1.5 archetypes?

Feedback welcome.

  • thomas

{
"original_language": "ISO_639-1::pt-br",
"translations": [
"en": {
"language": "ISO_639-1::en",
"author": [
"name": "Sergio Miranda Freire",
"organisation": "Universidade do Estado do Rio de Janeiro - UERJ",
"email": ["sergio@lampada.uerj.br"](mailto:sergio@lampada.uerj.br)
]
},
],
"description": {
"original_author": [
"name": "Sergio Miranda Freire & Rigoleta Dutra Mediano Dias",
"organisation": "Universidade do Estado do Rio de Janeiro - UERJ",
"email": ["sergio@lampada.uerj.br"](mailto:sergio@lampada.uerj.br),
"date": "22/05/2009",
],
"details": [
"en": {
"language": "ISO_639-1::en",
"purpose": "Representation of a person's demographic data.",
"use": "Used in demographic service to collect a person's data.",
"keywords": ["demographic service", "person's data"],
"misuse": "",
"copyright": "© 2011 openEHR Foundation"
},
"pt-br": {
"language": "ISO_639-1::pt-br",
"purpose": "Representação dos dados demográficos de uma pessoa.",
"use": "Usado em serviço demográficos para coletar os dados de uma pessoa.",
"keywords": ["serviço demográfico", "dados de uma pessoa"],
"misuse": "",
"copyright": "© 2011 openEHR Foundation"
},
],
"lifecycle_state": "Authordraft",
"other_contributors": ["Sebastian Garde, Ocean Informatics, Germany (Editor)", "Omer Hotomaroglu, Turkey (Editor)", "Heather Leslie, Ocean Informatics, Australia (Editor)"],
"other_details": [
"references": "ISO/TS 22220:2008(E) - Identification of Subject of Care - Technical Specification - International Organization for Standardization."
]
},
"artefact_object_type": "DIFFERENTIAL_ARCHETYPE",
"archetype_id": "openEHR-DEMOGRAPHIC-PERSON.person.v1",
"adl_version": "1.5",
"artefact_type": "archetype",
"definition": {
"rm_type_name": "PERSON",
"node_id": "at0000",
"attributes": [
{
"rm_attribute_name": "details",
"children": [
{
"rm_type_name": "ITEM_TREE",
"node_id": "at0001",
"occurrences": "1",
"includes": [
{
"expression": {
"type": "Boolean",
"operator": {
"value": 2007
},
"left_operand": {
"type": "String",
"reference_type": "attibute",
"item": "archetype_id/value"
},
"right_operand": {
"type": "C_STRING",
"reference_type": "constraint",
"item": {
"regexp": "(person_details)[a-zA-Z0-9_-]*\\.v1",
"is_open": False,
"regexp_default_delimiter": True
}
},
"precedence_overridden": False
}
},
],
"is_closed": False
},
],
"is_multiple": False
},
{
"rm_attribute_name": "identities",
"children": [
{
"rm_type_name": "PARTY_IDENTITY",
"node_id": "at0002",
"occurrences": "1",
"includes": [
{
"expression": {
"type": "Boolean",
"operator": {
"value": 2007
},
"left_operand": {
"type": "String",
"reference_type": "attibute",
"item": "archetype_id/value"
},
"right_operand": {
"type": "C_STRING",
"reference_type": "constraint",
"item": {
"regexp": "(person_name)[a-zA-Z0-9_-]*\\.v1",
"is_open": False,
"regexp_default_delimiter": True
}
},
"precedence_overridden": False
}
},
],
"is_closed": False
},
],
"is_multiple": True
},
{
"rm_attribute_name": "contacts",
"children": [
{
"rm_type_name": "CONTACT",
"node_id": "at0003",
"occurrences": "1",
"attributes": [
{
"rm_attribute_name": "addresses",
"children": [
{
"rm_type_name": "ADDRESS",
"node_id": "at0030",
"occurrences": "1",
"includes": [
{
"expression": {
"type": "Boolean",
"operator": {
"value": 2007
},
"left_operand": {
"type": "String",
"reference_type": "attibute",
"item": "archetype_id/value"
},
"right_operand": {
"type": "C_STRING",
"reference_type": "constraint",
"item": {
"regexp": "(address)([a-zA-Z0-9_-]+)*\\.v1",
"is_open": False,
"regexp_default_delimiter": True
}
},
"precedence_overridden": False
}
},
{
"expression": {
"type": "Boolean",
"operator": {
"value": 2007
},
"left_operand": {
"type": "String",
"reference_type": "attibute",
"item": "archetype_id/value"
},
"right_operand": {
"type": "C_STRING",
"reference_type": "constraint",
"item": {
"regexp": "(electronic_communication)[a-zA-Z0-9_-]*\\.v1",
"is_open": False,
"regexp_default_delimiter": True
}
},
"precedence_overridden": False
}
},
],
"is_closed": False
},
],
"is_multiple": True
}
]
},
],
"is_multiple": True
},
{
"rm_attribute_name": "relationships",
"children": [
{
"rm_type_name": "PARTY_RELATIONSHIP",
"node_id": "at0004",
"attributes": [
{
"rm_attribute_name": "details",
"children": [
{
"rm_type_name": "ITEM_TREE",
"attributes": [
{
"rm_attribute_name": "items",
"children": [
{
"rm_type_name": "ELEMENT",
"node_id": "at0040",
"attributes": [
{
"rm_attribute_name": "value",
"children": [
{
"rm_type_name": "DV_TEXT"
},
{
"rm_type_name": "DV_CODED_TEXT",
"attributes": [
{
"rm_attribute_name": "defining_code",
"children": [
{
"rm_type_name": "CODE_PHRASE",
"target": "ac0000"
},
],
"is_multiple": False
}
]
},
],
"is_multiple": False
}
]
},
],
"is_multiple": True
}
]
},
],
"is_multiple": False
}
]
},
],
"is_multiple": True
}
]
},
"ontology": {
"term_definitions": [
"pt-br": {
"at0000": {
"text": "Dados da pessoa",
"description": "Dados da pessoa."
},
"at0001": {
"text": "Detalhes",
"description": "Detalhes demográficos da pessoa."
},
"at0002": {
"text": "Nome",
"description": "Conjunto de dados que especificam o nome da pessoa."
},
"at0003": {
"text": "Contatos",
"description": "Contatos da pessoa."
},
"at0004": {
"text": "Relacionamentos",
"description": "Relacionamentos de uma pessoa, especialmente laços familiares."
},
"at0030": {
"text": "Endereço",
"description": "Endereços vinculados a um único contato, ou seja, com o mesmo período de validade."
},
"at0040": {
"text": "Grau de parentesco",
"description": "Define o grau de parentesco entre as pessoas envolvidas."
}
},
"en": {
"at0000": {
"text": "Person",
"description": "Personal demographic data."
},
"at0001": {
"text": "Demographic details",
"description": "A person's demographic details."
},
"at0002": {
"text": "Name",
"description": "A person's name."
},
"at0003": {
"text": "Contacts",
"description": "A person's contacts."
},
"at0004": {
"text": "Relationships",
"description": "A person's relationships, especially family ties."
},
"at0030": {
"text": "Addresses",
"description": "Addresses linked to a single contact, i.e. with the same time validity."
},
"at0040": {
"text": "Relationship type",
"description": "Defines the type of relationship between related persons."
}
},
],
"constraint_definitions": [
"pt-br": {
"ac0000": {
"text": "Códigos para tipo de parentesco",
"description": "códigos válidos para tipo de parentesco."
}
},
"en": {
"ac0000": {
"text": "Codes for type of relationship",
"description": "Valid codes for type of relationship."
}
}
]
},
"is_controlled": False,
"is_generated": True,
"is_valid": True
}

small problem with commas fixed.

Hi Thomas!

A question, is this for compiled archetypes or just ordinary archetypes?

I can understand the reason why we need an efficient serialization
format for compiled archetypes (as discussed on the technical list) so
they can be loaded into system memory quickly knowing that they are
already validated.

For that my guess human-readability is not that important as to more
ordinary archetypes, am I right here? Because the chance for human
user to actually take a look of the complied archetype is very slim.
It's similar to the relationship between Java source code which is
very readable and the compile java bytecode which is hardly readable
but very efficient.

Following that reasoning I would say we probably would need a very
efficient but less human-readable format for serialization of compiled
archetypes (including templates of course).

There are several alternatives in java for serialization objects.
There is the native one from JDK and the Xtream library
(http://xstream.codehaus.org/) which is also very good. Question is
do we need a openEHR standard for serialized archetypes to start with?
Or we leave this to the implementer on different platforms? The
advantage of having a openEHR standard is that we could then have the
possibility to share compiled archetypes and templates in 'binary'
format, similar as released Java components in bytecode compressed in
a jar, across platforms.

/Rong

Actually I did it speculatively, assuming someone working with Java might find it useful. But I discovered JSON is an extremely weak ‘formalism’, and unlikely to be useful in realistic computing without a lot of extra tooling. Here’s why:

  • it doesn’t directly support representation of Hash tables or any keyed data structure; the keys have to be output as attribute names, which makes it hard for parsers to know what they are parsing. Every data structure is just an ‘Array’

  • it has no support for dynamically bound types, i.e. the data has a C_COMPLEX_OBJECT for an attribute statically declared as a C_OBJECT. This really kills any possibility of reparsing a JSON document unless it is like one of those simple examples you see at JSON.org.

  • its leaf types are limited

I can’t see how JSON could work properly without some sort of schema, a fact that Seref made obvious to me in the last couple of days, so this exercise is more for demonstration than anything else.

All of the serialisation requirements of true and realistic object data are met by dADL. I know it is not any kind of standard, but - to become a ‘standard’ all it needs is more implementations. I think it could be a useful de facto openEHR standard for compiled objects that doesn’t involved compression. Then at least we know we have control over one serialisation format. For a binary / compressed format, we need to look elsewhere of course, and I leave it to others to advise on that, as you have below.

  • thomas
(attachments)

OceanInformaticsl.JPG

Hi Rong,
I am not sure if Tom’s effort is towards finding an official serialization method for archetypes. Please correct me if I’m wrong here Tom.
It is more about exploring the feasibility of marshalling/unmarshalling with various formalisms. At least for me it is. Serialization is actually a dirty business, at least as tricky as GUI generation, i.e. quite varying across platforms and technologies. The variation exists around the formalism chosen for serialization/marshalling. Tool support, and handling of corner cases is tricky.

Here is an example we’ve pin pointed yesterday with Tom: you have an object model (AOM), and you’re serializing it to JSON. Go see any of the JSON outputs in the Bosphorus page. There are many fields where you have a concrete type instance that represents value of a field with an abstract type, such as CSingleAttribute and CMultipleAttribute, which are descendants of CAttribute. The field in the AOM has type CATtribute, meaning that both of these types can be assigned. Any medical informatics standard that claims to have a relation to object oriented paradigm would have features like these. So it is not only openEHR.

When the JSON with CSingleAttribute arrives, how would your deserializer know that it should be assigned to a field of CAttribute? If it does not, it would try to create an instance of CAttribute (which is an abstract type!) for the JSON data, and it would fail. Guess what? This is what happened with JSON output. Here comes the critical point:

if the formalism (JSON/YAML/XSD) includes a mechanism to solve this, than tooling is likely to cover it, and it works. If not, everyone using the mechanism would have to find a way of doing it, hence ad hoc solutions, hence, expensive, error prone, and not properly working system to system communication.

That is why it is quite tricky to choose a formalism for this requirement, and the type of problems I’ve just mentioned should be considered for every option, including XSTream, protocol buffers, etc etc.. We may not be able to find an optimum solution, but if we are to choose one, we should go with the best overall solution. What that solution is, is another topic :slight_smile:

Kind regards
Seref

one more:

  • in JSON, Boolean values true and false are case-sensitive - True and False don’t work. That’s really a bad mistake…

  • thomas

Hi Thomas,

The JSON format looks great for me. I see two main uses to this format.

  1. Is a natural way to interchange information between a server and a web browser, and do some client-side archetype procesing, like GUI generation (that’s my prefered topic :D) or input data validation on the client side.

  2. We can load the JSON archetypes on a document based DB, like CouchDB or MongoDB, to enable querying of archetype nodes, this could be useful to validate input data with its correspondent archetype node.

Just my two cents.

(attachments)

OceanInformaticsl.JPG

Warning: the posted JSON is actually a bit broken. We fixed it yesterday, with Seref’s help. I won’t post it again, because it is voluminous, but there will be a save as JSON option on the next beta of the ADL workbench. This appears to generate the correct JSON for archetypes I looked at, but people here will obviously find out whether it is really 100% correct or not.

In theory I agree with the 2 uses above; but now having been forced to use JSON, I think it will be harder than you think. Seref will probably provide some details on this.

The next ADL Workbench due out in less than a week. I am thinking of making a command-line version as well - anyone interested? It would enable you to compile an archetype repository (multiple repositories in fact) and then do things like:

  • provide the error results for any given archetype

  • ask for a specific archetype or template in a specific form as output

The compiler would probably run as a local service. Seref’s Bosphorus is just ‘doing this properly’.

  • thomas

not at this stage. My views are:

  • there may not be a need for one ‘holy’ object serialisation method, and over time, many will get built
  • but it is probably a good idea to have at least one, and I suggest this would be dADL, as we know it is completely regular and lossless. I know people will say: but you should use a ‘standard’. All the standards are broken in some way… I think we should have at least one ‘clean’ fallback serialisation.

Maybe over time, there will end up being an ‘official’ way of doing JSON, XML (XSD), YAML, and so on as well.

  • t

Well, there is almost always a need to handle various problems with the lack of support for proper OO in these formalisms. Like the way lack of generics in XSD has been handled in the published schemas.

It would be good to talk to users of these particular formalisms to arrive at suggested profiles (a nice name for agreed hacks…).
If we can’t find any users to talk to, then I think that is also a good sign that would save people from spending time for something that nobody is using :slight_smile:

Regards
Seref