Multi RM JSON schema validation and current schema issues

I’ve been playing around with the JSON Schemas and found some issues in the published ones.

First, the entry points to validate the root nodes are missing some types (ITEM_LIST, ITEM_SINGLE, ITEM_TABLE, ELEMENT, HISTORY, etc.)

Then some types for OBJECT_REF could be constrained. For instance EHR.ehr_access and EHR.ehr_status have PARTY_REF available as a possible type and this won’t happen, and I believe for EHR.ehr_status the ACCESS_GROUP_REF won’t happen either. There are some other examples of this in the schemas for OBJECT_REF, but the schema allows all types on each OBJECT_REF item. Same for EHR.directorya, EHR.compositions and EHR.contributions.

In some schemas EHR.contributions is required while in the RM it’s not.

DV_QUANTITY had a property attribute that is not currently in the specs. I checked the DV spec history and it seems that attribute was there but removed for v0.9 (like 15 years ago!).

In LOCATABLE there is an assertion Archetyped_valid: is_archetype_root xor archetype_details = Void, and I think types COMPOSITION, EHR_STATUS, FOLDER, and subtypes of PARTY (ROLE, PERSON, etc.) will be archetype roots, so why not add archetype_details as required for those types?

The question above has a second consideration: the rm_version field is in archetype_details and without that it’s not possible to know which schema version to use, if a system supports more than one RM version. So IMO archetype_details should be mandatory so any system can lookup for it before validating, then choose the right JSON schema version to validate such instance.

Though making archetype_details mandatory in the schema isn’t really solving the problem because a system should access the archetype_details BEFORE the validation is executed, but it helps to clarify that point to implementers. So the existence of archetype_details should be done and validated by code before the JSON Schema validation is executed. This of course considering a system can handle multiple RM version. But I guess most systems just assume JSON objects will comply with one specific version supported by the system and they will have rules to check for that. Though I’m more concerned about Conformance Verification and for that I need to use different RM versions.

DV_EHR_URI and DV_URI are missing the value attribute as required.

Then I have modified the 1.0.3 schema to be compliant with 1.0.2, and used that 1.0.2 schema to create a schema that allows to validated the openEHR REST API JSON payloads, which is different from the schema to validate RM JSON instances (e.g. EHR.ehr_status is EHR_STATUS instead of OBJECT_REF).

I will check 1.0.4 and 1.1.0 schemas on the next days.


In JSON Schema 1.0.4 there is a ARCHETYPE_HRID class, it seems that is a class from AOM2 not from the RM, should that be in the RM JSON Schema?

Found ISO8601_TYPE on the schemas, that class is abstract. Other abstract RM classes are not in the schemas, should that one be there?

Schemas 1.0.4 have also URI which is abstract, same question as above.

EXTRACT_ENTITY_MANIFEST.other_ids in schema 1.0.3 is “object” while in 1.0.4 it’s array of string, which is the most accurate for the RM type other_ids: List.

RESOURCE_DESCRIPTION_ITEM.other_details is optional in the RM but required in the schemas.

RESOURCE_DESCRIPTION_ITEM.language is CODE_PHRASE in the RM (1.0.3, 1.0.4) but in the 1.0.4 schema has type TERMINOLOGY_CODE

FOLDER.details is in the 1.0.4 Schema but was added in the RM 1.1.0, so should be removed from the 1.0.4 Schema.

GENERIC_CONTENT_ITEM.other_details is a Hash<String, String> in the RM, in the 1.0.3 schema it is “object” but in the schema 1.0.4 it’s “array of string”, which I think it doesn’t represent a map.

TRANSLATION_DETAILS.language is CODE_PHRASE in RM 1.0.3 and 1.0.4 but in schema 1.0.4 it’s type is TERMINOLOGY_CODE.

ACTIVITY.action_archetype_id is mandatory in the RM but in schemas 1.0.3 and 1.0.4 is not required.

Current schemas are defined here openEHR - JSON Schemas (ITS-JSON) Component - latest

Fixed schemas can be found here openEHR-OPT/src/main/resources/json_schema at master · ppazos/openEHR-OPT · GitHub

Open questions from my review:

  1. VERSIONED_OBJECT.owner_id is “Reference to object to which this version container belongs, e.g. the id of the containing EHR or other relevant owning entity.” (a little vague definition, in EHRServer it is actually the EHR id).

In the schemas that can be OBJECT_REF or any of it’s subclasses: LOCATABLE_REF, PARTY_REF or ACCESS_GROUP_REF. If owner_id IS actually the EHR, then just the OBJECT_REF type should be required in the schema because the EHR is not LOCATABLE, so the reference is not LOCATABLE_REF, it is certainly not a PARTY so it wouldn’t be PARTY_REF and it’s not an ACCESS GROUP, so no ACCESS_GROUP_REF is needed there. But the definition leaves the door open to other possible owners, so I don’t know if the other possible types can be safely removed from the schemas.

  1. What would be the role of the EXTRACT types in the API JSON schemas?


Do we have an API for that? If not, I would remove those types from the API schemas.

  1. For the ENTRIES the workflow_id and protocol_id are OBJECT_REF in the RM. I guess no protocol or workflow will be represented as a PARTY class or as an ACCESS_GROUP class, so I would like to remove the possible types in the schemas for all the entries for those attributes, that is: PARTY_REF and ACCESS_GROUP_REF.

  2. If and are just object in the schema, when we have a VERSION of something, the schema won’t validate the data. IMO it should match one of the versionable types, even though the RM leaves that open to any type T, we know it will be COMPOSITION, FOLDER, EHR_STATUS or the concrete subclasses of PARTY (maybe EHR_ACCESS too).

Here you can find all the fixed JSON schemas, for all the RM versions, and also the API “flavor” of the JSON schemas. openEHR-OPT/src/main/resources/json_schema at master · ppazos/openEHR-OPT · GitHub