JSON Schema and OpenAPI: current state, and how to progress

We recently had a discussion on the SEC call about JSON Schema. I was asked to write down the current state and to get the discussion going on how to progress. So, here it is:

Several options exist to define a JSON format. Two of the most often used are:

  • JSON Schema
  • OpenAPI

Tools for working with Json Schema are widespread. It is very suitable for validation purposes, including built-in support in text-editors. However, with OpenEHRs extensive use of polymorphism, it is not suited for code generation.

OpenAPI is an extension of JSON Schema. The latest version is fully compatible with JSON Schema, but it still contains extensions. To make it compatible, they defined the extensions as a Json Schema dialect, with added vocabularies, which is possible in json schema. It requires OpenAPI tooling to process, json schema based tooling works, but it will not be complete.
One of the extensions defined in OpenAPI is a feature to specify a discriminator column, which means it is possible to use it to generate code for models using polymorphism. Its use cases include API specification, validation, code generation and API specification. If we include it in the REST Api specification, it is possible to generate code for an OpenEHR REST API client in many languages, including all archetype and RM models. Also it is possible to generate documentation for these APIs in many formats, and to plug this definition into tooling to manually try an API.

Current state of JSON Schema

We currently have two JSON Schema files:

specifications-ITS-JSON

The schema in specifications-ITS-JSON is generated by Sebastian Iancu from the UML models. The code to generate this is closed source, and not available to others. It has the following

benefits:

  • very extensive, including all documentation about models, including which functions are defined
  • it has a well organised structure, with separate files in packages

And the following drawbacks:

  • it cannot be used for validation, because it does not implement the OpenEHR polymorphism including the _type column to discriminate types. If it encounters a CLUSTER in a place where the model says ITEM, the items attribute of the cluster plus any of its content will not be checked by the validator at all.
  • no root nodes are specified
  • it cannot be hand edited because it is so big, and all information is duplicated in several files, and the code to generate it is not released
  • slower to parse because it has many extra fields.

The Archie generated JSON Schema

The Archie version is autogenerated from BMM. The code to do so is available as part of Archie, at archie/JSONSchemaCreator.java at master · openEHR/archie · GitHub. It has the following

benefits:

  • can be used to validate, including all polymorphism except some generics
  • tuned for speed of validation and quality of validation messages
  • extensively tested against the Archie json mapping for the RM, even with additionalProperties: false set everywhere to test for completeness.
  • the code to generate it is open source
  • contains no extra information, so fast to parse.

drawbacks:

  • not currently separated in packages or files, just one file, but the information to do so is present in the BMM models.
  • nearly no documentation included, because the BMM files contain very little documentation.

Proposal for JSON Schema

We probably need a standard JSON Schema. It can be (a next iteration of) the Archie JSON Schema, or a next iteration of the current specifications-ITS-JSON. To switch to the Archie JSON schema, we have to decide if the current form is good enough, or if it needs a couple more improvements, for exampled in the form of documentation or splitting in a different package structure. It will also need to be tested against other implementations - the current one works with Archie and EHRBase, but it has not been tested against other implementations yet.
To keep the current schema, we will need to adjust it so it contains the constraints for polymorphism, and it will need to be extensively tested. We also need to decide on whether it is acceptable if this json schema can only be generated with unreleased tooling, or if we want an open variant to generate this.

Opinions?

OpenAPI

There is only one current OpenEHR OpenAPI model that I know of, both for archetypes and the RM model. It is generated from BMM. For the AOM, a BMM is first generated from the Archie implementation, because no model is available.The code is open source at First version of a working open API model generator by pieterbos · Pull Request #180 · openEHR/archie · GitHub and the output including a demo of how to use it for code generation is available at GitHub - nedap/openehr-openapi: An example project to show how OpenAPI can work with OpenEHR
The current model works to validate and to generate code and human readable documentation. However, it has the following problems:

  • code to generate the files is still in a branch in Archie, needs to be updated, reviewed and merged
  • one file, not structured in packages
  • nearly no documentation from the models, will have to be added to the BMM files
  • it contains only the models, no REST API definition yet.

I think it would be good to do further work on it, by creating an OpenAPI definition of the OpenEHR Rest API, referencing the autogenerated models. This would mean clients could be autogenerated, rather than having to rely on someone hand-coding a library such as Archie. The generated models can be referenced in such an API easily, and it is possible to mandate fields as mandatory for specific APIs from within the API definition, not changing the schema at all. It would also be easier to test whether a given implementation conforms to the OpenEHR models.

Again, opinions?

5 Likes

Excellent summary. I’ll just make the initial observation that I think we could benefit if we (most likely me) implement the UML → BMM automatic extractor (i.e. the same thing as already generates the specifications class tables, but targetted to BMM files instead). This would have the effect of:

  • extracting all the documentation elements as well as structure
  • performing the corrections to the UML to generate correct BMM (the UML doesn’t represent container types properly, among other things; I have special code to fix this).
  • always up to date with the original models (as long as we have them in UML at any rate :wink:
  • open source.

I may try an initial version of this quite soon, and I think it would be a couple of weeks’ work., so it wouldn’t be instant, but not too far off either.

2 Likes

Adding the documentation to the BMM would solve the documentation right away, or with about 10 minutes of work on the generator, both for the JSON Schema and the OpenAPI models for the RM - sounds good!

Right now for the AOM we use a BMM generated from the Archie AOM implementation with reflection. Does the UML → BMM automatic extractor work on the AOM by any chance?

1 Like

It will work on everything in UML, which includes AOM2 and AOM1.4, even BMM itself. That’s how the AOM and BMM specifications are generated (the class tables). So it will literally suck out everything. Of course, I have to write that outputter :wink: But the extractor side code (UML 2.x openAPI calls - ugly) won’t change, or not much. So it’s mainly a question of instantiating BMM objects and writing them out to P_BMM2 format (the current format); later I will write a BMM3 outputter… which will look like Xcore format, and includes the new EL.

1 Like

Many thanks for writing this @pieterbos . The mail notification somehow escaped me. I’ll keep this on the screen and make time to read it.

I just recently concluded the project I was working on, so I’m only commenting on this now. Once again thanks @pieterbos , much appreciated.

The way I see it, OpenAPI is significantly better than JsonSchema for our technical and political goals. Where JSON matters most from a standard perspective is the system periphery, where it is the serialisation format.

However, JSON in the context of OpenAPI has a much larger ecosystem around it compared to JSON Schema. Code generation, API specification etc helps us a lot more than just being able to validate payload content is valid. I think the nice thing about OpenAPI approach is that we can do it incrementally. Even if we only have an OpenAPI model (which you already support as a downstream artefact from BMM), that gives us code generation and validation for data, which we can extend to service definitions later (which’ll be based on data definitions anyway).

This actually takes us beyond FHIR, despite FHIR always being very bullish on the system periphery by design, because last time I checked, FHIR did not have an OpenAPI spec (not sure where it is now).

With JSON Schema, whatever we produce will leave developers in the cold in terms of figuring out how to use it from their applications, whereas with OpenAPI, we have a well supported stack/eco-system to point at, even if we don’t have it initially (i.e. working on service definitions later).

From a specification p.o.v. even if we had the current OpenAPI output of the code in the Archie branch, that’s a useable artefact, giving us what we’d get from Json Schema pretty much right now, with more to add if we want to.

Re adding documentation capabilities to BMM: I’d see that as a layer above BMM, no different than using templates for UI generation. A meta mechanism in BMM similar to annotations, which’d let the user link to some documentation may be a better separation of concerns, but I’m not really taking part on BMM development so I’ll stop at this point.

I was not aware of your work on OpenAPI, great job as usual, I’ll go and take a look now.

Just an aside - the ‘documentation’ @pieterbos is talking about here is just the primary model documentation i.e. class descriptions, feature descriptions etc - all that stuff you see in the Class Definitions tables in the openEHR spec. Because we don’t yet extract BMM from the UML, which is where those documenation fragments currently sit, the BMM files are missing them.

My goal is to get a UML->BMM extractor working, so that those documentation fragments will be exported automatically, along with all changes made to the UML. It’s not 100% ideal having the UML as the primary expression in the toolchain, but I guess its utility is still sufficient to justify it (i.e. we stop using UML, we have no diagrams ;), and we can compensate for its annoyances with extractor hacks, which I already do.

Being able to add annotations to BMM in another layer could indeed be useful…

Back to the main conversation on OpenAPI, also very educational for me.

Ah, sorry, I misunderstood the documentation bit then, though my suggestion still stands , but back to the main convo as you say.

The way I see it, openAPI will slowly, but almost surely will kill JSON Schema. As its adoption at the REST endpoints increase, it’ll be the primary source of formalism (it hurts me to use this word for json, but anyway…) for the json content pushed to backends, so I cannot see how any greenfield work can kick start with JSON Schema. Anyway, the point is for once we may have a late comer advantage here by not having invested too much into JSON schema and jumping to OpenAPI

The nice thing is - we get to have both, without much work!

Note that the current apib-files for the REST API also can represent models, even if it is not currently used in those files. For those a generator that does openAPI → .apib is available. Or openAPI could replace it, eventually.

Now we just have to update the Archie OpenAPI-generator and make sure it works in the latest version again, I created that pull request ages ago as more of an experiment…

1 Like

Thanks, Pieter, I missed the point re apib files. Good news then!

Re the latest version: do you think it’d be better to target OpenAPI 3.0 now rather than 3.1? Last I checked almost the whole tool stack, including UI tools etc were still supporting 3.0 max, so the exact latest version may not be the most convenient for our purposes :slight_smile:

Well, to my knowledge JSON Schema is not replacing OpenAPI, neither the other way around.
OpenAPI is an alternative to API Blueprint. I guess both (OpenAPI as especially API Blueprint) can use JSON Schema for models.

The plan we had a few years ago in SEC was to document the API using API blueprint, with references to JSON Schema for the resources definition, and optionally export it as OpenAPI specs so that we can generate code, etc.

There are also tools to convert .apib files to OpenAPI equivalent, although I never tried them.

Nowadays API Blueprint is not that popular anymore, OpenAPI probably won the battle on documenting APIs, but I think there are still benefits to our keep current REST specs as .apib files., while keeping an eye on the developments on these fields and adapt later if necessary.

I see @pieterbos responded also, supporting my thoughts about conversion from .apib files.

I’m not sure, but I think the differences should be small between those two versions.

Yes, except that the OpenAPI dialect of JSON Schema has some extensions that are absolutely necessary if you want to use any kind of automated tooling to map to classes with polymorphism. So, for the model part of OpenAPI, you would need a different file than plain JSON Schema to express the OpenEHR model. Which is why I generated both a json schema file for validation, and an OpenAPI file with just the models for validation, api specification and code generation. The differences are that they both have a very different way to express polymorphism - one only describes validation rules for json, the other a way to map to the OO-concepts.

I know we had this discussion before, but can’t remember the answer - does the orthodox variety of JSON schema not handle polymorphic typing?

Yes and no. You can do oneOf or anyOf, then reference several types. However, we need a discriminator column to determine which type is used, so we need to do a bit more.
For that, you can do, pseudocode:

if ( _type == "DV_TEXT") {
 apply this part of the schema
}

Or you can do (slightly less pseudocode):

"oneOf": [
  "allOf": [
     { "ref", "reference to the type DV_TEXT"},
    {
      "type": "object",
      "properties":
        "_type": {
          "const": "DV_TEXT"
     }
}, ... add more subtypes here
]]

The first works well with validators in the sense that the output is understandable and it is fast. The second one with most validators produces tons of possible output on a validation error, basically a message per possible variant, and the validators I have tried were rather slow with it. It should be possible to write tools that recognise these patterns and validate well and generate code, but I have not found any tools available that can generate any type of code from these constructions.

OpenApi defines a discriminator column, as in Inheritance and Polymorphism . That solves the problem entirely. Note that the paragraphs ‘model composition’ and ‘polymorphism’ are still standard JSON Schema. Tools to generate code are widely available, but I am not sure if they all support the discriminator column mechanism.

3 Likes