# JSON Schema and OpenAPI: current state, and how to progress **Category:** [ITS](https://discourse.openehr.org/c/its/41) **Created:** 2021-03-14 14:56 UTC **Views:** 4665 **Replies:** 40 **URL:** https://discourse.openehr.org/t/json-schema-and-openapi-current-state-and-how-to-progress/1385 --- ## Post #1 by @pieterbos We recently had a discussion on the SEC call about JSON Schema. I was asked to write down the current state and to get the discussion going on how to progress. So, here it is: Several options exist to define a JSON format. Two of the most often used are: - JSON Schema - OpenAPI Tools for working with Json Schema are widespread. It is very suitable for validation purposes, including built-in support in text-editors. However, with OpenEHRs extensive use of polymorphism, it is not suited for code generation. OpenAPI is an extension of JSON Schema. The latest version is fully compatible with JSON Schema, but it still contains extensions. To make it compatible, they defined the extensions as a Json Schema dialect, with added vocabularies, which is possible in json schema. It requires OpenAPI tooling to process, json schema based tooling works, but it will not be complete. One of the extensions defined in OpenAPI is a feature to specify a discriminator column, which means it is possible to use it to generate code for models using polymorphism. Its use cases include API specification, validation, code generation and API specification. If we include it in the REST Api specification, it is possible to generate code for an OpenEHR REST API client in many languages, including all archetype and RM models. Also it is possible to generate documentation for these APIs in many formats, and to plug this definition into tooling to manually try an API. # Current state of JSON Schema We currently have two JSON Schema files: - The official one in https://github.com/openehr/specifications-ITS-JSON - The one generated from BMM in Archie, at https://gist.github.com/pieterbos/ff8a9c67fd3d346b423a8cd69befe67a ## specifications-ITS-JSON The schema in specifications-ITS-JSON is generated by Sebastian Iancu from the UML models. The code to generate this is closed source, and not available to others. It has the following ### benefits: - very extensive, including all documentation about models, including which functions are defined - it has a well organised structure, with separate files in packages ### And the following drawbacks: - it cannot be used for validation, because it does not implement the OpenEHR polymorphism including the _type column to discriminate types. If it encounters a CLUSTER in a place where the model says ITEM, the items attribute of the cluster plus any of its content will not be checked by the validator at all. - no root nodes are specified - it cannot be hand edited because it is so big, and all information is duplicated in several files, and the code to generate it is not released - slower to parse because it has many extra fields. ## The Archie generated JSON Schema The Archie version is autogenerated from BMM. The code to do so is available as part of Archie, at https://github.com/openEHR/archie/blob/master/tools/src/main/java/com/nedap/archie/json/JSONSchemaCreator.java. It has the following ### benefits: - can be used to validate, including all polymorphism except some generics - tuned for speed of validation and quality of validation messages - extensively tested against the Archie json mapping for the RM, even with additionalProperties: false set everywhere to test for completeness. - the code to generate it is open source - contains no extra information, so fast to parse. ### drawbacks: - not currently separated in packages or files, just one file, but the information to do so is present in the BMM models. - nearly no documentation included, because the BMM files contain very little documentation. # Proposal for JSON Schema We probably need a standard JSON Schema. It can be (a next iteration of) the Archie JSON Schema, or a next iteration of the current specifications-ITS-JSON. To switch to the Archie JSON schema, we have to decide if the current form is good enough, or if it needs a couple more improvements, for exampled in the form of documentation or splitting in a different package structure. It will also need to be tested against other implementations - the current one works with Archie and EHRBase, but it has not been tested against other implementations yet. To keep the current schema, we will need to adjust it so it contains the constraints for polymorphism, and it will need to be extensively tested. We also need to decide on whether it is acceptable if this json schema can only be generated with unreleased tooling, or if we want an open variant to generate this. Opinions? # OpenAPI There is only one current OpenEHR OpenAPI model that I know of, both for archetypes and the RM model. It is generated from BMM. For the AOM, a BMM is first generated from the Archie implementation, because no model is available.The code is open source at https://github.com/openEHR/archie/pull/180/files and the output including a demo of how to use it for code generation is available at https://github.com/nedap/openehr-openapi The current model works to validate and to generate code and human readable documentation. However, it has the following problems: - code to generate the files is still in a branch in Archie, needs to be updated, reviewed and merged - one file, not structured in packages - nearly no documentation from the models, will have to be added to the BMM files - it contains only the models, no REST API definition yet. I think it would be good to do further work on it, by creating an OpenAPI definition of the OpenEHR Rest API, referencing the autogenerated models. This would mean clients could be autogenerated, rather than having to rely on someone hand-coding a library such as Archie. The generated models can be referenced in such an API easily, and it is possible to mandate fields as mandatory for specific APIs from within the API definition, not changing the schema at all. It would also be easier to test whether a given implementation conforms to the OpenEHR models. Again, opinions? --- ## Post #2 by @thomas.beale Excellent summary. I'll just make the initial observation that I think we could benefit if we (most likely me) implement the UML -> BMM automatic extractor (i.e. the same thing as already generates the specifications class tables, but targetted to BMM files instead). This would have the effect of: * extracting all the documentation elements as well as structure * performing the corrections to the UML to generate correct BMM (the UML doesn't represent container types properly, among other things; I have special code to fix this). * always up to date with the original models (as long as we have them in UML at any rate ;) * open source. I may try an initial version of this quite soon, and I think it would be a couple of weeks' work., so it wouldn't be instant, but not too far off either. --- ## Post #3 by @pieterbos Adding the documentation to the BMM would solve the documentation right away, or with about 10 minutes of work on the generator, both for the JSON Schema and the OpenAPI models for the RM - sounds good! Right now for the AOM we use a BMM generated from the Archie AOM implementation with reflection. Does the UML -> BMM automatic extractor work on the AOM by any chance? --- ## Post #4 by @thomas.beale It will work on everything in UML, which includes AOM2 and AOM1.4, even BMM itself. That's how the AOM and BMM specifications are generated (the class tables). So it will literally suck out everything. Of course, I have to write that outputter ;) But the extractor side code (UML 2.x openAPI calls - ugly) won't change, or not much. So it's mainly a question of instantiating BMM objects and writing them out to P_BMM2 format (the current format); later I will write a BMM3 outputter... which will look like Xcore format, and includes the new EL. --- ## Post #5 by @Seref Many thanks for writing this @pieterbos . The mail notification somehow escaped me. I'll keep this on the screen and make time to read it. --- ## Post #6 by @Seref I just recently concluded the project I was working on, so I'm only commenting on this now. Once again thanks @pieterbos , much appreciated. The way I see it, OpenAPI is significantly better than JsonSchema for our technical and political goals. Where JSON matters most from a standard perspective is the system periphery, where it is the serialisation format. However, JSON in the context of OpenAPI has a much larger ecosystem around it compared to JSON Schema. Code generation, API specification etc helps us a lot more than just being able to validate payload content is valid. I think the nice thing about OpenAPI approach is that we can do it incrementally. Even if we only have an OpenAPI model (which you already support as a downstream artefact from BMM), that gives us code generation and validation for data, which we can extend to service definitions later (which'll be based on data definitions anyway). This actually takes us beyond FHIR, despite FHIR always being very bullish on the system periphery by design, because last time I checked, FHIR did not have an OpenAPI spec (not sure where it is now). With JSON Schema, whatever we produce will leave developers in the cold in terms of figuring out how to use it from their applications, whereas with OpenAPI, we have a well supported stack/eco-system to point at, even if we don't have it initially (i.e. working on service definitions later). From a specification p.o.v. even if we had the current OpenAPI output of the code in the Archie branch, that's a useable artefact, giving us what we'd get from Json Schema pretty much right now, with more to add if we want to. Re adding documentation capabilities to BMM: I'd see that as a layer above BMM, no different than using templates for UI generation. A meta mechanism in BMM similar to annotations, which'd let the user link to some documentation may be a better separation of concerns, but I'm not really taking part on BMM development so I'll stop at this point. I was not aware of your work on OpenAPI, great job as usual, I'll go and take a look now. --- ## Post #7 by @thomas.beale [quote="Seref, post:6, topic:1385"] Re adding documentation capabilities to BMM: I’d see that as a layer above BMM, no different than using templates for UI generation. A meta mechanism in BMM similar to annotations, which’d let the user link to some documentation may be a better separation of concerns, but I’m not really taking part on BMM development so I’ll stop at this point. [/quote] Just an aside - the 'documentation' @pieterbos is talking about here is just the primary model documentation i.e. class descriptions, feature descriptions etc - all that stuff you see in the Class Definitions tables in the openEHR spec. Because we don't yet extract BMM from the UML, which is where those documenation fragments currently sit, the BMM files are missing them. My goal is to get a UML->BMM extractor working, so that those documentation fragments will be exported automatically, along with all changes made to the UML. It's not 100% ideal having the UML as the primary expression in the toolchain, but I guess its utility is still sufficient to justify it (i.e. we stop using UML, we have no diagrams ;), and we can compensate for its annoyances with extractor hacks, which I already do. Being able to add annotations to BMM in another layer could indeed be useful... Back to the main conversation on OpenAPI, also very educational for me. --- ## Post #8 by @Seref Ah, sorry, I misunderstood the documentation bit then, though my suggestion still stands , but back to the main convo as you say. The way I see it, openAPI will slowly, but almost surely will kill JSON Schema. As its adoption at the REST endpoints increase, it'll be the primary source of formalism (it hurts me to use this word for json, but anyway..) for the json content pushed to backends, so I cannot see how any greenfield work can kick start with JSON Schema. Anyway, the point is for once we may have a late comer advantage here by not having invested too much into JSON schema and jumping to OpenAPI --- ## Post #9 by @pieterbos [quote="Seref, post:6, topic:1385"] The way I see it, OpenAPI is significantly better than JsonSchema for our technical and political goals. Where JSON matters most from a standard perspective is the system periphery, where it is the serialisation format. [/quote] The nice thing is - we get to have both, without much work! Note that the current apib-files for the REST API also can represent models, even if it is not currently used in those files. For those a generator that does openAPI -> .apib is available. Or openAPI could replace it, eventually. Now we just have to update the Archie OpenAPI-generator and make sure it works in the latest version again, I created that pull request ages ago as more of an experiment... --- ## Post #10 by @Seref Thanks, Pieter, I missed the point re apib files. Good news then! Re the latest version: do you think it'd be better to target OpenAPI 3.0 now rather than 3.1? Last I checked almost the whole tool stack, including UI tools etc were still supporting 3.0 max, so the exact latest version may not be the most convenient for our purposes :) --- ## Post #11 by @sebastian.iancu Well, to my knowledge JSON Schema is not replacing OpenAPI, neither the other way around. OpenAPI is an alternative to API Blueprint. I guess both (OpenAPI as especially API Blueprint) can use JSON Schema for models. The plan we had a few years ago in SEC was to document the API using API blueprint, with references to JSON Schema for the resources definition, and optionally export it as OpenAPI specs so that we can generate code, etc. --- ## Post #12 by @sebastian.iancu There are also tools to convert .apib files to OpenAPI equivalent, although I never tried them. Nowadays API Blueprint is not that popular anymore, OpenAPI probably won the battle on documenting APIs, but I think there are still benefits to our keep current REST specs as .apib files., while keeping an eye on the developments on these fields and adapt later if necessary. --- ## Post #13 by @sebastian.iancu [quote="pieterbos, post:9, topic:1385"] Note that the current apib-files for the REST API also can represent models, even if it is not currently used in those files. For those a generator that does openAPI → .apib is available. Or openAPI could replace it, eventually. [/quote] I see @pieterbos responded also, supporting my thoughts about conversion from .apib files. --- ## Post #14 by @pieterbos [quote="Seref, post:10, topic:1385"] Re the latest version: do you think it’d be better to target OpenAPI 3.0 now rather than 3.1? Last I checked almost the whole tool stack, including UI tools etc were still supporting 3.0 max, so the exact latest version may not be the most convenient for our purposes :slight_smile: [/quote] I'm not sure, but I think the differences should be small between those two versions. [quote="sebastian.iancu, post:11, topic:1385"] OpenAPI is an alternative to API Blueprint. I guess both (OpenAPI as especially API Blueprint) can use JSON Schema for models. [/quote] Yes, except that the OpenAPI dialect of JSON Schema has some extensions that are absolutely necessary if you want to use any kind of automated tooling to map to classes with polymorphism. So, for the model part of OpenAPI, you would need a different file than plain JSON Schema to express the OpenEHR model. Which is why I generated both a json schema file for validation, and an OpenAPI file with just the models for validation, api specification and code generation. The differences are that they both have a very different way to express polymorphism - one only describes validation rules for json, the other a way to map to the OO-concepts. --- ## Post #15 by @thomas.beale [quote="pieterbos, post:14, topic:1385"] dialect of JSON Schema has some extensions that are absolutely necessary if you want to use any kind of automated tooling to map to classes with polymorphism [/quote] I know we had this discussion before, but can't remember the answer - does the orthodox variety of JSON schema not handle polymorphic typing? --- ## Post #16 by @pieterbos [quote="thomas.beale, post:15, topic:1385"] I know we had this discussion before, but can’t remember the answer - does the orthodox variety of JSON schema not handle polymorphic typing? [/quote] Yes and no. You can do oneOf or anyOf, then reference several types. However, we need a discriminator column to determine which type is used, so we need to do a bit more. For that, you can do, pseudocode: ``` if ( _type == "DV_TEXT") { apply this part of the schema } ``` Or you can do (slightly less pseudocode): ``` "oneOf": [ "allOf": [ { "ref", "reference to the type DV_TEXT"}, { "type": "object", "properties": "_type": { "const": "DV_TEXT" } }, ... add more subtypes here ]] ``` The first works well with validators in the sense that the output is understandable and it is fast. The second one with most validators produces tons of possible output on a validation error, basically a message per possible variant, and the validators I have tried were rather slow with it. It should be possible to write tools that recognise these patterns and validate well and generate code, but I have not found any tools available that can generate any type of code from these constructions. OpenApi defines a discriminator column, as in https://swagger.io/docs/specification/data-models/inheritance-and-polymorphism/ . That solves the problem entirely. Note that the paragraphs 'model composition' and 'polymorphism' are still standard JSON Schema. Tools to generate code are widely available, but I am not sure if they all support the discriminator column mechanism. --- ## Post #17 by @joostholslag Just learning about OpenAPI ft openEHR. And I had a question that might be very stupid. But if the reference and archetype models are in OpenAPI, could we also model archetypes in OpenAPI (instead of ADL)? And templates? This would make it a lot easier for people new to openEHR (or the occasional strong-headed CIO not willing to learn adl), to get into openEHR. Since with OpenAPI code generation they could use the models in there native programming language directly. Would be even better if e.g. CKM could export archetypes/templates that would self validate, so the relevant RM+AOM would be part of the OpenAPI schema file. And it would mean we no longer would have to maintain ODIN and ADL. --- ## Post #18 by @thomas.beale [quote="joostholslag, post:17, topic:1385"] But if the reference and archetype models are in OpenAPI, could we also model archetypes in OpenAPI (instead of ADL)? [/quote] There is a categorical difference between what generic formalisms like OpenAPI (or even JSON) can represent and native syntaxes. When a generic syntax is used, it is just representing instances from some meta-model. AOM is a meta-model for archetypes; there are meta-models for every programming language, and so on. Now, one might ask, why write Java code, or Python, C#, TypeScript etc - why have all these syntaxes when we could just have everything written in JSON, or maybe (somehow) in openAPI (which is really a cut-down version of [OMG IDL](https://www.omg.org/spec/IDL/4.2/PDF)). The reason is that the native syntaxes allow us to represent directly the semantics of the language, using keywords, specific symbols etc - so we can think in the concepts that the languages support. ADL is just another programming language - its main features not found in most other languages are: * constraints, including nested (e.g. `x ∈ {|0.0 .. 250.0|}`) * semantic overloading of model instances with domain markers (achieved by the id-codes, e.g. `ELEMENT[id41|systolic|]`. * terminology & terminology binding To make these things happen (just like all specific features in all those other languages) you need a meta-model, whose classes express exactly those features. For example, the ability to write `ELEMENT[id41]` in ADL is supported by the [`C_OBJECT` class in the AOM meta-model](https://specifications.openehr.org/releases/UML/latest/index.html#Diagrams___18_1_83e026d_1428951189859_3475_5433). Specifically: ![c_object|365x187](upload://2NDxXYxhRq179oGXVyjb1q1EEuL.png) The 2 fields `rm_type_name` and `node_id` are what allows `ELEMENT[id41]` to be written. So when we write a fragment of ADL (or Java, or Python, or anything) we are using a native syntax that makes sense to humans to write out (i.e. serialise) an instance of some class in the meta-model of that language. But there is always a generic way to serialise instances of a meta-model, which is in 'object dump' syntaxes like XML, JSON, YAML etc (that's what openEHR ODIN is). Here's an example: native ADL: ``` EVALUATION[id1] matches { -- Adverse reaction risk data matches { ITEM_TREE[id2] matches { items cardinality matches {1..*; unordered} matches { ELEMENT[id3] matches { -- Substance value matches { DV_TEXT[id130] } } ELEMENT[id64] occurrences matches {0..1} matches { -- Status value matches { DV_CODED_TEXT[id131] matches { defining_code matches {[ac1]} -- Status (synthesised) } DV_TEXT[id132] } } ``` And the equivalent in JSON dump format: ``` "definition": { "rm_type_name": "EVALUATION", "node_id": "id1", "attributes": [ { "rm_attribute_name": "data", "children": [ { "rm_type_name": "ITEM_TREE", "node_id": "id2", "attributes": [ { "rm_attribute_name": "items", "children": [ { "rm_type_name": "ELEMENT", "node_id": "id3", "attributes": [ { "rm_attribute_name": "value", "children": [ { "rm_type_name": "DV_TEXT", "node_id": "id130" } ] } ] }, { "rm_type_name": "ELEMENT", "node_id": "id64", "occurrences": "0..1", "attributes": [ { "rm_attribute_name": "value", "children": [ { "rm_type_name": "DV_CODED_TEXT", "node_id": "id131", "attributes": [ { "rm_attribute_name": "defining_code", "children": [ { "rm_type_name": "CODE_PHRASE", "node_id": "id9999", "constraint": "ac1" } ] } ] }, { "rm_type_name": "DV_TEXT", "node_id": "id132" } ] } ] }, ``` That's 18 lines of native syntax compared to 60 in JSON. The ADL is also directly comprehensible (assuming one has read the ADL manual ;), whereas the JSON serialisation just looks like... a pile of objects. The same argument holds for any programming language - consider an 'if / then' statement in Java, PHP, TS etc - generally easy to read and write, and makes sense at a high mathematical / logic level. However, written out as generic instances of the language meta-model, it would be impossible to read. So the purpose of any native syntax is for humans to write and read, but also for computers to be able to understand. To do that, a native language parser is required, which consumes texts in native ADL, Java etc, and pumps out (usually) an 'augmented abstract syntax tree' (augmented AST), which is an in-memory representation much more like the JSON. That in-memory structure could be serialised out using e.g. Java's native JSON serialiser, or any other similar tool, to save it in JSON. It's now no longer useful to humans, but it can be read back in in an instant by a standard JSON reader, to re-instantiate those in-memory objects. Whereas parsing the native form is quite a lot more complex and resource intensive. For files that are known to be valid, writing and reading to JSON or some other object dump format is thus quite useful. Native languages have another under-rated purpose: to teach the language, i.e. teach its concepts. This fragment of Java using a 'lambda' - `myIntegerList.forEach( (n) -> { System.out.println(n); }` - makes sense to Java programmers and is very concise. Without that special syntax, it's very hard to teach and learn. Can native languages be avoided? Sure, with visual programming Apps that allow you to program purely in the UI. Archetypes are in fact a candidate - most clinical modellers just use the Archetype Designer or similar tool. But it's surprising how many people write or modify native ADL. Visual programming of Java or Kotlin or Dart isn't going to happen any time soon however, because the concepts are more sophisticated than any possible visualisation. On to OpenAPI. It's a generic language for expressing APIs, so directly comparable to OMG IDL - it's like a programming language, except missing the ability to actually write the code inside routines. It doesn't (as far as I can tell) contain any constraint semantics beyond the notion of cardinality, and it doesn't know about terminology in any native way. So it's probably not a good candidate for trying to express archetypes. The use of new languages for specific purposes used to be thought of as bad, but today, everything has its own language, and instead of one language to rule them all, we have notionally one tool + meta-model approach to rule them all, since everything can be made to fit the schema: native language -> native lang Parser -> in-memory AST (meta-model instance) -> tools. The important thing is to have meta-models strong enough to represent native language semantics. Environments like [JetBrains MPS](https://www.jetbrains.com/mps/) are addressing this ecosystem. With these kinds of tools, lots of languages is no longer a problem, since they allow us to represent concepts in numerous domains natively. --- ## Post #19 by @borut.jures [quote="thomas.beale, post:18, topic:1385"] Can native languages be avoided? Sure, with visual programming Apps that allow you to program purely in the UI. Archetypes are in fact a candidate - most clinical modellers just use the Archetype Designer or similar tool. But it’s surprising how many people write or modify native ADL. [/quote] I was thinking about this yesterday. I "need" a computable version of the specifications for my approach but I'll use JSON for the operational templates. That means I won't use ADL. I was wondering if there was a poll asking clinical modelers whether they prefer GUI tools or text (ADL)? My thinking is that most would prefer a GUI tool making ADL optional. But you are saying there are people writing directly in ADL. It would be interesting to know how many % are using GUI / ADL. --- ## Post #20 by @thomas.beale [quote="borut.jures, post:19, topic:1385"] My thinking is that most would prefer a GUI tool making ADL optional. But you are saying there are people writing directly in ADL. It would be interesting to know how many % are using GUI / ADL. [/quote] It's undoubtedly more GUI, less ADL. There is another reason to maintain ADL and a native parser however. In the past, it was assumed that there would always be a fixed generic syntax to use. This used to be XSD 1.0. But that depends on the schema you design - the XML will be different for different variant schemas. If you move to XSD 1.1, then it all changes, multiplied by variant schemas. Then the world decides XSD is annoying (it is...) and wants everything in JSON. So we do that (quite easily). Then we have to sort out the `'_type'` thing in JSON. Then some of us move to JSON Schema based JSON, which (probably) changes things again in some annoying way. Then (imagine) some people want to move to JSON5 (I would ;), others want YAML (but which variant?). And so it goes. Meanwhile, there's only one ADL syntax at each release, and we always know what it means. We could imagine serialising a repository of archetypes into today's YAML vX.Y.Z, and forgetting about it for a few years. Then you come along later, and want to work with those archetypes, and have trouble locating a YAML vX.Y.Z reader, because the world's moved on. So things are not as black and white as one might first imagine. --- ## Post #21 by @borut.jures [quote="thomas.beale, post:20, topic:1385"] There is another reason to maintain ADL [/quote] I agree. With "optional" I had modelers (and my use case) in mind. ADL is not optional for CKM and CDR builders. --- ## Post #22 by @siljelb [quote="borut.jures, post:19, topic:1385"] I was wondering if there was a poll asking clinical modelers whether they prefer GUI tools or text (ADL)? [/quote] In practice you need to be able to do both. Tools do most of the work for sure, but no tool will ever be perfect. Search and replace for changing a word throughout an archetype, for example. Or transforming an archetype from one class to another. AFAIK no current tool does either of those. As for doing this in a generic language like JSON... Please, no! 😱 We've seen what operational templates look like, whether in XML, JSON or whatever. ADL is heaven compared to that 😇 --- ## Post #23 by @borut.jures Thank you for the great insight into the life of a clinical modeler. Has nobody asked what modelers need to do their work? It is clear to me that ADL is useful to you. Even with additional features added to the tools, ADL+text editor is the most flexible format. However mentioned search and replace using the semantics would be a great feature. They say the user is king/queen. And when a queen says :scream: about JSON, we should stop pushing it :flushed: Nobody is pushing developers to use a programming language they don't like. Developers shouldn't push the modelers either. p.s. I probably don't know all the needed background information why some prefer JSON over ADL. I'm new at openEHR and unencumbered about the past. I just see the future :innocent: --- ## Post #24 by @borut.jures [quote="joostholslag, post:17, topic:1385"] But if the reference and archetype models are in OpenAPI, could we also model archetypes in OpenAPI (instead of ADL)? [/quote] Good news from GitHub: "decided to go further with current JSON schema (**generated with Archie from BMM)**" Can this make few of us that like BMM happy and everybody who prefers JSON schema to have the format they like? :smiling_face_with_three_hearts: --- ## Post #25 by @pablo [quote="borut.jures, post:23, topic:1385"] I probably don’t know all the needed background information why some prefer JSON over ADL [/quote] Hi Borut, I'm late to the party... At implementation level and runtime, you can use any format you like, as long as you can transform back and forth to/from ADL or OPT, which means: the format you use your be semantically equivalent to the standard ones. So if JSON, XML, YAML, CSV or whatever simplifies your work at the technology level, USE IT! Though, what you think might simplify your work today, might make if difficult in the future when things get complicated, like modeling a complex EMR with openEHR. If you are new to openEHR, maybe you want to solve problems you don't really have or that are already solved elsewhere. You just need the experience to go through the learning process, understand the specs, the tech available, and how to apply both to tour specific business requirements. This might take a while, so even if you are not sure of what you are doing, it's all time invested in learning. Try, make mistakes, ask and learn. It could be a painful but enjoyable journey! (I've been there and made many mistakes, but learned a lot). That is just my 2 cents :slight_smile: All the best and welcome! --- ## Post #26 by @joostholslag I needed some time to realise you’re right. I think you summed up the arguments for domain specific languages nicely by order of priority. But I also still feel openEHR DSLs are a big barrier to entry, from a learning and an implementation perspective. So I would still like us to make ‘more’ effort to do work based on standards. And I think there is more room for compromise on the requirements than we currently allow. But maybe that compromise is making more effort to support standards, I’m thinking of: - OpenAPI export of flattened templates - deprecating Odin for json - sharing libraries (not just archie Java) - creating subsets of standards instead of inventing everything ourselfselves (expression language, task planning etc) But this is ultimately not my domain, so it’s up to the engineers to decide on these issues. But there’s a catch 22 to here: engineers able to build the stuff described are so well versed in openEHR they don’t benefit from the simplifications. --- ## Post #27 by @borut.jures [quote="joostholslag, post:26, topic:1385"] deprecating Odin for json [/quote] If you try hard enough you could use JSON instead of BMM (ODIN). But it is not the same experience editing a JSON compared to editing a BMM file. The JSON would be too complex for a human to edit. That is why DSLs are used. To simplify writing/editing. To not lose focus with formatting while trying to write your thoughts before you get distracted by another thought. To stay in the flow for as long as possible. What if openEHR specifications web site had to be written in pure HTML since it is easier for others? Let the author pick his tools for writing specifications. BMM format (ODIN) is not that hard for anyone willing to spend a few days learning/programming. There are ANTLR parsers in almost any language (and ANTLR is a "standard"). Why hasn't somebody converted BMMs into JSON (e.g. OpenAPI) if that is THE answer? Using Archie solves the issue of having to know how to parse BMMs. It shouldn't take more than a few days. I propose somebody uses Archie and existing BMM files to prepare an initial proposal for the specifications in JSON. Then both formats may be compared and a decision can be made which one to use in the future. [quote="joostholslag, post:26, topic:1385"] this is ultimately not my domain [/quote] Be careful with engineers :wink: It is easy to convince non-engineers something is hard if we don't feel like doing it. Or that it would take a loooong time to do it. Can you tell when they are telling you the facts? In medicine you ask for a second opinion to be sure. --- ## Post #28 by @pieterbos [quote="borut.jures, post:27, topic:1385"] f you try hard enough you could use JSON instead of BMM (ODIN). But it is not the same experience editing a JSON compared to editing a BMM file. The JSON would be too complex for a human to edit. [/quote] Well... replace json with json5 or yaml, and it is a direct and simple transformation. These file formats are very closely related, and BMM can just as well be expressed in another object serialisation format, for which proper tools are available in every editor. So, I do not agree with that part. Writing and parsing BMM in json and yaml is already supported out of the box in Archie, without any changes to Archie. --- ## Post #29 by @borut.jures [quote="pieterbos, post:28, topic:1385"] Writing and parsing BMM in json and yaml is already supported out of the box in Archie, without any changes to Archie. [/quote] **This is my main point - Archie has a good BMM parser. Somebody should use it to convert them to a proposed replacement for BMM files.** The result would be JSON (or whatever format is selected) versions of the existing BMM files. Then the new format is ready to be considered. p.s. I'll be happy to use the new format if it includes everything that is in the BMM files. [quote="pieterbos, post:28, topic:1385"] proper tools are available in every editor [/quote] Currently UML is the primary source of the specifications. BMMs are generated from UML. Not much editing is required. Same would be true for the new format if UML remains the primary source. But if the new format becomes the primary source, then it makes the difference how easy the editing is. Compared to ADL2, the BMM is verbose but JSON is even more verbose. I believe that JSON is for computers. DSLs like ADL2/BMM are for humans. JSON should be a result of a conversion from DSL not the primary source. To me the question is not replacing BMM/ODIN but providing alternative formats generated from them so that the specifications may be used in formats that the developers are more familiar with. --- ## Post #30 by @thomas.beale [quote="borut.jures, post:29, topic:1385"] This is my main point - Archie has a good BMM parser. Somebody should use it to convert them to a proposed replacement for BMM files. [/quote] That is coming (Expression Language); but there will always be a JSON / YAML / xyz equivalent. As long as we make all of this bullet-proof, everyone gets what they want ;) [quote="borut.jures, post:29, topic:1385"] Currently UML is the primary source of the specifications. BMMs are generated from UML. Not much editing is required. Same would be true for the new format if UML remains the primary source. But if the new format becomes the primary source, then it makes the difference how easy the editing is. Compared to ADL2, the BMM is verbose but JSON is even more verbose. [/quote] That is my personal thinking as well, but some people like JSON etc, and to be fair, YAML is pretty acceptable in certain editors. [quote="borut.jures, post:29, topic:1385"] JSON should be a result of a conversion from DSL not the primary source. [/quote] Yes, computationally it does have to be a serialisation of an in-memory meta-model, which is the BMM in this case. --- ## Post #31 by @borut.jures I was writing an OPT2 to JSON converter. Since some sections in ADL2 are using ODIN grammar I added 4 `toJson()` methods to the ODIN classes in my ODIN parser. This returned JSON for the ODIN sections in ADL2. Then I thought if those 4 `toJson()` methods will also convert BMM files to JSON. Yes, they do! On the next run of my generator I saved JSON versions of all the BMM files: ```json { "bmm_version": "2.3", "rm_publisher": "openehr", "rm_release": "1.0.4", "model_name": "EHR", "schema_name": "rm_ehr", "schema_revision": "1.0.4.1", "schema_lifecycle_state": "stable", "schema_description": "openEHR Release 1.0.4 EHR schema", "schema_author": "Thomas Beale ", "includes": { "1": { "id": "openehr_rm_structures_1.0.4" } }, "packages": { "org.openehr.rm.ehr": { "name": "org.openehr.rm.ehr", "classes": [ "EHR", "EHR_ACCESS", "EHR_STATUS", "ACCESS_CONTROL_SETTINGS" ] }, "org.openehr.rm.composition": { "name": "org.openehr.rm.composition", "classes": [ "COMPOSITION", "EVENT_CONTEXT" ], "packages": { "content": { "name": "content", "classes": "CONTENT_ITEM", "packages": { "navigation": { "name": "navigation", "classes": "SECTION" }, "entry": { "name": "entry", "classes": [ "ENTRY", "CARE_ENTRY", "ADMIN_ENTRY", "OBSERVATION", "EVALUATION", "INSTRUCTION", "ACTION", "ACTIVITY", "ISM_TRANSITION", "INSTRUCTION_DETAILS" ] }, "integration": { "name": "integration", "classes": "GENERIC_ENTRY" } } } } } }, "class_definitions": { "EHR": { "name": "EHR", "properties": { "system_id": { "name": "system_id", "type": "HIER_OBJECT_ID", "is_mandatory": true }, "ehr_id": { "name": "ehr_id", "type": "HIER_OBJECT_ID", "is_mandatory": true }, "time_created": { "name": "time_created", "type": "DV_DATE_TIME", "is_mandatory": true }, "ehr_access": { "name": "ehr_access", "type": "OBJECT_REF", "is_mandatory": true }, "ehr_status": { "name": "ehr_status", "type": "OBJECT_REF", "is_mandatory": true }, "directory": { "name": "directory", "type": "OBJECT_REF" }, "compositions": { "name": "compositions", "type_def": { "container_type": "List", "type": "OBJECT_REF" }, "cardinality": [] }, "contributions": { "name": "contributions", "type_def": { "container_type": "List", "type": "OBJECT_REF" }, "cardinality": [], "is_mandatory": true }, "most_recent_composition": { "name": "most_recent_composition", "type": "COMPOSITION", "is_computed": true } } }, "EHR_ACCESS": { "name": "EHR_ACCESS", "ancestors": "LOCATABLE", "properties": { "settings": { "name": "settings", "type": "ACCESS_CONTROL_SETTINGS" } } }, "ACCESS_CONTROL_SETTINGS": { "name": "ACCESS_CONTROL_SETTINGS", "ancestors": "Any", "is_abstract": true }, "EHR_STATUS": { "name": "EHR_STATUS", "ancestors": "LOCATABLE", "properties": { "subject": { "name": "subject", "type": "PARTY_SELF", "is_mandatory": true }, "is_queryable": { "name": "is_queryable", "type": "Boolean", "is_mandatory": true }, "is_modifiable": { "name": "is_modifiable", "type": "Boolean", "is_mandatory": true }, "other_details": { "name": "other_details", "type": "ITEM_STRUCTURE" } } }, "COMPOSITION": { "name": "COMPOSITION", "ancestors": "LOCATABLE", "properties": { "language": { "name": "language", "type": "CODE_PHRASE", "is_mandatory": true }, "territory": { "name": "territory", "type": "CODE_PHRASE", "is_mandatory": true }, "category": { "name": "category", "type": "DV_CODED_TEXT", "is_mandatory": true }, "composer": { "name": "composer", "type": "PARTY_PROXY", "is_mandatory": true }, "context": { "name": "context", "type": "EVENT_CONTEXT" }, "content": { "name": "content", "type_def": { "container_type": "List", "type": "CONTENT_ITEM" }, "cardinality": [] } } }, "EVENT_CONTEXT": { "name": "EVENT_CONTEXT", "ancestors": "PATHABLE", "properties": { "health_care_facility": { "name": "health_care_facility", "type": "PARTY_IDENTIFIED" }, "start_time": { "name": "start_time", "type": "DV_DATE_TIME", "is_mandatory": true }, "end_time": { "name": "end_time", "type": "DV_DATE_TIME" }, "participations": { "name": "participations", "type_def": { "container_type": "List", "type": "PARTICIPATION" }, "cardinality": [] }, "location": { "name": "location", "type": "String" }, "setting": { "name": "setting", "type": "DV_CODED_TEXT", "is_mandatory": true }, "other_context": { "name": "other_context", "type": "ITEM_STRUCTURE" } } }, "CONTENT_ITEM": { "name": "CONTENT_ITEM", "ancestors": "LOCATABLE", "is_abstract": true }, "SECTION": { "name": "SECTION", "ancestors": "CONTENT_ITEM", "properties": { "items": { "name": "items", "type_def": { "container_type": "List", "type": "CONTENT_ITEM" }, "cardinality": [] } } }, "ENTRY": { "name": "ENTRY", "is_abstract": true, "ancestors": "CONTENT_ITEM", "properties": { "language": { "name": "language", "type": "CODE_PHRASE", "is_mandatory": true, "is_im_infrastructure": true }, "encoding": { "name": "encoding", "type": "CODE_PHRASE", "is_mandatory": true, "is_im_infrastructure": true }, "subject": { "name": "subject", "type": "PARTY_PROXY", "is_mandatory": true }, "provider": { "name": "provider", "type": "PARTY_PROXY" }, "other_participations": { "name": "other_participations", "type_def": { "container_type": "List", "type": "PARTICIPATION" }, "cardinality": [] }, "workflow_id": { "name": "workflow_id", "type": "OBJECT_REF", "is_im_runtime": true } } }, "ADMIN_ENTRY": { "name": "ADMIN_ENTRY", "ancestors": "ENTRY", "properties": { "data": { "name": "data", "type": "ITEM_STRUCTURE", "is_mandatory": true } } }, "CARE_ENTRY": { "name": "CARE_ENTRY", "is_abstract": true, "documentation": "Abstract ENTRY subtype corresponding to any type of ENTRY in the clinical care cycle.", "ancestors": "ENTRY", "properties": { "protocol": { "name": "protocol", "type": "ITEM_STRUCTURE" }, "guideline_id": { "name": "guideline_id", "type": "OBJECT_REF", "is_im_runtime": true } } }, "OBSERVATION": { "name": "OBSERVATION", "documentation": "ENTRY subtype used to represent observation information in time, as either a single or multiple samples.", "ancestors": "CARE_ENTRY", "properties": { "data": { "name": "data", "documentation": "Data of the observation, in the form of a HISTORY of EVENTs.", "is_mandatory": true, "type_def": { "root_type": "HISTORY", "generic_parameters": "ITEM_STRUCTURE" } }, "state": { "name": "state", "type_def": { "root_type": "HISTORY", "generic_parameters": "ITEM_STRUCTURE" } } } }, "EVALUATION": { "name": "EVALUATION", "ancestors": "CARE_ENTRY", "properties": { "data": { "name": "data", "type": "ITEM_STRUCTURE", "is_mandatory": true } } }, "INSTRUCTION": { "name": "INSTRUCTION", "ancestors": "CARE_ENTRY", "properties": { "narrative": { "name": "narrative", "type": "DV_TEXT", "is_mandatory": true }, "expiry_time": { "name": "expiry_time", "type": "DV_DATE_TIME" }, "wf_definition": { "name": "wf_definition", "type": "DV_PARSABLE", "is_im_runtime": true }, "activities": { "name": "activities", "type_def": { "container_type": "List", "type": "ACTIVITY" }, "cardinality": [] } } }, "ACTIVITY": { "name": "ACTIVITY", "ancestors": "LOCATABLE", "properties": { "description": { "name": "description", "type": "ITEM_STRUCTURE", "is_mandatory": true }, "timing": { "name": "timing", "type": "DV_PARSABLE" }, "action_archetype_id": { "name": "action_archetype_id", "type": "String", "is_mandatory": true } } }, "ACTION": { "name": "ACTION", "ancestors": "CARE_ENTRY", "properties": { "time": { "name": "time", "type": "DV_DATE_TIME", "is_mandatory": true, "is_im_runtime": true }, "description": { "name": "description", "type": "ITEM_STRUCTURE", "is_mandatory": true }, "ism_transition": { "name": "ism_transition", "type": "ISM_TRANSITION", "is_mandatory": true }, "instruction_details": { "name": "instruction_details", "type": "INSTRUCTION_DETAILS" } } }, "INSTRUCTION_DETAILS": { "name": "INSTRUCTION_DETAILS", "ancestors": "PATHABLE", "properties": { "instruction_id": { "name": "instruction_id", "type": "LOCATABLE_REF", "is_mandatory": true, "is_im_runtime": true }, "wf_details": { "name": "wf_details", "type": "ITEM_STRUCTURE", "is_im_runtime": true }, "activity_id": { "name": "activity_id", "type": "String", "is_mandatory": true, "is_im_runtime": true } } }, "ISM_TRANSITION": { "name": "ISM_TRANSITION", "ancestors": "PATHABLE", "properties": { "current_state": { "name": "current_state", "type": "DV_CODED_TEXT", "is_mandatory": true }, "transition": { "name": "transition", "type": "DV_CODED_TEXT" }, "careflow_step": { "name": "careflow_step", "type": "DV_CODED_TEXT" }, "reason": { "name": "reason", "type_def": { "container_type": "List", "type": "DV_TEXT" }, "cardinality": [] } } }, "GENERIC_ENTRY": { "name": "GENERIC_ENTRY", "ancestors": "CONTENT_ITEM", "properties": { "data": { "name": "data", "type": "ITEM_TREE", "is_mandatory": true } } } } } ``` And then I re-read what Pieter wrote: [quote="pieterbos, post:28, topic:1385"] Writing and parsing BMM in json and yaml is already supported out of the box in Archie, without any changes to Archie. [/quote] It took me 10 days to understand what Pieter tried to say. He was probably referring to the [specifications-ITS-JSON](https://github.com/openEHR/specifications-ITS-JSON/tree/master/components). It contains JSON Schema versions of the BMM files. --- I still like the ANTLR4 grammars and generated parsers. A person would go crazy writing a grammar in JSON. But at least if somebody wants to use JSON instead of BMM as their starting point they can. And they could do it a long time ago using Archie. Is there anyone who uses BMM JSON files as computable specifications for openEHR? --- ## Post #32 by @sebastian.iancu [quote="borut.jures, post:31, topic:1385"] Is there anyone who uses BMM JSON files as computable specifications for openEHR? [/quote] I guess everybody is using JSON-schema to validate data, or perhaps to generate instance models compliant with RM. The BMM derived JSON does not contain all specifications, like full textual descriptions, inheritance info, neither those invariants. Nevertheless they are useful and some are doing experiments JSON Schema (openAPI) for various use cases. --- ## Post #33 by @joostholslag @pieterbos seen this: native support for swift code generation from OpenAPI: https://developer.apple.com/wwdc23/10171 --- ## Post #35 by @sebastian.iancu Hi, Welcome to openEHR discourse. I am not sure about the level of experience with openEHR technology and specifications, but based on your question I would assume you want to quickly solve a problem without diving too much into openEHR. However, I would suggest you to first identify your data-models you'll need (I see there in your dictionary some demographic and some health information), by choosing right archetypes and templates. For some data you might choose other standards (FHIR?). Once that step is done, you will need to work with a CDR, perhaps EHRbase. If you define (upload) there the right Template, which is in fact the schema of your data, then you'll be able to use their REST API to publish data. Try to follow some presentations about this stack, for instance you can find on youtube @Sidharth_Ramesh presentations which might be helpuful. --- ## Post #37 by @sebastian.iancu [quote="s.abedian, post:36, topic:1385"] Analyze the data dictionary and design any Archetypes and templates that are needed. From my analysis, I have found 7 templates in this data: 1) Stress Analysis, 2) Validation Info, 3) Person Activity, 4) Clinical Trial, 5) Sleep Info, 6) Epidemiologic Information, and 7) Demographic/Person/User Information [/quote] That might be right, but I cannot evaluate truly; better ask clinical modelers ;) [quote="s.abedian, post:36, topic:1385"] I will need to assemble a developer team to set up the EHRbase platform and develop APIs to implement databases based on the designed Templates. [/quote] If you can handle containers, EHRbase can de deployed using Docker; you might find easily here on discourse people already having experience on that. [quote="s.abedian, post:36, topic:1385"] The developer team will then need to design the necessary API to receive data from the JSON file and store it in the designed CDM that it has developed based on Templates [/quote] Yes, that's a way of doing it. [quote="s.abedian, post:36, topic:1385"] Regarding your comment about using other standards such as FIRE for some data, do you think it is possible and correct to design a template for ‘other data’, without applying other standards like FHIR. [/quote] That is also possible, there are use-case where demographic data was modelled with archetypes and stored in the CDR. A better alternative would be to store in the Demographic Repository based on demographic archetypes and templates - not widely supported yet, but perhaps nowadays possible - ask @pablo and CaboLabs/Atomik. [quote="s.abedian, post:36, topic:1385"] Thank you once again for your assistance, and I apologize for any basic questions I may have asked. [/quote] Question is not that basic actually, and no need to apologize anyways - everybody was at some point new on openEHR. --- ## Post #38 by @s.abedian Thanks a lot for comprehensive respond. 1. You are professional and clinical modeler also :pray: :star_struck: 2. Sure, I have to promote my knowledge about apply docker and containers about EHRbased using 3. My issue was about some data which don't set up in demographic or clinical categories. I thought that may be it is possible to design a archetype or template as a 'other' name to put in some unstructured and fragment information 4. Thank you. It is your kindness --- ## Post #39 by @sebastian.iancu [quote="s.abedian, post:38, topic:1385"] My issue was about some data which don’t set up in demographic or clinical categories. I thought that may be it is possible to design a archetype or template as a ‘other’ name to put in some unstructured and fragment information [/quote] Yes that is possible - usually you do that by having an administrative entry composition see [this chapter on Entry Package](https://specifications.openehr.org/releases/RM/latest/ehr.html#_information_ontology) and fig.20. --- ## Post #40 by @joostholslag I noticed a discrepancy in the [ITS json schema file](https://specifications.openehr.org/releases/ITS-JSON/development/components/AM/Release-2.2.0/Aom2/AUTHORED_ARCHETYPE-detailed.json)s of am2.2 with the spec [class diagram](https://specifications.openehr.org/releases/AM/development/AOM2.html#_archetype_class) for the am2.3 description of: AOM2.ARCHETYPE.PARENT_ARCHETYPE_ID. Turns out this is due to a change introduced in 2.3. There’s a difference in semantics that is relevant to a recent issue: https://openehr.atlassian.net/browse/SPECPR-461 I couldn’t find details in why this change was made in: https://specifications.openehr.org/releases/AM/Release-2.3.0/AOM2.html#_amendment_record And the reason for posting in this thread is to revive the idea of improving generation of json-schema, so they can be kept up to date with spec changes. Ideally with an algorithm that’s available to openEHR, so a spec change can trigger a CI/CD pipeline for the conversion and publication process. I’m also curious whether we could further integrate json-schema and OpenAPI. As @pieterbos described OpenAPI 3.1 is now fully compatible with json-schema. And 3.1 support seems to have improved significantly in the ecosystem. Also @sebastian.iancu created (3 different) versions of the OpenAPI descriptions of the rest api. Additionally the situation with UML editing has deteriorated significantly with the limited time the only capable author has available now. So there’s been significant changes that warrant a reopening of this topic imho. One question I have would be whether it would be possible to generate OpenAPI spec for AOM2.OPERATIONAL_TEMPLATE? Maybe from the [existing json-schema](https://github.com/openEHR/specifications-ITS-JSON/blob/master/components/AM/Release-2.2.0/Aom2/OPERATIONAL_TEMPLATE.json)? We still have to specify the [file formats for opt2](https://specifications.openehr.org/releases/AM/development/OPT2.html#_file_formats). Isn’t the Json-schema enough for that? Now if we have OpenAPI for opt2. Would it be possible to generate an OpenAPI model for a specific opt2? E.g. vital signs. This would be so exciting because it would allow a client app to work at a specific template level, instead of having to implement all the openehr generic complexity. --- ## Post #41 by @pieterbos Generating json schema is supported in Archie, as long as you have a BMM input. In several flavours of json schema. So go ahead. It does specify OPT 2 in json as well. OpenAPI can be a bit tricky. Yes, you can specify the format in json schema, and tools can validate the input. But in order to generate code from that, you need code generators that support polymorphism. As far as I know, code generators for json schema that support polymorphism do not exist, and are hard to write. So, the move to 3.1 probably does very little to help openEHR. What you need for that, is better code generators, that support the openAPI specific discriminator keyword. See https://spec.openapis.org/oas/v3.1.0.html#composition-and-inheritance-polymorphism . If that has improved, much more is possible with openAPI and openEHR. openAPI code generation at https://github.com/openEHR/archie/pull/180 You could indeed build a openAPI model for data corresponding to a specific opt. Might run into the same problems. I have not built that generator. --- ## Post #42 by @Seref [quote="pieterbos, post:41, topic:1385"] But in order to generate code from that, you need code generators that support polymorphism. As far as I know, code generators for json schema that support polymorphism do not exist, and are hard to write [/quote] That has more or less been the story for myself, @sebastian.iancu and @pablo when we worked on this (Sebastian did most of the work). Polymorphism had support for OpenAPI but it was not good enough to be reliable across mainstream languages. There are multiple ways of expressing inheritance in OpenAPI and code generated from those kept having issues. As far as I know, neither json schema nor OpenAPI are reliable sources for code generation at this point in time. --- ## Post #43 by @pablo @sebastian.iancu worked on two flavors (and tried many alternatives) of the JSON schemas for the REST APIs, I think one was focused on data validation (syntax) and the other one on code generation (correct types and hierarchies, etc). For the work I do in our CDR, I only use the data validation one for checking the commits in JSON. I think the schemas I use were generated from the BMMs at some point, though I think I fixed some errors then merged all together in a single file JSON Schema for convenience (it's easier to validate against a single schema than many schemas that have references between them). I'm sorry I can't add much value to this discussion right now, I personally don't see a use case for code generation based on schemas yet. For instance, what I did in the openEHR Toolkit to generate JSON examples is all based on the OPT 1.4 model, the RM and the JSON schema, but not using the schema directly, basically I've built a JSON serializer for the RM that complies with the JSON schema. --- **Canonical:** https://discourse.openehr.org/t/json-schema-and-openapi-current-state-and-how-to-progress/1385 **Original content:** https://discourse.openehr.org/t/json-schema-and-openapi-current-state-and-how-to-progress/1385