# Archetypes in YAML **Category:** [ADL](https://discourse.openehr.org/c/adl/40) **Created:** 2024-06-13 09:49 UTC **Views:** 450 **Replies:** 26 **URL:** https://discourse.openehr.org/t/archetypes-in-yaml/5357 --- ## Post #1 by @sebastian.iancu There were several discussions over the last 3 years mentioning the possibility to express archetypes in a more popular format other then the ODIN-ADL and XML. One of the option considered was YAML. I was exploring a bit this, tried a few things and came up with following results for two of the archetypes on CKM (`blood_pressure.v2` and `demo.v1`) - see attachments. These are just experiments, nothing final, end result is certainly not standardized, but I'm curious for your feedback so far. From my point of view it looks ok, readable and workable, almost as good as ODIN, but in any case it is just a plane standard YAML, with a few extra tags to indicate overloaded openEHR types. [openEHR-EHR-OBSERVATION.demo.v1.yml|attachment](upload://stWuMY4JFFm91m89b2lctT0bvzI.yml) (194.1 KB) [openEHR-EHR-OBSERVATION.blood_pressure.v2.yml|attachment](upload://142mLjHjOTbWq9hRwfOBPe2NKvo.yml) (294.4 KB) --- ## Post #2 by @siljelb This is interesting! I agree the `description` and `ontology` sections are more readable in YAML than in ADL, but the `definition` section is much *less* readable, especially with how it expresses numerical constraints like occurrences. See how the DV_QUANTITY element "Systolic" from the Blood pressure archetype is expressed in ADL and YAML, respectively: ``` ELEMENT[at0004] occurrences matches {0..1} matches { -- Systolic value matches { C_DV_QUANTITY < property = <[openehr::125]> list = < ["1"] = < units = <"mm[Hg]"> magnitude = <|0.0..<1000.0|> precision = <|0|> > > > } } ``` ``` - !C_COMPLEX_OBJECT rm_type_name: ELEMENT occurrences: lower_included: true upper_included: true lower_unbounded: false upper_unbounded: false lower: 0 upper: 1 node_id: at0004 attributes: - !C_SINGLE_ATTRIBUTE rm_attribute_name: value existence: lower_included: true upper_included: true lower_unbounded: false upper_unbounded: false lower: 0 upper: 1 children: - !C_DV_QUANTITY rm_type_name: DV_QUANTITY occurrences: lower_included: true upper_included: true lower_unbounded: false upper_unbounded: false lower: 1 upper: 1 node_id: '' property: terminology_id: value: openehr code_string: '125' list: - magnitude: lower_included: true upper_included: false lower_unbounded: false upper_unbounded: false lower: 0.0 upper: 1000.0 precision: lower_included: true upper_included: true lower_unbounded: false upper_unbounded: false lower: 0 upper: 0 units: 'mm[Hg]' ``` --- ## Post #3 by @borut.jures Avoiding ADL would lower a barrier to entry for new implementers but some clinical modelers like to write the archetypes in text editors. For them YAML requires more “key strokes” than ADL. I’m always advocating for canonical JSON when serializing ADL (canonical to me means strictly following the specifications in BMM). However if we expect the clinical modelers to consider YAML as an alternative to ADL, I would expect that we use a “shorthand” variant for `occurrences` and `existence` (eg. `{0..1}`): (I knew Silje will be quick to comment about this :wink:) ```YAML occurrences: 0..1 ``` One advantage of using a well supported syntax is that text editors support things like “folding”. ADL files are looong and they require a lot of scrolling to get to a specific part of it. For YAML the editors support collapsing `translations` and `description` to quickly get to the `definition` section: ![Screenshot 2024-06-13 at 12.47.14|506x364, 50%](upload://fBxEpFPGFr2tGszhUxBUWQNzSOO.png) I had to search the YAML specifications for the use of `!` reserved keyword (eg. `!C_COMPLEX_OBJECT`). In case others are also unfamiliar with it, it is a “node tag”: https://yaml.org/spec/1.2-old/spec.html#id2784064 p.s. I guess BMM files could be also changed to YAML and the change would have less impact :thinking: This would eliminate a need for ODIN parser everywhere. --- ## Post #4 by @ian.mcnicoll Just to add to the conversation, here is an archetype expressed as json out of the Archetype Dsigner API. It looks very like the yaml and I agree the main problem is with the cADL aspect - the constraints themselves. There are definitely places like expression of occurrences and slot fills, but I suspect this might still feel very 'abstract' as it is raw AOM. It would be interesting to see what a cleaner version (occurrences / slot-fills') might look like. [archetype_response.json|attachment](upload://2LlZnlLfyzN4XzqOhprtBHP0pY9.json) (4.6 KB) --- ## Post #5 by @ian.mcnicoll I noticed that the json version above does handle the multipliciteos like "0..*" so that least makes the archetype muchmore readable - slots fills are a bit more probpematic but I could probably live with that extra complexity, especially if we could annotate nodeids to be human readable as this makes navigation much easier ``` "@type": "C_COMPLEX_OBJECT", "rmTypeName": "ELEMENT", "occurrences": "0..1", "nodeId": "at0002| Certifed impairment |", ``` --- ## Post #6 by @borut.jures [quote="ian.mcnicoll, post:5, topic:5357"] I noticed that the json version above does handle the multiplicities like “0…*” [/quote] We could use https://specifications.openehr.org/releases/SM/latest/serial_data_formats.html for ADL YAML too. This would simplify `occurrences`, `existence`, `interval`, `Terminology_code`, date/time (and others). p.s. `@type` in your example should be `_type` :thinking: --- ## Post #7 by @ian.mcnicoll [quote="borut.jures, post:6, topic:5357"] We could use [openEHR Serial Data Formats (SDF)](https://specifications.openehr.org/releases/SM/latest/serial_data_formats.html) for ADL YAML too. This would simplify `occurrences`, `existence`, `interval`, `Terminology_code`, date/time (and others). [/quote] I like that idea but does it play nicely with the idea of ADL as a constraint language, not a simple schema? It is certainly nicer to understand at a human level but does it remain computable 'down the tree'? --- ## Post #8 by @borut.jures [quote="ian.mcnicoll, post:7, topic:5357"] does it remain computable ‘down the tree’? [/quote] This cannot be specified in the current version of BMM. Thomas says it could be added to BMM3. Implementers must be careful to add those simpler formats to the serialization/deserialization code. This way it should remain computable. For example my code generators use BMM + the previously linked specifications for Serial Data Formats to “overrule” the default behavior found in BMM files. --- ## Post #9 by @pablo I thought we were going to do JSON first then other formats, since for JSON we have a canonical JSON Schema (I don't remember if it covers the whole AOM/TOM 1.4 though). For other formats we might need some kind of schema to be able to validate the syntax. --- ## Post #10 by @joostholslag [quote="borut.jures, post:3, topic:5357"] Avoiding ADL would lower a barrier to entry for new implementers [/quote] I really care about this one. [quote="borut.jures, post:3, topic:5357"] For them YAML requires more “key strokes” than ADL. [/quote] I don’t think writing adl by hand to produce an archetype is really done by anyone. Mostly it’s doing 99%from a gui editor. But making specific edits for edge cases or tool failure by hand in adl is a key requirement imho. So nr of key strokes is not really an issue I think. But legibility and thus nr of lines is. [quote="borut.jures, post:3, topic:5357"] One advantage of using a well supported syntax is that text editors support things like “folding” [/quote] Yes, this would be amazing. [quote="pablo, post:9, topic:5357"] I thought we were going to do JSON first then other formats, [/quote] I remember differently from SEC discussion at Nedap last November. but it wasn’t a final decision, so maybe I missed something. Editing in yaml is much much easier than json, to me. Curious what the others think. --- ## Post #11 by @sebastian.iancu About those simplifications that flattens some value (occurrences , terminology codes etc) - we can do that for sure, but I intended to first produce a yaml that is close to AOM serializations, without any extra logic. But Like I said, this is just an experiment and a discussion point. --- ## Post #12 by @damoca [quote="joostholslag, post:10, topic:5357"] I don’t think writing adl by hand to produce an archetype is really done by anyone. Mostly it’s doing 99%from a gui editor. But making specific edits for edge cases or tool failure by hand in adl is a key requirement imho. So nr of key strokes is not really an issue I think. But legibility and thus nr of lines is. [/quote] In that case, I prefer JSON. Not only because of what @pablo said about having a canonical JSON schema, but also for that 1% of manual edition. An archetype is not a simple configuration file, as the examples provided show. It is a very nested structure where the use of brackets would be very welcome, instead of having to control the indentation levels by spaces. Just imagine having to add a sibling node here in a plain text editor. It is not about the number of keystrokes, but about the possibilities of introducing errors. ![image|690x217](upload://2rPw7GU2LqPUywG3tbFmYbxjx1Y.png) --- ## Post #13 by @sebastian.iancu I get your point, but yaml comes besides readability also with some advantages over json, like comments, typing, tagging - basically json is a subset of yaml. --- ## Post #14 by @thomas.beale For experimentation, the ADL Workbench generates a 100% YAML flat form archetype for any archetype. I have not checked if the YAML is completely correct for a long time, but if people want to use it, I can fix any errors pretty easily. [quote="siljelb, post:2, topic:5357"] See how the DV_QUANTITY element “Systolic” from the Blood pressure archetype is expressed in ADL and YAML, respectively: [/quote] Note that the above is a particular way of expressing DV_QUANTITY constraints in ADL1.4, but it's not ADL. The ADL looks like this: ``` items matches { ELEMENT[id5] occurrences matches {0..1} matches { -- Systolic value matches { DV_QUANTITY[id1061] matches { property matches {[at1057]} magnitude matches {|0.0..<1000.0|} precision matches {0} units matches {"mm[Hg]"} } } } ELEMENT[id6] occurrences matches {0..1} matches { -- Diastolic value matches { DV_QUANTITY[id1062] matches { property matches {[at1057]} magnitude matches {|0.0..<1000.0|} precision matches {0} units matches {"mm[Hg]"} } } } ``` I am pro using YAML for where ODIN is used in an ADL2 archetype, but for the definition part, which is normally in ADL, there are two things you can do: * serialise and read in cADL, enabling human and machine readability (and far less lines than YAML etc) * serialise and read the definition part in YAML as well, i.e. 100% YAML archetype. The second approach is most useful for saving validated archetypes for operational use, because once an archetype or template is validated, it doesn't need to be validated again, if unchanged. For archetype and template-building solely using a GUI tool (like Archetype Designer) it doesn't matter which representation is used, and the 100% YAML approach is possibly easier. However, in that approach, there is no 'source' level of representation - there would be no 'look at the ADL to see what's really going on'. If there are expressions in the ADL (as used by Nedap), then having no source level of representation might be quite problematic. There are various people and companies who consider a source syntax view of archetypes essential, and they would want the YAML + cADL (+ EL) form of an archetype. Either approach is reasonable - but I would say that users and tool builders need to know the consequences of their choices. For loading validated archetypes and templates into a service of an operational system, the YAML (or any similar object meta-model format) is the most attractive. [quote="borut.jures, post:3, topic:5357"] One advantage of using a well supported syntax is that text editors support things like “folding”. ADL files are looong and they require a lot of scrolling to get to a specific part of it. [/quote] Well I guess this is easy to add to typical modern language-server based editors for the cADL as well as the YAML that probably comes for free. Folding still doesn't make the definition part readable if it's in YAML (or JSON)... [quote="ian.mcnicoll, post:5, topic:5357"] I noticed that the json version above does handle the multipliciteos like “0…*” so that least makes the archetype muchmore readable [/quote] I used to propound doing this as well, but 'normal' devs using standard tools will start screaming at you if you bury micro-syntax (which is what '0..1' is) inside standard JSON. They want to see standard JSON, which is: ``` "_type": "C_COMPLEX_OBJECT", "rm_type_name": "INTERVAL_EVENT", "node_id": "id1043", "occurrences": { "lower": 0, "upper": 1 }, ``` I don't know if JSON-schema helps surface micro-syntaxes, if so, that might be an avenue to convincing people to handle JSON with embedded micro-syntax. [quote="borut.jures, post:6, topic:5357"] We could use [openEHR Serial Data Formats (SDF) ](https://specifications.openehr.org/releases/SM/latest/serial_data_formats.html) for ADL YAML too. This would simplify `occurrences`, `existence`, `interval`, `Terminology_code`, date/time (and others). [/quote] Right - it's a general approach. But the result isn't standard JSON (or YAML or whatever), it's custom stuff. [quote="borut.jures, post:8, topic:5357"] For example my code generators use BMM + the previously linked specifications for Serial Data Formats to “overrule” the default behavior found in BMM files. [/quote] Exactly - you have to have special processing. Personally, I have no problem with this, but again, normal devs using standard tools will complain, at least until there is a JSON++ standard that formalises syntax in String fields... [quote="joostholslag, post:10, topic:5357"] I don’t think writing adl by hand to produce an archetype is really done by anyone. [/quote] It's done more frequently than you realise and it's pretty easy as well. Not that I'm advocating it as the main method, but it's very useful to be able to do it. Like BMM files - we write them by hand today; the thing it replaces is XMI, which no-one can write by hand. [quote="joostholslag, post:10, topic:5357"] But making specific edits for edge cases or tool failure by hand in adl is a key requirement imho. [/quote] Exactly right. [quote="sebastian.iancu, post:11, topic:5357, full:true"] About those simplifications that flattens some value (occurrences , terminology codes etc) - we can do that for sure, but I intended to first produce a yaml that is close to AOM serializations, without any extra logic. But Like I said, this is just an experiment and a discussion point. [/quote] The ADL WB version just does this as well - canonical YAML. --- ## Post #15 by @sebastian.iancu Ok, but for the record, I don't want to divert this post into advocating YAML vs JSON vs ODIN benefits - I only want to explore the following: "if" we would choose YAML then how should that look like. There is so far no decision taken towards using YAML, but we just need to explore it. @thomas.beale, if you say AWB has canonical YAML, could you export those two archetypes from CKM (bloode_pressure.v2 and demov1) in YAML and then compare results? If others (@siljelb @joostholslag @borut.jures) would like to use that SDF flattening notation - I would give it a try also. A combination of ADL+YAML is (to my understanding based on last two face2face SEC meetings) not what we want to investigate, as we would still be stuck with non-mainstream serialization formats. --- ## Post #16 by @thomas.beale My openEHR blood_pressure attached. It's from the ADL2 form, so id-coded (ADL WB doesn't know about at-coded ADL2 yet). DIfferences appear to be: * AWB version is outputting lists ('-' leader) in some places where it should be outputting maps (no '-') * AWB doesn't output any typenames * AWB version only outputs necessary lines of occurrences & cardinality ranges I can fix the first two issues in ADL WB pretty easily, and that would provide a tool to output YAML for any archetype repo. [openEHR-EHR-OBSERVATION.blood_pressure.v2.0.8.yaml|attachment](upload://1OyzsqBshUVeGgY9SOcH7tz8FJb.yaml) (289.8 KB) --- ## Post #17 by @pablo [quote="joostholslag, post:10, topic:5357"] [quote="pablo, post:9, topic:5357"] I thought we were going to do JSON first then other formats, [/quote] I remember differently from SEC discussion at Nedap last November. but it wasn’t a final decision, so maybe I missed something. Editing in yaml is much much easier than json, to me. Curious what the others think. [/quote] I can only refer to the meeting notes: https://openehr.atlassian.net/wiki/spaces/spec/pages/2201812993/2023-11-15+16+Arnhem+SEC+Meeting Search "@Pablo Pazos introducing ADL3 is also an opportunity to “switch” to other serialization format..." There are mixed opinions, I guess because of different experiences and use cases. What I remember mentioning in the meeting, that might not be in the minutes, is that for JSON there is a schema, which is more convenient for validation, though there are some adaptations to use JSON Schema to validate YAML, and there is certainly a simpler JSON <-> YAML conversion than XML or other standard formats to JSON. Before making any decision, we (SEC) should consider use cases, pros and cons of each format. Though my personal preference would be not to have one single "preferred" format, but instead having a standard serialization and deserialization process to/from many formats to the AOM 1.4 an 2.x, that allows then to have bidirectional format transformations like format1 -> AOM -> format2 (change format1 and format2 to whatever you want). That way we can support multiple use cases. For instance, if I need to display an archetype on a web app, I would prefer JSON because of the browser's native support for JS. For storing ADL I would prefer YAML because it's smaller. So using the best format for the job. +1 on the comment by @sebastian.iancu I don't want to start a discussion on which format is "better", just wanted to point out those formats were mentioned in the NL SEC meeting. --- ## Post #18 by @joostholslag ![image|344x500](upload://n9hXPRJIUszQlWco1tLsewg8vDB.png) All the indentation, which is expected to occur a lot, will decrease legibility. This is on an iPad 11” so relatively small screen, and I never viewed ADL on iPad before, so definitely not the standard way of viewing archetypes. but still it’s probably an issue. --- ## Post #19 by @pablo For YAML there should be 2 spaces per indentation, in your image it seems to be more than that. --- ## Post #20 by @joostholslag ![image|352x500](upload://5XmK1UQnb6WFlVlCGv7qCEDzuQ6.png) ![image|321x500](upload://mKiov9JbK1hsbLlgw8UC6IGVQ1h.png) I don't think it's my editor, because also on another editor (vscode) both examples use more (often 4, but not consistently) spaces as indent. --- ## Post #21 by @pablo [quote="pablo, post:19, topic:5357, full:true"] For YAML there should be 2 spaces per indentation, in your image it seems to be more than that. [/quote] @joostholslag I should correct myself: YAML needs at least 2 spaces for indentation, but can take any number above that and will still be valid. The only thing is: all indentations should be consistent. I think having some 2 space and some 4 space indentations will break the format, since it uses the indentations as delimiters to know where a tree starts and ends. --- ## Post #22 by @damoca [quote="pablo, post:21, topic:5357"] I think having some 2 space and some 4 space indentations will break the format, since it uses the indentations as delimiters to know where a tree starts and ends. [/quote] That's why I said before that I see YAML problematic for manual edition. --- ## Post #23 by @ian.mcnicoll I would hope that editors would help with that but I still think there is little to choose between JSON and YAML in terms of editabilty/ human readability and JSON has much more utility in terms of further processing. I'm happy to see both plus others like XML but if we had to choose one or a primary target for tooling, I would go for JSON. But I m'm quite clear we need to have something that parses out-of-the box without any kind of custom parser. Regrettably I think that means that we need to move beyond cADL now. --- ## Post #24 by @thomas.beale [quote="ian.mcnicoll, post:23, topic:5357"] But I m’m quite clear we need to have something that parses out-of-the box without any kind of custom parser. [/quote] We already have that. JSON. An in-memory archetype can be saved and read in today from JSON. But it only works for validated artefacts. It's perfect for post-validation computational use. For the entire post-authoring / post-validation sphere of activity, this is exactly what tool environments should do. But we are mixing that up with upstream authoring and validation, i.e. the production of artefacts in the first place. [quote="ian.mcnicoll, post:23, topic:5357"] Regrettably I think that means that we need to move beyond cADL now. [/quote] It is very difficult to represent a partially written or draft artefact of any kind, containing formal errors (which may be intended for draft purposes for some reason) in the post-parse meta-model form. And as I noted above, it's close to incomprehensible for human editing. There's a reason we have 'programming languages' that are human comprehensible syntaxes... They are also indispensable for educating people in the first place, to understand the semantics of the meta-model. In sum, if you want to go solely with meta-model representation, and have no ability to edit source by humans, you still have to solve the problem of partial / draft / error-containing artefacts - probably with some private syntax / markup. This is certainly doable, but it's limiting. Aside from all that, what's the problem with using a 'language'? There must be 50 languages targetting the JVM. No-one suggests we stop using them and just write in bytecode (which is the kind of thing we wrote in in the 70s...) Other attempts to do what ADL does are layers of language / syntax over the top of any possible generic representation: * W3C Shacl - a layer over RDF ([spec page](https://www.w3.org/TR/shacl/)) * OMG SysML2 - its own language ([spec PDF](https://www.omg.org/spec/SysML/2.0/Beta2/Language/PDF)) Here's a bit of Shacl: ``` ex:ClassExampleShape a sh:NodeShape ; sh:targetNode ex:Bob, ex:Alice, ex:Carol ; sh:property [ sh:path ex:address ; sh:class ex:PostalAddress ; ] . ``` Here's a bit of SysML2 (p 643 of the spec): ``` individual a:VehicleRoadContext_1 { timeslice t0_t2_a { snapshot t0_a { attribute t0 redefines time=0 [s]; snapshot t0_r:Road_1 { :>>incline=0; :>>friction=.1; } snapshot t0_v:Vehicle_1 { :>>position=0 [m]; :>>velocity=0 [m]; :>>acceleration=1.96 [m/s**2]; snapshot t0_fa:FrontAxleAssembly_1{ snapshot t0_leftFront:Wheel_1; snapshot t0_rightFront:Wheel_2; } } } snapshot t1_a { attribute t1 redefines time=1 [s]; snapshot t1_r:Road_1 { :>>incline=0; :>>friction=.1; } snapshot t1_v:Vehicle_1 { :>>position=.98 [m]; :>>velocity=1.96 [m/s]; :>>acceleration=1.96 [m/s**2]; snapshot t1_fa:FrontAxleAssembly_1 { snapshot t1_leftFront:Wheel_1; snapshot t1_rightFront:Wheel_2; ``` --- ## Post #25 by @borut.jures [quote="thomas.beale, post:24, topic:5357"] We already have that. JSON. An in-memory archetype can be saved and read in today from JSON. But it only works for validated artefacts. [/quote] If I understand Ian, the wish is to simplify entry of new openEHR implementers. For them, using a validated JSON representation of archetypes and OPT2s, would simplify things a lot. But as you mentioned, this can be done even if ADL is one of the supported formats. I initially thought that even non-valid archetypes could be saved as JSON, but the JSON would at least have to be valid enough that the editing tools would be able to import it if exported. This can be done by skipping the JSON schema validation step, but it could be tricky. However non-valid ADL cannot be read by the tools either (but as you mention it can be easier to edit by humans to fix the errors). [quote="thomas.beale, post:24, topic:5357"] what’s the problem with using a ‘language’? [/quote] I came to openEHR from another ANTLR project so it was exciting to use it again. However things turned less exciting after realizing there are multiple “official” grammars for ADL. I ended up making my ADL parser “compatible” with the grammars Archie is using. Even with my experience with ANTLR, I wouldn’t mind using JSON instead of ADL. Especially since my tools can generate all the code required to deserialize JSON to AOM in 5 programming languages so far (my Kotlin version of the tools use no BMM and ADL parsers and only rely on JSON version of the archetypes/OPT2s). However my initial experience with the JSON version of the archetypes/OPT2s was also problematic since Archie, ADL WB and my tools serialized them differently. All these differences are fixed now (thanks to strictly following the BMM for AM by all the mentioned tools). Replacing ADL for archetypes is probably possible if all the editing is done with GUI editors (by serializing AOM into JSON and other formats). I learned that many implementers will shy away from openEHR because of the DSLs used. I’m sure there will be beautiful new BMM3, ADL3 and EL grammars at least in the “US version” so I might be able to experience both approaches :wink: --- ## Post #26 by @joostholslag I'm trying to open this file in VSCode and associate it with the json-schema, so it supports validation and code completion. I cloned [the schemas](https://github.com/openEHR/specifications-ITS-JSON) to "/Users/joostholslag/src/specifications-ITS-JSON/components/AM". I put the .yml file in ~/src/openEHR workspace and edited ~/src/.vscode/settings.json to exclusively contain: ``` { "json.schemas": [{ "fileMatch": [ "*.yml" ], "url": "/Users/joostholslag/src/specifications-ITS-JSON/components/AM/Release-1.4/Archetype/all.json" }] } ``` based on https://code.visualstudio.com/Docs/languages/json#_mapping-to-a-schema-defined-in-settings this should be enough. But it's not recognised: ![Screenshot 2024-06-21 at 11.24.46|690x431](upload://afperZl0Fhye2INk3ML8BamnZ6Z.jpeg) any pointers? --- ## Post #27 by @joostholslag I'm getting 'unresolved tag' errors in vscode: is this a (known) issue? ![Screenshot 2024-06-21 at 11.31.49|690x448](upload://18nV7tqBMHCiNE1C1X0gGYHvw66.jpeg) --- **Canonical:** https://discourse.openehr.org/t/archetypes-in-yaml/5357 **Original content:** https://discourse.openehr.org/t/archetypes-in-yaml/5357