ADL formalisms

pablo · 18 June 2024 18:56

My point is to use the same format that is used as a serialization for the AOM to represent the logic too. That is (from the examples you shared openEHR CDS, Guidelines and Planning Examples) to represent (tnm_t > '1a' or tnm_n > '0') as, for instance in JSON, something like:

{
    "or": [
        {
            ">": [
                "tnm_t",
                "1a"
            ]
        },
        {
            ">": [
                "tmn_n",
                "0"
            ]
        }
    ]
}

Analogous with YAML, XML or whatever.

I wouldn’t get into that, coupling a specific language in our metadata artifacts is bad for anyone not using that specific language. In order to be ecumenical, something generic enough is better. That is why I mentioned using declarative expressions in the same serialization format to simplify parsing the logic part.

thomas.beale · 18 June 2024 20:44

right - but that’s an object dump of in-memory meta-model objects, not a syntax. So it’s fine for machine read/write, but no use for humans to understand or write.

I’m generically assuming that object dump (of meta-model objects) representation in JSON or anything else is always technically available. It’s just not a format that humans would ever use.

There isn’t anything, only general purpose languages mal-adapted to the problem. Users have to figure out how to make these work for the HIT / CDS space, and it’s not trivial.

pablo · 18 June 2024 23:21

It’s not exactly an object dump from memory, and it’s not meant for humans to understand, read or write manually, as it shouldn’t be for the cADL part neither, though as said before, modeling tools are not there yet so modelers have to do manual tweaks.

The problem with logic expressions is for that to be written manually you need more than the syntax, like in any programming language, you have, for instance, a compiler/interpreter/transpiler and a debugger. You can’t just define the syntax and say “implement this”, and there is no way to check the expressions if the other elements are not defined.

What the declarative notation allows is to avoid manual writing of expressions in favor of higher level tools, like visual programming with blocks. So humans would really drag and drop building blocks like in scratch (https://scratch.mit.edu/projects/238763/editor/) so the expression will be syntactically correct at design time.

I would prefer to adopt “things” in favor of simplifying implementation and avoid manual intervention as much as possible in formats representing openEHR artifacts. I see this as a pro not as a con. The manual intervention today is mostly because of shortcoming of modeling tools, not because it’s a hard requirement from modelers. We could even have a binary format for archetypes and that would be OK IIF modeling tools are expressive enough that no modeler needs that manual intervention.

I’m not sure. If you choose Java to be your expression language, then .Net devs will need to transpile the expressions to C#, making their lives a little more miserable in the process. The problem is not technical, since modern programming languages don’t have many shortcomings in terms of representing logic. The argument is more political than other thing: instead of favoring one group over all others, we can make an abstraction that is easily mappable to any technology, and we could even provide reference implementations in different languages.

Behind all this is the issue of having different languages for things that are very similar, since we need to also manage their versions. We have archetype assertions, two expression languages, I think there are also two GDL expression languages, etc. each with their own versioning and their own syntax, which IMHO is a mess to maintain going forward. That is why I would prefer a declarative approach (and because it’s something I already tested and worked OK, I can provide details of my implementation if anyone is interested).

thomas.beale · 19 June 2024 17:58

The core expression language is the easiest part of the whole equation. Nearly every host language does expressions in the same way. There might be differences to do with e.g. '&&' versus 'and' or snake_case v CamelCase v kebab-case, but that’s about it. Basic logic expressions aren’t the problem.

It’s when you want to do more than write a simple expression, e.g. use temporal logic operators, use coded terms as first order elements, write a decision table, convert numeric ranges to symbolic ones (a basic necessity in lab systems) that you have to do a lot more work. You can do these kinds of things using more basic capabilities of each native language - but they will be a) a lot of code and b) not interoperable, since they solution will be different in each language. So there is no sharing of guideline logic modules (for example) possible. This is what companies do right now - and one reason why we have no shared guidelines language.

That is true. That is why we started work on Expression Language (EL), to unite all this, but also support needs not supported in general purpose programming languages. EL isn’t finished, but there is a pretty complete set of grammars and test cases here.

We can start that effort again, or try to make all this work in a general purpose language, but the challenges will not disappear - we will just be moving deck-chairs around on the Titanic.

I think you are solving a different problem here - the question of a standardised machine representation. But the jsonlogic solution won’t address most of the list above e.g. temporal operators, coded terms and so on. So it doesn’t get us that far. It still might be useful for sharing simple expressions though.

pablo · 20 June 2024 02:25

My point is: you always need a mapping between the expression and the language that implements the evaluation of the expression. Then if for some that is 1-to-1 and for others requires extra work, that’s unfair (what I mentioned before about the political argument).

Though technically you can do a syntactic mapping (expression syntax to programming language syntax, A.K.A. transpiler), the expression will require some extra context to be able to run, which is not included in the original expression. You could also do a mapping into an expression interpreter.

Besides all the tech part, which I thin is clear, what I would like to be at least considered, is the possibility to explore syntax-less expressions with a declarative approach.

I think I understand what you mean from the use case perspective, what you mention are higher level constructs. I think there we have two big alternatives:

a) To build our basic expression model and make all the higher level constructs built based on that basic expression model, so all the temporal logic, coded terms, conversions, decision tables/trees, etc. are just different arrangement of the same operations/clauses/constructs.

So an engine that can process the basic expression model can process any other higher level construct. This approach is difficult for adding initial extensions since there is a more constrained building block set, but easier to manage, run and standardize.

b) To build our basic expression model and a set of extensions with defined APIs, so that those can be used by the expression model.

That would be easier to extend since there are no much constraints for adding functionality, though it will be more difficult to manage and standardize.

I agree, and might be similar to the approach b) mentioned above, which is more difficult to standardize, though with a good set of rules and tools everything is possible. That is partly (though not the main reason) why I would like “the approach of having a syntax-less expression model represented natively as a common serialization format” to be considered.

I see a lot of value on the model behind the expression language, but I don’t like to much having a “language”, I would prefer to have an expression model and manage artifacts in a syntax-less way. Now I’m starting to repeat myself, I’m getting old…

I’m not pushing for JSON logic, just provided an example so people not familiar with the declarative approach can take a look. I think the expression model should be defined in order to comply to all the requirements you mentioned, and then the evaluation flow should also be defined so a runtime actually does what it’s suppose to do in order to comply with your requirements. Note I didn’t mention syntax at all, since the syntax itself it no so important IMHO because all the requirements could be met without a specific language.

If we take a step back and think about that and what you mentioned about the little syntactic differences between the languages, I think most languages will look almost the same in their AST form, which is the model behind the syntax. That is why I insist on focusing the model and not on the syntax

This is what I tried to do 12 yeas ago:

Design a model that can: a) declare and resolve variable (for instance executing a query in a CDR and extracting single or aggregated values in one opertaion), b) have functional modules (functions like time logic, calculations, assignment), c) have flow control, d) have actions (send an HTTP request, print, log, …).
Have an engine that can manage and evaluate rules on demand, providing input data if needed.

REF: cabolabs-xre-core/src/com/cabolabs/xre/core at master · ppazos/cabolabs-xre-core · GitHub

Have a simple representation of rules for storing and sharing.

REF: cabolabs-xre-engine/rules/rule9_logic_functions.xrl.xml at master · ppazos/cabolabs-xre-engine · GitHub (I know this rule is stupid, bare with me)

And had a platform that allows to see which rules are loaded, see their execution log, see errors, test them, etc.

REF: GitHub - ppazos/cabolabs-xre-iu: XML Rule Enfgine UI for management of rule execution for Clinical Decision Support

I also had a test app that was a client of the rule engine, to display alerts about women that didn’t have a PAP test in the last two years. Here is a presentation with some details about the rule evaluation and execution: XRE demo presentation | PPT

The missing part of that work was a visual rule editor, which today would be easy to build.

I know I would do a lot of things different today, but I have learned a lot from the process of designing and building that platform.

thomas.beale · 20 June 2024 16:33

Or… you build the evaluator. It’s easy. I’ve done many, including a financial rules engine that evaluated expressions with vector and scalar variables.

If you don’t do this, you have to write a transpiler. Both ways work, but I’d raither maintain a native execution engine than a transpiler.

This is pretty much what a transpiler has to do. Every higher level operation is compiled into 50 lines of code that awkwardly represents one line of source code. It’s not out of the question by any means - it’s what any compiler does, in fact. My main interest is in the source language. Whether it is transpiled into TypeScript or whatever, or else executed natively is just a technical choice. But to just test the language and make sure the intended semantics are really working, writing a native execution engine (i.e. an interpreter) is usually what is needed. Not hard at all.

So you might not want the language per se, but we will need a meta-model - see here.

They don’t (if they did, we wouldn’t have JVM updates every 6 months, or 21 versions of Java), but you are right, that is the correct general argument. That’s why the meta-model is the thing of primary importance. A/the syntax just makes it easy to understand, and write. But you are correct, it is not necessary for those who want to author logic expressions etc purely through GUI tools.

The meta-model of a language like Haskell very different to that of Java and similar languages…

This sounds like a description of what we were aiming for with Task Planning and Decision Logic, so i think we are nearly on the same page

pablo · 21 June 2024 02:52

If “code” is the specific language that serves to implement or evaluate the rule/expression, what I tried to say is what happens before going to specific implementation. I meant that the higher level constructs you mentioned could be based on a basic set of expressions, part of the same abstract expression or rule ~~language~~ model. If that is an option we want to consider, it can make everything we create on top of that compatible and manageable. Like every archetype being based on the same RM.

Maybe it’s a dumb question, but why is that a “meta” model and not just a “model”?

Checking the UML it appears to the the AST model for the syntax. When I talk about the model, is the representation of the rule itself, not the syntax. The model represents the constructs that can be used in the rules. Something like this (sorry I don’t have a diagram) cabolabs-xre-core/src/com/cabolabs/xre/core/logic at master · ppazos/cabolabs-xre-core · GitHub

Note I’m not referring to specific language features, but to common parts like variable declarations, assignment, comparison, flow control, loops, code blocks and functions.

I think that is worth exploring at least. Another thing could be to create expressions programmatically, I can imagine that could be of great value for testing the model and the evaluation without the syntax part, and maybe for conformance verification.

Note I never said “ditch the syntax”, I’m just saying it’s not required for rules to work.

It didn’t have the task planning execution and status management, it was pretty stateless and focused on rules triggered by events with each rule was totally isolated and was autonomous (if all the data needed was available, it was able to execute and trigger other actions and/or return a result to the caller), it was also REST, so an app could connect, fire a set of rules and get some feedback, for instance for CDS to show recommendations or alerts to clinicians. So it was more like a very simplified GDL with decision logic. The action part I think it was very powerful since more actions could be defined by devs, so the rule model was pretty extendable for specific needs, and still be syntactically similar to rules using the basic set of actions (it was just XML). As always, lack of time and the need to feed the family cut my research time short. I hope I can get back to that some day. It was a nice piece of engineering, and sadly I couldn’t share it with the openEHR folks.

joostholslag · 21 June 2024 07:51

I’ve had a thought.
The goal of this topic is to find a formalism, incl. serialisation and file extension for expressing design time information model artefacts of clinical concepts: archetypes and templates. (and additionally a use case specific technical model that’s easy to implement for client apps, conforming to those templates)

I agree with Thomas cADL/odin is easier to read and edit than json or even yaml, a key requirement for a design time formalism. The target audience is clinical modellers: clinicians with some IT affinity. So the primary interaction with this formalism should be (and is) through a GUI. So it requires a schema and serialisation in order for multiple tools to edit the same artefact. And in order to work in the openEHR tool chain there should be a common formalism incl serialisation and file extension, between the tools for editing, reviewing, publishing, importing to CDR etc…
And as clearly stated by many, and summarised by Borut, all our DSLs, like Odin, hamper adoption of openEHR. And they require specialised software that requires maintaining by a tiny group of overly busy people (those in this topic mainly).
Now either yaml and json are good candidates to meet all those requirements. Only point is that we concluded being able to hand edit the formalism is a key requirement. As analysed by many, especially Silje. Now if hand editing is done by clinical modellers, it’s important the formalism is legibly and ‘easy’ to understand for non programmers. ADL is in this regard better than json, but still hard. It seems to be that hand editing the ADL (json/odin whatever) should be done in any text editor (that supports schema validation).
But I think we don’t need this for those cases that require hand editing by clinical modellers. I think it makes much more sense to have the current GUI editing tools support hand editing of this formalism. And suddenly many downsides of a standard serialisation as described in this topic (accidentally breaking the indentation hierarchy, micro syntaxes, invalid json etc.) can be fixed by the tool.
At the same time ADL is still hard to understand for clinicians due to e.g. meaningless codes for nodes. ADL would be so much easier to edit if instead of “at0009” the node would be displayed as “systolic bloodpressure”.
Another one would be a “live preview” of manual changes to a model, ‘easy’ in a gui tool, very hard to do in a way that any editor supports it out of the box.
So I propose to specify one required formalism for design time artefacts that fully conforms to standard serialisation and aom schema, and let GUI tools support hand editing that formalism.
Both Nedap archetype editor and Better Archetype Designer has a rundemetary implementation of this idea: Sign In with Auth0
Or “adl” tab in Archetype Designer.

I’m very curious for the people actually involved in building these tools like: @borut.fabjan @VeraPrinsen @sebastian.garde

ian.mcnicoll · 21 June 2024 08:19

At the same time ADL is still hard to understand for clinicians due to e.g. meaningless codes for nodes. ADL would be so much easier to edit if instead of “at0009” the node would be displayed as “systolic bloodpressure”.

I agree in principle but there are all sorts of issues around the stability of that generated key (if the human name changes) and in any caee that would need to be an aDL3 issue.

My interim suggestion would be either to

Allow the use of “at0009 | Systolic |” as node identifier. Everything that is piped is ignored
Just add another pseudo-node like ‘@node_name’ alongside the node identifier, generated from the archetypeNodeID ontology, and just to be used as a navigational aid.

Either would solve the problem of navigability of the tree based solely on the atCodes, without needing comments, as are currently in ADL.

There is separate argument for human-readable keys but also a lot of issues to think through.

borut.jures · 21 June 2024 08:30

I understood this as a GUI tool looking up “at0009” and displaying it as “systolic blood pressure”.

The GUI tool would always use the current human name from the terminology section. It wouldn’t serialize it to the JSON.

These two approaches are still possible (and nice to have in the future), but not necessary if the GUI tool displays the “systolic blood pressure” found in the terminology section of an archetype.

ian.mcnicoll · 21 June 2024 08:40

Of course, that is assumed,. This is about where someone has to look at the json/yaml directly. What, I think, @joost was saying is that a major problem is that is difficult to navigate the tree when the nodes are only identified by atCodes alone - so you have to keep doing lookups to orientate. ADL solves this by adding comments against the nodeIds.

So I can easily find ‘Overall description’ via a simple text search, or eyeballing.

joostholslag · 21 June 2024 08:54

I’m not saying replace the at-code with a human readable identifiers in the adl. I’m saying, let the tool show the human readable identifier in the place of the at-code.

so exactly this:

So I’m saying if a clinical modeller has to look at the json/yaml/adl directly: let them do it from a tool, that visualises the ‘code’/constraints/model according to the suggestions above. If you open the formalism in a regular IDE, it will show the ‘normal’/raw yaml/json.

This is where the idea came from, but it’s still not ideal. Visualising the human name where an at-code is, would help a lot. We would also need a way to actually see the code, e.g. with a hover, or a toggle between codes and human name.

ian.mcnicoll · 21 June 2024 09:07

Of course I agree but I’d still like the fall-back position of being able to open a plain old text editor and a syntax that allows me to easily identify the nodes - this needs to work for developers and other users not just those of us who use GUI tools.

borut.jures · 21 June 2024 09:12

I agree. This can be solved by your suggestion:

GUI tools can implement a lookup for the codes and when serialized add the “piped” version to them.

It shouldn’t cause an issue with implementers since we already have the code to skip the “piped” part.

joostholslag · 21 June 2024 09:28

Sure, but for these edge of edge cases for clinical modellers, and frequently for developers, probably the native yaml/json as created by Sebastian and adl workbench is fine?

I’m fine with this, but it’s a little beside my point of accepting microsyntaxes etc. only in openEHR specific tools, and keep the required formalism for archetypes and templates vanilla json/yaml.

thomas.beale · 21 June 2024 18:00

Some things can be made to work this way, e.g. probably to make Terminology code work as a built-in type. But some semantics are just not supported by a basic language meta-model. E.g. lambdas, monads etc.

Just means ‘model of a language / formalism’ where, distinguished from ‘model’ which is an instance of the language, but is also a model of something from a domain. So in this scheme of things, AOM is a meta-model, and the heart rate archetype is a model. These terms are relative of course. THere’s a good paper on levels of meta - see particularly figure 7.

This is currently how GDL2 is created - only via GUI tools. Certainly works but also limits what you can do by other means.

ODIN is annoying now, because you can do the same thing in JSON, YAML etc, i.e. there is an alternative (and that’s why we should replace ODIN with one or both of those where it is currently in use).

However, not all DSLs have obvious industry-standard equivalents, in our case, cADL, and GDL.

That’s true, hence the ADL3 proposal to move to symbolic keys plus asserted IS-A relationships (since once you leave the dotted code form, you lose the implicit knowledge that at0.4.20 is a specialisation of at0.4 etc).