ADL 1.4 embedded ODIN versus ODIN

thomas.beale · 26 October 2021 11:27

I’m working on an improved set of Antlr4 grammars, including for ADL 1.4. These are modal grammars, which provide much better ability to deal with changing syntaxes.

So an example of what we can find in an ADL 1.4. archetype is this:

	ELEMENT[at0002] occurrences matches {0..1} matches {    -- X offset
		value matches {
			C_DV_QUANTITY <
				property = <[openehr::122]>
			>
			DV_COUNT matches {
				magnitude matches {|>=0|}
			}
		}
	}

Here we have a block of ODIN inline within CADL, with a leading type-marker (‘C_DV_QUANITY’). It follows the examples shown in the ADL 1.4 spec, which don’t unfortunately follow ODIN (type-markers described here).. (This incompatibility is of course my fault many years ago!)

If official ODIN were followed, the above would like like this:

	ELEMENT[at0002] occurrences matches {0..1} matches {    -- X offset
		value matches {
			(C_DV_QUANTITY) <
				property = <[openehr::122]>
			>
			DV_COUNT matches {
				magnitude matches {|>=0|}
			}
		}
	}

A line with '(C_DV_QUANTITY) <' is then much easier to distinguish in a parser from a normal CADL block (e.g. 'DV_COUNT matches {', in the above).

Since ADL 1.4 archetypes are going to be around for a while yet, I am wondering if:

in CKM, we could rewrite the ODIN blocks with type markers to the proper form (i.e. add parentheses) (Q for @sebastian.garde )
current modelling tools like AD handle the proper form, or only the ‘wrong’ form - this is mainly a question for serialisers I think (Q for @borut.fabjan and/or others)
Archie’s ADL1.4 reader handles both forms (Q for @pieterbos )

I can work around the wrong ODIN, but it is pretty painful. Maybe fixing the situation is even more painful, but I thought I’d ask.

pieterbos · 26 October 2021 12:22

It does not.

I also see no benefit in changing this, and quite a high cost overall.

For archie specific, it is an easy change to support it.

sebastian.garde · 26 October 2021 12:31

Doesn’t this change the canonical MD5 hash? If so it would be rather painful.
I assume that this requires changes not only to the Archie parser, but also to the older Java Ref Impl and Eiffel parsers to properly support this at the moment.
Even if, that gives you the extract part of it, we would then still need to be able to properly serialise it with this change for all existing archetypes.
I am sure all of this could be done, but it is a rather high cost coordinating this across tooling and has potential for problems and confusion - also knowing the pain of currently dealing with the specifics of C_DV_QUANTITY.
I personally think that fixing this is more painful than leaving it as it is.

pieterbos · 26 October 2021 12:57

Archie parses it as:

domainSpecificExtension: type_id '<' odin_text? '>';

github.com

openEHR/archie/blob/master/grammars/src/main/antlr/cadl14.g4#L16

    
      
          // license:     Apache 2.0 License <http://www.apache.org/licenses/LICENSE-2.0.html>
          //
          
          
grammar cadl14;
          import adl_rules14, odin14;
          
          
//
          //  ======================= Top-level Objects ========================
          //
          
          
domainSpecificExtension: type_id '<' odin_text? '>';
          
          
c_complex_object: type_id (atTypeId)? c_occurrences? ( SYM_MATCHES '{' (c_attribute_def+ | '*') '}' )? ;
          
          
atTypeId: '[' ( AT_CODE ) ']';
          
          
// ======================== Components =======================
          
          
c_objects: c_non_primitive_object_ordered+ | c_primitive_object ;
          
          
sibling_order: ( SYM_AFTER | SYM_BEFORE ) '[' AT_CODE ']' ;

Doesn’t that just fit into the modular grammar?

thomas.beale · 26 October 2021 18:36

Erm… yes! But what is the general approach to rewriting ADL1.4 syntax to correct errors? The MD5 will always break. New patch version at least, I guess.

Agree with your other comments.

It turns out it does I had pulled my much more generic ODIN handling from CADL2 (used for _default representation), which allows swapping to JSON etc, but this was getting a bit complicated (because of the problem alluded to above). But in fact just doing inline non-modal ODIN processing for ADL1.4 does still work (given I have more complex and reusable modal grammars now) - I didn’t think to just use that old rule.

So… no change needed - and I have a nearly complete set of new modal Antlr4 parsers now, including full ADL1.4 coverage.

Thanks for the input.

sebastian.garde · 26 October 2021 19:25

Ah, ok then. I thought you were talking about changing the [serialised] adl for all existing archetypes in all revisions. Rewriting history

pieterbos · 27 October 2021 07:33

Good to hear that solves it. Otherwise perhaps the type_id followed by ‘<’ at that point could be a hint to switch modes?

thomas.beale · 27 October 2021 09:48

Indeed - that’s how that kind of thing usually works. For academic amusement, this is the code I had created to do this previously. It actually does work (a further tweak still required), but in the end the native ODIN solution is much simpler, because we are not thinking to make ODIN-in-ADL1.4 generically replaceable by JSON etc - we just need to parse it and handle it, most likely just well enough to convert through to AOM2, which is what I assume Archie is doing, and what AD already does.

// -------------------------- Modal lexers -----------------------------
// match condition and mode to grab included ODIN blocks;
// these do explicit whitespace handling since they have to capture
// everything so it can be passed to other parsers.
// Here we match a type marker and ODIN open bracket; if there
// are no parentheses around the type, we add them

ODIN_BLOCK_START: TYPE_MARKER WS? '<' WS? EOL
    {
        String typeStr = getText();
        if (typeStr.charAt(0) != '(')
            setText ("(" + typeStr + ")");
    }
    -> mode (ODIN_BLOCK);

fragment TYPE_MARKER: '(' TYPE_INFO ')' | TYPE_INFO ;
fragment TYPE_INFO: ALPHA_CHAR ALPHANUM_US_CHAR* ;

//
// The end of an ADL 1.4. ODIN block is tricky because the type markers
// do not have (), so we resort to various tricks to catch what we
// hope are all the cases
//
mode ODIN_BLOCK ;
// Case 1: we hit another ODIN line starting with
//         'TYPE_MARKER <'
//      which we assume is a sibling, and therefore
//      a distinct constrainer object.
ODIN_BLOCK_START_OB: TYPE_MARKER WS? '<' WS? EOL
    {
        String typeStr = getText();
        if (typeStr.charAt(0) != '(')
            setText ("(" + typeStr + ")");
    }
    -> type (ODIN_BLOCK_START);

// Case 2: we hit a sibling CADL constrainer line e.g. the line
//      starting with 'DV_COUNT' in the example below
//                ELEMENT[at0002] occurrences matches {0..1} matches {
//                value matches {
//                    C_DV_QUANTITY <
//                        property = <[openehr::122]>
//                  >
//                  DV_COUNT matches {
//                        magnitude matches {|>=0|}
//                       }
//                }
//            }
//      We want to revert to normal parsing for this.
//      This will not catch difficult cases like type names
//      with generics e.g. POINT_EVENT<ITEM_TREE>, but these
//      never occur in real ADL 1.4 archetypes anyway.
CADL_BLOCK_START: WS? UC_ID WS?
    { setText (getText().trim()); }
    -> mode (DEFAULT_MODE), type (UC_ID) ;

// Case 3: we spot a following '}' this means the outer block has been closed,
//      and we can return the usual token (SYM_RCURLY)
//      Just in case, we rewrite the matched text to just the symbol.
CADL_BLOCK_END: WS '}' WS? EOL
    { setText ("}"); }
    -> mode(DEFAULT_MODE), type(SYM_RCURLY) ;

// anything else, we just suck up.
ODIN_BLOCK_LINE : ODIN_TEXT EOL ;
fragment ODIN_TEXT: ~[{}\n]* ; 
// More tricks required to handle {} inside ""...