Hello Thomas,
I had already noticed the expressions part and based my experimental implementation on that. This email got quite long, so let’s start with a summary:
Summary:
- The current spec is quite similar to XPath. We can keep this even closer by referencing to the XPath specification in our specification in more places. It allows for tool reuse and resolves ambiguities in the specification.
- Some other problems/questions where found regarding to the spec, including grammar ambiguities and how to handle them and a question about node-ids that exist in the AOM, but not alway in the RM.
I have not implemented the full expression language yet, so I might find more, for example when I implement functions.
XPath and the relation to the expressions language:
Before i note my issues, I would like to point out I noticed the language is very similar to XPath. In fact, you can convert almost all of the expressions language to valid XPath 2.0-expressions with some simple steps:
1. Split into separate statements. For every statement:
2. Replace Apath shorthand notation with xpath: [id1] to [@archetype_node_id = ‘id1’], etc.
3. Replace symbolic form of operators with the textual form
4. Replace for_all … In … … with ‘every $var in /path satisfies …’
5. Replaces implies with ‘if … then …’
6. Replace exists(expression) with count(expression) > 0
Then, get an Xpath implementation that works on your reference model, or just convert to XML first. Then for every assertion, evaluate the expression to a boolean. For every variable declaration, evaluate the expression to the type given in the variable declaration and store it under the given name.
Then implement the standard functions and variables. Functions and variables are part of standard Xpath, and so is defining your own.
If you do this, you just implemented full assertion support with very little effort and code, and very little chance of mistakes!
(If all you have is xpath 1, the for all and implies require manual handling. You might need to do a bit of extra work for some datatypes, especially terminology codes)
Having noticed this, i’m strongly in favour of keeping the syntax as close to Xpath as possible. This means we can reuse tools. Or, if you have reasons to write your own (I do, unfortunately), at least you can validate your implementations easily by testing against a known implementation.
So I would argue strongly in favour of keeping the $var syntax, because it is the same as the xpath-standard.
Some constructions in the expressions have a valid reason why they are different than Xpath, for example, the shorthand notation for archetype node ids really helps. I would say this could include the exists operator, because it expresses something that is often needed and expressing it explicitly allows for some really nice features in user interfaces.
However, I think this does not apply to the for_all and implies statements. If they could be replaced with the corresponding Xpath-syntax, I would think that is a good idea.
Problems in the specification
Here the problems I found in the spec so far:
Multiple-valued paths and type conversion:
* The spec does not say how to handle multiple-valued expressions, outside for_all statements. We could just follow the xpath-standard
* The spec says nothing about type conversion. We could just follow the xpath-standard.
Whitespace aware grammar
The current definition of the language needs a whitespace aware grammar. If not, the following is ambiguous:
$var:Integer ::= /path/to/value
/path/to/another/value > 3
Because there is no way to see which part of /path/to/value/path/to/another/value belongs to the first or second statement without considering whitespace in your parser. And that’s fine in a lexer, but harder to do in a parser – although still possible. Alternatively, it’s easily solved by demarcating your assertions, for example by requiring a ‘;’ after every assertion
The same problem happens in a second place:
for_all $var in /path /some/other/path > $var/subpath
This is actually even a bit hard to read for a human, because the space after /path is easily overlooked. Both the whitespace-awareness and the human readability could be easily solved by replacing for_all with the every .. In … satisfies syntax of xpath.
Node ids in archetype/reference model objects
In archetypes, some nodes have node ids, that have no node id in the corresponding reference model object. This is tricky, because a valid path to an archetype node, converted to Xpath, is NOT a valid path to the corresponding reference model objects. For example, the context attribute of a Composition is an EVENT_CONTEXT. This does not have an archetype node id. But it always has one in the ADL/AOM. So if you write the path /context[id2], you can convert it to Xpath as /composition/context[@archetype_node_id = ‘id2’]. But this will result in an empty node set, because there is no matching attribute called archetype_node_id. Instead, you could just write /context, which works.
So, there are several options to address this in the specification, for example:
1. Specify that paths to non-locatables should NOT have a [idx] predicate, even though the id in the archetype is present
2. Specify that paths to non-locatables can have a [idx] predicate, but it should be ignored in implementations
Option 2 is a harder to implement, because you can no longer convert from Apath to Xpath without knowledge of the model. But as Apath expressions are not new, I’m thinking some other people will have an opinion on this ![]()
Regards,
Pieter Bos