Hi Bert,
why the validator should need to continue traversing the instance?
Hi Pablo, because in the attributes are often also complex OpenEhr datatypes, so the validator needs to check these complex data types in the attributes too, and those datatypes again can have complex datatypes. In case of this example: Dv_Text matches {*} you’ll need to check everything, every structure, until you reach the leaf nodes, which, in this example can be anything. Only then, you can be sure that the data set is OpenEhr compliant.
That was my point
The validation that needs to reach leaf nodes is not the archetype validation, but the IM structure validation. That has nothing to do with the open constraint {*} in the archetype. In fact, that validation can be done completely without considering the archetype. What I said about using the XSD is just one way of implementation, you can do that by code also.
The thing is that a DvText can have the attribute: mappings and then can find a the attribute: purpose, of type DvCodedText, which again can have an attribute: mappings, which can again have an attribute: purpose, etc.
I got it ![]()
So, the occurrence of the leafnode can be far away, and still be compliant with the statement: DvText matches {*}, and a 100% compliant validator will need to follow al these steps. Of course this is not a normal situation, but it can happen. As said, we cannot always control incoming data sets. There maybe buggy software in the ecosystem where a kernel runs.
That really depends on implementation. Let say the system doesn’t control the input, so you can receive anything, for example binary data where you expect a dv_quantity. In that case, what I proposed implicitly is to have a 2 phase validator, 1st syntactic (against the IM, yes we need to reach leaf nodes here!), 2nd semantic (IMO we can prune the validator if we reach stuff like {*}). If the 1st phase returns invalid, there’s no need to execute the 2nd. If you execute the second, you’ll never reach an infinite recursion because of pruning.
Sorry, maybe I can’t explain myself clearly, is difficult to show the on email. Maybe others can validate or deny this.
To be safe and with feasibility in mind, a validator would need to stop validating, at some arbitrary point, although there is no error. So a validator which follow the rules for 100% is dangerous! it can crash a system.
Having two phase validators, I don’t know if there’s any case that you didn’t cover 100% and might get valid from invalid data or cover 100% and end with stack overflow. Finding a counter case would be enough to invalid my proposal ![]()
That was my point.
You are right in your statement, that when a part of an archetype is wildcarded, the XSD is the place where to find the validation rules.
Maybe the problem is trying to validate against the archetype at first and then validate the IM. I think it should be IM 1st and AM 2nd. But of course, I may overlooked some pathological case and this might not work on 100% of the cases.
Another thing that might be helpful is not to use archetypes directly, use OPTs. I learned that in the hard way. OPTs can contain the whole structure and constraints of specific compositions. So if someone specifies DV_TEXT in the OPT, my interpretation is they don’t need a DV_CODED_TEXT there. Also, an OPT is all in one file, while with archetypes you have to deal with slots (argghhhh). In fact, right now I’m changing all my systems adding OPT support. Simpler to validate, simpler to query.
Cheers,
Pablo.
Best regards
Bert