Validating an objecrt against its archetype

Jim Alateras wrote:

Question 1

I use The following code fragment to create an object from an archetype

rmobj = archetype.buildRMObject(valueMap, errorMap, sysmap());

One of my constraints for this archetype is optional, as shown below

ELEMENT[at0004] occurrences matches {0..1} matches {
value matches {
    DVTEXT matches {
        value matches {/.+/}
    }
}

In the valueMap I don't supply a value for this element and therefore it is not created. The 'rmobj' instance is created as expected.

At some later stage, I want to record a value against ELEMENT[at0004]. Do I need to manipulate the 'rmobj directly as shown below

Account account = (Account)rmobj;
Element element = new Element("isDebitAccount",
    new DvText("isDebitAccount"), isDebitAccount);
account.getDetails().items().add(element);

You can manipulate the in-memory object as you normally do in non-archetype based software. But by doing that you take the risk to leave logic specific to certain archetypes in your code, which would break the constraints with the archetype if e.g the details is changed in new definition of the archetype. The result is that your software can't be as much adaptive as it could be with dependency to certain archetype.

or can I create it using a path expression?

Such feature doesn't exist now.

Instead of modify the object directly, you should perhaps create a new one instead by using archetype.buildRMObject(). The reasons are:
1, you don't want to have any specific knowledge of details of certain archetypes.
2, since the high level objects, Composition, Party, EHR etc, are always versioned, you can always create new ones instead of overwriting the old ones.

Rong,

Just want to discuss this a bit further. From programming perspective would you work with valueMap objects (where the key is a path) and only create an instance of an information model object, from an archetype, when you have collected all the required information?

Yes, this is exactly the way what I would like to improve the current object creation logic of the kernel. Using path, more specifically runtime path, as the key is probably more sensible way to organize input values. The key benefit is that it will allow more than one data node to be created on one archetype node (e.g. CMultipleAttribute).

Alternatively, if you are performance sensitive do you work with the information model and use archetype for validation (i.e. validate the model against the constraints).

Suppose archetypes are classes in java and rim objects are java objects instances created by java classes (as they always are), rim objects should comply with the constraints in archetypes just the same as java objects are always valid according to java class definitions since it is reinforced by the java runtime environment. We perhaps can achieve this with kernel by only allowing rim objects to be created by archetypes directly, then there is no need to validate rim objects against archetypes.

On the other hand, if one would like to validate input data without creating any rim objects, it is possible to perform on raw data (non-rim objects) on the leaf constraints level where the constraints are mostly about data range of primitive data types.

For other type of validation, which usually involves large objects and their internal structures (defined by archetypes), I am not sure it worth the risk of coupling to archetypes for better performance (object creation in memory is pretty fast anyway).

[Apologies, if I haven't quite grasped the core concepts to building applications using archetypes]

No problem, we are all exploring :slight_smile:

Would be interesting to see other's view on this...

Rong

Thomas Beale wrote:

Hi Jim, Rong,

I am not yet up to speed on the Java code, but will give some generic answers, based on our previous Australian kernel implementation (based on the pre-openEHR GeHR models).

Thomas,

Thanks for sharing this with us.

Until we get a proper standardised API defined for this, you can do it how you like - as long as the archetype is respected. In our GeHR implementation, we had two ways of doing it, approximately as follows:

1. insert_at_path(some_path, an_element)
Note that the path here is the runtime path, which is always unique. You can thus always insert exactly where you want.

2. routines which navigated a cursor through a structure and then allowed an insert once you had arrived where you wanted. This kind of API had a lot of routines, and was a lot like a data structure library.

I now believe the path-based approach to be clearer, more concise (i.e. smaller API) and more in line with how people think today (i.e due to XML-indoctrination;-). But neither is more right technically speaking.

The main thing is that the insert_at_path() (or whatever you call it) call must check the relevant archetype node that the insertion is allowable, and it should fail if it is not. Your code should always know in advance what node this is, due to the choosing of archetypes beforehand (e.g. by the GUI forms).

This should work for single object node insertion since most of the object nodes with their runtime name are already created.

But I start to wonder if runtime path is enough for data creation from scratch, which requires:
1. path is unique so it can be used as key to group input values;
2. link to the original archetype node;

1) is satisfied by using current runtime path,
2) is not fully satisfied, since runtime path consists of mainly data node names (LOCATABLE.name()) and node names could either be the text value in the local language of archetype_node_id code of the node or explicitly set by the user. If more than one data node should be created by the same archetype node, the name should include both the text value and a modifier to make it unique. The algorithm used for generating unique name modifier can be predefined so it is possible to find the original archetype node. But it will fail if the name is explicitly set by a user.

Besides, it is preferred to have archetype node id as the direct link instead of some text values in local languages, which mainly is meant to be used by humans.

Perhaps a combination of both archetype id and some kind of modifier based on simple algorithm, eg. a counter could be used to form the path for object creation. It is bit like archetype path, but it is unique and easier to process.

Examples:

Suppose the "occurrences" of an archetype node[at0004] is "0..*", meaning from from zero to many nodes can be created by this archetype node. The archetype path to this node is:

/[at0001]/action[at0002]/representation[at0003]/items[at0004]/

The paths used to bind input data to the archetype node, notice "-1" and "-2" used as suffix of the archetype node id, which stands for the first and second object node created from the same archetype node.

/[at0001]/action[at0002]/representation[at0003]/items[at0004]-1/value/
/[at0001]/action[at0002]/representation[at0003]/items[at0004]-2/value/

And this should work well with multiple optional nodes from same level or different levels.

paths for two data nodes created out of two archetype nodes from same level, respectively:
/[at0001]/action[at0002]/representation[at0003]/items[at0004]-1/value/
/[at0001]/action[at0002]/representation[at0003]/items[at0004]-2/value/
/[at0001]/action[at0002]/representation[at0003]/items[at0005]-1/value/
/[at0001]/action[at0002]/representation[at0003]/items[at0005]-2/value/

paths for two data nodes created out of two archetype nodes having different parent data nodes originally from the same archetype node, respectively:
/[at0001]/action[at0002]/representation[at0003]/items[at0004]-1/items[at0005]-1/value/
/[at0001]/action[at0002]/representation[at0003]/items[at0004]-1/items[at0005]-2/value/
/[at0001]/action[at0002]/representation[at0003]/items[at0004]-2/items[at0006]-1/value/
/[at0001]/action[at0002]/representation[at0003]/items[at0004]-2/items[at0006]-2/value/

Cheers,

Rong

Rong Chen wrote:

The main thing is that the insert_at_path() (or whatever you call it) call must check the relevant archetype node that the insertion is allowable, and it should fail if it is not. Your code should always know in advance what node this is, due to the choosing of archetypes beforehand (e.g. by the GUI forms).

This should work for single object node insertion since most of the object nodes with their runtime name are already created.

But I start to wonder if runtime path is enough for data creation from scratch, which requires:
1. path is unique so it can be used as key to group input values;
2. link to the original archetype node;

You are right - in my explanation, I did not try to be complete. There is a step of major importance which should be carried out by the assembled archetype (i..e after slot-filling has occurred) structure in memory, namely to generate a default data structure. This should be done by defining a method create_default() or similar, which can be called recursively down the archteype hierarchy; the effect of this method called on every C_OBJECT is to create the appropriate data instance e.g. an OBSERVATION or ELEMENT object. The overall result is that in the vast majority of cases, there is a reasoable (often complete, apart from values) data object to work with. Then modifications can proceed by using paths.

However, even when following this desing aproach, there are occasions when this is not enough. Imagine the archteype in question is the openEHR BP measurement archetype; in this archetype, there is a subtree called 'protocol', which has existence = 0..1, i..e optional. It would be reasonable for the create_default() method not to create this subtree at all in the data. But what if the user does want it (1 time out of 50)? Then what the software has to do is to call create_default() on the protocl subtree of the BP archetype, and attach the result into the correct place in the main data structure created with the original create_default() call.

In general I would never expect data to be created completely from scratch - the only archetype which would justify this is the notional 'any' archetype which allows absolutely anything.

During this data create/modify phase Inside the kernel, there are both data instances and archetype instances. Each data instance has to be connected logically to its corresponding archetype node. This could be effected by pointers/references (which is what we did in our GeHR kernel of 3 years ago) or could just be tracked logically by using paths (every data node has embedded in it a value for the attribute archetype_node_id, inherited from he LOCATABLE class).

1) is satisfied by using current runtime path,
2) is not fully satisfied, since runtime path consists of mainly data node names (LOCATABLE.name()) and node names could either be the text value in the local language of archetype_node_id code of the node or explicitly set by the user. If more than one data node should be created by the same archetype node, the name should include both the text value and a modifier to make it unique.

Not 'could'; it must: it is a condition of correctness that no two sibling LOCATABLEs can ever have the same name value.

The algorithm used for generating unique name modifier can be predefined so it is possible to find the original archetype node. But it will fail if the name is explicitly set by a user.

Automatic generation (e.g. of "_1", "_2", or "(1)", "(2)" etc) will work fine in many cases; but user setting has to be allowed as well. Everything will be fine as long as when the relevant insert() or append() method is called, that a uniqueness precondition is observed, as shown in the following signature:
    insert(new_item: LOCATABLE) is
       pre: not this.has(new_item)

where the 'has()' method compares LOCATABLE objects on the basis of their names.

Besides, it is preferred to have archetype node id as the direct link instead of some text values in local languages, which mainly is meant to be used by humans.

Perhaps a combination of both archetype id and some kind of modifier based on simple algorithm, eg. a counter could be used to form the path for object creation. It is bit like archetype path, but it is unique and easier to process.

Examples:

Suppose the "occurrences" of an archetype node[at0004] is "0..*", meaning from from zero to many nodes can be created by this archetype node. The archetype path to this node is:

/[at0001]/action[at0002]/representation[at0003]/items[at0004]/

The paths used to bind input data to the archetype node, notice "-1" and "-2" used as suffix of the archetype node id, which stands for the first and second object node created from the same archetype node.

/[at0001]/action[at0002]/representation[at0003]/items[at0004]-1/value/
/[at0001]/action[at0002]/representation[at0003]/items[at0004]-2/value/

This is the right kind of idea, but not technically legal. The [] characters delimit the qualifier part of each path segment (i.e. the optional part required to differentiate the children of an attribute which is a List<T> or similar). So any other characters which serve this purpose have to go inside it. In theory you would achieve this by doing [at0004-1]. However, in _archetype_ paths, the thing inside the [] has to be a legal 'at' code - 'at0004-1' is not. The underlying problem is that we are mixing lexical data (the "-1") with codes ('at0004'). In a runtime path on the other hand, the items inside the [] _are_ lexical; as Rong said above - they are the values of the codes, in the local language. You can legally add "-1" or whatever to these with no problem.

The downside may seem to be that runtime paths are language-dependent, and somehow not universal. This is indeed the case, and it is the intention: runtime paths are made of values chosen at runtime for the 'name' attribute in data nodes; by definition this is in some language and relates to the local context. The archetype_node_ids are always there to allow universal, language-independent arhcetype paths to be re-constructed.

- thomas

Rong

Some ideas from the experience we have had…

The main thing is that the insert_at_path() (or whatever you call it) call must check the relevant archetype node that the insertion is allowable, and it should fail if it is not. Your code should always know in advance what node this is, due to the choosing of archetypes beforehand (e.g. by the GUI forms).

Generally, if it is an insert it will be a section or an entry, so you need to check that there is a valid slot at the node specified by the path. If the slot is not specified, then it should go in the first valid slot.

This should work for single object node insertion since most of the object nodes with their runtime name are already created.

But I start to wonder if runtime path is enough for data creation from scratch, which requires:

  1. path is unique so it can be used as key to group input values;
  2. link to the original archetype node;

You are right - in my explanation, I did not try to be complete. There is a step of major importance which should be carried out by the assembled archetype (i..e after slot-filling has occurred) structure in memory, namely to generate a default data structure. This should be done by defining a method create_default() or similar, which can be called recursively down the archteype hierarchy; the effect of this method called on every C_OBJECT is to create the appropriate data instance e.g. an OBSERVATION or ELEMENT object. The overall result is that in the vast majority of cases, there is a reasoable (often complete, apart from values) data object to work with. Then modifications can proceed by using paths.

However, even when following this desing aproach, there are occasions when this is not enough. Imagine the archteype in question is the openEHR BP measurement archetype; in this archetype, there is a subtree called ‘protocol’, which has existence = 0..1, i..e optional. It would be reasonable for the create_default() method not to create this subtree at all in the data. But what if the user does want it (1 time out of 50)? Then what the software has to do is to call create_default() on the protocl subtree of the BP archetype, and attach the result into the correct place in the main data structure created with the original create_default() call.

Another approach is to have the idea that the default_create builds a ‘prototype’ in memory, which changes state when it is populated. Then saving does not save prototype clusters and elements. This simplifies implementation, unless any call to a path calls the create_default if it is not there.

In general I would never expect data to be created completely from scratch - the only archetype which would justify this is the notional ‘any’ archetype which allows absolutely anything.

During this data create/modify phase Inside the kernel, there are both data instances and archetype instances. Each data instance has to be connected logically to its corresponding archetype node. This could be effected by pointers/references (which is what we did in our GeHR kernel of 3 years ago) or could just be tracked logically by using paths (every data node has embedded in it a value for the attribute archetype_node_id, inherited from he LOCATABLE class).

  1. is satisfied by using current runtime path,
  2. is not fully satisfied, since runtime path consists of mainly data node names (LOCATABLE.name()) and node names could either be the text value in the local language of archetype_node_id code of the node or explicitly set by the user. If more than one data node should be created by the same archetype node, the name should include both the text value and a modifier to make it unique.

Not ‘could’; it must: it is a condition of correctness that no two sibling LOCATABLEs can ever have the same name value.

Generally the ‘modifier’ should be set by the user, but requires a temporary machine generated name (e.g with a number appended).

The algorithm used for generating unique name modifier can be predefined so it is possible to find the original archetype node. But it will fail if the name is explicitly set by a user.

Automatic generation (e.g. of “_1”, “_2”, or “(1)”, “(2)” etc) will work fine in many cases; but user setting has to be allowed as well. Everything will be fine as long as when the relevant insert() or append() method is called, that a uniqueness precondition is observed, as shown in the following signature:
insert(new_item: LOCATABLE) is
pre: not this.has(new_item)

where the ‘has()’ method compares LOCATABLE objects on the basis of their names.

Besides, it is preferred to have archetype node id as the direct link instead of some text values in local languages, which mainly is meant to be used by humans.

Perhaps a combination of both archetype id and some kind of modifier based on simple algorithm, eg. a counter could be used to form the path for object creation. It is bit like archetype path, but it is unique and easier to process.

Examples:

Suppose the “occurrences” of an archetype node[at0004] is “0..*”, meaning from from zero to many nodes can be created by this archetype node. The archetype path to this node is:

/[at0001]/action[at0002]/representation[at0003]/items[at0004]/

The paths used to bind input data to the archetype node, notice “-1” and “-2” used as suffix of the archetype node id, which stands for the first and second object node created from the same archetype node.

/[at0001]/action[at0002]/representation[at0003]/items[at0004]-1/value/
/[at0001]/action[at0002]/representation[at0003]/items[at0004]-2/value/

The syntax for an enumeration is available in Xpath - we should stay with that.

This is the right kind of idea, but not technically legal. The characters delimit the qualifier part of each path segment (i.e. the optional part required to differentiate the children of an attribute which is a List or similar). So any other characters which serve this purpose have to go inside it. In theory you would achieve this by doing [at0004-1]. However, in archetype paths, the thing inside the has to be a legal ‘at’ code - ‘at0004-1’ is not. The underlying problem is that we are mixing lexical data (the “-1”) with codes (‘at0004’). In a runtime path on the other hand, the items inside the are lexical; as Rong said above - they are the values of the codes, in the local language. You can legally add “-1” or whatever to these with no problem.

The downside may seem to be that runtime paths are language-dependent, and somehow not universal. This is indeed the case, and it is the intention: runtime paths are made of values chosen at runtime for the ‘name’ attribute in data nodes; by definition this is in some language and relates to the local context. The archetype_node_ids are always there to allow universal, language-independent arhcetype paths to be re-constructed.

  • thomas

If you have any questions about using this list,
please send a message to d.lloyd@openehr.org

  • If you have any questions about using this list, please send a message to d.lloyd@openehr.org