# loss of type information in ID classes **Category:** [Implementers (archive)](https://discourse.openehr.org/c/implementers-archive/158) **Created:** 2007-02-07 04:52 UTC **Views:** 2 **Replies:** 18 **URL:** https://discourse.openehr.org/t/loss-of-type-information-in-id-classes/14622 --- ## Post #1 by @Andrew_Patterson I am looking at a round trip of information through an XML serialization\. Normally, type information can be retained in an xml serializer through the use of xsi:type \(so where an attribute is defined as type   PARTY\_PROXY, we can assign an object   of type PARTY\_SELF to that attribute   and do a round trip serialization and it will   still be PARTY\_SELF\)\. The OBJECT\_ID hierarchy defines an abstract root class with a single attribute called 'value' that is a string\. It then defines other types of ID's such as VERSION\_TREE\_ID, ARCHETYPE\_ID etc that vary only in their interpretation of the 'value' attribute\. This is fine because the typing information \(i\.e\. which ID we are dealing with\) is retained in serialization\. However, there is a also a UID hierarchy that defines classes such as INTERNET\_ID \( com\.microsoft\.vista\), UUID \(ae435abc\-3424\-\.\.\.\)\. It is objects of this hierarchy that VERSION\_TREE\_ID, ARCHETYPE\_ID etc are supposed to return in interpreting their 'value'\. For example, an OBJECT\_VERSION\_ID could be created with 'value' set to F7C5C7B7\-75DB\-4b39\-9A1E\-C0BA9BFDBDEC:: 87284370\-2D4B\-4e3d\-A3F3\-F303D2F4F34B:: 2 The function 'creating\_system\_id' on this type is defined to return a UID from this value\. However, without some magic, it has no way of knowing that 87284370\-2D4B\-4e3d\.\. is of type UUID and not a ISO\_OID or INTERNET\_ID\. The choices I see are: a\) is a magic routine needed here that can correctly     guess the type based on the string value? \(which is probably not that     hard to write \- there aren't that many UID types\) b\) should systems not need to know the types in these     cases and therefore the loss of typing is not a problem    \(though if this is the case, UID needs to be changed to     a concrete class rather than an abstract one\) c\) is this a problem for the XML ITS rather than openehr in     general? \(the xml serialization could be augmented with extra     typing info\) thoughts? Andrew --- ## Post #2 by @thomas.beale Andrew Patterson wrote: > ``` > I am looking at a round trip of information > through an XML serialization. Normally, type > information can be retained in an xml > serializer through the use of xsi:type > (so where an attribute is defined as type > PARTY_PROXY, we can assign an object > of type PARTY_SELF to that attribute > and do a round trip serialization and it will > still be PARTY_SELF). > > The OBJECT_ID hierarchy defines an > abstract root class with a single attribute > called 'value' that is a string. It then defines > other types of ID's such as VERSION_TREE_ID, > ARCHETYPE_ID etc that vary only in their > interpretation of the 'value' attribute. This is fine > because the typing information (i.e. which ID > we are dealing with) is retained in serialization. > > However, there is a also a UID hierarchy that > defines classes such as INTERNET_ID ( > com.microsoft.vista), UUID (ae435abc-3424-...). > It is objects of this hierarchy that VERSION_TREE_ID, > ARCHETYPE_ID etc are supposed to return in > interpreting their 'value'. > > For example, an OBJECT_VERSION_ID could be > created with 'value' set to > > F7C5C7B7-75DB-4b39-9A1E-C0BA9BFDBDEC:: > 87284370-2D4B-4e3d-A3F3-F303D2F4F34B:: > 2 > > The function 'creating_system_id' on this type is > defined to return a UID from this value. However, > without some magic, it has no way of knowing that > 87284370-2D4B-4e3d.. is of type UUID and not a > ISO_OID or INTERNET_ID. > > The choices I see are: > a) is a magic routine needed here that can correctly > guess the type based on the string value? (which is probably not that > hard to write - there aren't that many UID types) > > ``` this is what I have envisaged; as far as I know: - Guids/uuids always follow the same pattern (I don't have it to hand, but everyone knows it) - fixed number of segments of hexadecimal digits and only '-' separators - ISO oids only have '.' separators and numeric segments - domain names have '.' separators and only alpha-numeric segments, and the top-level domain names are all alphabetic names (no numerics). I don't believe there is any real danger of software getting an oid and a domain name confused, although I have not read all the relevant rules on this > ``` > b) should systems not need to know the types in these > cases and therefore the loss of typing is not a problem > (though if this is the case, UID needs to be changed to > a concrete class rather than an abstract one) > > ``` this is also realistic, since the use of each kind of identifier is likely to be systematic in wide jurisdictions, e.g. whole of Australia, whole of UK. In fact, the current openEHR model is a bit of a cop-out since there are few if any standards for what kind of ids will be used in e-health networks around the world; this may change over the next few years, in which case openEHR may adjust. Note that an openEHR EHR has an EHR_STATUS object which might be a smart place to put some kind of clues about identifier use in the EHR. We may also need some kind of application profile object to hold meta-information. > ``` > c) is this a problem for the XML ITS rather than openehr in > general? (the xml serialization could be augmented with extra > typing info) > > ``` although I am not an XML specialist, my general feeling is to use XML (particularly XSD) in pretty much its expected way; being too esoteric gets one into trouble. Since these fields are string fields with typing only visible in the object model, I guess completely orthodox XSD won't help (and it doesn't - see our current XSD - [http://svn.openehr.org/specification/BRANCHES/Release-1.1-candidate/publishing/its/XML-schema/documentation/BaseTypes.xsd.html_h1689213245.html](http://svn.openehr.org/specification/BRANCHES/Release-1.1-candidate/publishing/its/XML-schema/documentation/BaseTypes.xsd.html_h1689213245.html)). But I don't know if this matters. I think it is reasonable to assume that openEHR XML will always be processed by software that knows at least the reference model classes it is expecting, so it will always do the right thing. Magic guessing or a priori site-based or EHR-based clues will still be required based on syntax in that case, which I think is safe enough if my statements above are correct. - thomas --- ## Post #3 by @Andrew_Patterson > a\) is a magic routine needed here that can correctly > guess the type based on the string value? \(which is probably not that > hard to write \- there aren't that many UID types\) > > this is what I have envisaged; as far as I know: > Guids/uuids always follow the same pattern \(I don't have it to hand, but > everyone knows it\) \- fixed number of segments of hexadecimal digits and only > '\-' separators > ISO oids only have '\.' separators and numeric segments > domain names have '\.' separators and only alpha\-numeric segments, and the > top\-level domain names are all alphabetic names \(no numerics\)\. I don't > believe there is any real danger of software getting an oid and a domain > name confused, although I have not read all the relevant rules on this ok \- this makes sense\. Any thoughts on UID as an abstract class/vs concrete? In a system that doesn't care about the type of UID, a receiving system still has to make a choice as to the type of the UID just to construct one \(because UID is declared abstract\)\. I can't see any downside to allowing UID to be a concrete instantiable class\.\. Andrew --- ## Post #4 by @thomas.beale Andrew Patterson wrote: >> a\) is a magic routine needed here that can correctly >> guess the type based on the string value? \(which is probably not that >> hard to write \- there aren't that many UID types\) >> >> this is what I have envisaged; as far as I know: >>     >> Guids/uuids always follow the same pattern \(I don't have it to hand, but >> everyone knows it\) \- fixed number of segments of hexadecimal digits and only >> '\-' separators >> ISO oids only have '\.' separators and numeric segments >> domain names have '\.' separators and only alpha\-numeric segments, and the >> top\-level domain names are all alphabetic names \(no numerics\)\. I don't >> believe there is any real danger of software getting an oid and a domain >> name confused, although I have not read all the relevant rules on this >>     > ok \- this makes sense\. > > Any thoughts on UID as an abstract class/vs concrete? In a system that > doesn't care about the type of UID, a receiving system still has to make > a choice as to the type of the UID just to construct one \(because UID > is declared abstract\)\. I can't see any downside to allowing UID to be > a concrete instantiable class\.\. >   Andrew, sorry to take so long to respond\. Not sure what you want to achieve here: even if the UID type were concrete, you still have to instantiate it following some particular model of an id; all that we have done is limit those to GUID, Oid and Internet name\. Are you suggesting 'anything goes' is preferable? \- thomas --- ## Post #5 by @Andrew_Patterson > sorry to take so long to respond\. Not sure what you want to achieve > here: even if the UID type were concrete, you still have to instantiate > it following some particular model of an id; all that we have done is > limit those to GUID, Oid and Internet name\. Are you suggesting 'anything > goes' is preferable? I'm suggesting you could instantiate it with no constraints on the model of the id \- this may be what a receiving system needs to do because it had no knowledge of the actual correct model \(given that the information has been lost in transit and it may not want to guess\)\. Otherwise, even though the receiving system has no information to base it on, you force it to \_chose\_ one of the concrete instantiations even though the system may want to deal with it using only the semantics of a UID i\.e\. the sending system has given me a unique identifier \- I don't care how it came up with the identifier string or what format it is I just want to use it\. Andrew --- ## Post #6 by @thomas.beale Andrew Patterson wrote: >> sorry to take so long to respond\. Not sure what you want to achieve >> here: even if the UID type were concrete, you still have to instantiate >> it following some particular model of an id; all that we have done is >> limit those to GUID, Oid and Internet name\. Are you suggesting 'anything >> goes' is preferable? >>     > I'm suggesting you could instantiate it with no constraints on > the model of the id \- this may be what a receiving system needs > to do because it had no knowledge of the actual correct model > \(given that the information has been lost in transit and it may not > want to guess\)\. Otherwise, even though the receiving system has > no information to base it on, you force it to \_chose\_ one of the > concrete instantiations even though the system may want to deal > with it using only the semantics of a UID i\.e\. the sending system > has given me a unique identifier \- I don't care how it came up with > the identifier string or what format it is I just want to use it\. >   But you don't need to have a concrete class to do that \- you can statically declare a reference to be of an abstract type, and simply access whatever features are defined in that class \- which is just the attribute 'value' in the case of UID\. So although your variable \(say my\_uid\) will actually be attached to a UUID object, it will be of type UID, and will act accordingly\. \- thomas --- ## Post #7 by @Andrew_Patterson > But you don't need to have a concrete class to do that \- you can > statically declare a reference to be of an abstract type, and simply > access whatever features are defined in that class \- which is just the > attribute 'value' in the case of UID\. So although your variable \(say > my\_uid\) will actually be attached to a UUID object, it will be of type > UID, and will act accordingly\. Yes, I understand once you have a live object graph in whatever environment, it is not important then what the actual object type is\. My specific use case is at the boundary to a system accepting XML RM objects \- the deserializer needs to construct in memory a concrete instance of a UID class, based purely from the XML structure it is presented with\. Given that the typing information has been lost in the serialization \_to\_ XML, it is forced to guess at the proper concrete class to instantiate based on a magic algorithm\. I don't have a problem with that, but thought it might also be useful if it could alternatively say "hey, I don't really care whether I was sent a UUID etc, I'll just instantiate this concrete UID class with the relevant value and carry on"\. Andrew --- ## Post #8 by @thomas.beale Andrew Patterson wrote: >> But you don't need to have a concrete class to do that \- you can >> statically declare a reference to be of an abstract type, and simply >> access whatever features are defined in that class \- which is just the >> attribute 'value' in the case of UID\. So although your variable \(say >> my\_uid\) will actually be attached to a UUID object, it will be of type >> UID, and will act accordingly\. >>     > Yes, I understand once you have a live object graph in > whatever environment, it is not important then what > the actual object type is\. My specific use case is at the > boundary to a system accepting XML RM objects \- the > deserializer needs to construct in memory a concrete > instance of a UID class, based purely from the XML structure > it is presented with\. Given that the typing information has > been lost in the serialization \_to\_ XML, it is forced to > guess at the proper concrete class to instantiate based on a > magic algorithm\. I don't have a problem with that, but thought > it might also be useful if it could alternatively say "hey, I > don't really care whether I was sent a UUID etc, I'll just > instantiate this concrete UID class with the relevant value > and carry on"\. >   Sorry, my mistake \- I had forgotten that this was the root question you were asking in the original mail\. My natural response is: why should the XML tail wag the dog? It doesn't usually do software any good\.\.\.\.but thinking practically\.\.\.the real problem is that we are using 'efficient' XSD as shown in http://svn.openehr.org/specification/BRANCHES/Release-1.1-candidate/publishing/its/XML-schema/documentation/BaseTypes.xsd.html_h619733846.html \- this is space\-efficient, but loses typing at the leaf level as you say\. Solutions seem to be: I am still not convinced this is a problem however; the deserialiser will deserialise an entire ObjectId in one go, and for that it does have typing informtion\. So let's say a HIER\_OBJECT\_ID is found; the string value is given to a constructor of the HIER\_OBJECT\_ID class which pulls it apart, according to the syntax of that class \(see online spec http://svn.openehr.org/specification/BRANCHES/Release-1.1-candidate/publishing/architecture/rm/support_im.pdf for details\)\. The constructor will have to figure out what to do with what it sees \- in root\. I don't see how it could help for it to instantiate a UID when it is likely to have to know what it has \- it will have to use the magic code to determine what it is looking at and create the right thing\. At least that magic code will be shared across all OBJECT\_ID subtypes \(or more\)\. \- thomas --- ## Post #9 by @Heath_Frankel2 Tom, I can see what Andrew is saying here\. We either need to have some fancy logic to determine which sub\-class of UID to construct, make UID concrete or just treat the attributes of type UID as strings\. I guess the patterns for the 3 sub\-types are reasonably well know and different to be able to determine which sub\-class to create in a UID factory\. The alternative is to change OBJECT\_ID to not have a value attribute and specify the more precise attributes in the sub\-classes so that the UID type can be provided in the XML\. However this will make the XML more verbose\. I really wonder what is the value of having the UID subtypes at all apart from pattern validation? Heath --- ## Post #10 by @thomas.beale Heath Frankel wrote: > Tom, > I can see what Andrew is saying here\. We either need to have some fancy > logic to determine which sub\-class of UID to construct, make UID concrete or > just treat the attributes of type UID as strings\. I guess the patterns for > the 3 sub\-types are reasonably well know and different to be able to > determine which sub\-class to create in a UID factory\. > > The alternative is to change OBJECT\_ID to not have a value attribute and > specify the more precise attributes in the sub\-classes so that the UID type > can be provided in the XML\. However this will make the XML more verbose\. >   well, we don't want to do this \- the whole idea is that we have efficient representation without losing semantics\. The real problem is that XML is not good at doing these two together, and that's where the problem has to be solved in my view\. The object model is 'right' in its own terms\. > I really wonder what is the value of having the UID subtypes at all apart > from pattern validation? >   well at some point in the system, the logic is going to need to be able to create a new Guid, dereference an Oid etc\. The alternative would be to have a type UID that was concrete, just having a String value, and a bunch of functions of the form is\_uuid, is\_iso\_oid, is\_internet\_id etc\. This is a hack, and is non\-extensible \(since adding a new subtype means changing and re\-deploying the UID class\), whereas the current solution is extensible \(just add a new subclass\)\. Currently the only solution I can see that doesn't break the object model \(and remember, XML is just one serialised form \- maybe the hype will be over in 5 years time and we can get on using something that actually works;\-\) \- is to use string based pattern matching as discussed earlier in the thread\. It seems solid to me\. If we don't do that then we get this: \* system A has an OBJECT\_ID containing a UUID value \* the object network of the Composition gets serialised and sent to system B \* system B deserialises but all the OBJECT\_ID\.values end up just as UIDs, not UUIDs, ISO\_OIDs etc\. So we lose information\. I don't see this as acceptable\.\.\.\. \- thomas --- ## Post #11 by @Andrew_Patterson > well, we don't want to do this \- the whole idea is that we have > efficient representation without losing semantics\. The real problem is > that XML is not good at doing these two together, and that's where the > problem has to be solved in my view\. The object model is 'right' in its > own terms\. I'm not sure that this is purely an XML problem \- a Java implementation will have to go to reasonably extraordinary lengths to internally maintain the correct type information as well\. From a typing point of view, the OBJECT\_ID hierarchy has the attributes and functions the wrong way around \(the problem is not so much in the UID hierarchy, it's in the classes that reference the UID hierarchy\) OBJECT\_ID       attribute value : string UID\_BASED\_ID       function root : UID       function extension : string The implementation of UID\_BASED\_ID has to duplicate the storage of data, both setting the 'value' attribute to be xxxxx::yyyyy and also maintaining the actual object reference for 'root' so that it can be returned in the function call\. The XML serializer has a problem here because it has no way of storing the 'meta' information of the UID type \(which is why the problem is most noticeable in XML\)\. I would suggest the correct model should be OBJECT\_ID          abstract function value : string UID\_BASED\_ID          attribute root : UID          attribute extension : string          redefine function value : string             \(to return the value of 'root' and 'extension' separated by '::'\) Andrew --- ## Post #12 by @Andrew_Patterson btw just looking at the draft for Support, and INTERNET\_ID seems to be interspersed with all the OBJECT\_ID types, when it is actually a UID subtype \(so probably should be moved earlier in the section \- assuming all the UID types are meant to be documented together\) Andrew --- ## Post #13 by @thomas.beale Andrew Patterson wrote: >> well, we don't want to do this \- the whole idea is that we have >> efficient representation without losing semantics\. The real problem is >> that XML is not good at doing these two together, and that's where the >> problem has to be solved in my view\. The object model is 'right' in its >> own terms\. >>     > I'm not sure that this is purely an XML problem \- a Java implementation > will have to go to reasonably extraordinary lengths to internally > maintain the correct type information as well\. From a typing point > of view, the OBJECT\_ID hierarchy has the attributes and functions > the wrong way around \(the problem is not so much in the UID > hierarchy, it's in the classes that reference the UID hierarchy\) >   well, let's go back to the requirements\. The design intent is to have String identifiers that are efficient for storage and serialisation, while being able to treat them \(or subparts\) as properly typed artefacts\. Doing it the other way round means that there is no way in openEHR to treat ids as Strings \- they are always multi\-attribute items\. In XML this will make for a lot of unnecessary volume\. So the choice we made quite a long time ago was to use String representation and internal parsing to access the bits and pieces \- just like for the ISO 8601 date/time types\. The current model does this \- I wouldn't say it is the wrong way round \- it is just a different design decision than for higher\-level objects\. > OBJECT\_ID >       attribute value : string > > UID\_BASED\_ID >       function root : UID >       function extension : string > > The implementation of UID\_BASED\_ID has to duplicate the storage > of data, both setting the 'value' attribute to be xxxxx::yyyyy and > also maintaining the actual object reference for 'root' so that it can > be returned in the function call\. >   I must be missing something here; all it does in my implementation is extract the piece before \(or after for extension\) the '::' when you call the function\. > The XML serializer has a problem here because it has no way of > storing the 'meta' information of the UID type \(which is why the > problem is most noticeable in XML\)\. > but all it has to do is inspect the string\. I have a dirty bit of code as follows:     string\_to\_uid\(s: STRING\): UID is             \-\- The identifier of the conceptual namespace in which the object exists,             \-\- within the identification scheme\. Returns the part to the left of the             \-\- first '::' separator, if any, or else the whole string\.         require             string\_valid: s /= Void and then not s\.is\_empty         do             create \{UUID\} Result\.default\_create             if Result\.valid\_id \(s\) then                 create \{UUID\} Result\.make\(s\)             else                 create \{ISO\_OID\} Result\.default\_create                 if Result\.valid\_id \(s\) then                     create \{ISO\_OID\} Result\.make\(s\)                 else                     create \{INTERNET\_ID\} Result\.default\_create                     if Result\.valid\_id \(s\) then                         create \{INTERNET\_ID\} Result\.make\(s\)                     else                         \-\- error                     end                 end             end         end \(there are nicer ways to do this obviously\)\. > I would suggest the correct model should be > > OBJECT\_ID >          abstract function value : string > > UID\_BASED\_ID >          attribute root : UID >          attribute extension : string >          redefine function value : string >             \(to return the value of 'root' and 'extension' separated by '::'\) >   this is exactly what we are trying to avoid\. But I don't have any difficulty implementing it either so maybe there is a misunderstanding\. \- thomas --- ## Post #14 by @Andrew_Patterson > artefacts\. Doing it the other way round means that there is no way in > openEHR to treat ids as Strings \- they are always multi\-attribute items\. Well, for serialisation it might not be able to treat them as strings but the abstract UID class could always have a 'value' function that returns the data in string form\.\. other aspects of the system could use that regardless of whether internally they were stored as multi attribute items\.\. > In XML this will make for a lot of unnecessary volume\. So the choice we > made quite a long time ago was to use String representation and internal > parsing to access the bits and pieces \- just like for the ISO 8601 > date/time types\. The current model does this \- I wouldn't say it is the > wrong way round \- it is just a different design decision than for > higher\-level objects\. Fair enough \- the result of this decision is that typing information is lost \- I think the trade\-off needs to be documented explicitly in the spec\. > I must be missing something here; all it does in my implementation is > extract the piece before \(or after for extension\) the '::' when you call > the function\. If you're not worried about guessing at the UID type, this is the way to do it\.\. > but all it has to do is inspect the string\. I have a dirty bit of code > as follows: We seem to have come to the agreement then that some form of string\_to\_uid\(\) function is not just one way of implementing an openehr system, but is actually \_required\_ in any openehr system\. I think some mention of this should be in the section on UIDs\. Andrew --- ## Post #15 by @thomas.beale Andrew Patterson wrote: >> artefacts\. Doing it the other way round means that there is no way in >> openEHR to treat ids as Strings \- they are always multi\-attribute items\. >>     > Well, for serialisation it might not be able to treat them as strings > but the abstract UID class could always have a 'value' function > that returns the data in string form\.\. other aspects of the system could > use that regardless of whether internally they were stored as > multi attribute items\.\. > >> In XML this will make for a lot of unnecessary volume\. So the choice we >> made quite a long time ago was to use String representation and internal >> parsing to access the bits and pieces \- just like for the ISO 8601 >> date/time types\. The current model does this \- I wouldn't say it is the >> wrong way round \- it is just a different design decision than for >> higher\-level objects\. >>     > Fair enough \- the result of this decision is that typing information is > lost \- I think the trade\-off needs to be documented explicitly in > the spec\. >   There is another design reason I forgot to mention to use Strings: it allows identification schemes to change over time, without invalidating existing data\. This might happen with Archetype\_ids, and we will need a Template\_id, which we have not defined yet \- but if we follow the current design approach, it won't matter \- the Ids will just be strings as stored\. >   > We seem to have come to the agreement then that > some form of string\_to\_uid\(\) function is not just one > way of implementing an openehr system, but is actually > \_required\_ in any openehr system\. I think some mention > of this should be in the section on UIDs\. >   Sure \- but I don't see this as controversial \- it seems pretty minor\. But it is no problem to add some implementation notes\. \- thomas --- ## Post #16 by @Andrew_Patterson > Sure \- but I don't see this as controversial \- it seems pretty minor\. Yes, no problems from me \- I just like arguing\.\. :\-\) Andrew --- ## Post #17 by @thomas.beale Andrew Patterson wrote: >> Sure \- but I don't see this as controversial \- it seems pretty minor\. >>     > Yes, no problems from me \- I just like arguing\.\. :\-\) > remind me never to be in court with you ;\-0 \(unless you are defending me;\-\) I will make some additions to the text in the Support IM identification package around this thread and upload in the next day or two\. thanks for the input\. \- thomas --- ## Post #18 by @Gerke_Geurts Hello all, It seems to me that various standard URN schemes can be used to unambiguously define unique identifiers in a single string, for example: - urn:uuid: scheme for UUIDs - urn:oid: scheme for OIDs Using URIs in the XML documents is the XML way for the 'fairly concise' but extensible serialisation of identifiers and for that reason seems a feasible solution for openEHR XML serialisations. Kind Regards, Gerke Geurts. --- ## Post #19 by @thomas.beale I have now uploaded a version of the Support IM that contains some design & implementation paragraphs on the topic of this thread\. See http://svn.openehr.org/specification/BRANCHES/Release-1.1-candidate/publishing/architecture/rm/support_im.pdf \- thomas --- **Canonical:** https://discourse.openehr.org/t/loss-of-type-information-in-id-classes/14622 **Original content:** https://discourse.openehr.org/t/loss-of-type-information-in-id-classes/14622