Adam Flinton wrote:
I would like though to enquire wrt the rationale of containing _id info
in a separate <value/> element.
If you are being consistent
instead of :
<terminology_id>
<value>ISO_639-1</value>
</terminology_id>
it should be simply:
<terminology_id>ISO_639-1</terminology_id>
or <terminology_id value="ISO_639-1"/>
Adam,
when you say it 'should' be - either pulled up a level, with an object
attribute removed OR represented as an XML attribute - what is the
driver? Is it semantic (you think there is something wrong with the
reprsentation of the object structure defined by the specification) or
is it to do with space/signal-to-noise (using one of the last two
methods uses less characters)?
The way it currently is is due to a direct machine-performed object
serialisation process - in other words, it simply follows the same rules
for transforming any object data into XML. Your suggestion (I presume)
is a special case of the general idea of representing all so-called
basic types (Strings, Integers, dates etc) as XML attributes rather than
as XML elements. But we have already just discussed and agreed that long
text strings (especially containing unicode, backslash quoting and
whitespace) should be XML elements.
As I have said before, what I think is most important is regular
encoding from data to and from XML, so that a) software is as simple and
clean as possible and b) changes are not needed due to particular
content (i.e. data). Now, ideally we would minimise use of bandwidth /
space with the representation as well. The problem is that XML is pretty
poorly designed for efficiently representing data, and has a poor signal
to noise ratio...making data serialise in a way that is either 'more
aesthetic' or smaller always implies more complex software containing
exceptional rules. Further, although XML isn't well designed for data
representation, in its original design, 'attributes' were intended for
meta-data items, rather than 'data'. Whether this semantic needs to be
retained in the XML we are talking about here is a question.
So the question is: at what level do we include exceptional processing
to reduce space wastage, since this complicates the software? How much
do we compromise the intended semantics of XML, where attributes are
designed for holding meta-data (including real meta-data, e.g. things
like xsi:TYPE etc)?
Any idea of saving space has to be done on the basis of a study of high
volumes of representatively diverse data. Saving 10 bytes is not
interesting, but saving 10Gb/minute in a large data processing system
is. I will go out on a limb and say that 'style' has no place in good
engineering, only good engineering does - correctness, performance,
maintainability etc.
With all that in mind - if the community wants to make the appropriate
analysis of data and propose a more space-efficient schema, I am not
against it. But the needs of correctness (= patient safety) must be
satisfied.
- thomas beale