Dear All,
1) Problem statement
2) Solution
3) Points to Note
4) XSLT Sheet
5) Summary
1) Problem statement
I have been writing an OpenEHR publishing & QA routine which is basically Ant, which includes running XSLT tasks for the NHS.
There is a problem with the current structure of the XML archetypes & templates which is that the values are contained as a text() child of an element & sometimes as the text() child of a value child of the element.
This is dangerous & (IMHO) wrong.
The reasons being that :
A) a single value of that sort should be contained in an attribute.
B) It leads to a world of pain wrt "pretty-print"/indentation.
As an example, XMLSpy will automatically pretty print XML because that makes it readable to the (human) reader. Equally XSLT sheets often use the
indent="yes"in the output declaration.
<xsl:output method="xml" version="1.0" encoding="utf-8"
indent="yes" />
Firstly it means that what looks like
<rm_type_name>
ELEMENT
</rm_type_name>
is actually:

													ELEMENT
												
As a really quick example of this, get an XML Archetype, open it in XMLSpy, press save, now open it in the Ocean Archetypes editor.
Admire the way the text now is all over the place and has empty square boxes for the line endings (i.e. 
![]()
Now try and save as an ADL.
If you save as an XML, the formatting etc is retained. Basically open up an xml archetype in XMLSpy, click save and you have a corrupt archetype.
Before people decry pretty printing per se bear in mind that :
i) single long string in not readable
ii) the adl is pretty printed....i.e. your adl files do not come as one long string but are formatted in much the same way as XML is pretty printed. The adl takes care of this in basically the same way I am going to suggest that the XML does ie.
description = <"Clinical description of the meconium">
vs
description = Clinical description of the meconium
or
description =
Clinical description of the meconium
etc.
Any XLST which tries to extract values from the present structure must engage in code such as:
<xsl:variable name="tab">'	'</xsl:variable>
<xsl:variable name="nl">'
'</xsl:variable>
<xsl:variable name="v_rm_type_name_no_pp"
select="translate(translate($v_rm_type_name/text(),$tab,''),$nl,'')" />
& that in itself is dangerous as some editor might put in some formatting chars which are not being filtered out.
2) Solution:
Instead of using a text child, any value should go in a value attribute e.g.
<items id="description">
Clinical description of the meconium
</items>
becomes:
<items value="Clinical description of the meconium" id="description"/>
3) Points to Note:
A) The result is actually closer to the adl e.g.
<items code="at0061">
<items value="Clinical description of the meconium" id="description"/>
<items value="Description" id="text"/>
</items>
<items code="at0062">
<items value="Colour of meconium" id="description"/>
<items value="Colour" id="text"/>
</items>
vs
["at0061"] = <
description = <"Clinical description of the meconium">
text = <"Description">
>
["at0062"] = <
description = <"Colour of meconium">
text = <"Colour">
>
B) The files are approximately 2/3'rds the size of the originals. This could be reduced further by using a smaller attribute name (e.g. val or even v).
C) The Archetypes are much more readable to the average human e.g.
<details>
<language>
<terminology_id value="ISO_639-1"/>
<code_string value="en"/>
</language>
<purpose value="To describe body fluids and secretions"/>
<use/>
<misuse/>
</details>
vs:
<details>
<language>
<terminology_id>
<value>ISO_639-1</value>
</terminology_id>
<code_string>en</code_string>
</language>
<purpose>To describe body fluids and secretions</purpose>
<use/>
<misuse/>
</details>
or
<occurrences>
<lower_included value="true"/>
<upper_included value="true"/>
<lower_unbounded value="false"/>
<upper_unbounded value="false"/>
<lower value="1"/>
<upper value="1"/>
</occurrences>
vs:
<occurrences>
<lower_included>true</lower_included>
<upper_included>true</upper_included>
<lower_unbounded>false</lower_unbounded>
<upper_unbounded>false</upper_unbounded>
<lower>1</lower>
<upper>1</upper>
</occurrences>
4) XSLT Sheet
I have attached a mini-xslt sheet which takes a template or XML Archetype & renders it into this fomat.
Run the XSLT with saxon as Xalan....shows how fragile the current situation is as it picks up the "pretty-print" chars as text children & puts them in where there is no text child except the formatting chars.
5) Summary
i) The present situation/structure is dangerous.
ii) Pretty-print is the norm & even the ADL is pretty printed and has adopted a similar method to cope.
iii) The solution simplifies the XML in terms of both processing and human readability.
iv) The solution shrinks the file sizes.
Yours
Adam Flinton
(attachments)
setTextAsVal.xslt (1.55 KB)