Q on openEHR XML-schema versioning

We are about to publish Release 1.0.2 of the openEHR specifications. The
CRs in this release have necessitated some very small (non-data
affecting) changes in the schema BaseTypes.xsd (impact statement of
Release 1.0.2 at
http://www.openehr.org/svn/specification/BRANCHES/Release-1.0.2-candidate/publishing/release_notes_1.0.2.htm
; published Release 1.0.1 schemas at
http://www.openehr.org/releases/1.0.1/its/XML-schema/index.html)

The question has come up as to how changes in versions of the schemas
should be identified. Currently the schemas have the following kind of
heading:

<?xml version="1.0" encoding="utf-8"?>
<!-- openEHR Release 1.0.1 BaseTypes XML schema -->
<!-- Authored by Ocean Informatics 2007.04.13 -->
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema&quot; xmlns="http://schemas.openehr.org/v1&quot;
  targetNamespace="http://schemas.openehr.org/v1&quot; elementFormDefault="qualified" version="v1.0.1"
  id="BaseTypes.xsd">

Firstly, openEHR publishes a number of schemas, not just one. Each
carries only the release id, but not an individual version number. I
would argue that they should carry an individual version id (and
possibly not the Release number?). Can the XML experts here comment on
what the usual way to manage the kind of componentised schemas we use in
openEHR is? Should we, as of this release put a per-schema version id in
each schema?

Secondly, you will notice that these schemas are authored by Ocean
Informatics. This has been the case historically (I believe the
community understands the 'bootstrap' nature of openEHR development),
but of course is unlikely to remain so. Adam Flinton (NHS) and others
will certainly propose an improved Archetype.xsd, and most likely
similar improvements to the other schemas for forthcoming releases. When
this happens, I expect that the 'author' line will change to something
like "Authored by openEHR XML Schema project" or similar. However, for
this 1.0.2 release, I guess it should stay as is, since at least it
allows people to know the author, and who to blame, rather than the
schemas being anonymous. Objections to this are welcome of course.

I am more concerned to correct the version id situation.

All feedback welcomed.

- thomas beale

CRs in this release have necessitated some very small (non-data
affecting) changes in the schema BaseTypes.xsd (impact statement of
Release 1.0.2 at
http://www.openehr.org/svn/specification/BRANCHES/Release-1.0.2-candidate/publishing/release_notes_1.0.2.htm
; published Release 1.0.1 schemas at
http://www.openehr.org/releases/1.0.1/its/XML-schema/index.html)

I am going through the 1.1 release candidate changes and it appears
that some of these are going to be 'breaking' changes to the
schema. So our plan for handling the 1.0.2 schema should have
an eye out to 1.1, which will need to be handled in a
somewhat compatible manner..

Firstly, openEHR publishes a number of schemas, not just one. Each
carries only the release id, but not an individual version number. I
would argue that they should carry an individual version id (and
possibly not the Release number?). Can the XML experts here comment on
what the usual way to manage the kind of componentised schemas we use in
openEHR is? Should we, as of this release put a per-schema version id in
each schema?

I don't think the version id etc _inside_ the schema file is too
important - I worry a great deal about the XML instances that are out
there and how instances can be automatically matched up to their
corresponding schema files. That to me is the fundamental issue.
I would contend that 1.1 will require a change to the schema
namespace to "http://schemas.openehr.org/v1.1&quot; or
"http://schemas.openehr.org/v2&quot; or "http://schemas.openehr.org/2008/12&quot;
(if we decide to break
the correspondance between schema set version and openehr
release version)

I do not know how the instance versioning will work with
multiple indepedant schemas (I presume Thomas is referring
to the Template schema vs AM schema etc). I think schemas
need to be released in a "set" that is consistent and whole
- so would prefer that the template/AM/RM schema are all
released at the same time with the same schema namespace
(even if the individual schema files document different
"releases" of their own specs)

Andrew

Andrew Patterson wrote:

CRs in this release have necessitated some very small (non-data
affecting) changes in the schema BaseTypes.xsd (impact statement of
Release 1.0.2 at
http://www.openehr.org/svn/specification/BRANCHES/Release-1.0.2-candidate/publishing/release_notes_1.0.2.htm
; published Release 1.0.1 schemas at
http://www.openehr.org/releases/1.0.1/its/XML-schema/index.html)
    
I am going through the 1.1 release candidate changes and it appears
that some of these are going to be 'breaking' changes to the
schema. So our plan for handling the 1.0.2 schema should have
an eye out to 1.1, which will need to be handled in a
somewhat compatible manner..
  
there will certainly be breaking changes in the archetype schema, which
doesn't not cause a great problem as long as it is handled properly.
Handling it properly means tools of the future knowing the difference
between the current Archetype.xsd schema and later more efficient variants.

Firstly, openEHR publishes a number of schemas, not just one. Each
carries only the release id, but not an individual version number. I
would argue that they should carry an individual version id (and
possibly not the Release number?). Can the XML experts here comment on
what the usual way to manage the kind of componentised schemas we use in
openEHR is? Should we, as of this release put a per-schema version id in
each schema?
    
I don't think the version id etc _inside_ the schema file is too
important - I worry a great deal about the XML instances that are out
there and how instances can be automatically matched up to their
corresponding schema files. That to me is the fundamental issue.
I would contend that 1.1 will require a change to the schema
namespace to "http://schemas.openehr.org/v1.1&quot; or
"http://schemas.openehr.org/v2&quot; or "http://schemas.openehr.org/2008/12&quot;
(if we decide to break
the correspondance between schema set version and openehr
release version)
  
ok - this approach more or less replicates the release id approach
already in use, but converts it to a URL. We already have the URL
pattern http://www.openehr.org/releases/N.M.P, so we could do
http://www.openehr.org/releases/N.M.P/schemas

Apparently it is not always expected that the URL mentioned in a schema
actually exists so http://schemas.openehr.org/v1.0.2 could be used as
well. Or we could create it for real. (What are the requirements here?)

I do not know how the instance versioning will work with
multiple indepedant schemas (I presume Thomas is referring
to the Template schema vs AM schema etc). I think schemas
  
well actually in this case I was thinking more of the RM schemas like
Basic_types.xsd and the others -
http://www.openehr.org/releases/1.0.1/its/XML-schema/index.html .
Starting from Composition.xsd, there is a cascade of nested inclusions
down to Basic_types.xsd. If an XML document representing a Composition
is to correctly indicate its schema, it should point to e.g.
http://www.openehr.org/releases/1.0.2/schemas/Composition.xsd or
http://schemas.openehr.org/1.0.2/Composition.xsd

Inclusions are curently done with a statement like

<xs:include schemaLocation="Content.xsd"/>

which does not mention the version of the included schema, so one
assumes it is from the same release, which is the effect we want. Is
this the orthodox way to do this in XML land?

- thomas

ok - this approach more or less replicates the release id approach
already in use, but converts it to a URL.

Except, this is a change that occurs in all xml _instances_, not just
the schema files. So every reference model document in every
system in existence now has to handle two different schemas
and convert between them. We have to decide whether this is
what we want.. do the xml schemas aggressively track the
exact spec versions, or do we only increment xml schema
versions when necessary (and therefore should the xml
schemas have a separate version)

We already have the URL
pattern http://www.openehr.org/releases/N.M.P, so we could do
http://www.openehr.org/releases/N.M.P/schemas

Apparently it is not always expected that the URL mentioned in a schema
actually exists so http://schemas.openehr.org/v1.0.2 could be used as
well. Or we could create it for real. (What are the requirements here?)

The schema identifier is a URI - there is no requirement that it is
accessible at the same identifier, and in fact it seems like there
is a trend towards using other URN syntaxes rather than
URLs.

Starting from Composition.xsd, there is a cascade of nested inclusions
down to Basic_types.xsd. If an XML document representing a Composition
is to correctly indicate its schema, it should point to e.g.
http://www.openehr.org/releases/1.0.2/schemas/Composition.xsd or
http://schemas.openehr.org/1.0.2/Composition.xsd

Inclusions are curently done with a statement like

<xs:include schemaLocation="Content.xsd"/>

which does not mention the version of the included schema, so one
assumes it is from the same release, which is the effect we want. Is
this the orthodox way to do this in XML land?

There was a decision by Heath (and others at Ocean) to have a
single namespace for all the openehr XML classes around the time of the
1.0 release. The schemaLocation of XSD files is a separate issue and one
that I would not worry about - assume that all XSD files bundled
together in the same directory belong to the same "release set"
(as is currently the case)

Andrew

Andrew Patterson wrote:

ok - this approach more or less replicates the release id approach
already in use, but converts it to a URL.


Except, this is a change that occurs in all xml _instances_, not just
the schema files. So every reference model document in every
system in existence now has to handle two different schemas
and convert between them. We have to decide whether this is
what we want.. do the xml schemas aggressively track the
exact spec versions, or do we only increment xml schema
versions when necessary (and therefore should the xml
schemas have a separate version)

I don’t think this is the case - each document should just indicate which schema it is derived from. There will always be new and improved XML-schemas - that is just the nature of a formalism that is inherently inefficient - people will keep coming up with ways to improve it. Any document created from a previous version of the schema will point to the earlier version. Since all schemas in openEHR are designed to convert into the same reference model, the data remain interoperable (unlike purely schema-based approaches to health data).

The main point it seems to me is what the schema should carry as its namespace… is it (as today):

<xs:schema xmlns:xs=[**MailScanner has detected a possible fraud attempt from "www.w3.org" claiming to be** "http://www.w3.org/2001/XMLSchema"](http://www.w3.org/2001/XMLSchema) xmlns=[**MailScanner has detected a possible fraud attempt from "schemas.openehr.org" claiming to be** "http://schemas.openehr.org/v1"](http://schemas.openehr.org/v1)
	targetNamespace=[**MailScanner has detected a possible fraud attempt from "schemas.openehr.org" claiming to be** "http://schemas.openehr.org/v1"](http://schemas.openehr.org/v1) elementFormDefault="qualified" version="v1.0.2"
	id="BaseTypes.xsd">

or more like:

<xs:schema xmlns:xs=[**MailScanner has detected a possible fraud attempt from "www.w3.org" claiming to be** "http://www.w3.org/2001/XMLSchema"](http://www.w3.org/2001/XMLSchema) xmlns=[**MailScanner has detected a possible fraud attempt from "schemas.openehr.org" claiming to be** "http://schemas.openehr.org/v1"](http://schemas.openehr.org/v1)
	targetNamespace=[**MailScanner has detected a possible fraud attempt from "schemas.openehr.org" claiming to be** "http://schemas.openehr.org/v1.0.2"](http://schemas.openehr.org/v1) elementFormDefault="qualified" 
	id="BaseTypes.xsd">

I don’t know what weight the ‘version’ attribute carries in the xs:schema tag - I don’t understand why there appear to be two ways of indicating the version in fact.

The schema identifier is a URI - there is no requirement that it is
accessible at the same identifier, and in fact it seems like there
is a trend towards using other URN syntaxes rather than
URLs.

so sticking with the current style of URI is no problem.

  • thomas beale

Except, this is a change that occurs in all xml _instances_, not just
the schema files. So every reference model document in every
system in existence now has to handle two different schemas
and convert between them. We have to decide whether this is
what we want.. do the xml schemas aggressively track the
exact spec versions, or do we only increment xml schema
versions when necessary (and therefore should the xml
schemas have a separate version)

I don't think this is the case - each document should just indicate which
schema it is derived from.

Each document does - using the namespace.

There's no other 'version'
information in the XML instances - we could put a mandatory
'version' element in at the root of 'composition' or something
but it isn't there at the moment. There is also the possibility
of using the xsi:schemaLocation within the instances as another
technique.

There will always be new and improved XML-schemas
- that is just the nature of a formalism that is inherently inefficient -
people will keep coming up with ways to improve it. Any document created
from a previous version of the schema will point to the earlier version.
Since all schemas in openEHR are designed to convert into the same reference
model, the data remain interoperable (unlike purely schema-based approaches
to health data).

Well yes, in an abstract sense that is true - but where the rubber
meets the road so to speak, you have actual XML instances, and
XSD schemas and I'm telling you as someone who does this day
in and day out - changing the namespace of a schema/instances
means that they will no longer validate using standard xml
tools - so you now have an issue with multiple schemas and
how you will handle them. I agree with you that this is a
boundary problem, because eventually they convert into a
standard reference model. But lets not make the mistakes
like HL7 do and pretend that the actual technical artifacts
are unimportant details - xml instances and schemas are
the things that you give to programmers to get aquainted
with the standard. And nothing will annoy them more than
having no clear guidance on how the
instances/schemas/specs relate to each other, or finding
a document that purports to be version A but finding it
not validate against schema A.

The main point it seems to me is what the schema should carry as its
namespace... is it (as today):

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema&quot;
xmlns="http://schemas.openehr.org/v1&quot;
  targetNamespace="http://schemas.openehr.org/v1&quot;
elementFormDefault="qualified" version="v1.0.2"
  id="BaseTypes.xsd">

This will result in instance documents that do not describe
whether they conform to the 1.0.0, 1.0.1, or 1.0.2 spec

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema&quot;
xmlns="http://schemas.openehr.org/v1&quot;
  targetNamespace="http://schemas.openehr.org/v1.0.2&quot;
elementFormDefault="qualified"
  id="BaseTypes.xsd">

This will result in instance documents that describe exactly
what version of the spec they conform to, but will result
in every system needing to interoperate with as many
schema files as there are minor releases of openehr.

I don't know what weight the 'version' attribute carries in the xs:schema
tag - I don't understand why there appear to be two ways of indicating the
version in fact.

From the XML schema spec:

"The other attributes (id and version) are for user
convenience, and this specification defines no
semantics for them."

so sticking with the current style of URI is no problem.

yes.

Andrew

Andrew Patterson wrote:

Andrew,

There was a decision by Heath (and others at Ocean) to have a
single namespace for all the openehr XML classes around the time of the
1.0 release. The schemaLocation of XSD files is a separate issue and one
that I would not worry about - assume that all XSD files bundled
together in the same directory belong to the same "release set"
(as is currently the case)

Actually the namespace decision including its current format was made in
conjunction with other openEHR members including yourself and Rong as far as
I remember.

Heath

Actually the namespace decision including its current format was made in
conjunction with other openEHR members including yourself and Rong as far as
I remember.

Yes, sorry - wasn't trying to blame anyone :slight_smile: I meant that the final
decision as to what actually went into the schemas was
done by Ocean because you guys were editing the schema files -
but we all did have a discussion about the various pros and
cons and the group consensus was to have a versioned namespace..

Andrew

Thomas,

The original namespace was designed in a way that would not require a change until there was a radical RM (or schema design) change.

I would suggest that the principles are similar to archetype versions and revisions, the namespace will require a change when the schema change breaks existing data, otherwise it is just a version change.

The version attribute in the schema does not affect the data at all, it is for version control information to the users of the schema. I don’t care what goes there but at the time the openEHR release seemed logical. If a new release changes the schema (in a compatible way) the release number of that specific schema can change otherwise I don’t see any need to upgrade the version ID just because of a new release, but as I said I don’t care that much because it has no affect on the data as long as I know what schema I need to deploy with my version specific openEHR components.

The question is, what do we do when we do have a data breaking schema change, like potentially in r1.1? I suggest that we just go with http://schemas.openehr.org/v1.1 and we can assume the old http://schemas.openehr.org/v1 meant r1.0.x. It wasn’t expected to have such a substantial schema change until r2 but I guess that is the reality of software.

Jumping ship to another style such as http://schemas.openehr.org/2009/03 would also be reasonable, it would just mean we have to correlate release dates with release numbers.

BTW Andrew,

There is an optional rm_version in the archetype_details attached to any archetyped locatable. We currently populate this only when we specify a template_id on the composition, which is all compositions in our applications. We could suggest that this is required for all compositions and make this the handle to determine what schema to use for validation.

Heath

My attempt to summarise this discussion, and how we should proceed:

  • we should stick with name-spaces based on major releases, where ‘major’ means a change in the first or second place of the release id, e.g. 1.0 → 1.1 or 1.1 → 1.2 or 1.2 → 2.0;

  • so currently for Release 1.0.2 we should leave the name space as is, i.e. xmlns=MailScanner has detected a possible fraud attempt from “schemas.openehr.org” claiming to be “http://schemas.openehr.org/v1”

  • in terms of handling minor version ids inside the schemas…

  • since there is a chain of inclusion going on, a change to one schema will cause changes to all others above it in the inclusion chain (but not below). Therefore the ‘version’ of each schema should be allowed to change independently.

  • if we say that all current Release 1.0.1 schemas are ‘version 1.0.1’ then we should update the ‘version’ to ‘1.0.2’ in the schemas from Base_types.xsd all the way up the chain (which just happens to be all the schemas in this case).

  • but generally, we only change version numbers on a schema if either it has a local modification, or if an included schema has had a new release.
    As Andrew says, systems will end up dealing with multiple releases of schemas, that is inevitable. But this will be minimised, since data will only have to know about changes in major version, which we cater for by the above method (issue a schema with a new namespace).

  • thomas beale

Thomas Beale wrote:

(attachments)

OceanCsmall.png

The question is, what do we do when we do have a data breaking schema
change, like potentially in r1.1? I suggest that we just go with
http://schemas.openehr.org/v1.1 and we can assume the old
http://schemas.openehr.org/v1 meant r1.0.x. It wasn't expected to have such
a substantial schema change until r2 but I guess that is the reality of
software.

I am also concerned about 'non-breaking' changes though - well, it depends
on your definition of a breaking change - but say we add an
optional element to a xml type? This doesn't break any existing data
because all instances will still comply with the new schema - however,
new instances will now potentially not comply with the old schema. Do
we consider these breaking changes? Are we expecting to create a
new namespace if we do this type of change?

Jumping ship to another style such as http://schemas.openehr.org/2009/03
would also be reasonable, it would just mean we have to correlate release
dates with release numbers.

This has one advantage in that it is possible to release new major
versions of the spec _without_ updating the schema. So if v1.2 needs
no schema changes over v1.1, the 1.2 released schemas can be left as
http://schemas.openehr.org/2009/03 - and people won't be confused
thinking that their data isn't the right 'version'.

There is an optional rm_version in the archetype_details attached to any
archetyped locatable. We currently populate this only when we specify a
template_id on the composition, which is all compositions in our
applications. We could suggest that this is required for all compositions
and make this the handle to determine what schema to use for validation.

Yes, but we would also need a similar mechanims for all top-level XML
artifacts (archetype instances, extracts, PARTY? etc).

The other suggestion I have seen on the interwebs is to make the
xsi:schemaLocation attribute compulsory in instances

<Composition xsi:schemaLocation="http://schemas.openehr.org/v1
       http://www.openehr.org/releases/1.0.1/its/XML-schema/Composition.xsd&quot;&gt;

<Composition xsi:schemaLocation="http://schemas.openehr.org/v1.1
       http://www.openehr.org/releases/1.1.3/its/XML-schema/Composition.xsd&quot;&gt;

The schema checker would not actually go and 'fetch' the XSD but would
be looking for known URL's to indicate the exact schema version it was released
against..

http://www.openehr.org/releases/1.0.1/its/XML-schema/Composition.xsd
means 1.0.1 etc

So the schema namespace indicates the major structural version of
the schema - and the xsi:schemaLocation gives the exact schema version
that the instance was created for (even though the minor schema
differences between 1.0.2 and 1.0.3 may not have any data changes).
I don't know whether storing this version is any use - I guess it depends
on the definition of 'breaking changes' I discussed above.

Andrew

Hi Andrew,
See below.

Heath

> The question is, what do we do when we do have a data breaking schema
> change, like potentially in r1.1? I suggest that we just go with
> http://schemas.openehr.org/v1.1 and we can assume the old
> http://schemas.openehr.org/v1 meant r1.0.x. It wasn't expected to have

such

> a substantial schema change until r2 but I guess that is the reality of
> software.

I am also concerned about 'non-breaking' changes though - well, it depends
on your definition of a breaking change - but say we add an
optional element to a xml type? This doesn't break any existing data
because all instances will still comply with the new schema - however,
new instances will now potentially not comply with the old schema. Do
we consider these breaking changes? Are we expecting to create a
new namespace if we do this type of change?

[Heath Frankel]
I understand your point here but if we cannot have some kind of schema
migration mechanism we will need a new schema per release, which is
something that I don't think anyone wants.

Using your example, adding an optional element will cause systems that use
an older schema to invalidate an instance that populates that element but at
least an older system can continue to produce validate instances. I am not
sure how much schema validation will be used in a production system, the
overhead to validate every instance may be to great so schema validation
will probably be just a testing and accreditation issue.

We may need to have some parser rules a bit like HL7 V2 where you accept
additional elements that you don't expect, within reason, so that we can
support this forward-compatibility. This will mean that we may not be able
to use auto-generated XML serialisers but there are other XML APIs that can
be used to support this kind of rules.

> Jumping ship to another style such as http://schemas.openehr.org/2009/03
> would also be reasonable, it would just mean we have to correlate

release

> dates with release numbers.

This has one advantage in that it is possible to release new major
versions of the spec _without_ updating the schema. So if v1.2 needs
no schema changes over v1.1, the 1.2 released schemas can be left as
http://schemas.openehr.org/2009/03 - and people won't be confused
thinking that their data isn't the right 'version'.

[Heath Frankel]
So your suggesting moving to a date oriented namespace so that there is no
tie to the release number?

> There is an optional rm_version in the archetype_details attached to any
> archetyped locatable. We currently populate this only when we specify a
> template_id on the composition, which is all compositions in our
> applications. We could suggest that this is required for all

compositions

> and make this the handle to determine what schema to use for validation.

Yes, but we would also need a similar mechanims for all top-level XML
artifacts (archetype instances, extracts, PARTY? etc).

[Heath Frankel]
All archetyped locatables can have this and I would expect all top-level
objects to be archetyped otherwise they have no domain semantics.

The other suggestion I have seen on the interwebs is to make the
xsi:schemaLocation attribute compulsory in instances

<Composition xsi:schemaLocation="http://schemas.openehr.org/v1

http://www.openehr.org/releases/1.0.1/its/XML-schema/Composition.xsd&quot;&gt;

<Composition xsi:schemaLocation="http://schemas.openehr.org/v1.1

http://www.openehr.org/releases/1.1.3/its/XML-schema/Composition.xsd&quot;&gt;

The schema checker would not actually go and 'fetch' the XSD but would
be looking for known URL's to indicate the exact schema version it was
released
against..

http://www.openehr.org/releases/1.0.1/its/XML-schema/Composition.xsd
means 1.0.1 etc

So the schema namespace indicates the major structural version of
the schema - and the xsi:schemaLocation gives the exact schema version
that the instance was created for (even though the minor schema
differences between 1.0.2 and 1.0.3 may not have any data changes).
I don't know whether storing this version is any use - I guess it depends
on the definition of 'breaking changes' I discussed above.

[Heath Frankel]
I can see the utility of this, but I am not sure that we should mandate it,
seems a bit of a hack.

I can see some better schemes in the offing from the experts! I would
not think we should change anything in the current schema approach, as
this is a minor release. I propose to change only the version ids in the
relevant schemas to reflect the small change in Base_types. We will
leave it to a new major version (1.1 or later) to change tack on how to
manage schema versions. I would suggest that XML experts here would need
to develop a bullet-proof approach to this (as opposed to just
suggestions), so that we can implement it in a major release.

- thomas beale

Heath Frankel wrote:

[Heath Frankel]
I understand your point here but if we cannot have some kind of schema
migration mechanism we will need a new schema per release, which is
something that I don't think anyone wants.

Yes - I don't want that either. I bring it up because if we are allowing
minor schema changes as well as major ones, we need two versioning
mechanisms.

For the major releases the schema namespace is the indicator.

For minor releases, we have no information about what minor release
the instance was designed for, unless we mandate a mechanism
such as the one you suggested with rm_version in archetype_details.

[Heath Frankel]
So your suggesting moving to a date oriented namespace so that there is no
tie to the release number?

Well, the tie would be through documentation - the 1.1.3 release would say
something like - we are using the 2009/01 schemas. It then puts them
on a separate release trajectory. Of course, this would only be useful
if we think there will actually be divergence between the XML schema
changes needed, and the major releases needed.

[Heath Frankel]
All archetyped locatables can have this and I would expect all top-level
objects to be archetyped otherwise they have no domain semantics.

What about actual ARCHETYPE objects? Some tools or systems may
persist the xml serialization of the AOM rather than the ADL. Same
with templates. I'm saying that every XML instance that might ever be
in the wild using the openehr schemas should have some clear mechanism
to tie it to the actual schema files that it _strictly_ conforms to.

[Heath Frankel]
I can see the utility of this, but I am not sure that we should mandate it,
seems a bit of a hack.

It does feel hackish. But it is the only attributes you can add to an xml
instance without needing to change schema, and the xsi:schemaLocation
field is actually roughly designed for this purpose.

The other option is to add a mandatory fixed "meta" attribute for Composition,
Archetype etc that is explicitly in the XML ITS, that holds the full
schema release version number.

Andrew