distributed development, governance and artefact identification for openEHR

I have completed drafts of two documents I believe will come to be
important in openEHR in the near future. The first describes a model of
distributed development and governance of knowledge artefacts, including
archetypes and templates. The second defines an identification system
for these artefacts. The first document is a rewrite of a document
called the 'Archetype System' from previous releases of openEHR, the
second is new. A detailed description of a governance structure and also
quality assurance will come in later documents, but key aspects of both
subjects are summarised in the first of the above-mentioned documents.

These are both development phase documents and are available for
community review at
http://www.openehr.org/svn/specification/TRUNK/publishing/architecture/am/dist_dev_model.pdf
and
http://www.openehr.org/svn/specification/TRUNK/publishing/architecture/am/knowledge_id_system.pdf

A wiki page is available at
http://www.openehr.org/wiki/display/spec/Development+and+Governance+of+Knowledge+Artefacts
for discussion purposes.

All feedback welcome.

- thomas beale

Hi Tom,

This is a good document - thanks. I have posted this to the clinical list as
well.
http://www.openehr.org/svn/specification/TRUNK/publishing/architecture/am/di
st_dev_model.pdf

My comments:

Page 11:

Current text:
Archetypes based on different classes from the same information model to
have the
same name, e.g. An archetype for 'vital signs' headings based on the SECTION
class, and
a 'vital signs' archetype based on OBSERVATION.

Comment:
I believe there will be archetypes for sections and entry that have the same
name but this is not a good example. The entries for vital signs are BP,
Pulse etc. I think it would be better to just raise the problem or get an
example. The nearest one I can think of is a plural form - e.g: Problems
(Section) and Problem (Entry).

Page 16:

Current text:
Also in common with software, there is a strong interest among archetype and
template users in one
or a few high-level shared libraries of high-quality artefacts rather than
scattered development with
poor quality control. In an ideal world, there might be only a single
repository, or a purely hierarchical
set of repositories. However, we know from experience with software
development that this is
extremely unlikely. Each organisation initially starts out with its own
point of view, timelines and priorities.
If a coherent ecosystem of cooperating repositories, or a single
international repository were
ever to exist, it would be as an emergent result of collaboration among
initially separate authoring
groups, rather than being established at the outset.

Comment:

With my software hat on I concur with your sentiment. The reality is that it
is likely to be the clinical colleges and other stakeholders outside the
software domain that produce the authoritative archetypes that are sought
after. I have no doubt that there will be both entropic and negentropic
forces. I believe that the lowest energy state is so profoundly associated
with collaborative work in this area that we are unlikely to see many
competing efforts. This is for many reasons including:
* There are many choices involved in developing archetypes - each has
implications for other archetypes that have to be developed and maintained,
translated etc
* Clinicians are interested in alignment of recording around best practice
and providing suitable flexibility - this is best done within one or a very
small number of archetypes for a given clinical concept
* Interoperability will be maximised with collaboration around the
development of archetypes and vendors will have a much smaller footprint to
manage
* The cost of creating competing sets of archetypes is massive and probably
not sustainable
* The number of clinicians and technical people in a position to contribute
is not high considering the work required
* Structuring clinical information is more fractal than orthogonal in
nature. Consensus provides a means of getting to grips with the issues and
managing complexity

In essence I am counselling everyone to see the centralised hierarchy of
repositories as infinitely more useful than the software model of go and do,
try, build and lets sort the differences in the future. I guess I am
proposing that archetypes are a lot closer to terminology than software from
a clinical perspective.

Current text:

Other national and institutional repositories are likely to come
online from 2009 onward. This trend appears to be similar to the emergence
of large-scale open
source projects.

The collaborative nature is similar. I hope that the national repositories
will be used more to organise comments and development of archetypes for the
collective effort in these early days than starting out on their own. In the
end I hope the national repositories will be where the releases of largely
international artefacts (with local specialisations) will be managed. Some
archetypes for national or local use will also be developed but these will
be edge cases. I guess I would see archetypes as the operating system of
clinical interoperability if I were to use the technical analogy.

Current text:

Based on the above considerations, the 'modern' model of software
development is
assumed to provide a suitable paradigm for openEHR knowledge artefact
development,
with emphasis on collaborative development of one or a small number of
well-known artefact
repositories.

I therefore do not agree with this statement. It may appear realistic but I
do not think it is sustainable nor to be encouraged. I am interested in what
other people think.

It is really an important consideration. Terminology also could be organised
primarily by country (or even regionally) and then at the core when people
agree on things. If the clinical leaders in the openEHR community agree with
me that we should really work with the centralised model it does not greatly
influence the technical issues in this paper but it does mean we can work
with the assumption that everyone will see a central authoritative source
and then a hierarchy of repositories rather than a flat set of repositories
competing for dominance.

Cheers, Sam

Hi Tom
Another very thoughtful document. I have been involved to some degree in the
discussion of this area over the years as have many others prior to this
draft release.
http://www.openehr.org/svn/specification/TRUNK/publishing/architecture/am/di
st_dev_model.pdf

This document suggests approaches that need very careful consideration and
at present I do not support the direction of this document - specifically in
the changes proposed. It is important that the community consider the
implications for clinical interoperability etc. This document has being
released to the community prior to careful consideration by the ARB to gauge
the views of implementers and clinicians. It will be difficult for many to
understand the implications but I want to raise some of my concerns so
people have an insight into the implications.

Page 5:

Current text:
1.4 Changes proposed in this document:
augmented form of ARCHETYPE_ID to include organisational / package
namespace, e.g.
org.openehr.ehr::openEHR-EHR-EVALUATION.problem.v1
. concept name section of archetype identifiers (middle part of
ARCHETYPE_ID) now relaxed
to no longer require structure based on specialisation parents, e.g.
'problem-diagnosis' can
now just be 'diagnosis', or any name preferred by designers;
. addition of commit_id sub-section archetype description section;
. addition of id_history sub-section to description section, e.g.
id_history = <"se.skl.epj::openEHR-EHR-EVALUATION.problem.v1">

This document when taken in combination with the distributed development
model raise the stakes for participants. The outcome is the promise of the
ability to interoperate from a knowledge artefact point of view but I have
doubts whether the proposed changes will support clinical interoperability.
I understand the wish of technical people to produce artefacts wherever and
whenever they like (just as terminologists do) but I would propose that we
have to manage complexity as well. In a world that is immensely complex
already (clinical systems) we may have to sacrifice some possibilities to
ensure we can perform the sort of functions people seek from a standard EHR
architecture.

I would like to add the requirements that are fundamental from my
perspective so that the community can raise these:

1. Primacy of openEHR: I would propose that we need a hierarchy of
authority. Although openEHR artefacts are presently managed within the
Foundation it is possible that the governance will move to a more
authoritative organisation in the near future. That said, I believe that
archetypes released by the openEHR Foundation should not be identified
specially (i.e. no name space). This means that openEHR becomes the default
namespace for archetypes and begins to provide a hierarchy of authority that
I think is so important in this space. One might argue that anyone can
produce archetypes with no namespace - but really anyone can produce
anything with any namespace so that is not sufficient.

2. Archetype IDs
How archetypes are identified in the universe is of no great concern to me.
I am happy to accept that people may want to give them arbitrary names and
live with complexity from an archetype management point of view. What I am
concerned about is how they are identified in data. And I do not want this
to be any more complicated than is required to support the vision we have
for openEHR. I am happy that we may need to extend this at some point but we
have seen very successful extensions of many identifiers as systems grow
(IP4->ip6, Ascii -> UTF-8). I understand the technical vision - anyone can
do anything and we will be able to sort out what is going on - but I believe
we have to keep pushing for things to be right for where we are up to.

In data the model is known - therefore the openEHR-EHR is redundant in an
EHR implementation. The namespace of the archetype is not.

In data, in XML expression of openEHR data, the archetype iD Class name and
concept part provide a means of returning the data without knowledge of the
archetype. An openEHR repository can be fully implemented, use AQL etc
without any knowledge of archetypes whatsoever. The reason for this is that
the archetype Ids are used in queries, and specialisations can be found
without reference to any archetypes based on the ID. This is a fundamental
benefit for implementers and losing this will require a considerably more
complex engine with, potentially, access to every archetype ID in the world.
This is not useful.

So my fundamental requirement is, in openEHR systems, to be able to query
for specialisations without the need to go to an archetype knowledgebase
(which will by definition be incomplete).

Page 17:

Current text:
As a general principle, for a given archetype used to
create data (e.g. an openEHR OBSERVATION object), the following archetypes
could be used for querying:
. the same archetype, i.e. Exact same version, revision & commit;
. any previous revision of the same archetype;
. any of the specialisation parents of the archetype;
. any previous revision of any of the specialisation parents of the
archetype.

Comment:
I would add a specialisation of this archetype to the list. It will be easy
to determine in the query space whether the nodes sought are shared with
parents and whether a query on the parent is iso-semantic, overlaps (to what
extent) or is unique to the specialisation.

Current text:
To address this situation, it may be useful to include the configuration
meta-data from the operational
template(s) with the data when it is transferred outside of its normal
environment, e.g. in an EHR
Extract.

Comment:
Tom raises the issue of no longer being able to query on specialisations.
This is one suggestion which I do not think appropriate as it creates
massive complexity and allows huge holes for errors in automatic processing.
He goes on to the other alternative:

Current text:
The other possibility is to include archetype lineage information in the
data itself..... The simplest form of this would be as a list of operational
identifiers, e.g.
se.skl.epj::openEHR-EHR-EVALUATION.genetic_diagnosis.v1.12,
org.openehr.ehr::openEHR-EHR-EVALUATION.diagnosis.v1.29,
org.openehr.ehr::openEHR-EHR-EVALUATION.problem.v2.4
... The above example could then become:
se.skl.epj::openEHR-EHR-EVALUATION.genetic_diagnosis.v1.12,
org.openehr.ehr::~diagnosis.v1.29,~problem.v2.4

Comment:
This is a large overhead for the query engine and the data but it is in
essence what we have at the moment in the form of:
openEHR-EHR-EVALUATION.problem-diagnosis-genetic_diagnosis.v1
We have obvious problems with our current approach in that there can be only
one version of the specialisation. This has to be overcome.

Within the data we know some things - which class, which reference model. If
we accept the authority of openEHR we can accept a default namespace (as in
current systems). We can then see that we could reduce Tom's in data string
to:

EVALUATION.problem.v2.4,diagnosis.v1.29,se.skl.epj::genetic_diagnosis.v1.12

Lets consider the revision information. If versions are entirely backwardly
compatible, is it helpful to have the revision in the data? An optional
element may or may not exist. If I have an old archetype (or the one that I
use in my system) I can still use it to query data entered against future
revisions. I think we need to consider carefully the revision information
and whether it should be in the data.

If we go in that direction the id becomes:

Evaluation.problem.v2,diagnosis.v1,se.skl.epj::genetic_diagnosis.v1

Not so far away from:
openEHR-EHR-EVALUATION.problem-diagnosis-genetic_diagnosis.v1

It may be better to take the syntax to:
EVALUATION.problem.v2-diagnosis.v1-se.skl::genetic_diagnosis.v1 as this
would be more backwardly compatible.

In summary, I would like this to proceed in a manner that fits the clinical
and technical vision. Is it a hierarchy of authorities for artefacts or not.
Do we stay backwardly compatible with current implementation processes or
not? I think you can understand where I am coming from. By accepting a
hierarchy of authority it does mean that we have a lot less complexity.
Namespaces in the longer term would be for specialisations and I would argue
would probably be unique for a country in the foreseeable future. If another
country wanted to use archetypes developed within a different country, I
would argue that this specialisation should be promoted to the international
set.

I look forward to your responses.

Cheers, Sam

Sam Heard wrote:

1. Primacy of openEHR: I would propose that we need a hierarchy of
authority. Although openEHR artefacts are presently managed within the
Foundation it is possible that the governance will move to a more
authoritative organisation in the near future. That said, I believe that
archetypes released by the openEHR Foundation should not be identified
specially (i.e. no name space). This means that openEHR becomes the default
namespace for archetypes and begins to provide a hierarchy of authority that
I think is so important in this space. One might argue that anyone can
produce archetypes with no namespace - but really anyone can produce
anything with any namespace so that is not sufficient.
  
Hi Sam,

The primacy of openEHR sounds good, but wouldn't it be better to stamp
the archetypes with "the openEHR seal of approval"? Your proposal above
means that all of the home-grown local archetypes sitting on people's
own computers at the moment are indistinguishable from the authoritative
openEHR archetypes.

I don't buy the argument that producing an archetype with no namespace
is equivalent to producing an archetype with any namespace:

    * Archetypes with no namespace can (and will!) be produced
      frequently, innocently and by accident.
    * Producing an archetype with the "openehr" namespace would be a
      deliberate act, a conscious choice.

- Peter

Tom and Sam,

Page 11:

Current text:
Archetypes based on different classes from the same information model
to
have the
same name, e.g. An archetype for 'vital signs' headings based on the
SECTION
class, and
a 'vital signs' archetype based on OBSERVATION.

Comment:
I believe there will be archetypes for sections and entry that have the
same
name but this is not a good example. The entries for vital signs are
BP,
Pulse etc. I think it would be better to just raise the problem or get
an
example. The nearest one I can think of is a plural form - e.g:
Problems
(Section) and Problem (Entry).

[HKF: ] The example that exist at present is INSTRUCTION.medication,
ACTION.medication and ITEM_TREE.medication. This happens for procedure as
well.