proposed change to ADL 1.4 to handle langauges & translations better

Dear all,

I would like to propose to make a small change in the archetype structure of ADL 1.4. See the published structure below:

******* current published ADL 1.4 archetype structure *****
archetype
archetype_id
[specialize
parent_archetype_id]
concept
coded_concept_name
language
dADL language description section
description
dADL meta-data section
definition
cADL structural section
invariant
assertions
ontology
dADL definitions section
[revision_history
dADL section]

Tom

As 1.4 is in CEN, I think this has to be 1.5 to be safe. People can get the available languages from the ontology but if we ever separate the languages into separate files this will no longer be possible.

Cheers, Sam

Thomas Beale wrote:

2007/1/15, Thomas Beale <Thomas.Beale@oceaninformatics.biz>:

I don’t believe any of us have implemented this part of the spec properly at this stage (well, Rong has just upgraded the Java parser, but it is not deployed yet), and the current openEHR archetypes don’t contain this section. So it seems to me that the consequences should not be too great.

Hi,

I agree, but adding support for translations in the editors is probably rather easy.

Can I have some feedback as follows:

  • do you agree in principle with this change?

Yes, but will the original_language section replace the primary_language attribute in the ontology section?

  • is it going to badly break any software?

Not badly, but there’s no doubt it will break.

  • who thinks that we should call the result ADL 1.5?

I do think that’s a good idea so that parsers which read the version in the archetype section will know what to expect.

Regards,

Mattias

Dear all,

I would like to propose to make a small change in the archetype structure of ADL 1.4. See the published structure below:

******* current published ADL 1.4 archetype structure *****
archetype
archetype_id
[specialize
parent_archetype_id]
concept
coded_concept_name
language
dADL language description section
description
dADL meta-data section
definition
cADL structural section
invariant
assertions
ontology
dADL definitions section
[revision_history
dADL section]


As you can see from the AUTHORED_RESOURCE class used in the AOM (inherited into ARCHETYPE) (http://www.openehr.org/uml/release-1.0.1/Browsable/_9_5_1_76d0249_1140013971401_826330_4871Report.html ), we have the separate attributes original_language and translations. In ADL2 (which is a direct dADL seialisation of this model), it looks as follows:

********** current published ADL2 archetype structure **********
archetype_id = <“some.archetype.id”>
adl_version = <“2.0”>
is_controlled =
parent_archetype_id = <“some.other.archetype.id”>
concept = <[concept_code]>
original_language = <“lang”>
translations = <

description = <

definition = (cadl) <#
cADL plug-in section
#>
invariant = (aadl) <#
assertions plug-in section
#>
ontology = <

revision_history = <


Here you can see original_language and translations as distinct top-level attributes, as the are in the AOM / AUTHORED_RESOURCE classes. My proposal is to change ADL 1.4 to mimic this, i.e.

******* current published ADL 1.4 archetype structure *****
archetype
archetype_id
[specialize
parent_archetype_id]
concept
coded_concept_name
original_language
language_name
translations
dADL translations section
description
dADL meta-data section
definition
cADL structural section
invariant
assertions
ontology
dADL definitions section
[revision_history
dADL section]


This will facilitate my software, and I imagine everyone’s, since it makes ADL 1.4 closer to the AOM. It won’t be that long before we go to ADL2, which means just using a dADL parser for the whole archetype (+ plug-in section parsers for the cADL & invariants section).

Hi Thomas,

There is one more madatory attribute “is_controlled” from AUTHORED_RESOURCE class which is currently embedded in archetype header instead of as a top level attribute.

It will be useful to group attributes from AUTHORED_RESOURCE under one top level attribute something like “authored_resource”. This would be possible if 1) ARCHETYPE doesn’t inherit directly from AUTHORED_RESOURCE but include it as a member attribute; 2) make AUTHORED_RESOURCE a concrete class instead of abstract class since it’s directly usable as is.

All these would facilitate the parser even more and also follows the same pattern how definition and ontology are included in ARCHETYPE.

Since we are most likely heading for ADL 1.5, I feel it’s the right time to propose these changes now.

Regards,
Rong

Rong Chen wrote:

Hi Thomas,

There is one more madatory attribute “is_controlled” from AUTHORED_RESOURCE class which is currently embedded in archetype header instead of as a top level attribute.

that one can stay as it is in ADL 1.4 (seeing we have all already implemented it anyway) - it will be a proper attribute in ADL2.

It will be useful to group attributes from AUTHORED_RESOURCE under one top level attribute something like “authored_resource”. This would be possible if 1) ARCHETYPE doesn’t inherit directly from AUTHORED_RESOURCE but include it as a member attribute; 2) make AUTHORED_RESOURCE a concrete class instead of abstract class since it’s directly usable as is.

well, this is now how we did the modelling - the model we have works fine as it is, and has been adopted by CEN, so we couldn’t really make changes like this. Also, this really would break a lot of software, but I don’t think would add any value. Note that the CEN specification is now starting to be used in Europe and the UK.

All these would facilitate the parser even more and also follows the same pattern how definition and ontology are included in ARCHETYPE.

you just have to think of AUTHORED_RESOURCE as grouping together a bunch of things that always make sense together for online resources. It would probably make pre-ADL2 parsers slightly easier if we do what you say, but our main aim is correct semantics in the AOM and ADL2, which we should go for this year - so changes that only make pre-ADL2 parsers slightly easier don’t seem warranted, since they will be irrelevant pretty soon.

Since we are most likely heading for ADL 1.5, I feel it’s the right time to propose these changes now.

I agree with everyone, we would have to make this change ADL 1.5

  • thomas

Tom, Sam,

If this is needed, I agree that it should be 1.5.

I think it will break a little bit the software we are currently developing, but that's no big deal.

Another question to consider is if there really always is only one primary language?
In a more and more cooperative archetype development, I can imagine archetypes that are developed simultaneously in English, German and Dutch for example. (Or British and American English, not sure if regional differences would be a translation as such...).

What is the extra value of separating primary language and translations that cannot wait until ADL 2.0?
Does this separation implicitly mean we do not trust the translations as much as the primary language?

Regards
Sebastian

Sebastian Garde wrote:

Tom, Sam,

If this is needed, I agree that it should be 1.5.

I think it will break a little bit the software we are currently developing, but that's no big deal.

Another question to consider is if there really always is only one primary language?
In a more and more cooperative archetype development, I can imagine archetypes that are developed simultaneously in English, German and Dutch for example. (Or British and American English, not sure if regional differences would be a translation as such...).

What is the extra value of separating primary language and translations that cannot wait until ADL 2.0?
Does this separation implicitly mean we do not trust the translations as much as the primary language?
  

I think that is always the case. The activity of translation is in
general going to be quite different from that of archetype development -
the latter is (hopefully) workshop-based, involves reviews and other QA
activities. Translation on the other hand in its simplest form is the
passing of some file to a translation company to generate the same thing
in a new language. I have some experience with how this works (due to
having been married to a translator in a past life;-) and I can tell you
that translation invariably throws up numerous questions about ambiguous
meanings, which are inevitably answered by the translators themselves,
due to lack of access to the original authors. Sometimes there is a bit
of access, but it is not typical. Even specialist translations suffer
from this problem.

So in practical terms the problem can be stated like this:
- let's say an archetype is written in French
- let's say that some colleagues of the French group translate it into
English (unwittingly introducing a minor error)
- now let's say you want to translate into German. Which language should
you translate from? Finding a translator for de/en will be slightly
easier than de/fr (but not much I wouldn't think).
- Let's say you did it from the English, and unwittingly introduced a
further minor error.

By the time you get to Korean, via Urdu, you have an accumulation of
translation errors, and the quality of each translation is suspect, but
to an unknown degree. Whereas if you always translation from the
original language, the translation errors are more or less constant, and
also fixable, since we know what the reference language is.

I don't think an archetype can be developed in more than one language at
once. Even if a dutch, german and english speaker do the development,
they need to commit their agreements in one language or another -
presumably english. That means there is still a translation activity
into dutch and german, but in this situation, the translations are
likely to be safer medically speaking. However, many doctors don't spell
or write better grammar than the average person, whereas a professional
translator would not make such errors.

I think translation is a necessarily imperfect world, and the original
language notion just helps to reduce errors, but not remove them entirely.

BTW Umberto Eco's book "Mouse or Rat" is an amusing read on the subject...

- thomas

Hi All

You do need a primary language - as translations in this language should potentially be propagated throughout the translations if there is a change of meaning. Also, other translations should ideally be based on the primary language.

Cheers, Sam

Thomas Beale wrote:

I am working on the specifications and software now, and the more I think about it, the more I think it is better to leave the 1.4 spec as it is, i.e. with the language section as follows:

language
original_language = <“en”>
translations = <
[“de”] = <
author = <“freddy@something.somewhere.co.uk”>
accreditation = <"British Medical Translator id 00400595”>

[“ru”] = <
author = <“vladimir@something.somewhere.ru”>
accreditation = <"Russion Translator id 892230A”>

While it means that original_language and translations are not at the same level as the items under description, I now think it is probably better to stay with this the way it is stated in the 1.4 spec, for the following reasons:

  • our software is built to this spec (except my parser…)

  • it is what is published, and I think that more small changes that are not really necessary should probably be avoided. As Sam has said, stability needs to be the name of the game…

  • having a top-level section original_language doesn’t really work in the 1.4 style syntax, since it would look something like:

  • original_language

  • “en”

  • whereas the way we have documented it looks more natural (i.e. as per the first example above)

I know it is a bit of an anomaly, but as I try to write the code and manage the specification, I am inclined to stick with the spec as is, while we are with 1.4.

Further thoughts?

BTW I will post reworked versions of ADL 1.4, ADL 2, AOM, oAP online containing many if not all the changes and error-fixes posted in the last couple of months.

  • thomas

Thomas Beale wrote:

Hi Tom,

Does this mean we will stay with ADL 1.4 and the next natural step would be ADL 2.0? If so, I think it’s good, because there’s no need to publish many small changes and fixes - we need to have stable software working soon. Is there any parser grammar for the translations part of the language section available? I think Rong mentioned it isn’t available yet or maybe it was the revision history grammar (or both)…

Regards,

Mattias

2007/1/25, Thomas Beale < Thomas.Beale@oceaninformatics.biz>:

Thomas Beale wrote:

Hi Thomas,

* it is what is published, and I think that more small changes
that are not really necessary should probably be avoided. As Sam
has said, stability needs to be the name of the game...

I can just agree with Sam. We need time to build the systems and not too
often have to go back and make smaller changes according to the updated
specifications.

We have also had drawbacks when we have presented openEHR in Sweden and got
questions about how stable it is. The answer ‘it is not stable now, but we
hope it will be stable any week now’ is not the best answer to convince the
audience of openEHR’s excellence.

  Greetings
  Mikael Nyström
  Ph D student
  Department of Biomedical Engineering
  Linköping University
  Sweden

I am working on the specifications and software now, and the more I think about it, the more I think it is better to leave the 1.4 spec as it is, i.e. with the language section as follows:

language
original_language = <“en”>
translations = <
[“de”] = <
author = <“freddy@something.somewhere.co.uk”>
accreditation = <“British Medical Translator id 00400595”>

[“ru”] = <
author = <“vladimir@something.somewhere.ru”>
accreditation = <“Russion Translator id 892230A”>

While it means that original_language and translations are not at the same level as the items under description, I now think it is probably better to stay with this the way it is stated in the 1.4 spec, for the following reasons:

  • our software is built to this spec (except my parser…)

  • it is what is published, and I think that more small changes that are not really necessary should probably be avoided. As Sam has said, stability needs to be the name of the game…

  • having a top-level section original_language doesn’t really work in the 1.4 style syntax, since it would look something like:

  • original_language

  • “en”

  • whereas the way we have documented it looks more natural (i.e. as per the first example above)

Agree here. This would mean minimum changes need to be done on the parser (and test archetypes).
/Rong