language translation

thomas.beale · 23 June 2002 02:17

I have just been contemplating the ISO requirement:

STR4.6 The EHRA must support the identification of information that has been translated from the language in which it was originally recorded. Such identification must describe the faithfulness or reliability of the translation. (1.4.3)

We have previously said that a new TRANSACTION would be created for the purposes of translation. Let's imagine that it is necessary to have a portuguese translation of the current Care Plan transaction (let's say a UK traveller is in hospital in Portugal or Brazil...). What needs to happen?

step 1: create a new TRANSACTION version in the same VT, which is a copy of the exiting latest one
step 2: all DV_PLAIN_TEXTs (whether self-standing or in DV_PARAGARPHs) have to be edited into portuguese
step 3: either we assume that all DV_TERM_TEXTs will jsut appear in portuguese via a terminology (unlikely since not many termsets have portuguese translations) or the author translates all these as well. Probably the system should convert them all to DV_PLAIN_TEXTs first.
set 4: commit new version

Now, what is in the new version? If it is only Portuguese language, it breaks the rule that the latest version contains the latest information in the sense that now it cannot be read by the usual owners of the record (UK doctors, hospitals, software). Also, what if the portuguese/brazilian clinicians decide to add some new information - it would only be in portuguese (actually, that's a bit unfair, doctors in both countries are much more likely to be able to choose to use English in an english language EHR than most english speakers are to do the reverse. Nevertheless...let's imagine the consultant does not know how to write english - maybe they are a local nurse or other kind of carer in rural brazil).

There are two solutions I can see.

1. versions for the purpose of language translation only could be "side" branches on the version tree, meaning that both the current top english-language version and the newly translated portuguese version are both the latest (but the english language one would take precedence because it was the "trunk". If the record had been in portuguese, and we were doin an english translation of a part of it, it would be the other way round). This would slightly change the version identification scheme, and would complicate the management of versions a bit. On the other hand, branching is a very well understood idea in version control, and there are many models for how to do it.

However, we would probably have to impose a rule that the translated version was just a translation - no additions etc. If additions were wanted, then yet another new version would be created on top of the new translated version. This version then would be the "latest" in the tree.

2. make the translation a new version in the trunk, but make the model so that each fragment of text which has been translated is also retained in english as well - ie.. for every single text item in the english original, there are now two items - the original and the portuguese equivalent. Viewers of the record choose whichever language, and see the appropriate view.

This approach breaks the current rule that the contents of a TRANSACTION are all in one language only. We would have to move the language indicator to DV_TEXT, DV_PARAGRAPH etc to enable different fragments to each hae their own translation.

If the portuguese authors decide to add more information in portuguese, where will it go? Presumably in a new transaction on top of the translation one, but the problem now is that there are item(s) which are now only in portuguese, salted through the latest TRANSACTION...

Solution 1 is preferable from a clean version control perspective, since it is absolutely obvious where the latest information is, and where any information in a given language will be.

But solution 2 probably corresponds more closely to the reality that complete translations might not be done - just enough fragments might be translated to enable the treating physician to do his/her work. This means that solution 1 would result in transaction versions which remained mostly in english, with bits of portuguese sprinkled through them, being saved as "portuguese" language transactions.

I think the rules we should stick to are as follows:
a) one primary language for the whole transaction. Transactions in which the totality of information is not available in one language are not viable for safe health care
b) new versions on the trunk of the version tree should never change their primary language. This means that all trunk versions in a VERSIONED_TRANSACTION have the same primary language.

A solution which would respect these rules and achieve what we want takes another approach:
- define the primary language at the VERSIONED_TRANSACTION level (i.e all TRANSACTIONs in the tree have the one primary language)
- any translation no matter how partial of an existing TRANSACTION, causes a side version whose reason for creation is "translation to XXX", where XXX is the target language
- translation of fragments in the side version is done by allowing DV_TEXTs to have translations, which do not replace the current value, but are in addition to it. As many or as few textual fragments as are needed can be translated.
- if the 2nd language author wants to write new content into an existing transaction in the 2nd language only, create a new TRANSACTION version containing the new content items on top of the side version.

Now, let's imagine the UK person gets back home after their bad accident in Brazil. The Brazilian doctors have been brilliant, and fixed them up; only problem is that the patient's new care plan (and maybe a lot of other bits and pieces in the record) all appear in Portuguese only. According to the above solution, the new information must be in latest versions on side branches of existing versioned transaction trees, or else in completely new transaction trees. This makes it easy for the software to find the 2nd language additions, and for the health service to get them translated back to english, and put in as new trunk versions, now bringing the trunks up to date. What about the transaction trees newly created in Brazil - does english now become the side version here? Or do we force them to be created with portuguese as the side branches, even though they were created for the first time by portuguese-speaking authors? I favour the second approach.,

Thoughts? This is a tricky area, and I;m sure there are holes in my proposals so far.

- thomas beale

Jean_Roberts · 24 June 2002 07:31

Whilst I cannot comment on the architecture detail, I would flag up a
need to consider the international environment around the handling of
records. We may well be subject to generic CEC Directives about how
records should be produced and the Brazilians (or in fact even the US or
somewhere even with same language bases) could have national / regional
requirements for how they must collect or present data on healthcare
delivered locally. I am not saying that there are these types of
difference at present, but like with trans-border data transmission
across 'unsafe' areas such potential external inconsistencies must be
recognised as needing to be considered.
Jean Roberts

Phoenix Associates, 19 Church Meadow, Ipstones, Staffs, ST10 2LS UK
email : jean@hcjean.demon.co.uk http://www.hcjean.demon.co.uk
tel 07771 804472 or tel/fax+44 1538 266944

thomas.beale · 24 June 2002 12:54

Two other scenarios I thought of which are more likely than the one I used:

- people from non-english language countries receiving treatment in english-langauge countries for whatever reason (holiday, only place the procedure is available)
- situations where both the origin and translation language are not english, and there is no guarantee that translators will be available, e.g. bulgarian/norwegian etc

- thomas beale

Jean Roberts wrote:

Tom_Culpepper · 25 June 2002 14:54

Jean,
You may find the work that was done at the OMG HDTF worth looking at in terms of doing International Distributed Computing in the Medical environment there are several areas:

Person Identification Service (http://www.omg.org/cgi-bin/doc?formal/2001-04-04)
Terminology Query Service (http://www.omg.org/cgi-bin/doc?formal/2000-06-31)
Resource Access Decision (http://www.omg.org/cgi-bin/doc?formal/2001-04-01)
Clinical Observation Access Service (http://www.omg.org/cgi-bin/doc?formal/2001-04-06)
All of which are works geared toward doing secure distributed computing for medical information.

The full catalog of specifications can be found at:
http://www.omg.org/technology/documents/spec_catalog.htm

Tom

David_Forslund · 26 June 2002 18:10

I second this. You can also see demonstration Java implementations of these specifications at http://OpenEMed.org

Dave

thomas.beale · 1 July 2002 00:50

The following posted on behalf of Dr Jean-Luc Mommaerts

I would take as general principle the following:
"One meaning - one transaction (or 'equivalent' transaction, cf. infra);
difference in meaning - different transactions."
This goes more or less against what has been previously said, but still I
think it to be a logical solution. A doctor primarily wants to convey
meaning in his writings, independent of the format. Whether it's there on
paper, in bits and bytes, in codes or in another human language doesn't
matter for that.

Some notes:
1) I am well aware that a translation never provides 100% the same meaning.
But we talk about intentions now. If you (as reader of the EHR) know that
the intention of the writer was to convey the same meaning, you know that
you only have to look at one version, namely that of which you know the
language best.

2) In many cases, a doctor will understand some of the meaning, even if the
text is in another language. Medicine has many terms that etymologically
come from the same roots. So, partial translations will be common. But
almost always a partial translation will be supplemented with additional
information.

3) Not only can one go to Brazil on holiday. One can move to that country.
Then after a year or two (or twenty?), one moves again. And again. Anything
is possible.
This is only to depict reality. The solution that I propose transcends these
difficulties.

Let's take a look at some use cases now:
1) A translator translates one or more transactions and wants to indicate
that he translated them 'as good as possible', i.e. without intention to
change anything of the meaning. In that case, I would not make a new 'direct
' instance of <TRANSACTION>. The translator needs to be able, though, to
indicate his intention of conveying the same meaning in another language. In
any case, at this point one just has two 'forms' of the same thing, i.e. the
same transaction. Maybe make it an instance of <EQUIVALENT TRANSACTION>.
<EQUIVALENT TRANSACTION> can be a conceptual child of <TRANSACTION>, with an
additional attribute indicating its reason for existence, in this case:
language translation. Additional attributes can be where and by whom the
translation is done and with what degree of reliability (according to ISO).

2) A translator translates a transaction only partially, with no additional
information. Same result. It is the same ('equivalent') transaction. There
is a fuzzy border now between the 'primary languages'. But the migration is
from language A to language B. Let the translator be able to indicate this.
You can not ask from a doctor who translates parts of a transaction, that he
always indicates for each part whether his translation is 'pure' or with
additions.

3) There is a (complete or partial) translation + at the same time
additional information. In this case, it's a different transaction. Treat it
as any other new transaction, but with an indication of the language
migration.

4) There is an automatic translation of (some of the) terms. Here, one can
look at what happens, completely from the level of concepts instead of
terms. The same thing is there inside the computer. The only difference is
how the user looks at it. So: same transaction. In fact, wherever possible,
a user should always be able to take a look at terms in different languages.
Terms explicitly related to concepts shouldn't be present in the computer as
unrelated terms, but always as concepts (concept-ID's).

5) The doctor doesn't know anything of the language and no translation is
done at all. Then a new transaction is made with indication of language.

We come back to the general principle: "One meaning - one transaction", but
WITH an indication that it concerns a translation (whether partial or
complete). In all other cases, make a new transaction. As for 'primary
language', I think it's best to always have it indicated on the level of
transaction, instead of versioned transaction. This avoids possible errors
when a person moves and moves and moves.

This solution is simple and IMO encompassing enough. But: I don't know the
arguments why it was previously said to make a translated transaction always
a different transaction. In a sense, an 'equivalent transaction' IS also
different. Thus as an alternative, you can put it in the same list
(versioned transaction). The version identification scheme doesn't have to
be changed. The most recent transaction is then always the latest
transaction in the list, but if it's an instance of <EQUIVALENT

, then the one before is also the latest one, which, by meaning,

is true.

Kind regards

Jean-Luc Mommaerts

thomas.beale · 2 July 2002 01:48

Thomas Beale wrote:

The following posted on behalf of Dr Jean-Luc Mommaerts

I would take as general principle the following:
"One meaning - one transaction (or 'equivalent' transaction, cf. infra);
difference in meaning - different transactions."

the only problem with literally agreeing with this is that if a transaction has been committed in language L1, there is no way in the architecture to go and modify it, even jsut for the purpose of translating it to L2, without creating a new transaction - that's the principle of indelibility, required to ensure medico-legal investigations can occur. So any change, no matter how small must cause a new transaction; we then need to decide among the following possibilities:

1. add a new "trunk" transaction in L2; this transaction is now the "latest"
2. add a "branch" transaction in L2; both the most recent trunk and the new branch are candidates for being the "latest", depending on what language you want to read in, and whether the writers of the new branch added anything
3. create a new VERSIONED_TRANSACTION whose first TRANSACTION is the one in L2, based on a transaction in an existing versioned transaction.

However, I agree with the spiriti of Jean-Luc's statement - one meaning / one transaction; however, I would apply it to the whole record: there is just one record, of which some information might be in language L1, some in L2, some in L3 etc. This is in contrast to a possible view which says that there might be different langauge "views" of the record, e.g. a portuguese view, a bulgarian view, a basque view. From the scenarios described below, and from most people's experience, this is unrealistic. The reality is that bits here and there are translated, some completely, some not, some information is added in a new language, etc. And all the while, practitioners (at least at higher levels of training) can read all those latin/greek based words used in the european tradition of medicine, so the difference between what is written (in various languages) and what is understood is a grey area (but note: it would be easy not to understand some little words in another language, which might be things like "no evidence of", "fear of" etc - food for thought).

Let's take a look at some use cases now:
1) A translator translates one or more transactions and wants to indicate
that he translated them 'as good as possible', i.e. without intention to
change anything of the meaning. In that case, I would not make a new 'direct
' instance of <TRANSACTION>. The translator needs to be able, though, to
indicate his intention of conveying the same meaning in another language. In
any case, at this point one just has two 'forms' of the same thing, i.e. the
same transaction. Maybe make it an instance of <EQUIVALENT TRANSACTION>.
<EQUIVALENT TRANSACTION> can be a conceptual child of <TRANSACTION>, with an
additional attribute indicating its reason for existence, in this case:
language translation. Additional attributes can be where and by whom the
translation is done and with what degree of reliability (according to ISO).

I don't know if there is any need for another "reason" attribute, since TRANSACTION already has one, but maybe for recording quality or type of translation...

2) A translator translates a transaction only partially, with no additional
information. Same result. It is the same ('equivalent') transaction. There
is a fuzzy border now between the 'primary languages'. But the migration is
from language A to language B. Let the translator be able to indicate this.
You can not ask from a doctor who translates parts of a transaction, that he
always indicates for each part whether his translation is 'pure' or with
additions.

I agree; no-one is going to do this. The only way to know if anything has been added is to read the new material, which may require getting someone in fluent in the other language to do it.

3) There is a (complete or partial) translation + at the same time
additional information. In this case, it's a different transaction. Treat it
as any other new transaction, but with an indication of the language
migration.

Given that there is no way for software to tell the difference between 2) and 3), I think that both cases have to lead to a new transaction.

4) There is an automatic translation of (some of the) terms. Here, one can
look at what happens, completely from the level of concepts instead of
terms. The same thing is there inside the computer. The only difference is
how the user looks at it. So: same transaction. In fact, wherever possible,
a user should always be able to take a look at terms in different languages.
Terms explicitly related to concepts shouldn't be present in the computer as
unrelated terms, but always as concepts (concept-ID's).

I agree - this should be done in the display, and does not change anything in teh EHR.

5) The doctor doesn't know anything of the language and no translation is
done at all. Then a new transaction is made with indication of language.

agree.

We come back to the general principle: "One meaning - one transaction", but
WITH an indication that it concerns a translation (whether partial or
complete). In all other cases, make a new transaction.

But due to the indelibility requirement we have to make a new TRANSACTION anyway...

As for 'primary
language', I think it's best to always have it indicated on the level of
transaction, instead of versioned transaction. This avoids possible errors
when a person moves and moves and moves.

I think I probably agree with this - it maintains the principle that "the current view of the EHR is given by the latest TRANSACTION in every VERSIONED_TRANSACTION". If just translating a transaction causes a new VERSIONED_TRANSACTION, then this rule is broken - some VTs will have out-of-date informatino as their latest version.

This solution is simple and IMO encompassing enough. But: I don't know the
arguments why it was previously said to make a translated transaction always
a different transaction. In a sense, an 'equivalent transaction' IS also
different. Thus as an alternative, you can put it in the same list
(versioned transaction). The version identification scheme doesn't have to
be changed. The most recent transaction is then always the latest
transaction in the list, but if it's an instance of <EQUIVALENT
>, then the one before is also the latest one, which, by meaning,
is true.

I'm not sure how much I like the name, but the idea of EQUIVALENT_TRANSACTION probably should be considered. One point to think about though: what happens if the latest version of say a care plan is partially translated, and this new version now goes back a new TRANSACTION in the VT stack, but it contains only the translated bits. What is the clinical meaning of this: that some parts were deleted because they were not translated, or that some parts where deleted because they are no longer clinically relevant, due to the new care situation of the patient.... or both...?

When I think of this situation, I find I come back to branch versions, where you can always see the latest verion in a given language, but also know, due to committal date/times, what is the "latest" information. In general, this leads to a real version "tree" not just a "trunk".

- thomas beale

Sam · 2 July 2002 01:53

Dear All

There are many issues that have to be addressed when translation becomes the
norm. In the meantime, it is clear that there will need to be a block on
double translations - that is into one language and then into another. For
this reason, I propose that in the first version of openEHR that we do the
following:

1. Force each versioned transaction to have a single root language.
2. Allow 'branch versions' that translate the original versions to another
language - but make these read only.

This would mean some changes to the model - but would ensure that we did not
get translations of translations happening and would enable read only
translations. This would mean that if a care plan was to be updated in
another language for example - a new versioned transaction would have to be
created. The application could do this quite quickly. The previous versioned
transaction - if it clashed with the later - would have to be archived or
moved to another folder.

This might mean that the folders at the highest level might be used for
language.

I think that we need to get more experience with this before we will make
sensible choices in this regard.

Cheers, Sam Heard

Topic		Replies	Views
proposed change to ADL 1.4 to handle langauges & translations better Implementers (archive)	11	23	26 January 2007
[[JIRA] Created: (SPEC-302) Translations embedded in the ADL are not efficient and should instead use 'gettext' catalogs.] Technical (archive)	19	29	4 May 2009
character sets and languages in openEHR Technical (archive)	19	27	6 April 2004
Translation approaches Clinical (archive)	37	61	18 January 2012
Multiple translators Clinical (archive)	25	50	23 March 2015
CEN meeting and data types Clinical (archive)	14	36	7 March 2007
Private response, so OpenEHR list is not for further discussion? Clinical (archive)	5	15	19 September 2005
Updated ADL, AOM 1.5 and new ODIN specifications Technical (archive)	11	16	6 May 2013
Versioning implementations Implementers (archive)	29	53	23 August 2017
distributed development, governance and artefact identification for openEHR Technical (archive)	4	10	24 June 2009

language translation

Related topics