Deprecated languages in openEHR external terminologies file

Hi!
Over the last few weeks, I have been demonstrating how to create translations of Archetypes using the CKM and ADL Designer. In this case, the focus is on European Portuguese (pt-PT) and Brazilian Portuguese (pt-BR), where the differences in language can sometimes be greater than those between British English (en-GB) and American English (en-US).

I was surprised that the ADL designer said pt-PT was deprecated, but not the CKM.

I contacted Borut from Better and he told me that this comes from a copy of the external terminologies from openEHR, that can be found in this repository:

What is the rationale behind this?

I have no idea about the reasoning behind this, but it happened here: renaming descriptions, reordering, deprecating and fixing some langua
 · openEHR/specifications-TERM@17c3125

The was reason to have PT, EN, and ES as default languages for the languages and not the dialects is assuming pt-PT as PT, es-ES as ES, en-GB as EN (not sure if there are more, probaly FR?). I understand the rationale in the way that even if you are translating archetypes it makes no sense having 20 different dialect translations for the same archetype (which I remember it was starting happening in some time in the past). So the default is to just create translations to the language in general. And in the unlikely chance that there is actually a difference, you are still able to create an specific dialect translation.

I assume the PT-PT is marked as deprecated and not completely removed for the reason you mentioned, there could be bigger differences and could be good to point out, but in general my understanding is that you would use pt for European Portuguese

The differences between European and Brazilian Portuguese are far more substantial than for the English variants (in both vocabulary and grammar).

While I generally agree with Diego‘s principle above, we may have made the wrong call for Portuguese because the likelihood of having to deal with more than the very simple differences such as colo(u)r, centre/center, speciali[s,z]e is far larger than in English


Please correct if I am wrong @vanessap - this is just my limited experience.

Tbh I don’t think it’s our place to define which languages are appropriate or not for translating archetypes, and I don’t think that “dialects” should necessarily be a criteria for exclusion. As an example, Norwegian has two official written standards; nynorsk (nn) and bokmĂ„l (nb). These are almost by definition written dialect variants of the same language as used in the same country, but we still need to be able to translate archetypes into both.

The intention of this was not to exclude dialects, but to assume we don’t need to create 10 translations to each dialect when the rubrics would always be the same. Nobody is excluding dialects, in any case would be the other way around, assuming a dialect as the “main” one

But in some cases there isn’t a main one, like for the Norwegian example the two written variants are legally equal. And for variations of a language across several countries, how can you define one as “main”? By number of speakers, by country of origin, or something else?

1 Like

You are lucky there because you have two different language codes nb and nn, rather than (say) no-nb and no-nn. Otherwise, you’d have to ask yourself as well what a generic “no” means.

  • For NN and NB it is probably so different, that a generic Norwegian code doesn’t make sense.
  • For English, Spanish or German, it is so close that typically/often you get away with not using regional variants.
  • For Portuguese you are somewhere in between.

There is nothing more complex than languages and its variants, macro-languages etc.

The thing is: If we want to have a list to choose languages from at all, there is no escaping from making these kind of decisions:

  • The first decision is to focus on “Simple language subtags” (see ISO 639-1) and then the “most important” “Language-Region” codes. But not for example Language-Script-Region or Language-Region-Variant codes. See RFC 5646 - Tags for Identifying Languages for some details on the complexity.
  • The above is not simple because there are a total of approx. 7000 living languages and a rough guess of 15000-40000 dialects / regional variants, depending on where you set the boundaries - which ones do we want to include, which ones not.
  • Practically: how do we avoid unnecessary translations into regional variants as well as guiding translators. Looking at the list, I notice there is no de-DE (for German in Germany), no es-ES for Spanish in Spain), nor fr-FR for French in France but no such decision has been made for English/en (to exclude e.g. en-GB or en-US, or if you ask me: en-AU - and thus mark as “the” main variant). You could of course include de-DE, es-ES, fr-FR but then when to pick “de” and when “de-DE”, etc.?

It is unfortunate that it is not as simple as with countries (and even there we have enough disputes of course - but at least there are ISO 3166 codes (and official names) to refer people to - e.g. is it “Taiwan” or “Taiwan (Province of China)” List of ISO 3166 country codes - Wikipedia. Not for openEHR to decide or have an official opinion on, luckily.

For languages, however, it is and always will be a living document based on judgement calls - assuming we want to have a list
And if openEHR is suddenly getting massive interest in the small region of the Philippines where Abenlen Ayta is an important indigenous Austronesian language, we may consider adding that to the list.

Back to Portuguese: There is not only European Portuguese and Brazilian Portuguese, but quite a few other variants in Africa and Asia. Glottolog 5.2 - Portuguese is a really nice resource to explore the languages.

These other variants are typically more similar to the European variant, especially in writing. What does it mean for this list? I don’t know! Add all variants and imply that the main “pt” code should not be used? From the perspective that we want to avoid having to create many variants it makes sense to have European Portuguese as the main language, but since pt-BR is too different and also important, it feels condescending to do so at the same time.

My practical opinion on the concrete issue at hand: It seems to me that typically pt-PT and pt-BR are used in localisation, and we should do the same in this case and remove the “Deprecated” text from pt-PT.

Agree.

As for others, I think native language users will need to decide among themselves whether it makes sense to include variants or not. What we do know though is that ISO 639-1 is not sufficient to cover our official minority languages, ref ISO 639 language codes - Specifications - openEHR.

It doesn’t legally exist :laughing:

First of all, we need to reflect on the requirement: to maintain meaning so that no errors of understanding occur when e.g. a term text is display in some language, to some supposed consumer of that language. Possible categories of difference between regions using the same language:

  • a ‘hard’ difference in usage of a medical term or use of a different term; I don’t have one to hand, but I’d say if we investigate ‘angina’ in most languages and their regional / country variants, we’ll see some interesting things; Brazil is likely also to have more / differing terms relating to tropical diseases and their symptoms
  • grammar differences, e.g. present continuous tense in pt-PT v pt-BR
  • colloquial differences, i.e. ‘how we say it’ is just different, for no good reason

Unless one of these exists, I would not be trying to use the regional variants as a way to allow different translation styles that encode no semantic differences - then we are just into competing translations of e.g. War and Peace into English (there are quite a few).

I have to admit I am not sure how much work we should put into spelling differences. As a speaker of international English, it is intensely annoying to see ‘pediatric’ instead of ‘paediatric’ (still universally used in the UK), but if you look at the word ‘foetal’ / ‘fetal’, it’s completely mixed all over the world - not only the Americans use ‘fetal’. If the text said something like ‘
 according to the protocol originally developed at Northampton Paediatric clinic, UK
’, then you’d want the proper spelling, even in the default ‘en’ translation. At least for English, theoretically we could say that ‘en’ is International English, but the use of ‘colour’ etc may be more annoying to the North Americans than ‘color’ is to us.

And don’t get me started on ‘plow’, ‘modeling’. Nevertheless, one must keep things in perspective


My original vision was that things like archetype terminologies would have (say)

  • a ‘pt’ translation of all ‘en’ items, and where needed (= in real semantic cases),
  • override ‘pt-PT’ terms and
  • override ‘pt-BR’ terms
  • override ‘pt-xx’ terms, if applicable (Angola, Sao Tome, Timor etc)

In this scheme we have to decide what the (default) ‘pt’ version of a term is - is it the Portugal one, Brazilian, or other? I don’t think it matters, as long as the overrides are done correctly.

The processing should always be that the translation relating to the locale is chosen, and then if not present (usually it won’t be), just get the default translation of that language. Then what is shown on the screen should always be right.

Any argument over pure style should be resolved within the parent language first, not result in regional variants for no good reason.

1 Like

@emmanuel.eschmann @HHeiser this thread is very relevant to our discussion in DMEG about Swiss localisations and language variants

1 Like

Thank you very much, @Olha_Nikolaieva , for bringing this thread to our attention.

The discussion here (especially at https://discourse. openehr.org/t/deprecated-languages-in-openehr-external-terminologies-file/11646/10) confirms our decision made at the openEHR.ch DMEG meeting on 16 December 2025 (https://openehr.atlassian.net/wiki/x/AQDtxg / Section “Übersetzungen”) to contribute and use DE translations for German-speaking Switzerland in a first iteration. Only in a later iteration, and if there is a real need, could we then add a DE-ch regionalisation.