Deprecated languages in openEHR external terminologies file

vanessap · 9 December 2025 15:12

Hi!
Over the last few weeks, I have been demonstrating how to create translations of Archetypes using the CKM and ADL Designer. In this case, the focus is on European Portuguese (pt-PT) and Brazilian Portuguese (pt-BR), where the differences in language can sometimes be greater than those between British English (en-GB) and American English (en-US).

I was surprised that the ADL designer said pt-PT was deprecated, but not the CKM.

I contacted Borut from Better and he told me that this comes from a copy of the external terminologies from openEHR, that can be found in this repository:

github.com/openEHR/specifications-TERM

computable/XML/openehr_external_terminologies.xml

master

<terminology name="openehr" language="en" version="3.1.0" date="2024-04-11">
	<codeset issuer="ISO" openehr_id="countries" name="countries" external_id="ISO_3166-1">
		<code value="AF" description="AFGHANISTAN"/>
		<code value="AX" description="ÅLAND ISLANDS"/>
		<code value="AL" description="ALBANIA"/>
		<code value="DZ" description="ALGERIA"/>
		<code value="AS" description="AMERICAN SAMOA"/>
		<code value="AD" description="ANDORRA"/>
		<code value="AO" description="ANGOLA"/>
		<code value="AI" description="ANGUILLA"/>
		<code value="AQ" description="ANTARCTICA"/>
		<code value="AG" description="ANTIGUA AND BARBUDA"/>
		<code value="AR" description="ARGENTINA"/>
		<code value="AM" description="ARMENIA"/>
		<code value="AW" description="ARUBA"/>
		<code value="AU" description="AUSTRALIA"/>
		<code value="AT" description="AUSTRIA"/>
		<code value="AZ" description="AZERBAIJAN"/>
		<code value="BS" description="BAHAMAS"/>
		<code value="BH" description="BAHRAIN"/>

This file has been truncated. show original

What is the rationale behind this?

siljelb · 9 December 2025 16:51

I have no idea about the reasoning behind this, but it happened here: renaming descriptions, reordering, deprecating and fixing some langua… · openEHR/specifications-TERM@17c3125

yampeku · 10 December 2025 20:30

The was reason to have PT, EN, and ES as default languages for the languages and not the dialects is assuming pt-PT as PT, es-ES as ES, en-GB as EN (not sure if there are more, probaly FR?). I understand the rationale in the way that even if you are translating archetypes it makes no sense having 20 different dialect translations for the same archetype (which I remember it was starting happening in some time in the past). So the default is to just create translations to the language in general. And in the unlikely chance that there is actually a difference, you are still able to create an specific dialect translation.

I assume the PT-PT is marked as deprecated and not completely removed for the reason you mentioned, there could be bigger differences and could be good to point out, but in general my understanding is that you would use pt for European Portuguese

sebastian.garde · 10 December 2025 21:00

The differences between European and Brazilian Portuguese are far more substantial than for the English variants (in both vocabulary and grammar).

While I generally agree with Diego‘s principle above, we may have made the wrong call for Portuguese because the likelihood of having to deal with more than the very simple differences such as colo(u)r, centre/center, speciali[s,z]e is far larger than in English…

Please correct if I am wrong @vanessap - this is just my limited experience.

siljelb · 11 December 2025 09:36

Tbh I don’t think it’s our place to define which languages are appropriate or not for translating archetypes, and I don’t think that “dialects” should necessarily be a criteria for exclusion. As an example, Norwegian has two official written standards; nynorsk (nn) and bokmål (nb). These are almost by definition written dialect variants of the same language as used in the same country, but we still need to be able to translate archetypes into both.

yampeku · 11 December 2025 09:44

The intention of this was not to exclude dialects, but to assume we don’t need to create 10 translations to each dialect when the rubrics would always be the same. Nobody is excluding dialects, in any case would be the other way around, assuming a dialect as the “main” one

siljelb · 11 December 2025 10:01

But in some cases there isn’t a main one, like for the Norwegian example the two written variants are legally equal. And for variations of a language across several countries, how can you define one as “main”? By number of speakers, by country of origin, or something else?

sebastian.garde · 11 December 2025 11:58

You are lucky there because you have two different language codes nb and nn, rather than (say) no-nb and no-nn. Otherwise, you’d have to ask yourself as well what a generic “no” means.

For NN and NB it is probably so different, that a generic Norwegian code doesn’t make sense.
For English, Spanish or German, it is so close that typically/often you get away with not using regional variants.
For Portuguese you are somewhere in between.

There is nothing more complex than languages and its variants, macro-languages etc.

The thing is: If we want to have a list to choose languages from at all, there is no escaping from making these kind of decisions:

The first decision is to focus on “Simple language subtags” (see ISO 639-1) and then the “most important” “Language-Region” codes. But not for example Language-Script-Region or Language-Region-Variant codes. See RFC 5646 - Tags for Identifying Languages for some details on the complexity.
The above is not simple because there are a total of approx. 7000 living languages and a rough guess of 15000-40000 dialects / regional variants, depending on where you set the boundaries - which ones do we want to include, which ones not.
Practically: how do we avoid unnecessary translations into regional variants as well as guiding translators. Looking at the list, I notice there is no de-DE (for German in Germany), no es-ES for Spanish in Spain), nor fr-FR for French in France but no such decision has been made for English/en (to exclude e.g. en-GB or en-US, or if you ask me: en-AU - and thus mark as “the” main variant). You could of course include de-DE, es-ES, fr-FR but then when to pick “de” and when “de-DE”, etc.?

It is unfortunate that it is not as simple as with countries (and even there we have enough disputes of course - but at least there are ISO 3166 codes (and official names) to refer people to - e.g. is it “Taiwan” or “Taiwan (Province of China)” List of ISO 3166 country codes - Wikipedia. Not for openEHR to decide or have an official opinion on, luckily.

For languages, however, it is and always will be a living document based on judgement calls - assuming we want to have a list…And if openEHR is suddenly getting massive interest in the small region of the Philippines where Abenlen Ayta is an important indigenous Austronesian language, we may consider adding that to the list.

Back to Portuguese: There is not only European Portuguese and Brazilian Portuguese, but quite a few other variants in Africa and Asia. Glottolog 5.2 - Portuguese is a really nice resource to explore the languages.

These other variants are typically more similar to the European variant, especially in writing. What does it mean for this list? I don’t know! Add all variants and imply that the main “pt” code should not be used? From the perspective that we want to avoid having to create many variants it makes sense to have European Portuguese as the main language, but since pt-BR is too different and also important, it feels condescending to do so at the same time.

My practical opinion on the concrete issue at hand: It seems to me that typically pt-PT and pt-BR are used in localisation, and we should do the same in this case and remove the “Deprecated” text from pt-PT.

siljelb · 11 December 2025 12:50

Agree.

As for others, I think native language users will need to decide among themselves whether it makes sense to include variants or not. What we do know though is that ISO 639-1 is not sufficient to cover our official minority languages, ref ISO 639 language codes - Specifications - openEHR.

It doesn’t legally exist

thomas.beale · 11 December 2025 12:53

First of all, we need to reflect on the requirement: to maintain meaning so that no errors of understanding occur when e.g. a term text is display in some language, to some supposed consumer of that language. Possible categories of difference between regions using the same language:

a ‘hard’ difference in usage of a medical term or use of a different term; I don’t have one to hand, but I’d say if we investigate ‘angina’ in most languages and their regional / country variants, we’ll see some interesting things; Brazil is likely also to have more / differing terms relating to tropical diseases and their symptoms
grammar differences, e.g. present continuous tense in pt-PT v pt-BR
colloquial differences, i.e. ‘how we say it’ is just different, for no good reason

Unless one of these exists, I would not be trying to use the regional variants as a way to allow different translation styles that encode no semantic differences - then we are just into competing translations of e.g. War and Peace into English (there are quite a few).

I have to admit I am not sure how much work we should put into spelling differences. As a speaker of international English, it is intensely annoying to see ‘pediatric’ instead of ‘paediatric’ (still universally used in the UK), but if you look at the word ‘foetal’ / ‘fetal’, it’s completely mixed all over the world - not only the Americans use ‘fetal’. If the text said something like ‘… according to the protocol originally developed at Northampton Paediatric clinic, UK…’, then you’d want the proper spelling, even in the default ‘en’ translation. At least for English, theoretically we could say that ‘en’ is International English, but the use of ‘colour’ etc may be more annoying to the North Americans than ‘color’ is to us.

And don’t get me started on ‘plow’, ‘modeling’. Nevertheless, one must keep things in perspective…

My original vision was that things like archetype terminologies would have (say)

a ‘pt’ translation of all ‘en’ items, and where needed (= in real semantic cases),
override ‘pt-PT’ terms and
override ‘pt-BR’ terms
override ‘pt-xx’ terms, if applicable (Angola, Sao Tome, Timor etc)

In this scheme we have to decide what the (default) ‘pt’ version of a term is - is it the Portugal one, Brazilian, or other? I don’t think it matters, as long as the overrides are done correctly.

The processing should always be that the translation relating to the locale is chosen, and then if not present (usually it won’t be), just get the default translation of that language. Then what is shown on the screen should always be right.

Any argument over pure style should be resolved within the parent language first, not result in regional variants for no good reason.

Olha_Nikolaieva · 17 December 2025 16:43

@emmanuel.eschmann @HHeiser this thread is very relevant to our discussion in DMEG about Swiss localisations and language variants

emmanuel.eschmann · 18 December 2025 16:34

Thank you very much, @Olha_Nikolaieva , for bringing this thread to our attention.

The discussion here (especially at https://discourse. openehr.org/t/deprecated-languages-in-openehr-external-terminologies-file/11646/10) confirms our decision made at the openEHR.ch DMEG meeting on 16 December 2025 (https://openehr.atlassian.net/wiki/x/AQDtxg / Section “Übersetzungen”) to contribute and use DE translations for German-speaking Switzerland in a first iteration. Only in a later iteration, and if there is a real need, could we then add a DE-ch regionalisation.

vanessap · 16 January 2026 17:27

Sorry for the delayed response. Thanks everyone for your input. The PT case is indeed tricky because the same words can have very different meanings across dialects. For example, “rapariga” in pt-PT simply means “young girl/girl”, while in pt-BR it can be understood as an offensive term, which this can easily lead to uncomfortable misunderstandings if we are not aware of these differences.

I am going to ask the Portuguese affiliate for their input.

I will also open a change request to remove the “deprecated” on pt-PT

sebastian.iancu · 18 January 2026 23:04

Sorry for late reply on this, I missed the whole thread.

The change in the terminology xml file was done by me in order to correct some technical errors - I have no grounded opinion and “jurisdiction”, neither anything to say about how is, or how supposed to be used in archetypes.

The problem that I fixed (not only with Portuguese language) was related to incorrect ISO codes, particularly ISO 639-1. It supposed to be a 2 letter string. But actually what we had in the files was a mix of ISO 639-1 and BCP-47 language tags.

To explain:

ISO 639-1 language codes: short language identifiers like pt (Portuguese), en, fr.
BCP 47 / IETF language tags: the common “language[-region[-…]]” format used in HTTP (Accept-Language), HTML (lang), many frameworks. Examples: pt, pt-PT, pt-BR.

From the perspective of BCP-47 pt-PT is equivalent as just pt (in fact it means Portugese, unspecified region) - but pt-PT is not a valid ISO 639-1 language code, hence my change to deprecate it. Also, the other problem we have is that all those regions are in lowercase pt-br, but according to the standard it should be uppercase pt-BR.

If we need to de-deprecate pt-PT for good reasons - then of course we can do it, let me know - happy to have feedback on it

sebastian.iancu · 4 March 2026 09:37

Coming back to this - should we revert (de-deprecate) the changes on pt-PT for an upcoming Terminology release?

Topic		Replies	Views
ISO 639 language codes Specifications	5	505	2 September 2024
Norwegian languages in openehr terminology Reference Implementation: Java (archive)	7	10	15 November 2011
US-centric spec aliasing? Specifications	12	561	9 November 2022
SNOMEDCT - correct representation Implementers (archive)	49	86	20 May 2017
Invalid language codes in languages codeset Reference Implementation: Java (archive)	20	41	2 April 2014
AD support for multilingual editing of templated value sets Archetype Designer	7	138	11 October 2024
[[JIRA] Created: (SPEC-302) Translations embedded in the ADL are not efficient and should instead use 'gettext' catalogs.] Technical (archive)	19	19	4 May 2009
constraint binding error Technical (archive)	35	47	24 February 2011
Archetype original language and translation policies Clinical	10	933	22 June 2021
Support of multilingual annotations Archetype Designer template	19	263	27 January 2025

Deprecated languages in openEHR external terminologies file

Related topics