ISO 639 language codes

Currently we use ISO 639-1 codes to specify languages in archetypes. This works fine for most major languages. We may at some point in the future need to be able to tell the difference between several different Sámi languages of which one exists in ISO-639-1 (se/sme), and two only in ISO 639-2 and -3 (sma, smj).

Is it valid to use ISO-639-2 or -3 in archetypes today? If not, are there any plans to extend the specs to include them?

1 Like

Maybe this is related to openEHR JIRA

(thinking a little more about it maybe not, as I assume they share country)
I think your use case would be solved if we update descs, as I’m assuming no tool does a hard check over the lenght or existence of these languages

I agree with the need to specify more languages in archetypes as raised by Silje.

Howwever, from my point of view this involves more than just updating the descriptions in the specs:
I am pretty sure this is checked in one way or another in various tooling, either by using the openEHR external terminology file (as the Java Ref Impl does if I remember correctly), or in a more lenient form if you use for example .NET’s CultureInfo in some way, etc.
This is for a) enforcing the specs, b) retrieving and displaying the name of the language, etc.
In CKM, we use the same set of languages for users and archetypes (as specified in the external terminology file) and essentially tag archetypes as well as users with these languages. A different user interface is required, especially for ISO639-3 with its 6900 languages.

The Jira proposal Diego mentions referring to RFC-5646 is useful and would probably really just be a description update because as it is said there, this is what is done anyway (ISO 639-1 language code optionally followed by an ISO 3166-1 alpha-2 country). Would this actually work the same way with ISO 639-3?

Also a careful consideration if we want to allow all the parts of ISO639 or only one - especially in the common cases where a languge has (different) codes in more than one part of ISO639. It seems important that we don’t use various codes for the same language. There seems to be a canonical form in RFC-5646 which may be be usable for this purpose (not sure) - creating and verifying this canonical form seems rather hard though in its general form.