Representation of ethnicity

A conversation has started on the CKM Discussion page, but I’m bringing it here to open it up for broader community involvement…

Aljoscha Kindermann started the thread on 22 Jan:
In the HiGHmed use case cardiology we need to represent e.g. "caucasian ethnicity" and "black skin color" as anamnesis parameters. The ethnicity has been shown to play a role in different disease characteristics in cardiology.Our first thinking was to include it into the "Health risk assessment" evaluation archetype which we also use to store information about the family prevalence.
However, classifying ethnicity as a risk factor would be insufficient because it can also give information about different necessary treatment approaches.
Therefore I fear that this classification could not only be technically incorrect but also regarded racist..
Is there any experience here of how to to represent ethnicity correctly? 

@ian.mcnicoll responded on 23 Jan
It is a very tricky one, mostly for obvious cultural / human reasons and it came up recently in a conversation amongst Scottish clinical informaticians, in relation to Covid.
There are no universal lists of ethnicity because the granularity is often predicated by local sensitivity/custom e.g in the UK the ‘ethnicity’ list separates ‘UK Irish’ from other caucasian ethnicities because of historical prejudice i.e in this case it is designed to help prevent ongoing prejudice, but, of course, in the wrong hands it might do exactly the opposite.
The natural place for ethnicity is probably really in the demographics space but that can be seen as potentially threatening, and for the perfectly reasonable scientific purpose (which may very well be anonymised) it may make sense to have it in the EHR but where.
Perhaps there is a case for an Ethnicity archetype akin to the Gender archetype (which shares similar cultural challenges), really as placeholder but explaining the challenges and really leave to implementation to locate it safely in the HR and populate an appropriate valueset.

Aljoscha responded again on 26 Jan
Thank you for your thoughts! This suggestion of using the gender archetype as an example for orientation is a very good suggestion in my opinion. This is helpful as the concept of gender is also a challenging discussion.
I like how it (gender archetype) leaves the freedom of a lot of different implementations. I will try to come up with an idea of how to build an archetype which also leaves freedom to use it with different concepts.

@varntzen responded later on 26 Jan
Nice, have a go at a new archetype akin to the Gender archetype for a start. Actually it is not allowed to register that information in Norwegian EHR’s.
As an alternative can we use the specific genetic markers in question for various health risks or Nota Bene information?

@natalia.strauch responded later on 26 Jan
The recording of ethnicity and race is even recommended in the clinic by the FDA: Collection of Race and Ethnicity Data in Clinical Trials | FDA
And the CDISC standard therefore assigns these items together with SEX and AGE to the DEMOGRAPHICS module.
It would therefore be right to also have a specific archetype for Race and Ethnicity Data in the openEHR.

@heatherleslie responded on 27 Jan
I’ve been grappling with the notion of ethnicity and race for some time. It is quite contentious and I think needs careful consideration.
I’ve uploaded a candidate model to a CKM incubator based on work I’ve done in Australia, for consideration -
The Use currently reads as:
“Use to record the identification with one or more cultural and ethnic groupings, usually self-described by the individual.
The concept of ethnicity allows individuals to self-nominate a kinship or connection with a cultural or social group. This may often, but not always, be associated with a geographic region or place of origin.
The concept of categorisation by race or skin colour is often contentious and in some places, the term ‘race’ may be considered interchangeable with ‘ethnicity’. This is common and acceptable in some places, such as the USA, yet is illegal in others, such as Norway. Contributing to the confusion, many value sets for ethnicity also contain values that describe physical qualities such as skin colour or geographical origin. In view of this, ‘race’ has not been explicitly modelled as a separate data element, but instead ‘Ethnicity’ has been represented with the option for multiple occurrences so that it could be represented and renamed in a template, or it may be feasible for ‘Race’ to be added as a separate data element in a future specialisation.
Typically ethnicity is considered as a component of a demographic record for an individual, however it has been represented within this clinical archetype, for when it needs to represent clinical data or be used in an algorithm within a clinical system and access to appropriate demographic data or values sets is not feasible.”


It would be nice to restart this discussion. We’ve encountered a use case for an archetype dealing with data closely connected to ‘race’, and physical appearence when categorising donors of eggs or semen for artificial reproductive medicine. The draft archetype (see link in this thread) made by @heather.leslie in January, and updated in August is a good candidate for the ethnicity/race data. The concept name is a bit inaccurate - as ‘ethnicity’ is a cultural (and by this changable) concept while ‘genetic origin’ or ‘ancestry’ is stable.

I suggest that we move this archetype forward for review.

Perhaps “Group affiliation” covers the concept better, and is more political correct.

Group is a very broad concept though. I’m affiliated in some way with many groups, including Norwegians, registered nurses, clinical modellers, people with Roma ancestry, foster parents, and judoka. Are all of those relevant to the intended semantics?

Population group?

Hi there, I am new to the forum. May I suggest “race and ethnicity” as it may be the most precise and inclusive description: Updated Guidance on the Reporting of Race and Ethnicity in Medical and Science Journals.


Thanks for sharing, Peter.

This paper teases out so many of the evolving issues that relate to this area that are critical to good data recording, especially into the future. However I’m not sure it is advocating for labelling this concept as ‘Race and ethnicity’ so much as saying “…if you’re already recording data about race and ethnicity this is what you need to consider…”. We need to educate people about these issues and do better as a community, especially where this applies to social determinants of health.

However, I have major ethical issues with naming the archetype concept ‘Race and …’ Clearly race is captured in some parts of the world, but is it illegal to do so in others. While one of the data elements supports capturing data about race, for when it is required, I definitely don’t think it should be the focus.




Thank you the reply.
I understand your point. Indeed, direct translations of “race and ethnicity” would make it very poorly in some locales, like in Norway, where one would almost never use the term “rase” about people. How about “ancestry”?


How does healthcare work if capturing race is illegal? It can be highly relevant right? Race is about genetic ancestry right, or am I missing something?

I might indeed be wrong:

It’s widely agreed that race is a classification system designed by humans that lacks a genetic basis

There is no evidence in the article to back that claim though.

The article does cover the status quo of race in medical risk calculation nicely I believe. My summary would be: race is problematic in risk calculation, because it’s possibly based on under diagnosis due to social influences and will lead to systemic under diagnosis and under treatment. There is however little evidence presented that under diagnosis and under treatment are worse than over diagnosis and over treatment.
So for openEHR I have not found a reason not to allow the capture of race in an archetype. But a reference or comment on the difficulties and alternatives may be warranted.

Hi Joost,
Race is included in the candidate archetype. Because it is used as part of decision support algorithms in places like the US. I believe that this is being reconsidered, but will take a long time for this to be phased out, if ever.

But in Norway I’m told it is illegal to record race, so other factors must be used in their algorithms.

It’s beyond my pay grade to argue the merits one way or another, but the archetype should carry all aspects that contribute to how we humans group ourselves, without moral judgement :zipper_mouth_face:


In general I think most of the race/ethnicity subsets are kind of a mess. The concepts themselves are not that clear/well defined. E.g. for hispanic, there is no consensus if it’s a race, an ethnicity or both.

I think nowadays the consensus it that ‘race’ has no biological basis, and ethnicity provides an inexact genetic background. Differences in outcomes are more about the environmental and social contexts of the patient than the ‘race’ itself

I agree that archetypes should carry everything, but also be as informative as possible. Maybe the use/misuse part of the archetype should state the parts of the world where we know it’s illegal or not recommended to capture this information (or where it is usually captured)

Wikipedia helps me out:

This article is about the biological taxonomy term. For the anthropological term, see Race (human categorization)
Race (biology) - Wikipedia

Wich in turn explains:

While partially based on physical similarities within groups, race does not have an inherent physical or biological meaning
Race (human categorization) - Wikipedia

I didn’t know much about race being a human categorization, and have just been educated that there is a crucial difference between “race” as a biological taxonomy term and “race” as a human categorization without inherent biological meaning. In my (biomedical) mind race only existed to describe biological concepts.

Should this all be stored in the EHR? Or are there human groupings that belong in the demographic server?

In England, this information is carried on the PDS national demographics service, so yes that is definitely an option but for very obvious reasons, the whole issue is immensely politically and culturally charged with local legislative pressures , so very unlikely IMO we will ever see true international consensus, even on definitions of the element names.

Having said that, I think the current draft archetype is a a very good attempt


Totally agree - it’s a mess. The most helpful resource I’ve found so far is Wikipedia :woozy_face: - see Ethnic group
" Depending on which source of group identity is emphasized to define membership, the following types of (often mutually overlapping) groups can be identified:

In many cases, more than one aspect determines membership: for instance, Armenian ethnicity can be defined by citizenship of Armenia, native use of the Armenian language, or membership of the Armenian Apostolic Church."

It does already, but please suggest improvements. All welcome.


Hi Joost,

There is no black and white, simple answer here.

Almost all of these data elements in the archetype are focused on personal identification with a group - this is all about social context and a feeling of belonging, possibly entered into a PHR. It is so important in the EHR context re mental health and wellness, allowing people to properly see the patient in their social context. Also some of these things might be used to trigger clinical decision support.

There is an enormous overlap between these data elements and a formal demographic server. What is kept on a server can vary considerably, and whether it actually belongs more appropriately in one place or the other is a discussion above my paygrade and expertise.

My modelling philosophy in this situation tries to keep it simple - if it is relevant in the clinical context we should model it. Then we can let the end users/implementers negotiate/decide what is kept where or a hybrid solution based on clinical requirements, cultural context etc. We can only provide good quality options and can’t control how it is rolled out.

I’ve struggled with how to capture these nuances for a couple of years now - especially in recording social determinants of health. After many times of trial and error I think it is ready for public scrutiny and discussion (like now :sunglasses:)

If anyone can tease it out further or better… please be my guest :raised_hands:



If you ask an geneticist, he/she will most probably say that there are only one race of humans today. But we have a variations in some genes, that alters the physical properties, as colour of the skin. (Which is btw defined by more than 100 genes, if you have “white” skin colour you have more of the “light skin tone variants”, if you have a dark skin colour, you have more of the dark variants. The frequencies of those vary, but still are the same genes). It is equally stupid to divide humans by the size of their nose, length of their right index finger, or colour of their eyebrows.

We need to get rid of “race” related to humans. Period. BUT as it is in use as a proxy for determinants to health, social status (and racism), AND also is a identity marker with positive value for some individuals (for example Black Life Matters), it is propably a reason to we should nevertheless reflect that in an archetype. Whether it is this proposed archetype or another, I do not know.

A problem with this Cultural and Ethnic identity archetype is to scope it down. As humans, we have more identities. A gender identity - is that a cultural invention? Sexual orientation identity? Do I have an identity as “Nerd” or “Archetype modeller”?

The more I think of it, the more unsure I become. We do already have archetypes (published or in draft) for what is considered to be other health determinants: Income, Education, Housing, Occupation, Alcohol and Tobacco use, and … maybe Religious affiliation. We miss some related to “Minority within the dominant population” - and “Connection to the society”-type of archetypes.

Sorry, folks. I need to ponder more about this. Now you at least know what my brain struggles with! :smiley:

Hi Vebjørn,

I share your pain, but I can’t take the responsibility for past choices made by others.

We have what we have and need to cater for it as sensibly as possible (a la FHIR representing what is currently in systems.

That said, we also have an opportunity to guide better practice (note it may not be best… yet) in our archetypes.

We need to manage that tension and find a balance and then let the archetypes run free… :running_woman: :running_man:


Nobody is blaming you, Heather. We just need to figure out how to, or even if, we want to use the race notion. I agree that if we do, we need to guide.

