Issues and ideas on the demographic model

pablo · 17 March 2023 03:41

Hi all, I’m progressing with a demographics data repository and demographics REST API and found some issues I would like to share to have your opinions.

For reference, this is the demographics UML:

many to one association between ACTOR and ROLE

ACTOR has a reference to 1 or many ROLEs
ROLE has a reference to 1 ACTOR

When creating those objects, since the references between them are bidirectional and mandatory, you can’t actually create an ACTOR having a reference to an existing ROLE and viceversa, so if the ACTOR is created first then the roles collection will be empty, then the ROLE is created and can have a reference to the existing ACTOR, and after that you can update the ACTOR adding the reference to the existing ROLE in roles.

So the model won’t be valid until this “transaction” is completed.

IMO bidirectional relationships in these kind of generic models are problematic, and there are many ways of representing the same thing without bidirectional associations.

One way is leaving ROLE as it is, and removing ACTOR.roles. Then ACTOR.roles() could be a method that returns all ROLEs that have performer to be this ACTOR (by uid).

Another way is not making one of the sides mandatory, like ACTOR.roles [0…*], though that could lead to ACTORs without a ROLE.

PARTY_IDENTITY and CONTACT have a purpose() method

In the spec, the purpose() is said to be taken from the inherited LOCATABLE.name

Though the semantics of a purpose and a name are very different, and what the spec says will depend on how modelers designed the archetypes and set the names there, I mean, to be compliant with what the spec says modelers should use the LOCATABLE.name field to actually set a purpose for that class (identity or contact), which is difficult to check.

Another thing is we might be missing terminology items for those “purposes”. The spec doesn’t specify how the purpose is formalized, and doesn’t mention if it needs to be formalized or coded at all, which I think it might be useful.

Overwhelming container objects

Since both, ACTOR and ROLE, are LOCATABLE, when creating an ACTOR (PERSON, ORGANISATION, etc.) or ROLE, a CONTRIBUTION, VERSIONED_OBJECT and ORIGINAL_VERSION should be created for each.

So let’s say we have an API to create PERSON and ROLE, and the ACTOR.roles was removed (see 1.), follow this flow:

a. p = create_person(…) > creates CONTRIBUTION, VERSIONED_PERSON, ORIGINAL_VERSION and data PERSON.
b. r1 = create_role(…, p) > creates CONTRIBUTION, VERSIONED_ROLE, ORIGINAL_VERSION and data ROLE, with performer = PARTY_REF(p)
c. r2 = create_role(…, p) > same as above

So when creating one PERSON with two ROLEs we have 3 CONTRIBUTIONS, 3 VERSIONED_OBJECTS, 3 ORIGINAL_VERSIONs, etc.

Since ROLE is a per-instance role, meaning that two PERSONs with the same ROLE ‘doctor’ will generate 2 objects ROLE each with ‘doctor’ and a different performer (PERSON), then we can consider ROLE is a weak entity in relation with PERSON or any ACTOR, so a simpler solution would be to create VERSIONED data at the ACTOR level and leave ROLE as depending on the ACTOR versioning, like subfolders in a directory FOLDER structure.

With that, creating a PERSON with 2 ROLEs will generate just one CONTRIBUTION, VERSIONED_OBJECT and ORIGINAL_VERSION, while the version.data will be the PERSON containing 2 ROLEs, of course for this to happen, also the relationships between ACTOR and ROLE need to be modified, like I mentioned in 1. or by removing the ROLE.performer and just having the ACTOR.roles on the ACTOR side, that way the current bidirectional kind-off graph structure could be just a tree, because in the current model ACTOR and ROLE are both top-level classes, though conceptually and technically a ROLE, how it’s currently modeled, can’t exist without a performer.

Note these considerations and changes to the model can simplify how the demographic API could work in the near future.

What to others think?

thomas.beale · 17 March 2023 09:02

Question 1 - ACTOR/ROLE relationship

Correct. This model doesn’t indicate something that it should, which is that one of those relationships is not to be persisted, only populated on retrieval. In other places in the RM I have usually shown the relationship on the ‘N’ side being persisted i.e. ACTOR.roles in this case, with the other being populated on retrieval, in this case ROLE.performer being computed by finding the ACTOR that has this ROLE in its roles property.

That might not be very efficient, and it could be done the other way around, such that ACTOR.roles is computed on retrieval, which is likely to be easier based on indexing.

In a logical sense, it could be but of course there is no method on ACTOR that could know how to go and find the ROLES that have it as parent - this has to be a higher level operation whose result could be put into ACTOR.

A third possibility is to get rid of both of these attributes, and just use the generic PARTY_RELATIONSHIP to represent them.

Semantically that would be no problem - ACTORs can exist with no ROLE, since an ACTOR is a real entity, and there is always an default role of ‘self’, i.e. citizen or company etc, that just exists within society.

I started looking at some other ways of modelling these kind of relationships in the draft Entity model, see here.

Question 2 - PARTY_IDENTITY and CONTACT purpose() method

I agree we should review this.

Question 3 - Overwhelming container objects

See the Entity model - it models ROLE as PERSONA (‘role’ is badly overloaded in demographic models and most people get confused among:

capability of a person, e.g. General Practitioner
role of the person, acting as a GP in an organisation, e.g. state health system
a job post within an organisation, that could be advertised and filled by someone who is a GP

You will also see that ACTOR → PERSONA is a direct relationship in that model, not via an id. This will have the effect you are talking about. I am not 100% if this is a good idea, but if the assumption that an ACTOR’s number of PERSONAs is almost always low (e.g. < 10) it should work. Note that PERSONA instances are completely dependent on the ACTOR, so it would be reasonable to expect that an update to a PERSONA might often involve updates to other PERSONAs of the same ACTOR.

There is also a class ACCOUNTABILITY to describe job post definition, which defines the responsibility of a PARTY (usually a PERSONA) doing that job.

damoca · 17 March 2023 10:04

This is exactly what I was thinking. Simplify the model (ROLE could be just an empty class with the “capabilities” relation), and use PARTY_RELATIONSHIP to link ACTOR and ROLE

thomas.beale · 17 March 2023 11:02

You’ll see on the Entity model some PARTY/PARTY relationships whose names are of the form r_xxx. For now I am considering these as analysis level relationships that could all be implemented using PARTY_RELATIONSHIPs, marked in a specific way (see ENTITY_RELATIONSHIP.type - could be used for this).

This models are tricky for reasons like Pablo mentioned - you only have to imagine a few million instances to see that some thought is required on efficiency. And of course a real MPI or other Entity service could easily have billions or 100bn instances over time.

My current view is to use the Entity model as a place to do some analysis and then either:

propose the result as a new (better) model for demographics and other entities (places, things etc - see here)
and/or backport just a few things to create a v1.x or v2 demographic model with just a few additions / modifications, particularly Accountability and the improved hierarchy below AGENT.

Others may have better ideas on all this - I’d be very happy to see this as a group activity.

pablo · 17 March 2023 14:30

That is what I though of as the best alternative, though I didn’t consider what you mentioned about persistence vs. populate when retrieving.

The problem with that approach is it generates two conceptual models, one with constraints/rules for input and another for output, adding complexity.

I think if the purpose is needed, it shouldn’t rely in the inherited LOCATABLE.name, maybe it’s a new field?

The model looks cleaner, related to item 3 on my list, that might avoid having too many top-level classes and more tree-like structure in the demographic/entity model.

Do you have a text spec of that model? I would like to read what the new relationships there mean, for instance, I’m guessing AGGREGATE_AGENT.r_members is a virtual relationship using PARTY_RELATIONSHIP behind. Same for ORG_ENTITY.r_parts.

Then we can discuss other details, like the meaning of AGENT there, but that’s a different thread.

Thanks for taking the time to review this.

pablo · 17 March 2023 14:50

These are the summarized challenges I see with current demographics:

Top-level archetypable class relationships make instances to be graphs not trees
Modeling such relationships is difficult, PARTY_RELATIONSHIP is itself a top-level LOCATABLE class.
ACTOR-ROLE relationships are the same: top-level archetypable class relationships, adding the complexity of the bidirectionality.
Querying such graph data is difficult to current AQL, which works with CONTAINS to navigate parent/descendants, think of this:

Pablo → RELATIONSHIP(spouse) → Barbara
Pablo → RELATIONSHIP(father) → Lorenzo

Query: get all family members of Lorenzo.

Pseudo AQL:

SELECT p
FROM PERSON lorenzo, PERSON p CONTAINS PARTY_RELATIONSHIP r
WHERE r.target =p AND r.source=lorenzo AND lorenzo.id = xxxx

Issues:

a. the query needs to have the relationships declared with the source being the record that you have the id for (Lorenzo)
b. there are some relationship semantics, like the bidirectionality of “spouse”, or “father” in one way means “child” in the other way, that is not considered by the query
c. so if all relationships are not explicitly declared, the query doesn’t work
d. this is more like a inference engine thing than a query (like an OWL ontology graph), since there are semantics in the relationships that can affect query results

This would be simpler if the “family” is modeled as a GROUP, then querying all members, that way the graph is flatten to a one-to-many relationship, which is actually like running the inference engine over the graph and getting all the inferred relationships to form the GROUP.

I’m just starting to think about this and my head hurts

thomas.beale · 17 March 2023 15:04

Yes (here), but… the documentation is sparse right now

Your guesses correspond to how I am currently thinking to specify this.

pablo · 17 March 2023 20:54

Making all relationships between PARTIES be modeled in the same way, makes implementation easier, because it’s like a JOIN table in SQL, and can query on the source or target side, or both.

Also it’s kind of RDF triples or OWL verbs, which can be used to infer, if all relationships are well defined, extra relationships that might not be explicitly in the data (what I mentioned in the previous message by analyzing a test AQL for demogaphics). @sebastian.iancu this is related to what we are discussing about querying demographics.

pablo · 17 March 2023 23:16

Another potential simplification to both demographic and entity models are:

PARTY.details vs. identities and contacts/addresses
ENTITY.description vs. identities and contacts/addresses

I see identities and contact/address might be used in some cases, not always, and are pretty generic, also are LOCATABLE classes, which means can be archetyped.

What would happen if we drop PARTY_IDENTITY, CONTACT and ADDRESS from the model and we provide examples on how to represent those by archetyping the PARTY.details: ITEM_STRUCTURE and ENTITY.description: CLUSTER?

So identities, contacts and addresses could be just CLUSTER archetypes in both models, and be used when those are required. That removes a lot of complexity from the “fixed” model itself and is flexible enough.

The only open questions there is what to do with the CONTACT.time_validity and with the purpose() methods.

If the time_validity is needed, I think that would be something a modeler will include in the CLUSTER archetype. Then the purpose methods, could be the purpose of the corresponding archetype (from the archetype header: purpose, use, misuse, etc.)

What do you think?

PS: I might need to draw this UML

sebastian.iancu · 18 March 2023 23:09

Nice analysis @pablo, you hit few of the problems we encounter over the years implementing this Dem RM.

But before you get too enthusiastic on removing or dropping parts of this spec, I would like remind that this is a stable spec, it is in use in production (at least by Code24 and Ocean) - so any desired change should be carefully consider not only with the respect of the quality of spec design (as in “… this is not good enough, lets improve it”), but also with the respect of persisted data-instances existing on customers. In any case changes should be possible, compliant with release semver.

At Code24 we had a few major rounds of implementation while adopting this spec, including 2-3 major refactors or functional redesign, all resulting data conversions. Out of 20+ customer instances we have, the most complex exceeds 200k persons and 1m relations, some instances with even 15 versions/person on average. So my point is that we managed to make this model working (functionally and efficiently) in production, and even though it contains issues, some are just theoretical, which can be solve by being a bit more pragmatic in the implementation.

Some of our solutions (listed here from my top of my head):

we don’t use those type() and purpose() methods for the reasons you mentioned. Instead we implemented other similar methods which extracts the information from the underlying archetype (we have several type of Group or Organisation archetypes, and similarly for Address); formalization of this name/value types() is captured by archetype terminology
we ‘store’ the Role by the Person as relationship, the ACTOR.roles is a technically a runtime function, computing the list of roles; similarly we also solved ROLE.performer => which is in line with proposal above to turn this into functions in the spec.
we always ‘store’ the relationship by the source only; on the target we compute the list of reverse_relationships
in practice we never had graphs, but just trees, as everything created and used is from a context of a patient-centric view (for example: a patient instance may have a mother instance, but if in reality the mother is a also patient, then she will be another instance in the system)
we use archetyped PARTY_RELATIONSHIP, as well as ADRESS; the use of generic CLUSTERs instead of PARTY_IDENTITY, CONTACT, etc. would be from my point of view a regression, as we’ll loose ‘fixed’ semantic
we don’t persist actor’s roles, but just use generic roles (so role is not per-actor instance) - this was a design trade-off to optimize data and simplify functionality
other then issues mentioned above we didn’t had problems with the number of top level containers, neither with the amount of needed Versions and Contributions to maintain all the data.

There are few other issues with current Dem RM:

missing of a top level structure - i.e. the SYSTEM, which I mentioned in a couple of SEC meeting
unclear use-case of FOLDERs in the context of Dem RM
LOCATABLE.links [SPECPR-281] - openEHR JIRA
PARTY.reverse_relationship should be a function and would be easier to use if it would be a List <party_ref>
missing of some other resources/device type, which @thomas.beale address in his new proposal;
the new model has some issues from the past fixed, but it also introduces some new aspects and changes which I was not yet able to asses completely.

thomas.beale · 18 March 2023 23:28

The intention in any reference model is to reprsent domain invariant concepts that can be relied on in any specific instance. So if you agree that attributes called identities and contacts are potential properties of any PARTY (even if some PARTY instances don’t have them) then they belong in the RM. They don’t necessarily need their own types (i.e. PARTY_IDENTITY, CONTACT), but such types often make implementation easier. For example if PARTY_IDENTITY is a class, it can have a function as_string(): String added that will stringify the pieces of the identity. If it’s just a CLUSTER, you have create a class PARTY_IDENTITY (or however called) to put that function. CONTACT has a fixed relationship addresses. If it is just a CLUSTER this has to be fixed by archetype, and standardisation relies on global use of that archetype.

It’s always a balance of course. You can remove ‘complexity’ from the RM, but it will just be moved to the archetype space, and achieving the same standardisation relies on looser methods.

Well it depends in whether it is always potentially needed; if so, it’s better to have it in the RM, since no extra modelling effort in archetypes is needed. I don’t say that it is really needed; I am contemplating pulling it out in fact.

pablo · 19 March 2023 15:43

Hi Sebastian, I think we can do an unconstrained analysis exercise without worrying too much about the current use in productions, it is just an exercise. From that we can note issues of the current model in abstract and get input from implementers that actually use the model and not issues, challenges and design decisions from real life usage. Just after that we can have a good backlog of things we could or not consider as potential improvements to the specs. Some o those might be applicable on the current model’s revision branch, others with breaking changes will require a new major version branch.

IMO the current demographic model is stable because we didn’t dedicate too much time to review it and detect issue or improvement opportunities. Not saying this doesn’t happen from time to time, but in comparison with what we did with the EHR, AQL and REST specs, the time for demographics is just a small fraction. At this moment it just happen that I’m implementing the demographic model and I’m probably facing issues you faced years ago.

pablo · 21 March 2023 16:22

I agree common patterns should be part of the RM, but my point wasn’t really about instances but about the model itself, an example: what’s the meaning of AGENT having a CONTACT? As I understand AGENTs could be from a software system to an ultrasound machine, through vital signs monitors and laboratory sample analyzers.

In the same context, what would be the use of the languages field?

About that, I see languages as metadata not sure why it’s part of the model. In a COMPOSITION I see the importance for global interoperability, like knowing the language a clinical document is written on, but not so sure about ACTOR.languages.

About the general discussion about the current RM and the new ENTITY RM, I’m sure you and others might heard of the Universal Data Models concept from Len Silverston. Check the diagram on page 2, is kind of a generic Party model.
Universal_Data_Models_for_Health_Care.pdf (129.5 KB)

This is also really nice documentation with UMLs UCMDB CI Data Model

I think there is a huge overlap with what we are doing and these guys and I’m currently ordering a book from Len to get into the details of the Party model and other models.

thomas.beale · 21 March 2023 16:51

An Agent is any autonomous entity with agency. Any kind of agent, including a robot may have contact information. An Ultrasound machine (at least the ones we have today) is not an Agent, it’s just a Device.

Only makes sense for Agents, since they can communicate, potentially linguistically. The knowledge of language(s) is useful so as to know how to talk to a Person, or even a chatbot.

I have the books on my shelf;) And took quite a few ideas from them. One thing that they don’t have is archetypes, so they create extra types for ‘types’ of things, which we don’t have to do, because Archetypes can define the types of Actor etc.

I had not seen the UCMDB one - it’s a kind of technical ontology. Could indeed be useful.

pablo · 24 March 2023 16:56

Can we do a list of things that could be AGENTs and check if devices might have a place in the demographic/entity models? For me the difference is not clear, maybe the concept of “agency” in Spanish means some other thing I don’t fully understand.

thomas.beale · 24 March 2023 19:30

For something to be an ‘agent’ it has to be autonomous and capable of independent action, decision-making etc. Generally speaking devices do not fit this definition. So devices cannot act as Parties.

IN the draft model I have so far, they would come under ARTEFACT in the physical entity part of the model.

‘Agency’ is a common term in philosophy and psychology; see here for agencia / filosofia wikipedia page.

pablo · 24 March 2023 19:55

That is why I asked for examples, that definition fits, for instance, a person. Is an algorithm an agent? A software system? A lab analyzer?

But not all parties are agents, right? So why a device can’t be a party is not clear.

If all parties are agents, we have a big issue in our current class hierarchy.

thomas.beale · 25 March 2023 14:33

None of those things could be agents, since they can’t act autonomously, i.e. initiate acts.

A Party is either an Agent, i.e. a real world autonomous entity acting as itself, or it is a Persona (what is called Role in the current Demographic model), which is an Agent acting in some defined capacity, e.g. as a General Practitioner, licensed insurance broker and so on.

In these kinds of models, a Party cannot be a passive object.

pablo · 25 March 2023 17:37

@thomas.beale so, do you have a list of things that could be agents? You didn’t answer that

So person is an agent?

Please look at the current demographic model, I’m talking/asking about that.

thomas.beale · 25 March 2023 18:18

Ah sorry - I had forgotten that the name of Agent in the demographic model is Actor. We could use either name. For now, just assume they are synonyms. So everything I said above applies equally to ‘Actor’ in the published demographic model. Sorry for the confusion!