openEHR MCP Server - a plugin to connect LLMs with CDR's

odeak · 26 May 2025 10:25

Dear openEHR community,

I would like to announce an open-source Model Context Protocol (MCP) server for openEHR that I developed as part of my learning journey at @Sidharth_Ramesh 's excellent openEHR bootcamp:

MCP is an emerging standard for connecting external systems with large language models (LLMs), often described as the “USB-C port for AI”, and has demonstrated significant adoption in recent months (see https://modelcontextprotocol.io/ for more information).

The openEHR MCP Server enables composition creation for any template available on the server. Here is an example conversation using a simple vital signs template I used for testing:

https://claude.ai/share/787dbeb3-56df-40d7-a2e7-32614b84ea3f

@joostholslag discovered this work by chance and suggested bringing it to the community’s attention.

While this remains alpha-stage software, I was surprised by the effectiveness of the LLM-MCP integration (tested with Claude so far) in generating compositions from natural language based on template data and example compositions obtained from the EHRbase server.

I welcome your feedback, particularly if you’re able to test the implementation. Several development directions are possible, as already discussed with Joost:

AQL support (though we agreed that starting with stored queries would be safest)
Complete mapping of openEHR OpenAPI endpoints
Support for additional CDRs

Looking forward to your thoughts and contributions.
Oliver

ian.mcnicoll · 26 May 2025 15:17

Thanks Oliver,

Really interesting ot see how these new technologies might come into our world.

If anyone is unclear *as I was), how this might be used, take a look at https://claude.ai/share/787dbeb3-56df-40d7-a2e7-32614b84ea3f

If nothing else, This is a really great training tool, allowing someone to get started quickly and ‘tweak’ the resulting code.

The idea of generating examples with natural language is quite appealing - hooking up to SNOMED via FHIR Terminology services might be another extension that would help here.

We can definitely hook this up to multiple CDRS. There is already quite a high degree of conformance between ehrBase and Better CDRs, with just a few fairly minor discrepancies .

joostholslag · 26 May 2025 15:20

One question @odeak asked me was whether he should support other formats than flat webtemplate for compositions. We concluded that this is probably the easiest format for LLMs to work with. And other clients could ask the CDR to return the composition in another format. But I’d be interested in dissenting opinions.

pablo · 26 May 2025 15:58

In our approach to generate examples programmatically we use constructors for different RM types that consider the OPT constraints for that object. Then with the RM instance generated we just use a serializer to get the final format.

I think if Claude, instead of generating the final resource, it can generate the code that creates the RM instance, then we can output any available format by just hooking up the corresponding serializer.

I’m that approach, Claude needs to know the RM and the OPT models, though it will depend on a specific RM SDK implementation. I think there are two or three implementations being maintained:

openEHR SDK GitHub - CaboLabs/openEHR-SDK: Java/Groovy Support of openEHR Operational Templates, Reference Model, Data Generators and other tools for www.CaboLabs.com projects
Archie GitHub - openEHR/archie: OpenEHR library implementing ADL 2, AOM 2, BMM, RM 1.0.4 and many tools
ehrbase SDK (not sure if it uses Archie for the RM part) GitHub - ehrbase/openEHR_SDK: A SDK to facilitate the development of openEHR applications

borut.jures · 26 May 2025 15:58

I would always prefer canonical JSON. LLMs are great with JSON.

It would be interesting to test the same MCP server with canonical JSON. We might be surprised that it works even better.

@odeak Nice work. The best part is that you did it during a bootcamp
@Sidharth_Ramesh must have a great bootcamp if this is what is the result of it

odeak · 26 May 2025 17:24

Thanks for the feedback, Ian, adding terminology support to the list!

odeak · 26 May 2025 17:39

Thank you for the ideas and links, @pablo, I will need to dig a bit deeper.

Related to this I have looking at other LLMs on the list, especially smaller ones that can run locally, which brings additional complexities such as ensuring structured outputs, fine-tuning on RM etc. as well as infrastructure code for connecting both sides into the picture.

odeak · 26 May 2025 17:50

Thanks for the suggestion @borut.jures, this could be done as a configuration setting to the MCP server to use flat vs canonical (JSON only for now).

pablo · 26 May 2025 19:34

I think if you feed the RM class model implementation to the LLM-MCP (for instance openEHR-SDK/src/main/groovy/com/cabolabs/openehr/rm_1_0_2 at master · CaboLabs/openEHR-SDK · GitHub), the model will be able to create valid instances of the openEHR RM.

Though the difficult part would be to feed the ADL/OPT constraints so the generated RM instance can comply with the OPT rules (be semantically valid). Then the serialization format to whatever format (json canonical, xml canonical, flat, structured, etc) is just two lines of code.

For the OPT part, I’m guessing the LLM should know about the OPT model (openEHR-SDK/src/main/groovy/com/cabolabs/openehr/opt/model at master · CaboLabs/openEHR-SDK · GitHub), then providing specific OPTs might be understood to the LLM so it knows which constraints apply to each node of the RM. Though it will depend on the capabilities of the LLM to understand that OPT is a model of constraints over the RM and how those constraints apply to it, but it might be possible to get some results. If this is too complex for the LLM, an alternative is to explain the OPT constraints over the RM as OCL rules (object constraint language), which is common when working with object oriented models and UML.

Hope you can get some results to share!

birger.haarbrandt · 26 May 2025 20:22

Hi @odeak,

just wanted to thank you for sharing, it always makes my day when people are building cool stuff on openEHR and EHRbase

Sidharth_Ramesh · 28 May 2025 09:00

Excellent to see this here @odeak!
I was watching this video, and it reminded me of your work:

You’ve done really well to abstract away “actions” from just the usual openEHR REST API. I think this pattern works really well with MCP-like protocols.

And I’ve seen LLMs struggle with canonical and OPTs because some of my devs also struggle with these

AQL writes that simplifies the data models even further, FTW? @sebastian.iancu

SevKohler · 28 May 2025 09:51

The flats are short term easy, but the moment you want to apply that to other use-cases other languages they become a problem, since the output fields result out of the template names which users change.
So in your domain easy, the moment you want to transfer learn or apply in another country its questionable if the LLM can abstract that logic (in a clean way ), there is also specific parts of the RM missing in flats on top.
Canonicals are in that regard much better, we are testing currently with both.
The moment you will have to verify your net on another dataset problems may appear.

odeak · 28 May 2025 15:09

I have pushed a new version that also supports the canonical format (in json).

A quick test with my vitals sign template and the prompt template of the mcp server showed that this works, albeit slower with Claude to populate a composition (77secs vs 45 secs for a single test), whilst the first part to process the prompt template and generate the example composition was more or less the same (25secs vs 30secs). More details can be found here:

github.com/deak-ai/openehr-mcp-server

Support both simplified and canonical JSON formats

opened 05:30PM - 27 May 25 UTC

closed 03:00PM - 28 May 25 UTC

cto-deak-ai

*As a user of the openehr-mcp-server I can configure the json serialization form…ats to be used* The current implementation only works with the simplified data templates (see https://specifications.openehr.org/releases/ITS-REST/latest/simplified_data_template.html), specifically the json web template and creates compositions using the flat representation. This issue will enhance the mcp server to allow for the configuration of the json formats to be used as follows: "jsonFormat": "canonical" | "wt_flat" | "wt_structured" canonical: consistently use Accept header "application/json" wt_flat: application/openehr.wt+json to obtain the template and application/openehr.wt.flat+json for compositions wt_structured: same as wt_flat for the template and application/openehr.wt.structured+json for compositions Testing will need to show which versions work best with LLMs. If jsonFormat is not specified, the default will be "wt_flat" as this is the current implementation. Only if canonical turns out to be better based on testing should the default behavior change.

joostholslag · 28 May 2025 17:30

Do they struggle to understand the OPT model itself, or the semantics in the archetype and template?

Abstracting away the first with AQL insert seems great, abstracting away the second problem is crazy scary.

Sidharth_Ramesh · 28 May 2025 17:44

It’s more of the semantics of the RM and the complexity of the things around the semantic models. The models, archetypes and templates are usually super simple to understand.

Sidharth_Ramesh · 28 May 2025 17:52

Yeah I can see that problem with the flat format.

I think about how most of the world’s software is in English. The variables, functions etc. Much of this is authored by programmers whose native tongue is not English.

Shouldn’t templates and archetypes also be such a technical artefact? Default to English when rendering paths, and as a fallback use other languages for the paths?

LLMs are actually pretty good at translating and understanding what things mean.

But the translation / flat path problem is real though. Maybe this is why node names should have aliases in English (like variable names in code)? Think this is something ADL2 was proposing.

Just thinking out loud here.

ian.mcnicoll · 28 May 2025 18:21

Short node names in English - yes, but this does introduce a real burden for clinical modellers - see the related discussion on PROMS node names Natural language based FLAT format and long container/element names - #20 by ian.mcnicoll

joostholslag · 29 May 2025 05:12

Check. Which RM semantics are difficult?
The semantics of compositon with an event context containing data conforming to a template is quite key. Technically possible to abstract away with sensible defaults, but that will easily mess things up. You need to understand as a developer when the defaults should be overwritten. So you have to understand the compostion model very well in order to produce clinically safe data.

The contribution and version change model I imagine could also be a struggle to understand. But this is usually abstracted away by the CDR quite well, right?

There’s also complexity in the BASE and AOM, but I guess that’s not too relevant for developers working with data and existing templates?

joostholslag · 29 May 2025 05:14

Aliases is ok. But English as native node ids is also an issue for versioning. Suddenly a typo correction in an element name becomes a breaking change in an archetype..
(Can someone move this discussion to a different topic, I think we’ve gone beyond the original topic)

erik.sundvall · 19 June 2025 15:37

Another useful API to write an MCP frontend for could be the CKMs API: https://ckm.openehr.org/ckm/rest-doc/

Then one e.g. could just ask an LLM to find all archetypes that erraneously has an at-code on EVENT_CONTEXT (see The good, the bad and the "Wat?" of current simplified FLAT/SimSDT openEHR exchange format - #23 by joshua.grisham) instad of looking through them manually and potentially miss some