Hi all,
Over the past weeks I’ve been working on a set of changes to specifications.openehr.org aimed at making the openEHR specifications easier for LLMs, AI assistants, and agent-based tools to read, cite, and reason about — while at the same time reducing load and token usage on the hosting side.
Why bother
When someone asks ChatGPT, Claude, Gemini or a coding agent a question about openEHR, those assistants end up fetching and paraphrasing from this very site. A cleaner surface means more accurate citations back to the normative specs — assistants that quote us correctly rather than hallucinate around the edges. There’s also a hosting angle: AI systems typically issue 10–16 parallel sub-queries per user question, and for a small standards site that traffic adds up fast.
What changed
- Robots and sitemap tuning. Per-bot crawl delays for the major AI crawlers, and proper
<priority>/<changefreq>signals in the sitemap so crawlers understand that released specifications never change once published. - An
/llms.txtindex. Following the emerging llmstxt.org convention — still a proposal rather than a formal standard, but already honoured by most of the major engines — a curated plain-text map of the site is now atspecifications.openehr.org/llms.txt. - A Markdown representation of every spec page. Append
.mdto any spec URL, or sendAccept: text/markdown. That cuts per-page token cost by roughly 80% compared with the rendered HTML. Class attribute and function tables remain HTML-only for now;/api/classes.jsonresolves any class name to its authoritative location. - Structured JSON APIs. Small, cache-friendly endpoints under
/api/(components.json,classes.json,releases.json) for tool-builders, MCP servers, and RAG pipelines that want the catalogue without scraping HTML.
None of this changes what we see as human readers, and the normative specification content is untouched.
There are a few more features planned on this AI domain, also as we move to Antora based rendering, but they will come a bit later.
Feedback welcome. Two questions in particular:
- If you work on tooling — MCP servers, copilots, openEHR SDKs, internal RAG systems — would these representations be useful as they stand? What else would help?
- If you’ve caught an AI assistant giving an inaccurate answer about the openEHR specifications, please share the example. Concrete cases are genuinely useful for evaluating whether these changes actually move the needle.
Best regards,
Sebastian