Semi structured narrative data

thomas.beale · 2 November 2021 17:11

This post highlights one of the challenges with moving away from fully structured/atomised text representation (the old DV_PARAGRAPH approach) to narrative-oriented representation (the DV_TEXT markdown approach).

The way to think about these two representations is that DV_PARAGRAPH is something like a post-parse AST (abstract syntax tree) representation, i.e. the kind of in-memory structured object tree that results from parsing some text into pieces.

The markdown representation is a pre-parse representation.

Narrative is nice (or at least acceptable) for authoring and reading, but bad for computing; that’s why you parse it.

Structure is great for computing, but annoying for authoring and reading.

Given that we have adopted a markdown narrative representation for lumps of text larger than a single atom - i.e. paragraphs, or ‘lines of text’ etc, we need a way to represent terminology mappings.

This could be done as links, as @ian.mcnicoll suggested below - the question is links to what? SNOMED Uris like http://snomed.info/sct/id/1234567890 can be constructed, but they won’t function like URLs, i.e. they are not really links.

Another approach would be to write them inline, to be processed by another layer, i.e. they don’t constitute markdown as such. To achieve this, some means similar to markdown linking [] has to be used to establish which words the coding applies to.

This might be something like the following:

"^dysuria^[snomedct::49650001] warranting a urinary sediment to exclude a ^UTI^[snomedct::68566005]"

If the text you want is exactly the same as the preferred term, we could do:

"[snomedct::49650001|dysuria|] warranting a ..."

But the term for 68566005 is “urinary tract infection”, not “UTI”.

The above is not super-readable, so some better choice of syntax might be possible:

"/dysuria/[snomedct::49650001] warranting a urinary sediment to exclude a /urinary traction infection/[snomedct::68566005]"

I used ‘urinary traction infection’ as the text to emphasise the fact that the delimiters (here, //) are needed to indicate the text the mapping applies to.

I’m sure someone else can do better, but what we aim to do here with this kind of solution is:

define (yet another) openEHR micro-syntax that allows terminology mappings to be represented in structured plain text that is not markdown (i.e. won’t do something weird when seen by a markdown renderer) but can be reliably parsed into a structured form.

Edit: I tested the string above in the CommonMark tester page; it comes through OK.