# Semi structured narrative data
**Category:** [Apps](https://discourse.openehr.org/c/app-dev/8)
**Created:** 2021-10-30 15:44 UTC
**Views:** 2060
**Replies:** 44
**URL:** https://discourse.openehr.org/t/semi-structured-narrative-data/2007
---
## Post #1 by @joostholslag
A lot of our clinical data is in narrative reports form. Let's say for EVALUATION.clinical_synopsis.synopsis there's a DV_TEXT that contains a paragraph of text where some words in an individual sentence could be mapped to e.g. SNOMED.
e.g. :The patient has shown signs of dysuria(http://snomedct.info/id/ 49650001>) warranting a urinary sediment to exclude a UTI (http://snomedct.info/id/68566005>). "
Currently it's only possible to map the entire DV_TEXT to a code system like SNOMED. The deprecated DV_PARAGRAPH offered the option to 'concat' multiple DV_TEXTs so that you could have separate DV_TEXTs like (syntax probably incorrect) :
1. DV_TEXT.value = "The patient has shown signs of"
2. DV_TEXT.value = "dysuria"
DV_TEXT.mapping: (http://snomedct.info/id/49650001>)
3. DV_TEXT.value = "warranting a urinary sediment to exclude a"
4. DV_TEXT.value = "UTI"
DV_TEXT.mapping: "(http://snomedct.info/id/68566005>)"
The use would be to offer the user possibly relevant similar reports or information from the EHR using SNOMED to find synonyms. Not to query based on definitive data e.g. "has UTI"
I understood DV_PARAGRAPH was deprecated in favour of markdown. But markdown doesn't support term mappings. So what is the current advice how to achieve this. And do other implementors share this problem?
I know the proper openEHR way would be to to do fully structured data where the `entry` is the smallest unit of information. So the dysuria should be a CLUSTER.symptom_sign and the UTI possibly a EVALUATION.differential_diagnoses. But I'm struggling to imagine a user interface that facilitates recording all narrative information in such a strictly structured way, without driving clinicians crazy.
(The EVALUATION.clinical_synopsis states recording (semi) structured data as misuse btw. So unless you know what your doing, don't just implement what I described above.)
---
## Post #2 by @ian.mcnicoll
My MSc was on precisely that topic!!
https://www.scribd.com/document/50351864/Supporting-Narrative-based-medicine-in-GP-systems
or at a Dropbox link https://www.dropbox.com/s/b9mr9fse3zxpvf5/50351864-Supporting-Narrative-based-medicine-in-GP-systems.pdf?dl=0Skip to page 57!!
I thin this would require something like Markdown but with custom markup. CDA had a go at something similar but I think you need go well beyond just embedding Snomed terms to embedding links to all sorts of Structured entries ( e.g prescriptions.
---
## Post #3 by @borut.jures
*I'm new to SNOMED (and many other things here) so I might be completely wrong.*
Would it be possible to use SNOMED to check against everything a clinician is typing? Computers and SNOMED APIs are fast enough to do this without a clinician noticing any delays.
If a partialy typed word maps to something in SNOMED, a full term would be offered as auto-suggestion (like when we write on a phone). A clinical can select the offered term or continue writing.
A similar checks could be done to other structured data repositories.
A similar solutions probably already exist for popular markdown web components (I've seen some but would have to search again). They would need to be adapted to use SNOMED instead of whathever they are using for their example.
---
Edit: And one day we will be able to replace "typed word" with "spoken word".
---
Edit #2: Of course the system would still have to put everything into a DV_TEXT or some other mappings but this is "trivial" after the clinician selected/indicated what he/she wants to record. My suggestion is only about the data entry type that wouldn't drive the clinicians crazy.
---
## Post #4 by @joostholslag
[quote="borut.jures, post:3, topic:2007"]
A similar solutions probably already exist for popular markdown web components (I’ve seen some but would have to search again). They would need to be adapted to use SNOMED instead of whathever they are using for their example.
[/quote]
Hi Borut, thanks for chipping in:D you're completely right, systems like what you describe exist. e.g. https://twitter.com/mwardle/status/1452920467799609345?s=20
What I'm looking for is how to store that code mapping data in an openEHR system. Preferably in a standardised way. Compared to [TERM_MAPPING](https://specifications.openehr.org/releases/RM/latest/data_types.html#_term_mapping_class) markdown only offers the inline text (`.value`) the link (`code_string`) and a text "title", not a `match`, `purpose`, `terminology_id` nor `preffered_term`.
So apart from that it would be a 3rd/4th way of doing mappings, it also has fewer features, and is non standardised. It's debatable wether a term_mapping/binding should be openEHR specific.
@edit2 If it's trivial could you give an example syntax of what a DV_TEXT instance would look like, please?
---
## Post #5 by @joostholslag
Hi Ian,
Really curious. But do I really have to pay to view the doc? What would Markdown with custom markup look like? See my comments on Boruts post why I'm not too enthousiastic about the idea.
I fully agree it should go well beyond SNOMED. I'd especially like us to solve linking to other openEHR datapoints in the EHR.
edit: Should it be possible to record a link in a DV_TEXT that is validated against [Ehr_scheme](https://specifications.openehr.org/releases/RM/latest/data_types.html#_definitions)?
---
## Post #6 by @borut.jures
[quote="joostholslag, post:4, topic:2007"]
If it’s trivial could you give an example syntax of what a DV_TEXT instance would look like, please?
[/quote]
The DV_PARAGRAPH example could be the end result of what the clinicians enter. But I'm not qualified to discuss the DV_ types (I have read about them only once so far - I'll know more after my 3rd or Nth reading).
My comment was only for the part about using markdown for data entry. Since the markdown field is free-form text, it needs to be converted into DV_TEXT or other structures.
It looked to me that you would like markdown to work the same way as DV_PARAGRAPH. My suggestion is to **separate** data entry (using free form text with auto-suggestions) and create DV_TEXT or other structures in the background. I would expect clinicians aren't too happy to "build" these structures during their data entry. They shouldn't be bothered with that if we can perform the transformation in the background.
---
## Post #7 by @joostholslag
[quote="borut.jures, post:6, topic:2007"]
My comment was only for the part about using markdown for data entry. Since the markdown field is free-form text, it needs to be converted into DV_TEXT or other structures.
It looked to me that you would like markdown to work the same way as DV_PARAGRAPH. My suggestion is to **separate** data entry (using free form text with auto-suggestions) and create DV_TEXT or other structures in the background. I would expect clinicians aren’t too happy to “build” these structures during their data entry. They shouldn’t be bothered with that if we can perform the transformation in the background.
[/quote]
A DV_TEXT.value can contain the text in markdown format, if DV_TEXT.formattign is set to `markdown`. That's not the issue. The issue is how to then query based on snomed codes, and the other context info, like what kind of match it is (exact, narrower, broader etc), how to recognize it's snomed (not loinc), the purpose of the mapping, the preferred term displayed etc. And this with the openEHR premise of a specified data format where different client apps can natively interact with different backend CDRs.
I agree you want to seperate partly data entry from storage. You probably want user confirmation on the mapping, and may want to suggest a fully structured form based on the text input. And you definately do not want to render separate text fields for the 4 data parts I previously described.
---
## Post #8 by @ian.mcnicoll
I think the only way to do this is to embed links to the relevant Entries, which contain the SNOMED codes, in the usual way.
Apart from anything else how do you know 'Essential Hypertension' in the narrative (with an embeddedSNOMED code) , is a diagnosis, and not a family history , or a reason for encounter.
The embed of the codes is the least of the problems!!
---
## Post #9 by @joostholslag
[quote="ian.mcnicoll, post:8, topic:2007"]
Apart from anything else how do you know ‘Essential Hypertension’ in the narrative (with an embeddedSNOMED code) , is a diagnosis, and not a family history , or a reason for encounter.
[/quote]
If it's just about showing related information, that the user then skims through, it's ok not to know it's one or the other. Both may be relevant. I'm not proposing decision logic here, only presenting information to the user.
> The embed of the codes is the least of the problems!!
So how would you do it?
---
## Post #10 by @ian.mcnicoll
[quote="joostholslag, post:9, topic:2007"]
it’s ok not to know it’s one or the other.
[/quote]
I don't understand why you are capturing SNOMED CT codes then. I'm not understanding clearly!!
---
## Post #11 by @joostholslag
Given that most data is currently free text narrative data: progress notes/clinical synopsis etc. It's a good feature to search through those. e.g. give me anything related to "urinary tract infection". SNOMED is really useful to query on synonyms: UTI, Bacterial urinary infection etc. etc.
If words in the free text are classified by the original author using SNOMED codes that could would make the results a lot more precise instead of string matching. So you will want to record the SNOMED code with/in the text.
The search usecase I describe is a lot less specific and quicker compared to the specific AQL queries in openEHR. Where the query should be of very high sensitivity and specificity to be fit for e.g. decision support. And thus take lot's of time and effort to get precisely right.
Maybe this screenshot from our demo app (in Dutch) will help.
---
## Post #12 by @ian.mcnicoll
In that case, why not just put some kind of markdown link in there?
I think he may have a [UTI](https://snomed.info/12345678)
---
## Post #13 by @joostholslag
Hhaha yes, glad you understood the Dutch.
Yes putting it in markdown can work, but it has several downsides, see my post above: https://discourse.openehr.org/t/semi-structured-narrative-data/2007/7?u=joostholslag
---
## Post #14 by @thomas.beale
This post highlights one of the challenges with moving away from fully structured/atomised text representation (the old DV_PARAGRAPH approach) to narrative-oriented representation (the DV_TEXT markdown approach).
The way to think about these two representations is that DV_PARAGRAPH is something like a post-parse AST (abstract syntax tree) representation, i.e. the kind of in-memory structured object tree that results from parsing some text into pieces.
The markdown representation is a pre-parse representation.
Narrative is nice (or at least acceptable) for authoring and reading, but bad for computing; that's why you parse it.
Structure is great for computing, but annoying for authoring and reading.
Given that we have adopted a markdown narrative representation for lumps of text larger than a single atom - i.e. paragraphs, or 'lines of text' etc, we need a way to represent terminology mappings.
This could be done as links, as @ian.mcnicoll suggested below - the question is links to what? SNOMED Uris like `http://snomed.info/sct/id/1234567890` can be constructed, but they won't function like URLs, i.e. they are not really links.
Another approach would be to write them inline, to be processed by another layer, i.e. they don't constitute markdown as such. To achieve this, some means similar to markdown linking `[]` has to be used to establish which words the coding applies to.
This might be something like the following:
```
"^dysuria^[snomedct::49650001] warranting a urinary sediment to exclude a ^UTI^[snomedct::68566005]"
```
If the text you want is exactly the same as the preferred term, we could do:
```
"[snomedct::49650001|dysuria|] warranting a ..."
```
But the term for 68566005 is "urinary tract infection", not "UTI".
The above is not super-readable, so some better choice of syntax might be possible:
```
"/dysuria/[snomedct::49650001] warranting a urinary sediment to exclude a /urinary traction infection/[snomedct::68566005]"
```
I used 'urinary traction infection' as the text to emphasise the fact that the delimiters (here, `//`) are needed to indicate the text the mapping applies to.
I'm sure someone else can do better, but what we aim to do here with this kind of solution is:
* define (yet another) openEHR micro-syntax that allows terminology mappings to be represented in structured plain text that is not markdown (i.e. won't do something weird when seen by a markdown renderer) but can be reliably parsed into a structured form.
Edit: I tested the string above in the [CommonMark tester page](https://spec.commonmark.org/dingus/); it comes through OK.
---
## Post #15 by @joostholslag
Thank you for your elaborate response. I also like that you proposed a syntax. It makes it easier to discuss.
What I don't like is that it's yet another micro syntax and yet another way of dealing with mappings in openEHR. And, as stated before that it's less feature rich than TERM_MAPPINGs. But mostly that it will require a custom parser for DV_TEXT data instead of just treating it as markdown. Besides the coding work (we'll have to do some anyways) it gives uglyness, if e.g. you sync the text to a non-openEHR system. You'll have to clean out the openEHR micro syntax. Or what if markdown introduces a different function for the openEHR unique syntax/
The thought that just came to mind, could we do it the other way around, instead of annotating part of the text with a mapping, could we add an attribute to the TERM_MAPPING that's part of the DV_TEXt that records what part of the text the mapping refers to?
---
## Post #16 by @thomas.beale
[quote="joostholslag, post:15, topic:2007"]
What I don’t like is that it’s yet another micro syntax and yet another way of dealing with mappings in openEHR
[/quote]
Welcome to the (horrible) world of markdown ;)
[quote="joostholslag, post:15, topic:2007"]
could we add an attribute to the TERM_MAPPING that’s part of the DV_TEXt that records what part of the text the mapping refers to?
[/quote]
Theoretically yes. You'd have a single DV_TEXT for the sentence
"dysuria warranting a urinary sediment to exclude a UTI" with TERM_MAPPINGS carrying some sort of position data like:
mappings:
* [1]
* charpos = |1..7|
* target = [snomedct::49650001]
* [2]
* charpos = |51..53|
* target = [snomedct::68566005]
That would not be super-reliable, since people would get mixed up on whether the character positions referred to the number of characters in the output, or in the input (potentially full of other markdown text like links etc).
Another possibility might be to adopt a kind of referencing/citation approach. E.g.
"dysuria[1] warranting a urinary sediment to exclude a UTI[2]"
The numbers [1] and [2] refer to the 1st and 2nd items in the `mappings` list. We assume that when there is a single word to have a mapped term, we just do "word[n]". If there are more words, then we need some delims, e.g. "dysuria[1] warranting a urinary sediment to exclude a (urinary traction infection)[2]"
We'd have to mess around to figure out which kind of brackets or other delimiters would work best, but this is at least readable, and would not even break the RM.
---
## Post #17 by @ian.mcnicoll
I'll be honest and say right way that capturing these snomed codes like this without context, and just as context-free mappings, is probably a very bad idea!!.
But (if you insist!!), why not just use something like my simple URI type of idea, and just lob the list of SNOMED codes into mappings against he whole DV_TEXT element. I don't think you really need character positions , the codes themselves align the correct positions. And if there are duplicate SNOMED codes - well so what!!
---
## Post #18 by @joostholslag
Yes, my first thought was something like the `charpos` as you described. But I agree it can get mixed up pretty easily. Maybe it could be solvable in an acceptable way by implementers, since it doesn't have to be super reliable, for today's usecase. But I also like the other possibility, since mappings are conceptually quite close to citations, and it reuses the TERM_MAPPINGS. I agree it's very well readable. Would there be a way not to break the markdown syntax: not just valid, but also something that makes sense without openEHR. There is the footnote syntax `(`[^1]` )` but it doesn't have delimiters [afai can tell](https://www.markdownguide.org/extended-syntax/). And we still have to go from footnote to TERM_MAPPING. We could put a uri in the footnote that reference the term mapping.
But how about putting an Ehr_scheme uri in the url part of the markdown link syntax?
e.g.
`"[dysuria](ehr://system_id/ehr_id/top_level_structure_locator/path_inside_top_level_structure|mapping1) warranting a [urinary sediment to exclude a [urinary tract infection] (ehr://system_id/ehr_id/top_level_structure_locator/path_inside_top_level_structure|mapping2)"`
---
## Post #19 by @joostholslag
Why do you think it's a bad idea? Whole of SNOMED is without context, right? The value of the feature is clear right, would you solve it another way?
You could be right about not needing character positions. It indeed is mostly about the report as a whole that should be highlighted in a query. But you probably still would want to highlight the matching characters in the text. But that could be done by matching again when rendering the result of the query, may not be necessary to store the characters the original match relates to. But I struggle to accept that such a simple problem can not be solved well :roll_eyes:
I'm curious for the view of others, maybe @bna has an opinion?
---
## Post #20 by @ian.mcnicoll
[dysuria](openehr_mapping://snomed.info/sct/49650001) warranting a urinary sediment to exclude a [urinary tract infection](openehr_mapping://snomed.info/sct/431309003)
Bad idea - it is very, very easy for the wrong SNOMED codes to be picked up - this is a good example because actually this is arguably not a diagnosis at all, but an indication for an investigation.
Or better example
[Painless haematuria](openehr_mapping://snomed.info/sct/ 197938001) but warrants a urinary sediment to exclude a [urinary tract infection](openehr_mapping://snomed.info/sct/431309003)
---
## Post #21 by @thomas.beale
[quote="ian.mcnicoll, post:17, topic:2007"]
capturing these snomed codes like this without context, and just as context-free mappings, is probably a very bad idea!!.
[/quote]
In passing, I should note that I agree with this.
At an HL7 meeting years ago, I was asked by the very eminent lead of a research group that had produced a product that NLP-processed text notes and added SNOMED codes to them, each code connected to specific words in the text, to throw him an example.
I proposed: 'patient expresses a fear of lung cancer'.
The software, hitherto having been tested on thousands of notes at Mayo clinic supposedly without error coded for 'lung cancer'.
When it should have coded for anxiety.
I rest my case ;)
---
## Post #22 by @bna
This is a very interesting topic which we have visited many times over the last decade. Currently we are doing work on NLP capabilities for a smart editor. We call it "EHR Notes". The EHR could be a metaphor for "air" and also the EHR. What we want to achieve is an editing capability which feels as lightweight as air, and still have the power to detect structure in the content.
The editor work in the space above openEHR data. Since the content might address any type of clinical concept it will have to be able to inject any type of archetype/template/clinical model. I.e. Patient admitted with pain in left kne. Temp. 37 C, BP: 120/80.
We've learned from many research programs that the training of AI robots take lots of time and resources. When finished they only cover specific domains within health and care. Lessons learned from this is that the editor must be able to support different types of robots (functions, etc.). This is why we are exploring a way to define a generic API which takes a corpus as input and a structured result as output. The output will be handled by the editor to add links in the content and also create structured content for the openEHR CDR.
As many has commented above; this is a very complex problem and the specificity of the NLP functions are not that good. This is why we think on them as assistants. They are the newly educated doctor and you should treat them as such. The output from an NLP function should be considered as a potential advice. We have to let the clinician be the one to decide if the advice is to be used.
I will publish a video showing this kind of features later.
As part of this work I implemented a very simple NLP service. The source code is here: https://github.com/bjornna/ehrnotes-ask
The NLP engine is based on SpaCY and is trained to do NER (named entity recognition). There are multiple input sources like: SNOMED-CT for anatomy, ICNP with its axis, a medication list, etc.
Currently we are involved in research programs to work out improved NLP functions. They will be trained and developed by real NLP experts. The source code above is done by an amateur (me). Still it works reasonably well.
---
## Post #23 by @bna
I finally found time to create and publish a video on our NLP based EHR Notes: https://twitter.com/bjornna/status/1456359961383026694?s=20
---
## Post #24 by @joostholslag
Hi Thomas,
As Ian remarked, the example I gave intentionally has an ambiguous SNOMEd coding. Probably 314940005 Suspected urinary tract infection (situation) would have been better. But my point is, since snomed is only terminology, not information, an archetype indeed is the smalles unit of information. So when building queries you can never draw conclusions/compute based on only the snomed code. You'll need to check the snomed code is recorded as part of an EVALUATION.problemd_diagnosis to compute that the patient has a UTI. But you can suggest to the user a certain narrative report has 'something do do with' UTI. In this case it's not a big problem that the wrong snomed code was recorded. And there still is a lot of value here, right? Or how would you solve the problem I've shown with the prototype screenshot?
---
## Post #25 by @joostholslag
Hi Bjørn, thanks for the great prototype on EHR Notes, this is very similar to what we have in mind. Could you please share a bit more about how you technically record the mappings from the plain text in the note, to ICNP and openEHR OBSERVATIONs?
---
## Post #26 by @thomas.beale
[quote="joostholslag, post:24, topic:2007"]
You’ll need to check the snomed code is recorded as part of an EVALUATION.problemd_diagnosis to compute that the patient has a UTI. But you can suggest to the user a certain narrative report has ‘something do do with’ UTI
[/quote]
Yes, within openEHR system environments whose software and models were written by semantically conscious people, this is all pretty reasonable.
Just be aware that when the data get sucked into some other environment, users there may make the assumption that the codes embedded in the data express the whole and true semantics of the data. If they do, incautious use of codes in our nice openEHR environment may have unintended consequences later.
This is not to say don't do it; just that this is the kind of risk being run. It might be a low / no risk.
---
## Post #27 by @joostholslag
Aah yes, that’s a valid concern. And another reason I hate mapping to outside systems. Assumptions that make sense in one system are crazy dangerous in another. I hope we can agree that in an openEHR system mapping (a piece of) a DV_TEXT in a EVALUATION.clinical_synopsis.synopsis to a snomed clinical finding, it’s not a diagnosis. (And I would argue the same for other SNOMED uses, a terminology is not a fully computable information system, why else would we need openEHR archetypes.)
Then I’m willing to take the risk other people do something stupid.
But having said that. Could it help to add a char to TERM_MAPPING.match indicating an approximate match, for example:`~`? This would make the intention of the mapping even clearer in openEHR. And if we would do uri like Ian suggested by markdown url with protocol set to `openehr_mapping:://` there is an indication for an implementer in another system to have a look at the information in the openEHR TERM_MAPPING class and the `~` match should be a second warning not too issued too much.
---
## Post #28 by @mikael
No. Most parts of SNOMED CT has a context, although the context often is expressed as a default context. See for example [6.2.3. Default Context - Search and Data Entry Guide](https://confluence.ihtsdotools.org/display/DOCSEARCH/6.2.3.+Default+Context).
Btw, I think that the entire [SNOMED CT Search and Data Entry Guide](https://confluence.ihtsdotools.org/display/DOCSEARCH/SNOMED+CT+Search+and+Data+Entry+Guide) would be interesting for this discussion.
---
## Post #29 by @mikael
I think that you oversimplify SNOMED CT here. SNOMED CT is both a terminology and an ontology and you can therefore perfectly well interpret and draw conclusions based on the meaning of a SNOMED CT concept.
---
## Post #30 by @ian.mcnicoll
Hi Mikael,
I agree - my concern was not so much about the power of SNOMED CT itself but the ability of any NLP to correctly pick up the appropriate context and apply it, or associate other parts of attribution like dates. I know there has been a lot of interest in this approach and I have a UK colleague working on it - I'll see if I can get him to do a demo of their narrative-> SNOMED CT solution
---
## Post #31 by @joostholslag
Hi @mikael , interesting, I didn't know about snomed default contexts. Thank you for educating me.
I read the default context for a finding (e.g. UTI) to be:
```
* The finding has actually occurred (vs. being absent or not found).
* It is occurring to the subject of the record (the patient).
* It is occurring currently or at a stated past time.
```
But this still leaves a lot of context out to be able to you need to programatically conclude a patient 'has' a UTI. e.g. is it a diagnosis? who made the diagnosis (doctor/nurse/neighbour/facebook)? Is the diagnosis clinically significant or just a mild bacteriuria. etc. etc.
Otherwise we wouldn't need information models at all, right?
The downside of this default context is that terms that do not match that context 'family history of UTI' are not codedable in snomed.
The search and data entry guide sure seems interesting. Any recomandation how to approach it? Aside from start at page 1 and spent multiple weekend days before you end up at page 65? (a)
I do now better appreciated Ian's concern about automated snomed encoding. But this is also goes for average users, they won't understand default context, which means the scope of usage of snomed is much smaller than I hoped.
---
## Post #32 by @mikael
Yes @ian.mcnicoll, I know that there are good NLP solutions that can tag text with SNOMED CT concepts and I agree that it is capturing the context that is the hart(est) part of the process.
However, I have also seen less than good NLP solutions for SNOMED CT tagging that have missed the default context and similar SNOMED CT features. Hence my comments.
---
## Post #33 by @thomas.beale
[quote="mikael, post:32, topic:2007"]
However, I have also seen less than good NLP solutions for SNOMED CT tagging that have missed the default context and similar SNOMED CT features. Hence my comments
[/quote]
Keep your comments coming - I think we are somewhat out of date on some aspects of recent Snomed technology, so don't be afraid to correct us.
---
## Post #34 by @mikael
Hi @joostholslag,
Yes, you have understood the default context correct. I also agree that SNOMED CT, despite the default context, leaves quite much to the information model to specify.
It is also perfectly fine to override the default context of the clinical findings and procedure concepts in SNOMED CT. It is therefore the context is only a default and not a stated context. However, I would argue that the override needs to be done in some machine readable format. It would be perfectly fine to inside a `Family history` attribute in an archetype use the SNOMED CT concept `254837009 | Malignant neoplasm of breast (disorder) |` and it would be formally interpreted as `Family history of malignant neoplasm of breast`. However I would strongly advice against in free text do some tagging like
`The patient has a family history of malignant neoplasm of breast.`
Because then the override of the default context would not be stored in a machine readable format and information systems would then, for good reasons, assume that the default context is present. This is the main reason why I think that we should be very careful with allowing partial tagging of free text.
It is true that quite few `Family history of X` exists as stand alone concepts in SNOMED CT. (Currently there is 680 of them. :smiley:) However, it is possible to use the [SNOMED CT Compositional Grammar](https://confluence.ihtsdotools.org/display/DOCSCG) to express `Family history of X` with a post-coordinated expression, like
```
416471007 | Family history of clinical finding (situation) | :
246090004 |Associated finding (attribute)| = 254837009 |Malignant neoplasm of breast (disorder)|
```
for all clinical findings and procedures.
(In this specific case, there actually exists a SNOMED CT concept that express `429740004 | Family history of malignant neoplasm of breast (situation) |` and a classifier would automatically understand that this concept is semantically equivalent with the post-coordinated expression above.)
I haven't read the Search and Data Entry Guide for a while, but I think that chapter [ 6. Data Entry](https://confluence.ihtsdotools.org/display/DOCSEARCH/6.+Data+Entry) is the most relevant for this use case.
---
## Post #35 by @joostholslag
Hi Mikael, this helps a lot for me to better understand snomed. And it’s valuable advice to be careful with nlp tagging of free text. I understand the issue you present. But I’m curious what precautions do you take (on query, or otherwise) to let the query understand not to return a breast cancer if it’s in a family history archetype, do you use AQL, to filter only snomed findings in problem/diagnosis archetypes? If so we could do the same for clinical synopsis archetype, right?
And could we use the snomed composition grammar to do proper snomed encoding of free text with NLP?
I’m curious of actual use for querying datasets using snomed, it’s even harder than I thought to pick the right code. And I assume many errors are made in implementing systems? I’m quite sceptical our implementation is reliable now that I learn more.
---
## Post #36 by @mikael
Hi @joostholslag,
I am happy to help.
>But I’m curious what precautions do you take (on query, or otherwise) to let the query understand not to return a breast cancer if it’s in a family history archetype, do you use AQL, to filter only snomed findings in problem/diagnosis archetypes? If so we could do the same for clinical synopsis archetype, right?
My view is that a combination of AQL and [SNOMED CT Expression Constraint Language](https://confluence.ihtsdotools.org/display/DOCECL) is a good combination to query these kinds of content. And it can be used for all kinds of situations where the archetype specify a specific context, including clinical synopsis.
> And could we use the snomed composition grammar to do proper snomed encoding of free text with NLP?
Yes, as long as we are careful. :slight_smile: If the example above would be changed to
`
The patient has a family history of malignant neoplasm of breast
.`
it would have been a correct free text tagging.
> I’m curious of actual use for querying datasets using snomed, it’s even harder than I thought to pick the right code. And I assume many errors are made in implementing systems? I’m quite sceptical our implementation is reliable now that I learn more.
Well, as usual in the healthcare sector, I assume that you need dedicated people with good knowledge about each method you use. But that also apply to openEHR. :slight_smile:
---
## Post #37 by @erik.sundvall
[quote="joostholslag, post:1, topic:2007"]
I know the proper openEHR way would be to to do fully structured data where the `entry` is the smallest unit of information. So the dysuria should be a CLUSTER.symptom_sign and the UTI possibly a EVALUATION.differential_diagnoses. But I’m struggling to imagine a user interface that facilitates recording all narrative information in such a strictly structured way, without driving clinicians crazy.
[/quote]
Have a look at the PEN&PAD user interface from the 90's and pair it with currently available voice recognition and some other context-aware AI. Also look at the generated text summary in the upper right corner of PEN&PAD. https://youtu.be/PGEAmJJ4frU (Demo starts at 11:25)
---
## Post #38 by @joostholslag
Finally found time to read chapter 6 of the snomed guide. It now makes much more sense to me. The key takeaway for me is that the soft default context can be overruled by the information model, but must be computer processable. So a findings default context can be overruled by using it in a family history archetype. But it can’t be overruled from free texts ppm since computers cannot be assumed to understand that. I do hope all snomed implementers are aware of this, and they don’t just collect a list off all snomed codes for a patient in a single db column (without specifying the information model context. )
---
## Post #39 by @nuno.abreu
HI
You only can override the default context, if the adopted context is compatible. In “Family History “, clinical findings are not stated that had occurred in the patient, so the meaning of the concept used can be critically affected. The [editorial guide](https://confluence.ihtsdotools.org/display/DOCEG/Situation+with+Explicit+Context+Modeling) has a reference explaining this situation.

---
## Post #40 by @mikael
Thank you! (I haven't read the Editorial guide for a while, so I didn't know that it had been clarified.)
---
## Post #41 by @joostholslag
Dear Nuno, thank you from your reply.
I’m struggling to understand the piece of the editorial guideline. Am I to understand that contrary to earlier conclusions by @mikael and me from the snomed data search and entry guide, it’s not acceptable to record findings in a family history field?
---
## Post #42 by @mikael
[quote="joostholslag, post:41, topic:2007, full:true"]
Am I to understand that contrary to earlier conclusions by @mikael and me from the snomed data search and entry guide, it’s not acceptable to record findings in a family history field?
[/quote]
No. It seems only to be a more complicated way of expressing what we already had concluded.
---
## Post #43 by @mikael
Maybe @joostholslag and others would be interested in this free webinar about "SNOMED CT Terminology Binding - a state-of-the Art Review with Recommendations for Practice and Research" that is scheduled at 2022-01-19 15:00 UTC? More information can be found at the page [SNOMED - Events](https://www.snomed.org/news-and-events/events/web-series) if you scroll down to the "Upcoming Research Webinars" heading.
---
## Post #44 by @nuno.abreu
Hi @joostholslag
It depends how you want to use it. You can use it as value set for the user to pic, but the expression record as to be similar to the example show by @mikael with the composition grammar. I think we should use precoordinated terms if available, but it´s virtually impossible cover all the needs. So, having the possibility to use compositional grammar seems a good way to go.
---
## Post #45 by @joostholslag
[quote="thomas.beale, post:16, topic:2007"]
mappings:
* [1]
* charpos = |1…7|
* target = [snomedct::49650001]
* [2]
* charpos = |51…53|
* target = [snomedct::68566005]
[/quote]
Just came across this: Amazon does something similar in their snomed omop API:
> * **BeginOffset and EndOffset** –The beginning and ending location of the text in the input note, respectively.
https://aws.amazon.com/blogs/machine-learning/clinical-text-mining-using-the-amazon-comprehend-medical-new-snomed-ct-api/
---
**Canonical:** https://discourse.openehr.org/t/semi-structured-narrative-data/2007
**Original content:** https://discourse.openehr.org/t/semi-structured-narrative-data/2007