# Markdown in archetype definitions? **Category:** [Specifications](https://discourse.openehr.org/c/specifications/6) **Created:** 2025-01-09 10:17 UTC **Views:** 286 **Replies:** 22 **URL:** https://discourse.openehr.org/t/markdown-in-archetype-definitions/6133 --- ## Post #1 by @siljelb Hi all! Are there any plans for including markdown in archetype definitions? In general it would be very useful to be able to format the larger text blocks of a lot of archetypes, but my main use case right now is to be able to use subscript for things other than numbers. For example in cancer staging with molecular classification, the molecular classification part is often styled in subscript, for example "IAmPOLEmut". --- ## Post #2 by @joostholslag Anything inside a dv_text can be markdown.Do you mean the name of an element? That inherits from locatable, the name is dv_text, so markdown should be possible right? --- ## Post #3 by @siljelb In the archetype *definition*, not in the data. In my example it would be in the description or comment of a data element, or in 'Use'. I don't think it's a good idea to use markdown in a data element *name* 😅 --- ## Post #4 by @ian.mcnicoll It was discussed before with mixed views. I think it is a good idea, especially if we stick the to Github Markdown limits used by FHIR (no tables). The main issue would clearly be in tooling - @sebastian.garde and I discussed the impact and it is likely to be dobable in CKM without too much disruption, especially if as an interim, we flagged an archetype as potentially containing Markdown in the descriptions/ comments etc. I think it is particularly useful in PROMS to be able to add simple formatting like Bold/ italics etc to help keep aligned with formatting rules. --- ## Post #5 by @siljelb I'd be happy with the [Github limits](https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax) :raised_hands: --- ## Post #6 by @sebastian.garde For DV_TEXT the specs says that for markdown the "use of CommonMark [is] strongly recommended". GitHub Flavoured Markdown is a bit more powerful - it has a few extensions (such as tables, strikethrough, autolinks). From FHIR https://build.fhir.org/datatypes.html#markdown: > ...requires and uses the [GFM (Github Flavored Markdown) ![icon](upload://5uzmwglWqjdLoOjvqDHQAuIKsdb.png)](https://github.github.com/gfm/) extensions on [CommonMark ![icon](upload://5uzmwglWqjdLoOjvqDHQAuIKsdb.png)](http://spec.commonmark.org/0.28/) format, with the exception of support for inline HTML which is not supported. I guess nobody is stopping anyone from putting markdown in the archetype in these places (yes, probably not in the data element name :-) !) It is just a matter of how this is interpreted and rendered everywhere. But even if it is just rendered as plain text by all or some tooling at the moment, it may be ok. There are likely some edge cases where things may go wrong if special chars such as *, &, #, [, <, >, and ' are used and we don't know for sure if this is meant to be markdown or plain text. Whether tables, strikethrough etc. or not, I certainly agree with FHIR that inline HTML elements should not be used. Tables in use, purpose, misuse etc. may be ok, in ontology/description or comment fields they may be a bit over the top (also considering typical screen real estate). --- ## Post #7 by @siljelb I don't think tables, headings or strikethrough are particularly relevant, and underline is IMHO a legacy thing from before we had bold and italic. I think the following are likely relevant though: 1. Numbered lists * Bullet lists *italic text* **bold text** ***bold italic text*** superscript subscript [links](https://openehr.org) `inline preformatted text` The only ones of these that require html tags are subscript and superscript. Could those be allowed specifically, without allowing any other html tags? --- ## Post #8 by @ian.mcnicoll The other html tag might be underline - quite common in PROMS documents. It is probably not a good idea visually because of the confusion with links but we can;t control what the PROMS authors write. --- ## Post #9 by @grahamegrieve there's lot of legacy italics and underlining out there, so the GFM extensions are useful. With regard to markdown, special chars such as *, &, #, [, <, >, and ’ appear in some units etc, and there's been some problems in the FHIR ecosystem with people seeing something that might be markdown, and not being sure. I wrote this routine for use in the java implementation community: ```java /** * Returns true if this is intended to be processed as markdown * * this is guess, based on textual analysis of the content. * * Uses of this routine: * In general, the main use of this is to decide to escape the string so erroneous markdown processing doesn't munge characters * If it's a plain string, and it's being put into something that's markdown, then you should escape the content * If it's markdown, but you're not sure whether to process it as markdown * * The underlying problem is that markdown processing plain strings is problematic because some technical characters might * get lost. So it's good to escape them... but if it's meant to be markdown, then it'll get trashed. * * This method works by looking for character patterns that are unlikely to occur outside markdown - but it's still only unlikely * * @param content * @return */ // todo: dialect dependency? public boolean isProbablyMarkdown(String content, boolean mdIfParagrapghs) { if (content == null) { return false; } if (mdIfParagrapghs && content.contains("\n")) { return true; } String[] lines = content.split("\\r?\\n"); for (String s : lines) { if (s.startsWith("* ") || isHeading(s) || s.startsWith("1. ") || s.startsWith(" ")) { return true; } if (s.contains("```") || s.contains("~~~") || s.contains("[[[")) { return true; } if (hasLink(s)) { return true; } if (hasTextSpecial(s, '*') || hasTextSpecial(s, '_') ) { return true; } } return false; } private boolean isHeading(String s) { if (s.length() > 7 && s.startsWith("###### ") && !Character.isWhitespace(s.charAt(7))) { return true; } if (s.length() > 6 && s.startsWith("##### ") && !Character.isWhitespace(s.charAt(6))) { return true; } if (s.length() > 5 && s.startsWith("#### ") && !Character.isWhitespace(s.charAt(5))) { return true; } if (s.length() > 4 && s.startsWith("### ") && !Character.isWhitespace(s.charAt(4))) { return true; } if (s.length() > 3 && s.startsWith("## ") && !Character.isWhitespace(s.charAt(3))) { return true; } // // not sure about this one. # [string] is something that could easily arise in non-markdown, // so this appearing isn't enough to call it markdown // // if (s.length() > 2 && s.startsWith("# ") && !Character.isWhitespace(s.charAt(2))) { // return true; // } return false; } private boolean hasLink(String s) { int left = -1; int mid = -1; for (int i = 0; i < s.length(); i++) { char c = s.charAt(i); if (c == '[') { mid = -1; left = i; } else if (left > -1 && i < s.length()-1 && c == ']' && s.charAt(i+1) == '(') { mid = i; } else if (left > -1 && c == ']') { left = -1; } else if (left > -1 && mid > -1 && c == ')') { return true; } else if (mid > -1 && c == '[' || c == ']' || (c == '(' && i > mid+1)) { left = -1; mid = -1; } } // Detect autolinks, which should start with a scheme, followed by a colon, followed by some content. Whitespace // is not allowed and for practical purposes, the scheme is considered to consist of lowercase ASCII characters // only. Pattern autolinkPattern = Pattern.compile("<[a-z]+:[^\\s]+>"); Matcher autolinkMatcher = autolinkPattern.matcher(s); return autolinkMatcher.find(); } private boolean hasTextSpecial(String s, char c) { boolean second = false; for (int i = 0; i < s.length(); i++) { char prev = i == 0 ? ' ' : s.charAt(i-1); char next = i < s.length() - 1 ? s.charAt(i+1) : ' '; if (s.charAt(i) != c) { // nothing } else if (second) { if (Character.isWhitespace(next) && (isPunctation(prev) || Character.isLetterOrDigit(prev))) { return true; } second = false; } else { if (Character.isWhitespace(prev) && (isPunctation(next) || Character.isLetterOrDigit(next))) { second = true; } } } return false; } private boolean isPunctation(char ch) { return Utilities.existsInList(ch, '.', ',', '!', '?'); } ``` --- ## Post #10 by @ian.mcnicoll @sebastian.garde and I had a wee look at this issue and the latest GFM does allow some limited HTML tags to support underline, subscript, superscript. Supporting MD technically in tooling is not particularly an issue, and if more complex constructs like Tables are excluded, any MD text will still remain very human readable. As @grahamegrieve has said in his pst, the most likely area of confusion is where a new or existing archetype has some characters which mimic MD markup but incompletely e.g `**hello` which will confuse MD parser which is looking for the terminating ** to signify bold. Perhaps we could introduce support, as follows.. 1. A new tag in top-level other_details to signify `usesMarkdown` 2. Restrict use, for now, to Use, Misue, Purpose 3. Add a new node-level annotation that can be used for licensed PROMS text but which allows markdown, and if present, is used in tooling/ UI instead of description. We can also almost certainly run current CKM archetypes through markdown validation to see which are potentially problematic, and in future, I think MD should be the default for tooling i.e use of MD reserved characters is flagged. I should add that MD should never be supported in Node name/values --- ## Post #11 by @siljelb [quote="ian.mcnicoll, post:10, topic:6133"] MD should never be supported in Node name/values [/quote] Is this another limitation resulting from the flat json format? --- ## Post #12 by @ian.mcnicoll In part yes, but what ever the merits of FLAT format or otherwise, I feel we need to keep the node name as being brief and 'semi-technical' to allow it to be tokenised to a more technical format. AQL node name aliases are another use-case. Do we allow Tables or Links in Node names? If there's a need to somehow support e,g bold or italic, for UI purposes , I would far prefer to use a separate annotation that could be picked up by UI tools. --- ## Post #13 by @grahamegrieve You also said node value (as opposed to node name) which was more of a surprise, and I don't think you've answered that part --- ## Post #14 by @ian.mcnicoll I was not wrong, just confusing!! The attribute we are talking about is actually ELEMENT/name/value which is inherited from LOCATABLE. https://neoehr.com/openehr/uml/rm110/c/element ELEMENT/name/value carries the textual representation of the node name. whereas the ELEMENT value is carried at ELEMENT/value --- ## Post #15 by @thomas.beale [quote="siljelb, post:1, topic:6133"] Are there any plans for including markdown in archetype definitions? In general it would be very useful to be able to format the larger text blocks of a lot of archetypes, but my main use case right now is to be able to use subscript for things other than numbers. [/quote] There's nothing to stop it right now - if there were a common agreement across all tools to accept some flavour / subset of markdown, as discussed below (personally i would also allow tables). Remember that the idea of markdown is that you can read it in its raw form as well as the rendered form (although links are pretty annoying in raw form). So a tool that didn't support rendering should still just display the raw form with no problems. The meta-data fields you are talking about are covered by the [ODIN spec for string data](https://specifications.openehr.org/releases/LANG/development/odin.html#_string_data). This allows nearly anything but does have a couple of rules: * [Section 3.2](https://specifications.openehr.org/releases/LANG/development/odin.html#_special_character_sequences): some old skool backslash quoting is allowed, because everyone knows it e.g. '\t' etc; * [Section 3.1](https://specifications.openehr.org/releases/LANG/development/odin.html#_file_encoding): non ASCII chars have to be represented with UTF8 char strings There's nothing to say we couldn't change any of these rules today, but I think they are sufficient to allow Markdown. According to most markdown documentation, subscript and superscript are done with either `` and `` or (as with [Asciidoc](https://docs.asciidoctor.org/asciidoc/latest/text/subscript-and-superscript/) and [pandoc](https://pandoc.org/MANUAL.html#superscripts-and-subscripts)), a pair of tilde characters subscript and a pair of carat chars for super-script characters. Aesthetically, much nicer. Raw form: ``` "`Well the H~2~O formula written on their whiteboard could be part of a shopping list, but I don't think the local bodega sells E=mc^2^,`" Lazarus replied. ``` Rendered: "Well the H2O formula written on their whiteboard could be part of a shopping list, but I don't think the local bodega sells E=mc2," Lazarus replied. I would allow both forms. Since we expect to replace ODIN in archetypes with JSON or YAML in the near future, we would want to make sure we know how to process ODIN encapsulated Markdown into those formats. --- ## Post #16 by @varntzen [quote="sebastian.garde, post:6, topic:6133"] I guess nobody is stopping anyone from putting markdown in the archetype in these places (yes, probably not in the data element name :slight_smile: !) [/quote] Why not in data element name? It's in the https://ckm.openehr.org/ckm/archetypes/1013.1.393 ![image|490x53](upload://3yN4QUnYfvPJ4839AO9EGtM8DLt.png) --- ## Post #17 by @sebastian.garde [quote="varntzen, post:16, topic:6133"] Why not in data element name? It’s in the [Cluster Archety](https://ckm.openehr.org/ckm/archetypes/1013.1.393) [/quote] Hah - I was sure someone would find an example! This one though is simply a Unicode char I think https://unicodeplus.com/U+2082 --- ## Post #18 by @sebastian.garde [quote="ian.mcnicoll, post:10, topic:6133"] which mimic MD markup but incompletely e.g `**hello` which will confuse MD parser which is looking for the terminating ** to signify bold. [/quote] Is this really a problem? This would simply not be displayed in bold because that is not what it says, but exactly as if it were not markdown: `**hello`. (It is not like an xml tag that must be closed to be valid xml.) Another somewhat common thing is that there are some archetypes using three hyphens for a level underneath (see e.g. https://ckm.openehr.org/ckm/archetypes/1013.1.4439): \- first level \-\-\- second level The second level would display as \-\-\- in both markdown and non-markdown where as the first level is rendered as a list. - first level \-\-\- second level Not harmful, and easily changed to use indentation for the second level if desired. Grahame's algorithm to decide whether it is markdown or not is pretty good. It has the (side) effect that if the *only* markdown you use are hyphens for lists (see first level example above), then this would be interpreted as NOT markdown (if I read it correctly). This would be fairly common with current archetypes. If something that is more clearly markdown is added , the hyphen would then be suddenly interpreted as markdown. But again whether a hyphen at the start of a line is displayed as hyphen or dot is probably not a big problem. We could also include it in the algorithm I assume, but either way it is an interpretation. Unless of course, we add Ian's suggested uses_markdown marker to the other_details field. It can be done, but I haven't convinced myself yet that it is really necessary, especially as long as this is only for use,misuse, purpose. My question is if anybody has any more compelling real-life examples that make a per archetype 'uses_markdown' flag a necessity? --- ## Post #19 by @ian.mcnicoll Sure - I'm probably over-thinking this !! I'm fairly confident that any downstream issue is likely to be very rare and at worst very manageable. --- ## Post #20 by @thomas.beale [quote="sebastian.garde, post:18, topic:6133"] Unless of course, we add Ian’s suggested uses_markdown marker to the other_details field. It can be done, but I haven’t convinced myself yet that it is really necessary, especially as long as this is only for use,misuse, purpose. [/quote] The problem with such a field is that it might not be populated for description text whose authors think (in principle) is markdown, but happen to use no markdown syntax, so far. Maybe the tool has some visual switch 'use markdown' that the user clicks, and thinks they are using, but they actually create no markdown elements. But at some point in time a modifications to the text will contain markdown bullet lists or whatever. What will make them remember to set the 'uses_markdown' tag? Most likely an algorithm in the tool that detects markdown, and suggests to set that field. But if there is an algorithm that does this well, I would use it differently: use it to suggest corrections to make the text proper markdown, e.g. to replace dashes in bullet lists with asterisks or whatever. And more obviously tools should have a rendering window (like the Asciidoctor plugin for IntelliJ) or Github etc, so the author can see the rendered form of their text. So I would not be inclined to bother with an extra meta-data field that I think would be tricky to maintain. --- ## Post #21 by @sebastian.garde Looks like we are getting close here. My suggestion 1. Support GFM in `purpose`, `use`, `misuse`. 2. Use the algorithm to check if anything that is found may be markdown and if so render these fields accordingly when displayed. I.e. No special markdown flag. 3. For simplicity and clarity, I would stick with what GFM (and typical libraries) supports out of the box as much as possible. 4. Use markdown judiciously in these fields. Less may be more, especially all numbers exist as superscript and subscript in Unicode and thus won't require markup; some selected letters as well (those that are typically used in math formulas). 5. I don't expect many problems with existing archetypes but if any come up the fields would need to be updated, probably very slightly so. Based on that suggestion this leaves the following questions 1. Should we support superscript via surrounding carets ^test^ and subscript via surrounding tildes: ~test~ This is typically available via extensions to GFM and could be enabled if desired. 2. Related, but maybe less compelling, should we support strikethrough as well via double tilde: \~~This text can be interpreted as strikethrough.~~ 3. Should we allow the GFM supported HTML tags? This can typically be turned on or off (as a whole, not per tag). This relates to `, , , , ,
, , , 
    ,
      ,
    1. , , ,
      , ,
      ,
      `. The only somewhat compelling reason to have it is for `` to underline. My opinion on these questions is 1. Yes - support subscript and superscript using ^ and ~ 2. Maybe - strikethrough is far less compelling and slightly dangerous (either way) 3. No. Why use markdown if what you really want to do is html? The advantage of markdown is its human readability without being too distracting if it is just rendered as plain text. --- ## Post #22 by @ian.mcnicoll Based on discussions re PROMS I would also add Description to the list. I think we might find that over time we actually extend this e.g. to annotations. I'm fine with GFM. We will need underline, sub/superscipt (for some units) though agree numerics should be Unicode. I agree we should not allow strikethrough, and am happy to use the caret/tildes I agree we should avoid HTML tags but the only tricky area is which I think is supported in FHIR. I'd limit the HTML to Getting really close. --- ## Post #23 by @sebastian.garde [quote="ian.mcnicoll, post:22, topic:6133"] I would also add Description [/quote] You mean all descriptions in the ontology? [quote="ian.mcnicoll, post:22, topic:6133"] I agree we should avoid HTML tags but the only tricky area is which I think is supported in FHIR. I’d limit the HTML to [/quote] Not sure what you want to limit the html to, but in any case, I think this needs to be either all the GFM supported ones or none. You don't have to use anything other than \ of course (or whatever you had in mind), but starting to define and then implement specific extensions in all tools to support a different subset seems overkill. --- **Canonical:** https://discourse.openehr.org/t/markdown-in-archetype-definitions/6133 **Original content:** https://discourse.openehr.org/t/markdown-in-archetype-definitions/6133