Are there any plans for including markdown in archetype definitions? In general it would be very useful to be able to format the larger text blocks of a lot of archetypes, but my main use case right now is to be able to use subscript for things other than numbers.
For example in cancer staging with molecular classification, the molecular classification part is often styled in subscript, for example “IAmPOLEmut”.
Anything inside a dv_text can be markdown.Do you mean the name of an element? That inherits from locatable, the name is dv_text, so markdown should be possible right?
In the archetype definition, not in the data. In my example it would be in the description or comment of a data element, or in ‘Use’. I don’t think it’s a good idea to use markdown in a data element name
It was discussed before with mixed views. I think it is a good idea, especially if we stick the to Github Markdown limits used by FHIR (no tables).
The main issue would clearly be in tooling - @sebastian.garde and I discussed the impact and it is likely to be dobable in CKM without too much disruption, especially if as an interim, we flagged an archetype as potentially containing Markdown in the descriptions/ comments etc.
I think it is particularly useful in PROMS to be able to add simple formatting like Bold/ italics etc to help keep aligned with formatting rules.
I guess nobody is stopping anyone from putting markdown in the archetype in these places (yes, probably not in the data element name !)
It is just a matter of how this is interpreted and rendered everywhere. But even if it is just rendered as plain text by all or some tooling at the moment, it may be ok.
There are likely some edge cases where things may go wrong if special chars such as *, &, #, [, <, >, and ’ are used and we don’t know for sure if this is meant to be markdown or plain text.
Whether tables, strikethrough etc. or not, I certainly agree with FHIR that inline HTML elements should not be used.
Tables in use, purpose, misuse etc. may be ok, in ontology/description or comment fields they may be a bit over the top (also considering typical screen real estate).
I don’t think tables, headings or strikethrough are particularly relevant, and underline is IMHO a legacy thing from before we had bold and italic.
I think the following are likely relevant though:
Numbered lists
Bullet lists italic text bold text bold italic text superscript subscript links inline preformatted text
The only ones of these that require html tags are subscript and superscript. Could those be allowed specifically, without allowing any other html tags?
The other html tag might be underline - quite common in PROMS documents. It is probably not a good idea visually because of the confusion with links but we can;t control what the PROMS authors write.
there’s lot of legacy italics and underlining out there, so the GFM extensions are useful.
With regard to markdown, special chars such as *, &, #, [, <, >, and ’ appear in some units etc, and there’s been some problems in the FHIR ecosystem with people seeing something that might be markdown, and not being sure. I wrote this routine for use in the java implementation community:
/**
* Returns true if this is intended to be processed as markdown
*
* this is guess, based on textual analysis of the content.
*
* Uses of this routine:
* In general, the main use of this is to decide to escape the string so erroneous markdown processing doesn't munge characters
* If it's a plain string, and it's being put into something that's markdown, then you should escape the content
* If it's markdown, but you're not sure whether to process it as markdown
*
* The underlying problem is that markdown processing plain strings is problematic because some technical characters might
* get lost. So it's good to escape them... but if it's meant to be markdown, then it'll get trashed.
*
* This method works by looking for character patterns that are unlikely to occur outside markdown - but it's still only unlikely
*
* @param content
* @return
*/
// todo: dialect dependency?
public boolean isProbablyMarkdown(String content, boolean mdIfParagrapghs) {
if (content == null) {
return false;
}
if (mdIfParagrapghs && content.contains("\n")) {
return true;
}
String[] lines = content.split("\\r?\\n");
for (String s : lines) {
if (s.startsWith("* ") || isHeading(s) || s.startsWith("1. ") || s.startsWith(" ")) {
return true;
}
if (s.contains("```") || s.contains("~~~") || s.contains("[[[")) {
return true;
}
if (hasLink(s)) {
return true;
}
if (hasTextSpecial(s, '*') || hasTextSpecial(s, '_') ) {
return true;
}
}
return false;
}
private boolean isHeading(String s) {
if (s.length() > 7 && s.startsWith("###### ") && !Character.isWhitespace(s.charAt(7))) {
return true;
}
if (s.length() > 6 && s.startsWith("##### ") && !Character.isWhitespace(s.charAt(6))) {
return true;
}
if (s.length() > 5 && s.startsWith("#### ") && !Character.isWhitespace(s.charAt(5))) {
return true;
}
if (s.length() > 4 && s.startsWith("### ") && !Character.isWhitespace(s.charAt(4))) {
return true;
}
if (s.length() > 3 && s.startsWith("## ") && !Character.isWhitespace(s.charAt(3))) {
return true;
}
//
// not sure about this one. # [string] is something that could easily arise in non-markdown,
// so this appearing isn't enough to call it markdown
//
// if (s.length() > 2 && s.startsWith("# ") && !Character.isWhitespace(s.charAt(2))) {
// return true;
// }
return false;
}
private boolean hasLink(String s) {
int left = -1;
int mid = -1;
for (int i = 0; i < s.length(); i++) {
char c = s.charAt(i);
if (c == '[') {
mid = -1;
left = i;
} else if (left > -1 && i < s.length()-1 && c == ']' && s.charAt(i+1) == '(') {
mid = i;
} else if (left > -1 && c == ']') {
left = -1;
} else if (left > -1 && mid > -1 && c == ')') {
return true;
} else if (mid > -1 && c == '[' || c == ']' || (c == '(' && i > mid+1)) {
left = -1;
mid = -1;
}
}
// Detect autolinks, which should start with a scheme, followed by a colon, followed by some content. Whitespace
// is not allowed and for practical purposes, the scheme is considered to consist of lowercase ASCII characters
// only.
Pattern autolinkPattern = Pattern.compile("<[a-z]+:[^\\s]+>");
Matcher autolinkMatcher = autolinkPattern.matcher(s);
return autolinkMatcher.find();
}
private boolean hasTextSpecial(String s, char c) {
boolean second = false;
for (int i = 0; i < s.length(); i++) {
char prev = i == 0 ? ' ' : s.charAt(i-1);
char next = i < s.length() - 1 ? s.charAt(i+1) : ' ';
if (s.charAt(i) != c) {
// nothing
} else if (second) {
if (Character.isWhitespace(next) && (isPunctation(prev) || Character.isLetterOrDigit(prev))) {
return true;
}
second = false;
} else {
if (Character.isWhitespace(prev) && (isPunctation(next) || Character.isLetterOrDigit(next))) {
second = true;
}
}
}
return false;
}
private boolean isPunctation(char ch) {
return Utilities.existsInList(ch, '.', ',', '!', '?');
}
@sebastian.garde and I had a wee look at this issue and the latest GFM does allow some limited HTML tags to support underline, subscript, superscript.
Supporting MD technically in tooling is not particularly an issue, and if more complex constructs like Tables are excluded, any MD text will still remain very human readable.
As @grahamegrieve has said in his pst, the most likely area of confusion is where a new or existing archetype has some characters which mimic MD markup but incompletely e.g **hello which will confuse MD parser which is looking for the terminating ** to signify bold.
Perhaps we could introduce support, as follows…
A new tag in top-level other_details to signify usesMarkdown
Restrict use, for now, to Use, Misue, Purpose
Add a new node-level annotation that can be used for licensed PROMS text but which allows markdown, and if present, is used in tooling/ UI instead of description.
We can also almost certainly run current CKM archetypes through markdown validation to see which are potentially problematic, and in future, I think MD should be the default for tooling i.e use of MD reserved characters is flagged.
I should add that MD should never be supported in Node name/values
In part yes, but what ever the merits of FLAT format or otherwise, I feel we need to keep the node name as being brief and ‘semi-technical’ to allow it to be tokenised to a more technical format. AQL node name aliases are another use-case.
Do we allow Tables or Links in Node names?
If there’s a need to somehow support e,g bold or italic, for UI purposes , I would far prefer to use a separate annotation that could be picked up by UI tools.