# More RM statistics **Category:** [Technical (archive)](https://discourse.openehr.org/c/technical-archive/156) **Created:** 2011-10-24 14:43 UTC **Views:** 5 **Replies:** 12 **URL:** https://discourse.openehr.org/t/more-rm-statistics/15091 --- ## Post #1 by @thomas.beale I have refined the generated statistics in AWB, and the result is shown below for CKM archetypes (snapshot June 2011). Notes: - the stats are over the 197 archetypes validated by the ADL 1.5 compiler (note that this is more strict than any of the ADL 1.4 tools). Another 80 archetypes did not pass, most for trivial reasons of 1.5 validity like repeated occurrences and cardinality constraints. - the stats are obviously skewed toward the actual archetypes that have been done and are currently valid, which is more heavily OBSERVATION and EVALUATION related types. - the first group of stats is aggregated ontology codes, e.g. 3794 at-code definitions over those 197 archetypes (I will add in average computation, but it is easy enough to see that its about 19 at-code average per archetype). - this is a guide to costs and scale of translation and binding work - in the RM metrics stats group, these are aggregated counts of archetyped nodes of any kind, i.e. there are 4327 nodes of any kind of C_OBJECT. This is an average of around 21.5 nodes per archetype - this is potentially a guide to complexity and maintainability- in the RM breakdown stats group, each row indicates how many times that RM class was mentioned (i.e. constrained) in the 197 valid archetypes. E.g. the ADDRESS class was constrained 7 times. - for each class, the number of occurrences of constraints on each attribute is available on the next level breakout (see further down) - what this tells us (usually) is that a relatively small subset of the total number of attributes defined for a given type (e.g. DV_QUANTITY) are archetyped - within the DATA_VALUE types, it also tells us what DV_XXX types are actually needed, according to clinical modellers, e.g. DV_DATE, DV_DURATION and DV_DATE_TME are all used. - not surprisingly CLUSTERs, ELEMENTS, ITEM_TREE and a few other classes figure prominently - re: the recent debates about ITEM_STRUCTURE simplification, note that ITEM_SINGLE and ITEM_LIST types turn up in the stats (maybe these classes are candidates for turning into ITEM_TREEs, or maybe they are evidence that we need to retain ITEM_LIST?) There is more work to go here obviously, and I won't try to draw any hard conclusions right from what we see here. In terms of tooling the next steps are: - add a few further stats like average, max, min - some cleanup in visual presentation - generate an XML report form of this data with a .css for use with normal browsers - publish a new beta release of the Workbench, including all this functionality Any suggestions, feedback is of course welcome. [details="(attachments)"] ![jgafeceb.png|490x502](upload://kmgfr1XffVZ9Y1rQgtCIq4ewydN.png) ![gbccjicj.png|432x210](upload://54tsexjyH6d6zMuuo4yHhqxTK9I.png) [/details] --- ## Post #2 by @system Thanks, Thomas! This is definitely the feature I have been waiting for. Dose RM class/attribute statistics also work for customised RM, i.e. non-standard openEHR RM? Cheers, Rong --- ## Post #3 by @thomas.beale I have refined the generated statistics in AWB, and the result is shown below for CKM archetypes (snapshot June 2011). Notes: - the stats are over the 197 archetypes validated by the ADL 1.5 compiler (note that this is more strict than any of the ADL 1.4 tools). Another 80 archetypes did not pass, most for trivial reasons of 1.5 validity like repeated occurrences and cardinality constraints. - the stats are obviously skewed toward the actual archetypes that have been done and are currently valid, which is more heavily OBSERVATION and EVALUATION related types. - the first group of stats is aggregated ontology codes, e.g. 3794 at-code definitions over those 197 archetypes (I will add in average computation, but it is easy enough to see that its about 19 at-code average per archetype). - this is a guide to costs and scale of translation and binding work - in the RM metrics stats group, these are aggregated counts of archetyped nodes of any kind, i.e. there are 4327 nodes of any kind of C_OBJECT. This is an average of around 21.5 nodes per archetype - this is potentially a guide to complexity and maintainability- in the RM breakdown stats group, each row indicates how many times that RM class was mentioned (i.e. constrained) in the 197 valid archetypes. E.g. the ADDRESS class was constrained 7 times. - for each class, the number of occurrences of constraints on each attribute is available on the next level breakout (see further down) - what this tells us (usually) is that a relatively small subset of the total number of attributes defined for a given type (e.g. DV_QUANTITY) are archetyped - within the DATA_VALUE types, it also tells us what DV_XXX types are actually needed, according to clinical modellers, e.g. DV_DATE, DV_DURATION and DV_DATE_TME are all used. - not surprisingly CLUSTERs, ELEMENTS, ITEM_TREE and a few other classes figure prominently - re: the recent debates about ITEM_STRUCTURE simplification, note that ITEM_SINGLE and ITEM_LIST types turn up in the stats (maybe these classes are candidates for turning into ITEM_TREEs, or maybe they are evidence that we need to retain ITEM_LIST?) There is more work to go here obviously, and I won't try to draw any hard conclusions right from what we see here. In terms of tooling the next steps are: - add a few further stats like average, max, min - some cleanup in visual presentation - generate an XML report form of this data with a .css for use with normal browsers - publish a new beta release of the Workbench, including all this functionality Any suggestions, feedback is of course welcome. --- ## Post #4 by @thomas.beale > Thanks, Thomas! > > This is definitely the feature I have been waiting for. Dose RM class/attribute statistics also work for customised RM, i.e. non-standard openEHR RM? > > Cheers, > Rong they certainly do. The adltest repository is a non-standard test RM with all kinds of funny classes in it. Here are the results for it. Here is the output for the EHR Extract archetypes, based on the [proposed EHR Extract RM](http://www.openehr.org/svn/knowledge2/BRANCHES/P_schema/rm_schemas/openehr_ehr_extract_200.bmm), which includes new and changed classes in it: For any RM you can also get a separate tool that displays RM stats and info. Here it is for 13606: [details="(attachments)"] ![ehiccghd.png|602x387](upload://l7zVfyBlDiS6Wd8V8cmULOEZQyi.png) ![egcefbeh.png|263x234](upload://xvlgX4H1V3c8NgR4OgRU2tkO2BA.png) [/details] --- ## Post #5 by @system Excellent! /Rong --- ## Post #6 by @yampeku just for curiosity\.\.\. how many archetypes use partial ISO dates? --- ## Post #7 by @thomas.beale good question \- I will look at adding a special query to pick that up\. \- thomas --- ## Post #8 by @thomas.beale I should point out that the .bmm file format has changed, so if you have a custom model to add, it probably won't work if it is in the previous format. Changing it is easy, and you can see the new format [here](http://www.openehr.org/svn/knowledge2/BRANCHES/P_schema/rm_schemas/). I will replace the old format files on the trunk in a few days once I have done a bit more testing. - thomas --- ## Post #9 by @system Hi Diego, Generally there is no constraint on dates so partial dates are allowed\. When dates occur as data points they will usually want to allow partial dates\. The concise dates are almost always part of OBSERVATIONs or ACTIONs where they are reference model attributes and generally not constrained\. Some administrative dates are examples where partials may not be acceptable\. Cheers, Sam --- ## Post #10 by @ian.mcnicoll Hi Diego, Interesting question but although I would suspect the number of archetyped partial dates is extremely low, partial dates are very common in real world clinical data, e\.g\. year of diagnosis or month/year of procedure\. Ian Dr Ian McNicoll office \+44 \(0\)1536 414 994 fax \+44 \(0\)1536 516317 mobile \+44 \(0\)775 209 7859 skype ianmcnicoll ian\.mcnicoll@oceaninformatics\.com Clinical Modelling Consultant, Ocean Informatics, UK openEHR Clinical Knowledge Editor www\.openehr\.org/knowledge Honorary Senior Research Associate, CHIME, UCL BCS Primary Health Care www\.phcsg\.org --- ## Post #11 by @system Thanks for pointing out the change\. I will let you know if I run into any issues\. /Rong --- ## Post #12 by @thomas.beale This file should help writing a new schema, plus the existing schemas: http://www.openehr.org/svn/knowledge2/BRANCHES/P_schema/rm_schemas/EXAMPLE.bmm.txt \(these files will all move into the TRUNK in a day or so\.\.\.\.\) \- thomas --- ## Post #13 by @thomas.beale The statistics view finally looks as follows. This will be released in the next few days. --- **Canonical:** https://discourse.openehr.org/t/more-rm-statistics/15091 **Original content:** https://discourse.openehr.org/t/more-rm-statistics/15091