# More RM statistics
**Category:** [Technical (archive)](https://discourse.openehr.org/c/technical-archive/156)
**Created:** 2011-10-24 14:43 UTC
**Views:** 5
**Replies:** 12
**URL:** https://discourse.openehr.org/t/more-rm-statistics/15091
---
## Post #1 by @thomas.beale
I have refined the generated statistics in AWB, and the result is shown below for CKM archetypes (snapshot June 2011). Notes:
- the stats are over the 197 archetypes validated by the ADL 1.5 compiler (note that this is more strict than any of the ADL 1.4 tools). Another 80 archetypes did not pass, most for trivial reasons of 1.5 validity like repeated occurrences and cardinality constraints.
- the stats are obviously skewed toward the actual archetypes that have been done and are currently valid, which is more heavily OBSERVATION and EVALUATION related types.
- the first group of stats is aggregated ontology codes, e.g. 3794 at-code definitions over those 197 archetypes (I will add in average computation, but it is easy enough to see that its about 19 at-code average per archetype).
- this is a guide to costs and scale of translation and binding work
- in the RM metrics stats group, these are aggregated counts of archetyped nodes of any kind, i.e. there are 4327 nodes of any kind of C_OBJECT. This is an average of around 21.5 nodes per archetype
- this is potentially a guide to complexity and maintainability- in the RM breakdown stats group, each row indicates how many times that RM class was mentioned (i.e. constrained) in the 197 valid archetypes. E.g. the ADDRESS class was constrained 7 times.
- for each class, the number of occurrences of constraints on each attribute is available on the next level breakout (see further down)
- what this tells us (usually) is that a relatively small subset of the total number of attributes defined for a given type (e.g. DV_QUANTITY) are archetyped
- within the DATA_VALUE types, it also tells us what DV_XXX types are actually needed, according to clinical modellers, e.g. DV_DATE, DV_DURATION and DV_DATE_TME are all used.
- not surprisingly CLUSTERs, ELEMENTS, ITEM_TREE and a few other classes figure prominently
- re: the recent debates about ITEM_STRUCTURE simplification, note that ITEM_SINGLE and ITEM_LIST types turn up in the stats (maybe these classes are candidates for turning into ITEM_TREEs, or maybe they are evidence that we need to retain ITEM_LIST?)
There is more work to go here obviously, and I won't try to draw any hard conclusions right from what we see here. In terms of tooling the next steps are:
- add a few further stats like average, max, min
- some cleanup in visual presentation
- generate an XML report form of this data with a .css for use with normal browsers
- publish a new beta release of the Workbench, including all this functionality
Any suggestions, feedback is of course welcome.
[details="(attachments)"]


[/details]
---
## Post #2 by @system
Thanks, Thomas!
This is definitely the feature I have been waiting for. Dose RM class/attribute statistics also work for customised RM, i.e. non-standard openEHR RM?
Cheers,
Rong
---
## Post #3 by @thomas.beale
I have refined the generated statistics in AWB, and the result is shown below for CKM archetypes (snapshot June 2011). Notes:
- the stats are over the 197 archetypes validated by the ADL 1.5 compiler (note that this is more strict than any of the ADL 1.4 tools). Another 80 archetypes did not pass, most for trivial reasons of 1.5 validity like repeated occurrences and cardinality constraints.
- the stats are obviously skewed toward the actual archetypes that have been done and are currently valid, which is more heavily OBSERVATION and EVALUATION related types.
- the first group of stats is aggregated ontology codes, e.g. 3794 at-code definitions over those 197 archetypes (I will add in average computation, but it is easy enough to see that its about 19 at-code average per archetype).
- this is a guide to costs and scale of translation and binding work
- in the RM metrics stats group, these are aggregated counts of archetyped nodes of any kind, i.e. there are 4327 nodes of any kind of C_OBJECT. This is an average of around 21.5 nodes per archetype
- this is potentially a guide to complexity and maintainability- in the RM breakdown stats group, each row indicates how many times that RM class was mentioned (i.e. constrained) in the 197 valid archetypes. E.g. the ADDRESS class was constrained 7 times.
- for each class, the number of occurrences of constraints on each attribute is available on the next level breakout (see further down)
- what this tells us (usually) is that a relatively small subset of the total number of attributes defined for a given type (e.g. DV_QUANTITY) are archetyped
- within the DATA_VALUE types, it also tells us what DV_XXX types are actually needed, according to clinical modellers, e.g. DV_DATE, DV_DURATION and DV_DATE_TME are all used.
- not surprisingly CLUSTERs, ELEMENTS, ITEM_TREE and a few other classes figure prominently
- re: the recent debates about ITEM_STRUCTURE simplification, note that ITEM_SINGLE and ITEM_LIST types turn up in the stats (maybe these classes are candidates for turning into ITEM_TREEs, or maybe they are evidence that we need to retain ITEM_LIST?)
There is more work to go here obviously, and I won't try to draw any hard conclusions right from what we see here. In terms of tooling the next steps are:
- add a few further stats like average, max, min
- some cleanup in visual presentation
- generate an XML report form of this data with a .css for use with normal browsers
- publish a new beta release of the Workbench, including all this functionality
Any suggestions, feedback is of course welcome.
---
## Post #4 by @thomas.beale
> Thanks, Thomas!
>
> This is definitely the feature I have been waiting for. Dose RM class/attribute statistics also work for customised RM, i.e. non-standard openEHR RM?
>
> Cheers,
> Rong
they certainly do. The adltest repository is a non-standard test RM with all kinds of funny classes in it. Here are the results for it.
Here is the output for the EHR Extract archetypes, based on the [proposed EHR Extract RM](http://www.openehr.org/svn/knowledge2/BRANCHES/P_schema/rm_schemas/openehr_ehr_extract_200.bmm), which includes new and changed classes in it:
For any RM you can also get a separate tool that displays RM stats and info. Here it is for 13606:
[details="(attachments)"]


[/details]
---
## Post #5 by @system
Excellent!
/Rong
---
## Post #6 by @yampeku
just for curiosity\.\.\.
how many archetypes use partial ISO dates?
---
## Post #7 by @thomas.beale
good question \- I will look at adding a special query to pick that up\.
\- thomas
---
## Post #8 by @thomas.beale
I should point out that the .bmm file format has changed, so if you have a custom model to add, it probably won't work if it is in the previous format. Changing it is easy, and you can see the new format [here](http://www.openehr.org/svn/knowledge2/BRANCHES/P_schema/rm_schemas/). I will replace the old format files on the trunk in a few days once I have done a bit more testing.
- thomas
---
## Post #9 by @system
Hi Diego,
Generally there is no constraint on dates so partial dates are allowed\. When
dates occur as data points they will usually want to allow partial dates\.
The concise dates are almost always part of OBSERVATIONs or ACTIONs where
they are reference model attributes and generally not constrained\. Some
administrative dates are examples where partials may not be acceptable\.
Cheers, Sam
---
## Post #10 by @ian.mcnicoll
Hi Diego,
Interesting question but although I would suspect the number of
archetyped partial dates is extremely low, partial dates are very
common in real world clinical data, e\.g\. year of diagnosis or
month/year of procedure\.
Ian
Dr Ian McNicoll
office \+44 \(0\)1536 414 994
fax \+44 \(0\)1536 516317
mobile \+44 \(0\)775 209 7859
skype ianmcnicoll
ian\.mcnicoll@oceaninformatics\.com
Clinical Modelling Consultant, Ocean Informatics, UK
openEHR Clinical Knowledge Editor www\.openehr\.org/knowledge
Honorary Senior Research Associate, CHIME, UCL
BCS Primary Health Care www\.phcsg\.org
---
## Post #11 by @system
Thanks for pointing out the change\. I will let you know if I run into
any issues\.
/Rong
---
## Post #12 by @thomas.beale
This file should help writing a new schema, plus the existing schemas:
http://www.openehr.org/svn/knowledge2/BRANCHES/P_schema/rm_schemas/EXAMPLE.bmm.txt
\(these files will all move into the TRUNK in a day or so\.\.\.\.\)
\- thomas
---
## Post #13 by @thomas.beale
The statistics view finally looks as follows. This will be released in the next few days.
---
**Canonical:** https://discourse.openehr.org/t/more-rm-statistics/15091
**Original content:** https://discourse.openehr.org/t/more-rm-statistics/15091