archetypes for genomic data

Koray_Atalag · 15 July 2008 08:35

Hi Nicola,

It is great that someone from genomic society is involved with openEHR
As an M.D. who has done (thesis not submitted) Ph.D. work in
Molecular Biology & Genetics and worked on Genomic Databases (I am still
the curator of Turkish Human Mutation Database - a former HUGO MDI
initiative and now HGVS) my primary problem was to associate phenotypic
(i.e. clinical) data with genomic data. It was 1997-1998 and openEHR did
not exists! However I found out that the fundamental scientific problem
was not sequencing a whole lot of genes but to handle current vast
amount of data and extract useful information and discover knowledge by
Informatics. I have just finished my Ph.D. on Information Systems and my
thesis topic was related with openEHR archetypes...

The reason that I have given my background is that I am aware of your
requirements and possible contribution...I have previously tried to
raise some interest on this in both the openEHR foundation and the
discussion lists. I was not successful unfortunately in part because
that was not my primary concern at the time and perhaps in bigger part
due to the non-existance of a "critical-mass" within this society in
this area. But I know that some people exist who might also
contribute...Perhaps your message might be a right trigger for some of
us to establish an ad-hoc special interest group in genomics?

Your current proposal to represent genomic data with DV_MULTIMEDIA is
possibly the only way currently....BUT this is not acceptable in the
long run as many of the inherent attributes and constraints of genomics
should be integrated to Reference Model and for example a future
"DV_GENOMIC" type may further specialise into "DV_NUCLEICACIDSEQUENCE"
or "DV_RNASEQUENCE" or "DV_AMINACIDSEQUENCE".....Or the inherent openEHR
terminology should contain "viral DNA", "bacterial DNA", "mitochondrial
DNA" and so on.... Most importantly the "Codons" might be integrated in
the Reference Model and "AUG" might be defined as the universal STOP
codon for all genomic sequences??

Those are just immediate thoughts....I am sure much more can be set
forth if we can get together and work.

Cheers,

Koray Atalag, MD, Ph.D.
skype: atalagk

nicola onano wrote:

thomas.beale · 15 July 2008 09:26

Koray Atalag wrote:

Your current proposal to represent genomic data with DV_MULTIMEDIA is
possibly the only way currently....BUT this is not acceptable in the
long run as many of the inherent attributes and constraints of genomics
should be integrated to Reference Model and for example a future
"DV_GENOMIC" type may further specialise into "DV_NUCLEICACIDSEQUENCE"
or "DV_RNASEQUENCE" or "DV_AMINACIDSEQUENCE".....Or the inherent openEHR
terminology should contain "viral DNA", "bacterial DNA", "mitochondrial
DNA" and so on.... Most importantly the "Codons" might be integrated in
the Reference Model and "AUG" might be defined as the universal STOP
codon for all genomic sequences??

*I would think that these kind of data types will be needed in the
future, to allow transparent querying of protein & DNA structured data.
The DNA / protein information might reside in a special erver, but why
not make it an openEHR one with the required data types and so on?

- thomas beale

ian.mcnicoll · 15 July 2008 09:31

Hi Nicola,Koray,

Very interesting. It is perhaps worth thinking about this issue in 2 parts:

Developing archetypes for the application of genomic assessment in normal clinical practice e.g for myself as a GP with perhaps a DV_URI to the source genomic data. For this category of user the genomics is essentially ‘signals’ data akin to raw ECG leads data and I am much more interested in the decison support input and ‘results’ that fall out of the genomic analysis.

Is it possible to create a generic ‘Genomic Assessment’ archetype, which can be further specialised for particular conditions? Some of the work I was doing on the Family History archetype started to intrude on this area.

A further look at how the genomic data itself can be represented in openEHR. Does this really need additions to the Reference model as Koray suggests or can much of this be archetyped using existing RM datatypes?

Ian

Dr Ian McNicoll
office / fax +44(0)141 560 4657
mobile +44 (0)775 209 7859
skype ianmcnicoll

Consultant - Ocean Informatics ian.mcnicoll@oceaninformatics.com
Consultant - IRIS GP Accounts

Member of BCS Primary Health Care Specialist Group – www.phcsg.org

2008/7/15 Koray Atalag <atalagk@yahoo.com>:

system · 16 July 2008 12:17

Hi All

Good to see another genomic.

I am inclined to agree with Ian that the right place to start is with an EVALUATION and see what you come up with. Probably best to even start with a mind map. We need to choose a mindmap product that we all use so that we can all look at a collective set!

Cheers, Sam

Ian McNicoll wrote:

(attachments)

system · 16 July 2008 22:21

Hi all,

I got a similar question when I was introducing openEHR and archetypes in Porto, Portugal. My answer (or guess =) was that the generic data types and data structure models could probably be reused to model genomic data but it could be useful to introduce some high level common “container” classes dedicated to genomic data.

Cheers,
Rong

(attachments)

Thilo_Schuler1 · 17 July 2008 10:19

Hi,

I agree with Ian and Sam that we definitely need a "genomic
assessment" EVALUATION archetype and for some genetic disorders
specialisations of it. It should probably include links (DV_URI) to
reference sources on the web and/or encapsulated data (DV_MULTIMEDIA
or DV_PARSEABLE) to optionally store existing genomic information
formats like MAGE (http://www.mged.org/Workgroups/MAGE/mage.html)
directly in the openEHR instances for research purposes (e.g. in a
university hospital).

The introduction of new datatypes into the RM is a sensitve issue. The
RM was explicitely designed to be small and stable by using *generic*
classes/types.
Although I recognize the increasing importance of genomics/proteomics
(including phenotype-genotype relations) for medicine at the beginning
of the age of information-based or personalised medicine, I am
currently of the opinion that we should not try to model everything in
openEHR. This is for two main reasons:

1) openEHR excels at "putting the clinician in the driver seat"
through archetypes. The vast majority of clincians (and IMO this will
stay like this) is not interested in the the genomic data itself but
their consequences/conculsions. As Ian said a genomic EVALUATION
archetype would be needed. The archetype modeller/author should not be
overwhelmed by a growing number of RM classes it is hard enough to
properly use the existing ones...

2) By referencing to (DV_URI) or inclusion of (DV_MULTIMEDIA or
DV_PARSEABLE) we can make reuse the work, experience and tools (! -
e.g. for analysis) of other groups and can focus on what openEHR is
really good: let the clinicians decide what they want to record (and
not many clinicians are interested in DNA-squences etc but they need
to know e.g. whether a leukaemia is is Philadelphia Chromosome
(abl-bcr fusion gen) positive because certain drugs work only in that
instance).

Cheers, Thilo

thomas.beale · 17 July 2008 10:39

In principle I agree Thilo - use openEHR (as it is today) to record the
'report' based on the 'data', but not necessarily the raw data (image,
scan, genetic map etc). However - if we see openEHR as an engineering
approach, there is no reason not to create new pieces of the reference
model suited to specific purposes such as genomic/proteomic data
structures, or similarly to say workflow representation (as we have
considered inthe past). Then the archetypes for this part of the RM
would define models of content in those spaces. Not that I am suggesting
doing this right now, but the posibility is certainly technically
avialable. After all, all such data has to be represented in some kind
of model and technical framework - the openEHR approach may well offer
advantages.

- thomas beale

Thilo Schuler wrote:

ian.mcnicoll · 17 July 2008 10:53

Hi Thilo,

I am in complete agreement with you here.

A couple of things:

Can you or anyone else point to/or provide some real-life clinical forms or other documents pertaining to a clinical assessment where genomcs plays a significant part? I tried to google for this without much success.
I agree re not further complicating the openEHR reference model but I can see an attraction in using the openEHR 2/3 layer modelling paradigm in other specialist domains such as genomics with, perhaps a different or extended RM which encapsulates some of the basic data structures of that domain which are likely to be rigid and invariant over time such as the suggestions already made.

As I write this, Thomas has just suggested something similar.

Ian

Dr Ian McNicoll
office / fax +44(0)141 560 4657
mobile +44 (0)775 209 7859
skype ianmcnicoll

Consultant - Ocean Informatics ian.mcnicoll@oceaninformatics.com
Consultant - IRIS GP Accounts

Member of BCS Primary Health Care Specialist Group – www.phcsg.org

2008/7/17 Thilo Schuler <thilo.schuler@gmail.com>:

Thilo_Schuler1 · 17 July 2008 12:25

Hi Ian,

see inline

Hi Thilo,

I am in complete agreement with you here.

A couple of things:

Can you or anyone else point to/or provide some real-life clinical forms or other documents pertaining to a clinical assessment where genomcs plays a significant part? I tried to google for this without much success.

I googled two disorders on the top of my head…

As mentioned the Philadelphia Chromosome status is very important in certain leukaemias (can be treated with tyrosine kinase inhibitor) most prominently in CML. Here is a link regarding CML workup and treatment including genetic testing (FISH or PCR):

http://www.nccn.org/professionals/physician_gls/PDF/cml.pdf

Factor-V-Leiden mutation is also very important to assess the thrombosis risk. Two more links:

Both these examples are more or less monogenetic disorders, which are well understood with direct clinical relevance and therefore genetic testing is used in routine medicine.

I agree re not further complicating the openEHR reference model but I can see an attraction in using the openEHR 2/3 layer modelling paradigm in other specialist domains such as genomics with, perhaps a different or extended RM which encapsulates some of the basic data structures of that domain which are likely to be rigid and invariant over time such as the suggestions already made.

As I write this, Thomas has just suggested something similar.

True, and as Tom said the well desinged (in an engineering way) openEHR approach could be extended to genomics/proteomics to have a uniform environment. But my argument is still that openEHR should focus on the standardised but flexible recording oft “traditional” chart/record information (which turned out to be one of the hardest) and refer to other established specialised formats for raw data (ECG lead readings, microassay data…) via the generic datatypes (DV_URI, DV_PAREABLE, DV_MULTIMEDIA).

Thilo

Knut_Bernstein · 18 July 2008 09:09

Hi

My interest in the genomic archetypes is from the clinical perspective. I acknowledge that we need a way to represent the genomic information, sequence data etc. But we know that in the clinical setting this data is of little use unless they are accompanied by a clinical assessment. I.e. we need to be able to represent the consequences for the patient (and the family) of any genomic information.

This means that the reference to the genomic information may be rather simple, i.e. the name of a specific mutation, classification of a hereditary disease/variant, reference/link to a specific locus etc.

However what we need is a standardized way of linking genomic information to fenotypic information, i.e. diagnoses, symptoms and findings.

Furthermore we need to be able to express family relations in an explicit way.

There are some experiences in this area from the Danish registry for hereditary non-polyposis colorectal cancer (www.hnpcc.dk) which has been working with data definitions and communication standards for this – among other in the context of the EU project Infobiomed. (www.infobiomed.org/)

Regarding the representation of family relations the GEDCOM is a well known format for representation and communication of genealogical data, but newer models and format is also available. There is an overview on http://xml.coverpages.org/genealogy.html#gdmuml

Regards

Knut

system · 20 July 2008 02:52

Hi Knut

At present we have the ability in openEHR to describe people other than the subject of care, both in terms of their relationship (father/mother etc) and also by an ID, name or other identifiers. I believe that this is all we should do within the EHR. The aggregation of this data from a set of EHRs is what will lead to the information which is of value. It may be that keeping parents Genomic data in the EHR is of value as this may help untangle what is going on genetically when people have obvious problems - although the privacy implications are great as non-biological parents will be detected as will spontaneous mutations.

We have had an archetype as a specialisation of problem for genetic condition. I attach it here…
Let me know what you think of it clinically.

Cheers, Sam

Knut Bernstein wrote:

(attachments)

openEHR-EHR-EVALUATION.problem-genetic.v1.adl (8.72 KB)

Topic		Replies	Views
NIST: Automated Security Assessment Tool; need something similar Technical (archive)	6	1	22 May 2003
Archetyping Methodology Clinical (archive)	9	0	13 July 2008
Propositions? Technical (archive)	11	0	10 March 2004
family medical history archetype Clinical (archive)	3	0	24 October 2008
Encoding concept-relationships in openehr archetypes. Technical (archive)	13	0	14 August 2003
Evaluation Archetypes and assessment protocol Clinical (archive)	13	0	8 November 2007
Family History Clinical (archive)	0	1	18 November 2004
ECG archetypes Clinical (archive)	15	0	22 March 2007
procedure or finding? Clinical (archive)	12	0	23 April 2008
Adverse Reaction archetype - review round initiated Technical (archive)	10	0	17 July 2009

archetypes for genomic data

Related topics