Random / Synthetic Data Generation over Templates

Hello everyone

I remember some time ago, there was a tool that given a detailed template description, it would populate it with random data taking
into account only knowledge about the datatype.

I vaguely remember Heather Leslie mentioningit but I may be wrong.

Is that functionality available from one of Ocean’s tools? (e.g. the Template Designer)

Is similar functionality available through other tools? (e.g. LinkEHR, other).

Looking forward to hearing from you

Athanasios Anastasiou

Hi, I have developed a tool that does that, is the openEHR-OPT project in my GitHub account (ppazos).

Pablo,

is that a tool we can list on the openEHR tools page?

  • thomas

If course, it does several things:

  • parses and loads opts in memory she does then in a cache for easy access (I use it as jar lib in a couple of projects including the ehrserver)
  • generates xml instances based on opts
  • validates the instances with xsd
  • generates basic html from an opt

The latest 3 are command line tools

Yeah, LinkEHR can do that

Regards

Would there be any interest in making these services available if we could find some free uk hosting? I guess ideally we should post the template and get back a sample composition without any actual persistence of either.

In my case i have that in my To Do list. I want to complete the services with an XML instance validator based on the OPT constraints, not just on the XSD. It will take some time to have a usable release.

Hello Pablo and all

Thank you very much for the quick response. That’s not too far from what I had in mind.

Ian, were you asking about “reverse” models offered as a service?

All the best

Athanasios Anastasiou

Hi Athanasios,

I’m not quite sure what you mean by ‘reverse models’?

A couple of CDR vendors have implemented services that allow you to generate a dummy composition for any template that is registered with the CDR. That is nice but even nicer would be to have a service that allowed a dummy composition to be generated ‘on-the-fly’ from a submitted template, without leaving the template permanently on the CDR.

Ian

Hello Ian

Yes, that is what I understood and these services would be very useful indeed.

A forward “model” for a 4 digit hexadecimal number could be something like K = “[0-9A-F] [0-9A-F] [0-9A-F] [0-9A-F]”.

K recognises things like “0F0F”, “FFFF”, etc. Using K in reverse would be to use the regular expression to generate (all the) strings that would match the model.

Similarly, a Template is a model that “matches” its data with constraints dictated by ADL and a reverse model would be a Template that is used to generate (all the) data that would conform with it. Of course, this can be done in a “typical” way, i.e. match the data type, or in a more realistic way by taking into account the condition (i.e. synthetic data).

All the best

Athanasios Anastasiou

Hello Ian

Yes, that is what I understood and these services would be very useful indeed.

A forward “model” for a 4 digit hexadecimal number could be something like K = “[0-9A-F] [0-9A-F] [0-9A-F] [0-9A-F]”.

K recognises things like “0F0F”, “FFFF”, etc. Using K in reverse would be to use the regular expression to generate (all the) strings that would match the model.

Similarly, a Template is a model that “matches” its data with constraints dictated by ADL and a reverse model would be a Template that is used to generate (all the) data that would conform with it. Of course, this can be done in a “typical” way, i.e. match the data type, or in a more realistic way by taking into account the condition (i.e. synthetic data).

All the best

Athanasios Anastasiou