We have developed data pipelines that insert retrospective clinical data from hospital systems into EHRbase CDR (We have installed using Docker)
The data volume we are piloting with is in the tune of 800K + . Although the insertions work fine, we see that there is a huge performance deterioration after a few 100 insertions. It almost takes 1 second per composition on an average. We also noticed that for every few inserts, there is a delay of 5-6 seconds.
Before we start further investigations, I just wanted to know if there are any performance benchmarks available for the EHRBase composition Creation? Is it designed to handle batch inserts the way we are trying?
this appears strange to me as we have made benchmarks and performance testing that show some stable performance for inserts. We tested with a Postgres Cluster with 5 nodes and a dedicated write instance and have insert time of around 200ms with 50 concurrent threads firing on EHRbase.
Even with a single instance this should work better than what you experienced, hence there might be an issue with your Postgres configuration.
I have done some tests almost an year and a half ago, check the CSV reports here GitHub - ppazos/testehr: Load tester for the openEHR API
Also tried to use JMeter GitHub - ehrbase/load-tests-jmeter: EHRBASE Load Testing
Remember the delays might depend on the amount and complexity of the data, so the number of total compositions is not the only indicator to look at. You need to consider the size of the OPTs and the compos.
There are so many variables in that setup that it’s hard to know where to begin speculating
Docker first: are you mounting a folder in the host for the postgres image? Did you consider setting up postgres in a VM? or even better on an actual machine? Are we talking SSDs here or magnetic disks?
Those delays sound too large for Java’s garbage collector to be the culprit, especially the 5-6 secs, unless you’re running your performance tests on a Nintendo switch… So my first attempt would be to give postgres some room to breathe and try it that way.
HI @anupama, I’d be interested in any follow up. Did you manage to fix your problem?
@all Sorry for the delayed Follow up status.
We started the insertions in a new environment. Single Node Cloud Cluster. Dedicated Write Instance. Single Threaded.
We inserted 844K compositions and 31K EHRs(Patients) at the rate of 200 ms per composition
As the next step, we are planning to use multiple nodes and multithreading. Will post our findings.
@anupama that seems a reasonable “insert time”. I’m not sure if you are considering the full request time or just the DB transaction time.
Do insertions run on the same machine that runs EHRBASE? Because for a remote (internet) test, just the request round trip will have a delay of 200+ms.