REST API for creating compositions with id

ian.mcnicoll · 17 March 2023 12:00

I’ve read up a bit more and using PUT just gets us potentially into a tangle with th e url path or complexities around headers.

From my reading of the RESTful difference between PUT and POST, this is still a POST.

We are adding a new unique resource to the collection of Compositions, not updating an existing composition. The only difference is that we are assigning the uid client-side, not server-side.

The only downside I can see of using the uid within the Composition body, is that some composition generated examples populate the uid, which might cause confusion, so I’d want that behaviour to change.

ian.mcnicoll · 17 March 2023 12:12

We are looking at full vendor to vendor transfer of all EHRs. This would obviously be more efficient at lower level but may not be a high priority for CDR vendors to implement, so this was an experiment to see how much could be done with the standard API.

If nothing else it is a good advert for tech/vendor neutrality in a way that is non-opaque, and helps tease out the exact requirements.

Good point re the commit_time - we need to think about that.

damoca · 17 March 2023 13:26

That’s an interesting scenario, and probably needed sooner than later.

Since different vendors can implement different persistence technologies, its reasonable to work it out at the API level. But it’s not only about copying parts of the loaded EHRs. As @thomas.beale said, you should replicate all metadata (commit times, etc.), templates loaded, FOLDERs, CONTRIBUTIONs, demographics, and any other configuration at the EHR level.

The solution probably is to design a dump/load API endpoint, having some kind of container for all that information. Maybe the EHR_EXTRACT class, or an evolution of it, could be a good candidate.

This would be computationally heavy, but that’s another topic

ian.mcnicoll · 17 March 2023 13:40

We are hoping to cover eerything in there other than Folders and Demographics as neither are included in our current scope. Folders would probably not be too hard to do but there really are very few ? any independent openEHR Demographics services out there, and in any case, at least right now very loose or non-standardised coupling.

But we are doing/ havedone

Templates
Ehrs
Contributions
Composition versions

Agree that ideally this should be nicely wrapped up in to a simpler service but for now it is probably pretty useful to see the moving parts, if only so we can raise some of these issues as we go along.

If we can agree on the post /composition with uid resolution and have a list contributions end point, we are probably very close to having something working. I see we, I mean the Future Perfect guys like @Simon !! I’m just giving a little advice.

Seref · 17 March 2023 13:48

This would be my preferred approach to design. Even though we did not explicitly define it, there is an implicit context to the operations supported by the REST API (or so I think). Restoring data from another CDR is a richer operation in terms of its semantics, we’re replaying an actual insert that took place in another CDR, and there is more data that is of interest than a direct call to a REST endpoint, say from a mobile app.

The time of the original operation (as Tom points out), the fact that this operation is a replay of a previous one, and the time of the replay (which is potentially meaningful for audit purposes), and the information related to the original CDR. The identity of the bulk import itself, to be assigned and tracked, so that we know how many succeeded, failed etc. The need to keep the original uids.

I would not like to encode all of this within the semantics of a single REST endpoint, it’s too big of a semantic overload for my taste, due to current REST apis not having any awareness of a bulk operation (this is PUT/POST 16 of 345,000… → where do you keep that info? REST is supposed to encapsulate the whole state in the call.)

Finally, things get interesting when you scale up to tens of millions of compositions. We’re currently planning a data migration between two Ocean systems which will take days using specialised tools. If it was REST endpoints, we’d be pronouncing months…

ian.mcnicoll · 17 March 2023 14:28

Totally agree but both for immediate need and for education and a bit of ‘look how easy this actually is’, still keen to show the moving parts via chained REST calls.

@Seref - what’s your take on POST vs. PUT for creating new compositions with a given uid, which has wider uses-cases than this one e.g. linked compositions within a contribution.

Can we continue this specific discussion in JIRA [SPECITS-62] - openEHR JIRA and hopefully reach consensus. Every one seems to agree it is a reasonable idea

thomas.beale · 17 March 2023 14:44

This is definitely a dump/load thing. Some initial modelling on the abstract version of that here.

The reason why it has to be done as a dump-load service is to allow for differences in concrete data representation inside CDRs of different vendors and even different releases from the same vendor. No vendor would be able to guarantee anything if someone just slides the DB tables or even full JSON dump straight into the target DB, even if in some cases it might work. The creation of Versions and Contributions has to be done by appropriate calls taking the relevant content as an argument.

There are clearly multiple forms of this:

a partial EHR extract → import (e.g. as for transfer of care) - this is a merging scenario and you might want to preserve original commit times if the source and target CDRs are within the same health service, or you might not, if the patient treatment is now being taken over at the destination.
the dump-load situation for a whole CDR’s contents, to upgrade it in situ to a new CDR - preserve original commits
dump-load of a full EHR from one CDR to another one - probably want new commit times, indicating the earliest time the local staff at the destination could have seen the data.
etc.

These correspond to real world scenarios like:

merging a piece of an EHR to a CDR in a new location
moving an entire EHR to a CDR in a new location (for patient move, e.g. new country, region etc)
replacing the CDR in situ, data should look the same as before

I agree.

damoca · 17 March 2023 14:50

Of course, and what you say is a totally needed approach. Anything else we could propose now would not be available for several years.

@Seref I was not talking of any specific solution at this point, just talking about how to approach to this problem. Then there will come the technical problems, and the possible solutions:

Probably we are not talking of a REST transfer of data, but an export to some kind of file. Or maybe communication via other formats (protobuf). We could also consider using a compression format to reduce the size of data.
Probably this is not a single method, but has more granular operations.
We can learn a lot from SQL dumps: we could allow to decide to export just the “schema” (templates, etc.) or the schema + data. Or just the data for some templates or patients.

And for sure, there will be many other optimizations.

sebastian.iancu · 18 March 2023 20:36

As mentioned in the past, in the context of [SPECITS-62] - openEHR JIRA issue, I’m still in favor of extending the semantic of our PUT Composition to be used also for creation, not only for update. This would be in line with RFC 9110: HTTP Semantics and is most likely exactly what a programmer using REST Api would expect from openEHR API.
The PUT Contribution should be also changed to support creation of a Contribution with a given uid.
We should also keep this behavior consistent, and specify PUT for all resources working in the same way (i.e. create & update).

These changes will be done in the upcoming REST Api specification release, and should facilitate most of replay-functionality described above, keeping original IDs in the target system, although I can imagine system creation timestamps might be not honored always - which is an aspect we should still can discuss in the SEC meetings.

stefanspiska · 20 March 2023 13:29

@sebastian.iancu

The composition uid is just the Common Information Model locatable uid. so Its a bit strange that one is special and needs a specific syntax.

Also a use case we have is to create a composition and then add it to the directory in one contribution, which only possible if you can set the uid in the composition in the contribution and then put the uid in the folder update in the same contribution .

thomas.beale · 20 March 2023 14:09

I don’t see any reason in principle not to allow this, as long as UUID generation rules are being followed by the client.

Here’s a nice article on UUID generation.

ian.mcnicoll · 20 March 2023 14:58

(clinical hacker alert!) I looked at RFC 9110: HTTP Semantics ands I could still not see a a good argument for using PUT. Can you point to the exact part of the spec which make you think PUT over POST? I can see that a create is allowed by PUT but the spec does not explain why or when it is preferred. We are definitely creating new resources here, not re-allocating uids to an existing resource (or at least that’s my reading!)

The requirements for POST seem fully met and those specs dod not help me understand where a create might be reasonable

Seref · 21 March 2023 10:35

Sorry if I failed to express myself. To clarify: I didn’t think you were suggesting a specific solution I just thought your diagnosis was correct, then moved on to a particular suggestion of mine.
Based on your follow up, I get the feeling we’re on the same page, or maybe I’m confused

Seref · 21 March 2023 11:00

This is where angels and pinheads emerge I’m afraid. Is http semantics to be taken directly as REST semantics? Http has no problem using put for a creation while indicating the id to be created, but my understanding of REST favoured POST for that.

I have to repeat though, I don’t think it is a good approach overloading the REST api for bulk transfer semantics, but having said that, if we’re talking about setting the identifier of a resource on the caller side, I’d be inclined to use POST with the id included in the resource contents, which is what uid gives us. That is slightly better then PUT in my humble opinion because it stick to create semantics of POST rather than resorting to PUT, which is acceptable but not as obvious as POST, but as I said, feels like angels and pinheads to me. Also in PUT scenario we have to make sure the id in the URL and the one in the body match, which is more moving parts then I like.

The real question I have with setting the ids is: what about versioning? This pops up every now and then and there are discussions about this somewhere here: the uid field is set to a type relatively higher up in the identifier type hierarchy from RM, but in reality it is a more concrete type, set by a convention of Ocean’s first use and everybody else following track (vendor see, vendor do) As in:

First of all, how will we deal with system id and version when pushing a composition from another CDR via REST? if we use the object_version_id then I remember : not being allowed on some frameworks for being a reserved character in an URL (not a problem in POST). Then comes the versioning. What would the server do if you want to replicate the history of a composition (all versions) and you push v1 after you push v2? Assume the DB timed out under heavy write load while trying to grow the db file size (yes, it happens) and v1 insert failed but v2 worked. Now you have some missing history which you have to fix. Then there’s other invariants such as the contribution and folders as @stefanspiska mentioned.

Since the id of the resource is overloaded in reality with more information than you can represent with REST’s built in constructs as is the case here, we’ll have to define a lot of situations with headers at the minimum. As in server behaviour when the above out of order insert happens.

I’m not writing all this to be purist or pedantic. Maybe I’m missing the point but all of the above is pointing at REST just not having the semantic bandwidth IMHO.

Well, I’m afraid I’m under the opinion that it actually is not easy. I give @thomas.beale a lot of grief when it comes to (over)pragmatism on my part but I don’t think this is a case you can cut corners: you in bold because then it’ll become "this is how it’s done in openEHR "because Ian McNicoll did it that way. Same goes for other highly visible personas of openEHR btw.

There’s valuable work and lessons to take from FHIR here, who went with some file exports for bulk data transfer if my memory is correct.

I hope I could make your day once again Ian

ian.mcnicoll · 21 March 2023 11:22

Thanks Seref, - mostly

POST vs. PUT - thanks for the clarificaiton that this is quite a tight decision. We need to make a decision on this regardless of the bulk transfer usage as it is needed, particularly when constructing links within a contribution as @stefanspiska

And I agree that we need something similar for Contribution, which retains existing identifiers, though I’m not clear if anything breaks if these are re-created.

The way we planned it was basically to re-play each original Contribution, and each ascending version in the correct order but I can see the value of additional checks both that the uid does not already exist, and that the version is not out of order. As I understand it , retaining the original systemId is not a problem.

In terms of failing commits, for whatever reason, I’d expect a whole Contribution would fail and be logged, whether that means the whole EHR should be deleted is an interesting question.

I completely agree that this is not likely to be a good solution for anything other than small-scale use-cases / experimentation but of nothing else we are forcing the community to address some of these questions re Contributions, version order, fail behaviour, which would equally apply with a more elegant Bulk Import/Export process.

And as a test of whether there are indeed ‘corners being cut’.

sebastian.iancu · 21 March 2023 21:03

Sorry @ian.mcnicoll for late reply (with answer to your question above), but here is the way I see it:

POST: you create a new resource by POSTing against a collection url, so you basically add (create) a composition to the EHR-composition-collection, and rely on the collection “manager” (i.e. EHR) to assign and return you an identifier for your composition. Thus the server-side will assign the uid.
PUT: you should use when you want create a resource and you want to be in charge of (or actually you have requirements regarding) the identification of your composition within the collection. Basically the client-side decides the uid.
In both situations the server may refuse to execute the operation (to create the composition) if certain condition are not fulfilled. Consider here conflict on uid, version mismatch, etc.

So, I would say you should use PUT only if you need a specific uid, otherwise use POST.

The bulk import/export might be a use-case (although as mentioned above not performant enough for large replay cloning CDRs), or another use-case could be the one of creating compositions & associated folder with or without a contribution (and operation orchestrated by client-side).

This is just http/rest - we might not like it enough, or we might say that is unhandy or not reach enough in semantic, but this is “standard”, and programmers likes it (I guess…).
Btw, FHIR also supports it - see FHIR Http 3.1.0.4.1 Update as Create.

sebastian.iancu · 21 March 2023 21:20

I agree there is a lot of things we can discuss on this subject of system& versions & identification, and a lot o points where API implementation might go wild with subtiel flavors. But on the other hand for PUT (with an uid), is it not simple like this:

if no If-Match header is set, and uid is not allocated it will be version=1 (or the one specified), thus a create.
if you need to create a new version, make sure the payload and the headers are right so that operations will be an update (of the previous)?

We could ask to rely only on POST to honor client-side uid, just based on payload (composition.uid), but that will make the spec unpredictable for a "REST API programmer. I’m afraid in that case we’ll go against the mainstream - but please correct me if I’m wrong about this practice.
Btw, in case of POST, FHIR ignores the id from the resource - see FHIR Http create.

Seref · 22 March 2023 08:17

Frankly: I don’t know. I don’t think using POST with an id in the body would violate the so called REST principles but it may be non-idiomatic, so these are all grey areas.

Your suggestions re headers is what I was referring to, and thinking about this clearly, I realised it actually follows a design principle from programming and api design I noticed, though I have not seen this named specifically: make the non-obvious harder than obvious So using PUT with well documented and quite possibly custom headers to define behaviour for edge cases etc, we can support creating resources with caller-set identifiers, versions etc. It won’t be as easy to use as POST but that’s actually the point.

This works rather well in programming language and API design, 80-90 % of use cases are conveniently supported out of the box in terms of syntax/tooling etc, and when you need the remaining cases, you need a strong cup of coffee and some documentation.

Thanks for pointing out what FHIR has done, I respect the effort they’re putting into their work and where possible we should learn from them, so in this case support in favour of PUT becomes stronger for me.

ian.mcnicoll · 22 March 2023 12:40

Thanks Seref / Sebastian,

I’m also gently persuaded towards PUT by knowing that is the current FHIR approach, however …

@Sebastian - are you suggesting that for

PUT https://{baseUrl}/v1/ehr/{ehr_id}/composition/{uid_based_id}

the meaning of the {uid_based_id} is dependent on the existence of the IF-MATCH header? That feels a little tricky to me , as a common problem for newbies will be to forget toadd If-Match when trying to to a ‘normal’ PUT on an existing composition.

Also to @stefanspiska question about Contributions - how would that work, as within the Contribution there might be a mix of creates and updates.

{

* "uid": {
  * "_type": "UID_BASED_ID",

  * "value": "6cb19121-4307-4648-9da0-d62e4d51f19b"},

* "versions": [
  * {
    * "preceding_version_uid": {},

    * "signature": "string",

    * "lifecycle_state": {},

    * "attestations": [],

    * "data": {
      * "archetype_node_id": "openEHR-EHR-COMPOSITION.encounter.v1",

      * "name": {},

      * "uid": {},

      * "archetype_details": {},

      * "language": {},

      * "territory": {},

      * "category": {},

      * "composer": {},

      * "context": {},

      * "content": [ ]},

    * "commit_audit": {}}],
}

sebastian.iancu · 22 March 2023 21:20

Yes, the missing If-Match header implies a the client requests to create a resource, while the presence implies that the client requests to update a resource (the one identified by versioned_object_uid from the url, while the header is just a check for the version validity). So when a someone wants to update a resource, and forgets to add the header (and assuming that a previous version already exist on the sever), it will get a http-error (we have to see if it would be 412, 409 or 400).

The meaning of the PUT operation (create vs update) is changing dependent on client assumptions (which will have to be set in the headers). But the meaning of the {uid_based_id} is always the same: a versioned_object_uid for identifying a VERSIONED_OBJECT in a form of a HIER_OBJECT_ID.

For the PUT with a contribution (which is not yet part of the specs): it could be used to create a contribution with a client-generated uid, the rest should be working the same as POST. In case the contribution with that uid already exist, or one of the version could not be committed, it should return a 409 (or something else).