`sct` - a fast, free set of local-first file-based SNOMED-CT tools in Rust

I’ve been involved in healthtech and clinical informatics for a fair old while now, and have always found the lack of simple tooling for SNOMED-CT quite frustrating. Everything is either a janky online Term Browser with limited functionality, or it’s a REST API which requires you to fully understand FHIR, ECL and SNOMED before you can even start.

I have no idea how people learn SNOMED completely in the abstract without anything to actually tinker with and learn from. In the UK we have the TRUD which makes even the process of obtaining the RF2 files a challenging and bewildering experience. (Hint: you have to ‘Subscribe’ before the Download button will be visible :rofl:)

Anyway, rants aside (more in the video though), I decided to build something better and came up with this:

In simple benchmarking it’s between 6x and 60x faster than Snowstorm or Ontoserver for all queries I’ve tried.

I’m sure it will be of use to many in the openEHR community. Here’s a video that walks through the main features so far.

Would love to get feedback of the openEHR community. If you find it useful let me know, if you find bugs let me know, and if you want new features - let me know!

2 Likes

Do you know Mark Wardle’s tools? GitHub - wardle/hermes: A library and microservice implementing the health and care terminology SNOMED CT with support for cross-maps, inference, fast full-text search, autocompletion, compositional grammar and the expression constraint language. · GitHub

1 Like

Hi @pablo - yes I know Mark well and his work was part inspiration for this. I have struggled with Clojure and JVM in the past and prefer simpler tools and setups, but hermes is still great stuff.

I also did a fairly extensive search to see what else was out there before I built this, but there was a gap in the area of simple, file-based toolsets. I find databases add additional cognitive overhead which files do not.

He has a Java wrapper around the closure library, which is what I use in EHRServer and Atomik to expand snomed expressions in openEHR queries. He actually helped me a lot on the setup, and I was able to report back issues and ideas he later fixed.

1 Like

With Mark’s tools I treat the database as a black box, all I needed to do is following the setup phase, just a few commands, and that’s it.

Though, I find it difficult to think of having sct tools without a database that can do search, expand expressions, and other stuff with sct that actually has good performance, unless you plan to put everything on memory.

1 Like

Search is millisecond fast on disk with this setup, running it on a normal laptop that has an NVMe SSD. I’m starting to think the ‘database advantage’ is minimal now that disks are so fast.

But 50+ years of an industry believing that databases are the only possible performant technology is a belief that dies hard.

I’ll have a go with Hermes to see what it’s like in comparison.

Part of the reason for building this was simply for me to properly understand SNOMED internals by building something using it.

2 Likes

I think, like all tools, databases of different kinds and flavors have their own use cases, though those cases are “general use”, meaning the same tools can be used in the same way to solve different problems. For me the database is just a standardization layer which reals with, among other things, data at scale.

With sct, I understand is not the same case, for instance data doesn’t grow every day, also the schema is pretty static. So I understand were you are coming from, implementing tools for sct is a very constrained domain and files are fine. Though without seeing the implementation I’m guessing you are dealing with a lot of data in memory. Of course if you want to do stuff with LOINC, ICD10, or even mix that with clinical data, the same solution might not apply (not “general use”), so you might need to develop tools from scratch, there is where a database could help.

It would be nice to have a functional comparison in terms of performance and tools available with Hermes and other libraries. I have watched some videos from Mark showing what tools are available in Hermes and the other libs he created around it. For instance I think he has a simple FHIR TS that works on top of Hermes.

Best,
Pablo.

The implementation is open source so you can see it.

Nothing is in memory, this is all files on disk. SQLite is probably caching a bit. But the NDJSON part is using jq over a file and it’s still acceptably fast.

Maybe I will add an option to do the operations in memory, I suspect that will make it even faster.

Sure, though I’m not proficient in Rust, so I’m not sure what the code is doing or how it’s organized.

It converts SNOMED into NDJSON and then SQLite, that’s the Rust bit. After that you are just literally executing sqlite3 ... commands and there’s no Rust involved.