# `sct` - a fast, free set of local-first file-based SNOMED-CT tools in Rust **Category:** [General Discussion](https://discourse.openehr.org/c/general-discussion/132) **Created:** 2026-04-04 22:01 UTC **Views:** 58 **Replies:** 11 **URL:** https://discourse.openehr.org/t/sct-a-fast-free-set-of-local-first-file-based-snomed-ct-tools-in-rust/11906 --- ## Post #1 by @marcusbaw I've been involved in healthtech and clinical informatics for a fair old while now, and have always found the lack of simple tooling for SNOMED-CT quite frustrating. Everything is either a janky online Term Browser with limited functionality, or it's a REST API which requires you to **fully understand** FHIR, ECL ***and*** SNOMED before you can even start. I have no idea how people learn SNOMED completely in the abstract without anything to actually tinker with and learn from. In the UK we have the [TRUD](https://isd.digital.nhs.uk/trud) which makes even the process of obtaining the RF2 files a challenging and bewildering experience. (Hint: you have to 'Subscribe' before the Download button will be visible :rofl:) Anyway, rants aside (more in the video though), I decided to build something better and came up with this: https://github.com/pacharanero/sct In simple benchmarking it's between 6x and 60x faster than Snowstorm or Ontoserver for all queries I've tried. I'm sure it will be of use to many in the openEHR community. Here's a video that walks through the main features so far. https://youtu.be/f-gz-MKtU44 Would love to get feedback of the openEHR community. If you find it useful let me know, if you find bugs let me know, and if you want new features - let me know! --- ## Post #2 by @pablo Do you know Mark Wardle's tools? https://github.com/wardle/hermes --- ## Post #3 by @marcusbaw Hi @pablo - yes I know Mark well and his work was part inspiration for this. I have struggled with Clojure and JVM in the past and prefer simpler tools and setups, but `hermes` is still great stuff. I also did a fairly extensive search to see what else was out there before I built this, but there was a gap in the area of simple, file-based toolsets. I find databases add additional cognitive overhead which files do not. --- ## Post #4 by @pablo He has a Java wrapper around the closure library, which is what I use in EHRServer and Atomik to expand snomed expressions in openEHR queries. He actually helped me a lot on the setup, and I was able to report back issues and ideas he later fixed. --- ## Post #5 by @pablo [quote="marcusbaw, post:3, topic:11906"] I find databases add additional cognitive overhead which files do not. [/quote] With Mark's tools I treat the database as a black box, all I needed to do is following the setup phase, just a few commands, and that's it. Though, I find it difficult to think of having sct tools without a database that can do search, expand expressions, and other stuff with sct that actually has good performance, unless you plan to put everything on memory. --- ## Post #6 by @marcusbaw Search is millisecond fast on disk with this setup, running it on a normal laptop that has an NVMe SSD. I'm starting to think the 'database advantage' is minimal now that disks are so fast. But 50+ years of an industry believing that databases are the **only** possible performant technology is a belief that dies hard. I'll have a go with Hermes to see what it's like in comparison. Part of the reason for building this was simply for me to properly understand SNOMED internals by building something using it. --- ## Post #7 by @pablo I think, like all tools, databases of different kinds and flavors have their own use cases, though those cases are "general use", meaning the same tools can be used in the same way to solve different problems. For me the database is just a standardization layer which reals with, among other things, data at scale. With sct, I understand is not the same case, for instance data doesn't grow every day, also the schema is pretty static. So I understand were you are coming from, implementing tools for sct is a very constrained domain and files are fine. Though without seeing the implementation I'm guessing you are dealing with a lot of data in memory. Of course if you want to do stuff with LOINC, ICD10, or even mix that with clinical data, the same solution might not apply (not "general use"), so you might need to develop tools from scratch, there is where a database could help. It would be nice to have a functional comparison in terms of performance and tools available with Hermes and other libraries. I have watched some videos from Mark showing what tools are available in Hermes and the other libs he created around it. For instance I think he has a simple FHIR TS that works on top of Hermes. Best, Pablo. --- ## Post #8 by @marcusbaw [quote="pablo, post:7, topic:11906"] Though without seeing the implementation I’m guessing you are dealing with a lot of data in memory. [/quote] The implementation is open source so you can see it. Nothing is in memory, this is all files on disk. SQLite is probably caching a bit. But the NDJSON part is using `jq` over a file and it's still acceptably fast. Maybe I will add an option to do the operations in memory, I suspect that will make it even faster. --- ## Post #9 by @pablo [quote="marcusbaw, post:8, topic:11906"] The implementation is open source so you can see it. [/quote] Sure, though I'm not proficient in Rust, so I'm not sure what the code is doing or how it's organized. --- ## Post #10 by @marcusbaw It converts SNOMED into NDJSON and then SQLite, that's the Rust bit. After that you are just literally executing `sqlite3 ...` commands and there's no Rust involved. --- ## Post #11 by @pablo Aha! I thought it was file only and you did the operations over the file directly. There is in fact a database, a small one, but a database anyway. About the memory thing, you can configure SQLite to have the data on memory for faster access, which is specially good if all your data fits in your memory. --- ## Post #12 by @marcusbaw sct — what's new since the end of March Three weeks of fast-moving work. Here's the highlight reel. ### Easier to install Three ways to get sct now exist where there was only cargo install: - `curl -sSL https://... | sh` installer for macOS/Linux, PowerShell installer for Windows - auto-detects OS/arch, verifies SHA-256 against SHA256SUMS, drops the binary in the right place. - Homebrew tap and Scoop bucket auto-bumped on every release. - `cargo binstall sct-rs` for the Rust crowd who want prebuilt without compiling. - Prebuilt binaries shipped for **Linux** x86_64 + aarch64, **macOS** Intel + Apple Silicon, and **Windows** x86_64. ### New things you can actually do - `sct lookup` — direct SCTID and CTV3 code lookup with the full concept page. - `sct lexical` — FTS5 keyword search, with phrase / prefix / boolean operators. - `sct semantic` — Ollama-backed vector similarity search over Arrow IPC embeddings. - `sct refset` — list, inspect, and enumerate members of any Simple refset, end-to-end RF2 ingest support. - `sct codelist` — build, validate, diff, stat, and export clinical code lists in a YAML-front-matter Markdown format. - `sct trud` — download SNOMED CT releases straight from NHS TRUD, with SHA-256 verification. - `sct info` + `sct diff` — inspect any artefact, compare two NDJSONs to see what changed between releases. - `sct tui` — full-screen interactive terminal explorer. - `sct gui` — browser-based UI with a D3.js neighbourhood graph visualisation. - `sct completions` — bash/zsh/fish/PowerShell/elvish. ### More data linked together - **CTV3 and Read v2 cross-maps** loaded from UK Monolith RF2 — reverse-lookup legacy codes to SNOMED. - **Transitive closure tables** for fast subsumption queries. - **ZIP auto-extraction** — point `sct ndjson` at the release zip, skip the unzip step. ### MCP server matured - Codelist tools now exposed (build/edit code lists from your LLM client). - Newline-delimited JSON transport for the 2025-03-26 spec alongside the older Content-Length framing — works with both Claude Desktop and Claude Code 2.x. - Schema-version validation at startup so a stale binary against a new DB fails loud, not silent. ### Provenance (this week!) Every artefact now carries the SNOMED edition, release date, full release identifier, and the `sct` version that built it — captured in NDJSON headers, SQLite metadata tables, and Arrow schema metadata. `sct info` displays it, query commands show a footer (TTY-aware so pipes stay clean), `--provenance` flag overrides the default, MCP advertises it on every handshake and embeds it in `snomed_concept` responses, and `sct codelist add` auto-fills the `snomed_release` frontmatter from the DB. ### Maintenance, bugfixes & security - Crate split into library + binary so integration tests can live properly under tests/. - Pre-commit hook running `cargo fmt --check` + `cargo clippy --all-targets -- -D warnings`. - Replaced unmaintained `serde_yml` with `serde_yaml_ng`; bumped `ratatui` to 0.30 — clears three open RUSTSEC advisories. - Clean SIGPIPE exit so `sct … | head` no longer panics. - Configurable one-line concept format shared across `lookup`/`lexical`/`refset`/`semantic`. ### Docs - Walkthrough split into focused per-topic pages, hosted on the GitHub Pages site (Zensical). - Per-command reference in docs/commands/ - Devcontainer with `sqlite3`, `duckdb`, `jq`, `ripgrep`, `Python`, and `Ollama` pre-installed. --- **Canonical:** https://discourse.openehr.org/t/sct-a-fast-free-set-of-local-first-file-based-snomed-ct-tools-in-rust/11906 **Original content:** https://discourse.openehr.org/t/sct-a-fast-free-set-of-local-first-file-based-snomed-ct-tools-in-rust/11906