Spaces:
Sleeping
Sleeping
| # VariantLens | |
| *A clinical-grade genomic variant interpretation system for the | |
| Jordan Lerner-Ellis Lab* | |
| **Brief prepared 2026-05-12** · commit `7c28d3b` · | |
| https://github.com/tsevitth-png/variantlens | |
| --- | |
| ## Executive summary | |
| VariantLens automates the ACMG/AMP 2015 framework end-to-end. Given a single | |
| HGVS variant, it gathers evidence from 12 independent biomedical data sources, | |
| applies 22 of the 28 ACMG criteria across a deterministic rule engine and a | |
| literature-grounded LLM layer, and produces a Bayesian-combined classification | |
| with a full audit trail. A trained curator reviews and signs off on every | |
| classification; the tool surfaces evidence, it does not autonomously classify | |
| for clinical use. | |
| The system is validated at **94.0% concordance** on a 1000-variant ClinVar | |
| 4★/2★+ fixture spanning 876 unique genes, with the literature-reasoning layer | |
| off. With literature on, a 50-variant stress-biased smoke test shows | |
| **+7 wins / 0 regressions** — projecting toward a ~96-97% combined headline | |
| on the full fixture. | |
| The architecture is open-source (private repo, MIT-licensable on request), | |
| self-hostable on-premise, and supports a fully air-gapped configuration in | |
| which no patient genomic data leaves the laboratory network. | |
| --- | |
| ## Validation status | |
| ### Concordance, by experimental setup | |
| | Setup | n | Adjacent-tier match | Pathogenic recall | Benign recall | | |
| |---|---|---|---|---| | |
| | 100-variant ClinVar 4★ (Apr 2026, baseline) | 100 | 89.0% | 80% | 99% | | |
| | 100-variant ClinVar 4★ (after rule-engine fixes) | 100 | **98.0%** | 95% | 99% | | |
| | **1000-variant ClinVar 2★+** (deterministic only) | **993** | **94.0%** | **96.5%** | **99.5%** | | |
| | 50-variant stress sample (RAG enabled) | 50 | 84.0%* | 95% | 100% | | |
| | Full 1000 with RAG (projected from smoke) | 1000 | ~96-97% | ~98% | ~99% | | |
| \* The 50-variant sample was deliberately stratified toward deterministic-misses | |
| to test RAG's rescue capability. On the same 50 variants, deterministic-only | |
| reached 70%; RAG lifted it to 84% with zero benign-side regressions. | |
| ### Per-variant-type breakdown (1000-fixture, deterministic) | |
| | Variant class | Count | Concordance | | |
| |---|---|---| | |
| | Synonymous | 2 | 100% | | |
| | Splice region | 182 | 97.3% | | |
| | Inframe insertion | 31 | 96.8% | | |
| | Other (intronic/UTR) | 51 | 94.1% | | |
| | Inframe deletion | 69 | 92.8% | | |
| | Missense / single-base | 658 | 83.1% | | |
| The missense gap is where the literature layer is designed to contribute — | |
| functional studies, family co-segregation, and de novo observations that | |
| no database alone captures. | |
| ### How to reproduce | |
| ```bash | |
| docker compose exec api python -m scripts.run_validation \ | |
| --fixture backend/tests/fixtures/clinvar_validation_set_1000.json \ | |
| --validation --skip-rag \ | |
| --out docs/clinical_validation_results_1000.json | |
| ``` | |
| The fixture, results, and breakdown scripts are checked into the repository | |
| at `backend/tests/fixtures/clinvar_validation_set_1000.json`, | |
| `docs/clinical_validation_results_1000.json`, and `scripts/per_gene_breakdown.py` | |
| respectively. | |
| --- | |
| ## Architecture | |
| ### The hybrid principle | |
| Database facts (population frequency, ClinVar consensus, in-silico predictor | |
| scores) are scored **deterministically** — no LLM involvement, no possibility | |
| of hallucination. Literature-derived evidence (functional studies, family | |
| segregation, de novo occurrence) goes through a **retrieval-augmented** | |
| pipeline in which the LLM is constrained to reason only over chunks retrieved | |
| from the trusted source corpus. | |
| ``` | |
| ┌────────────────────────────────────────┐ | |
| HGVS in ──▶ │ Mutalyzer → Ensembl VEP (normalize) │ | |
| └────────────────────────────────────────┘ | |
| │ | |
| ┌─────────────────────────┼──────────────────────────┐ | |
| ▼ ▼ ▼ | |
| Deterministic Database Literature | |
| engine (14 crit) layer layer (8 crit) | |
| │ │ │ | |
| ┌────┴────┐ ┌──────────┴──────────┐ ┌──────────┴──────────┐ | |
| │ autoPVS1│ │ gnomAD v4.1 │ │ PubMed │ | |
| │ rules │ │ ClinVar │ │ EuropePMC fulltext │ | |
| │ hotspots│ │ ClinVar residue │ │ NCBI PMC fulltext │ | |
| │ gene │ │ REVEL │ │ bioRxiv/medRxiv │ | |
| │ mech │ │ AlphaMissense │ │ Unpaywall + pypdf │ | |
| │ Pejaver │ │ SpliceAI │ │ Elsevier/Wiley/Springer | |
| │ tiers │ │ VEP consequences │ │ TDM (institutional) │ | |
| └────┬────┘ └──────────┬──────────┘ └──────────┬──────────┘ | |
| │ │ │ | |
| └─────────────────────────┼──────────────────────────┘ | |
| ▼ | |
| ┌──────────────────────────────────────┐ | |
| │ Bayesian combiner (Tavtigian 2018) │ | |
| │ + context-aware PM2 / PVS1 gating │ | |
| └──────────────────────────────────────┘ | |
| ▼ | |
| ┌──────────────────────────────────────┐ | |
| │ Curator review (mandatory sign-off) │ | |
| │ Free-text override w/ audit trail │ | |
| └──────────────────────────────────────┘ | |
| ▼ | |
| ┌──────────────────────────────────────┐ | |
| │ Audit-trail export (PDF, ClinVar XML,│ | |
| │ FHIR resources) │ | |
| └──────────────────────────────────────┘ | |
| ``` | |
| ### Criteria coverage | |
| 22 of the 28 ACMG/AMP 2015 criteria are implemented today. | |
| **Deterministic backbone (14):** | |
| PVS1 · PS1 · PM1 · PM2 · PM5 · PP3 · PP5 · BA1 · BS1 · BS2 · BP1 · BP4 · BP6 · BP7 | |
| **Literature-driven (8):** | |
| PS2 · PS3 · PS4 · PM3 · PM6 · PP1 · PP4 · BS3 | |
| **Pending (6, scoped):** | |
| PM4 · PP2 · BS4 · BP2 · BP3 · BP5 — none of these are high-yield on | |
| typical clinical caseloads; targeted for v0.2. | |
| ### Anti-hallucination by construction | |
| The literature layer's design eliminates fabrication pathways structurally, | |
| not stylistically: | |
| * **Retrieval first, generation second.** The LLM (Claude) never sees the | |
| open internet — only chunks retrieved by vector similarity from a corpus | |
| of PubMed abstracts and (where available) full-text papers. | |
| * **Citation enforcement.** Every fired criterion must cite a PMID. The | |
| prompt requires the cited PMID to appear in the metadata of one of the | |
| provided chunks. A post-validation schema check rejects responses | |
| containing PMIDs not in the retrieved set. | |
| * **Variant-specificity gate.** Added 2026-05-11 after empirical study. | |
| The LLM must quote a sentence containing the input variant's HGVS or | |
| protein change. Gene-level mentions (*"BRCA1 missense variants"*) do | |
| not qualify. This single change eliminated 32 of the 37 over-firing | |
| regressions observed in earlier RAG experiments. | |
| * **Conservative bias.** The prompt explicitly instructs the model to | |
| default to `triggered: false` on insufficient evidence, framing false | |
| positives as worse than false negatives — a curator can upgrade a | |
| missed criterion; a fabricated criterion silently corrupts the report. | |
| * **Structured JSON output.** Free text is rejected; the schema is | |
| validated and retried once with a repair prompt before failing closed. | |
| ### Literature evidence sources | |
| | Source | Status | Coverage of cited papers | Cost / access | | |
| |---|---|---|---| | |
| | PubMed abstracts | Active | 100% of indexed papers | Free | | |
| | EuropePMC full text | Active | ~40% | Free | | |
| | NCBI PMC full text | Active | ~30% | Free | | |
| | bioRxiv / medRxiv preprints | Active | Pre-publication functional studies | Free | | |
| | Unpaywall + PDF extraction | Active (opt-in) | ~50% of paywalled papers | Free | | |
| | Elsevier ScienceDirect TDM | Code ready, awaiting key | Most major journals | Institutional subscription | | |
| | Wiley Online Library TDM | Code ready, awaiting key | Wiley journals | Institutional subscription | | |
| | Springer Nature TDM | Code ready, awaiting key | Springer journals | Free (registration) | | |
| | OMIM clinical synopses | Code ready, awaiting key | Curated phenotype + mechanism | Free for academic | | |
| **Without any institutional credentials, active sources cover ~70-80% of cited | |
| papers.** With UHN library coordination on the publisher TDM keys, that climbs | |
| to ~85-90%. | |
| --- | |
| ## Differentiation from peer tools | |
| | | AI CURA | EvAgg | AutoPM3 | InterVar | VariantLens | | |
| |---|---|---|---|---|---| | |
| | Architecture | LLM-only + RAG | Aggregator only | Single-criterion ML | Deterministic only | Hybrid (deterministic + RAG) | | |
| | Validation size | ~100 expert-panel variants | n/a (not classifier) | Single criterion | ~7,000 (8 years old) | 1,000 (this work) | | |
| | Headline concordance | 96% (small set) | n/a | F1=0.96 (PM3) | 90% adjacent-tier | 94% deterministic, projected 96-97% with RAG | | |
| | Anti-hallucination | Best-effort prompting | n/a | n/a | n/a (no LLM) | Structural — citation enforcement, variant-specificity gate, JSON validation | | |
| | Audit trail to source | Reported in paper | Yes | n/a | Limited | Complete: every criterion cites a DB row, PMID, or VCV accession | | |
| | Per-gene concordance breakdown | Not published | n/a | n/a | Not published | Published in `docs/per_gene_breakdown_1000.json` | | |
| | Ancestry stratification | No | No | No | No | Available from gnomAD per-pop AFs | | |
| | On-prem / air-gap option | No | No | n/a | Yes (deterministic) | Yes (Ollama via `USE_LOCAL_LLM=true`) | | |
| | Open source | No | Partial | Yes (single criterion) | Yes | Yes | | |
| | Code available for review | No | Partial | Yes | Yes | https://github.com/tsevitth-png/variantlens | | |
| ### Defensible positioning | |
| The tool is the only system in its category that simultaneously offers: | |
| 1. A deterministic ACMG backbone that beats InterVar on coverage (22/28 vs ~18/28). | |
| 2. A literature layer with hallucination guards stronger than AI CURA's. | |
| 3. Per-gene transparency that no competitor publishes. | |
| 4. A fully on-premise deployment path for clinical regulatory environments. | |
| 5. Verifiable open-source code that reviewers can inspect. | |
| --- | |
| ## Clinical readiness | |
| ### Already in place | |
| * **Governance drafts** (`docs/governance/`): | |
| Lab SOP template, InfoSec/Privacy security review draft, REB/IRB | |
| submission brief, release log. All four documents are ready for | |
| Jordan to review and sign. | |
| * **Audit trail infrastructure**: SQLAlchemy-backed Postgres records every | |
| classification with its triggered criteria, evidence sources, and any | |
| curator overrides with free-text justification. Schema in | |
| `backend/app/models/classification.py`. | |
| * **Export formats**: PDF reports, ClinVar XML submission format, and FHIR | |
| resources are generated by `backend/app/services/exports.py`. | |
| * **Clinical deployment artifacts**: `docker-compose.clinical.yml`, | |
| `backend/Dockerfile.clinical`, `frontend/Dockerfile.clinical`, | |
| `frontend/nginx.conf`, and `scripts/clinical_preflight.py` (generates | |
| JWT secrets, validates env) are checked in. | |
| * **Air-gap path**: `USE_LOCAL_LLM=true` swaps Anthropic for Ollama running | |
| in-process. No patient data leaves the lab. | |
| ### Awaiting institutional action | |
| These items require Jordan or lab administration; the code path is ready. | |
| 1. SOP sign-off (`docs/governance/01_lab_sop_template.md`). | |
| 2. InfoSec / Privacy Office review (`02_privacy_security_review.md`). | |
| 3. REB / IRB submission (`03_irb_brief.md`). | |
| 4. OMIM API key application (`omimadmin@omim.org`, 1-2 week turnaround). | |
| 5. UHN Library Services coordination for publisher TDM API keys | |
| (Elsevier, Wiley, Springer) — 2-4 week turnaround typical. | |
| 6. Lab Director sign-off and `v0.1.0` release tag. | |
| ### Deferred technical work (post v0.1.0) | |
| * Wire Ensembl variant_recoder fallback for variants where the standard | |
| chr-pos-ref-alt resolution fails (currently ~5% of fixture). Estimated lift: | |
| +2 percentage points on overall concordance. | |
| * Implement BS4, BP2, BP3, BP5, PM4, PP2 (the 6 missing ACMG criteria). | |
| None high-yield on typical caseloads; tactical completion target. | |
| * Move backend off Hugging Face Spaces to dedicated cloud (Fly.io / DigitalOcean) | |
| for production-grade SLA — required only if the demo serves real curator workflows. | |
| * GA4GH VRS / VA-Spec interoperability for cross-tool variant representation. | |
| --- | |
| ## Worked example: BRCA1 NM_007294.4:c.5266dupC | |
| Input: a known Ashkenazi-founder pathogenic frameshift. | |
| | Step | Source | Output | | |
| |---|---|---| | |
| | HGVS normalization | Mutalyzer + Ensembl VEP | `chr=17, pos=43057064, frameshift_variant, p.Gln1756ProfsTer74` | | |
| | Population frequency (primary) | gnomAD chr-pos-ref-alt lookup | Skipped — empty alt allele for `dup` notation | | |
| | Population frequency (fallback) | gnomAD `variant_search` by ClinVar variation ID | Resolved to `13-32340300-GT-G`, AF 0.000136, 0 homozygotes | | |
| | ClinVar consensus | NCBI esummary | `VCV000548237` (3★ Pathogenic) | | |
| | In-silico predictors | REVEL / AlphaMissense / SpliceAI | n/a for frameshift | | |
| | autoPVS1 | rule engine | Triggered (very_strong) — frameshift in established LoF gene | | |
| | Bayesian score | combiner | PVS1 (+8) + PP5 (+8) + PM2_supporting (+1) = +17 | | |
| | Final | combiner | **Pathogenic** | | |
| | Audit | Postgres | Every criterion above persisted with its evidence_text, source, and confidence fields | | |
| The classification is reproducible to the byte for any variant in the | |
| validation fixture. Every triggered criterion includes a `source` field | |
| (database accession or PMID), an `evidence_text` field with the literal | |
| quote or score, and a `confidence` rating. | |
| --- | |
| ## Honest limitations | |
| These are surfaced explicitly because they will surface anyway during | |
| review: | |
| * The 94% number is adjacent-tier (P↔LP and B↔LB collapsed). Strict-tier | |
| exact-match concordance is ~75-80%; lower than published but not | |
| unreasonable given that even expert panels disagree on the P/LP boundary. | |
| * The 1000-variant fixture is balanced (200 per tier) and may not reflect | |
| the natural prevalence of a specific lab's case mix. | |
| * Population frequency lookups via the `dup`/complex-indel fallback path | |
| add ~2-5 seconds per variant for cases where the primary lookup misses. | |
| Affects roughly 5% of variants in the validation fixture. | |
| * The literature layer is deliberately deployed only behind authentication | |
| in production (cost control); the public demo URL runs deterministic-only. | |
| * Six ACMG criteria are not yet implemented (PM4, PP2, BS4, BP2, BP3, BP5). | |
| None of these meaningfully changes final classifications on more than | |
| ~1-2% of typical caseloads, but full 28/28 coverage is the v0.2 target. | |
| --- | |
| ## How to verify everything in this document | |
| | Claim | Verifiable artifact | | |
| |---|---| | |
| | 94.0% concordance on 1000 variants | `docs/clinical_validation_results_1000.json` | | |
| | 22/28 ACMG criteria implemented | `backend/app/services/acmg/rules.py` + `backend/app/services/llm/prompts.py` | | |
| | Per-gene concordance breakdown | `docs/per_gene_breakdown_1000.json` | | |
| | RAG smoke test result | `docs/smoke_test_50_results.json` | | |
| | Anti-hallucination prompt design | `backend/app/services/llm/prompts.py` | | |
| | 102 / 103 backend tests passing | `pytest backend/tests/` | | |
| | Air-gap deployment artifacts | `docker-compose.clinical.yml` | | |
| | Governance drafts | `docs/governance/` | | |
| --- | |
| ## Single-paragraph positioning statement | |
| > VariantLens is an open-source clinical genomic variant interpretation | |
| > tool combining a calibrated deterministic ACMG/AMP rule engine with a | |
| > structurally hallucination-resistant LLM-driven literature reasoning | |
| > layer. It reaches **94.0% adjacent-tier concordance** on a 1000-variant | |
| > ClinVar fixture spanning 876 genes — exceeding the published numbers | |
| > for InterVar and architecturally distinct from AI CURA, EvAgg, and | |
| > AutoPM3. It is deployable on-premise with no cloud dependency, ships | |
| > with a complete audit trail to source for every triggered criterion, | |
| > and is positioned to support the ACMG/AMP SVC v4.0 transition through | |
| > a versioned rule-engine architecture. | |
| --- | |
| *Contact*: Theo Sevitt · intern, Jordan Lerner-Ellis Lab | |
| *Repository*: https://github.com/tsevitth-png/variantlens | |
| *Live demo*: https://frontend-coral-omega-54.vercel.app | |