Spaces:
Sleeping
Sleeping
File size: 30,190 Bytes
3e219fa | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 | # VariantLens: Lab-Grade Variant Interpretation Tool
## Full Implementation Plan β Claude Code Build
*Jordan Lerner-Ellis Lab Β· University of Toronto Β· April 2026*
---
## 1. Design Philosophy
**Core principle:** Human-in-the-loop augmentation. The tool accelerates evidence gathering, applies ACMG criteria, and uses Claude to synthesize unstructured literature β but a trained curator makes every final classification decision. This matches the design of all three tools from the November 2025 CGLC session (AI CURA, EvAgg, AutoPM3) and is the safest path to clinical adoption.
**Non-negotiables:**
- All patient data stays on-premise (no genomic data sent to cloud APIs without explicit opt-in)
- Full evidence audit trail β every criterion is traceable to a source
- Compatible with ACMG SVC v4.0 when finalized
- Export to ClinVar, PDF, and HL7 FHIR
---
## 2. Selected Tools & Frameworks to Integrate
### 2.1 Existing tools to build ON TOP OF (don't reinvent)
| Tool | Role in VariantLens | Why |
|---|---|---|
| **autoPVS1** | PVS1 criterion automation | Best-in-class null variant assessment; open-source Python; integrates with pyhgvs |
| **InterVar** | ACMG rule engine scaffold | Implements ~18 criteria; open-source; use as base then extend to all 28 |
| **Mutalyzer** | HGVS normalization | Industry standard; Python API available; solves the nomenclature inconsistency problem |
| **PyHGVS** | Secondary normalization | Lightweight Python library; good fallback |
| **SpliceAI** | Splice effect prediction | Pre-scored lookup tables available (avoid running the model per variant) |
| **REVEL** | Missense pathogenicity | Pre-computed for all missense positions in gnomAD; load as SQLite |
| **AlphaMissense** | Missense pathogenicity | 2023 DeepMind model; scores for ~71% of human missense variants; download as flat file |
| **CADD** | Combined annotation | Pre-scored tracks; REST API available |
| **ChromaDB** | Vector store for RAG | Local, embedded, no server needed; Python-native; HIPAA-friendly |
| **sentence-transformers** | Embeddings for RAG | `all-MiniLM-L6-v2` for speed; `BioLinkBERT` for biomedical accuracy |
### 2.2 Data sources to connect
| Source | Data | Access method |
|---|---|---|
| **gnomAD v4.1** | Population allele frequencies | REST API + local SQLite for BA1/BS1/BS2/PM2 |
| **ClinVar** | Existing classifications | Entrez E-utilities + local VCF download (weekly sync) |
| **OMIM** | Gene-disease + inheritance | API (free for academic use) |
| **ClinGen VCEPs** | Expert panel rules | ClinGen Allele Registry API |
| **HGMD (lite)** | Published variants | Public variant lists (full version if lab has license) |
| **PubMed** | Literature | E-utilities for abstract retrieval; full-text via PMC API |
| **UniProt** | Protein domain / functional domains | REST API for PM1 |
### 2.3 What NOT to rebuild
- Do not implement your own in silico predictors β use pre-scored tables
- Do not build your own variant normalizer β Mutalyzer handles this
- Do not build your own vector database β ChromaDB is production-ready
---
## 3. System Architecture
```
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β FRONTEND (React) β
β Variant input Β· HPO terms Β· Curator dashboard β
ββββββββββββββββββββββ¬βββββββββββββββββββββββββββββ
β REST API
ββββββββββββββββββββββΌβββββββββββββββββββββββββββββ
β BACKEND (FastAPI / Python) β
β β
β ββββββββββββββββββββββββββββββββββββββββββββ β
β β 1. Normalization Layer β β
β β Mutalyzer β canonical HGVS β β
β βββββββββββββββββββ¬βββββββββββββββββββββββββ β
β β β
β βββββββββββββββββββΌβββββββββββββββββββββββββ β
β β 2. Evidence Gathering (parallel async) β β
β β β β
β β Databases: RAG Pipeline: β β
β β β’ gnomAD β’ PubMed fetch β β
β β β’ ClinVar β’ ChromaDB query β β
β β β’ OMIM β’ Relevant chunks β β
β β β’ REVEL/SpliceAI β’ Context assembly β β
β β β’ AlphaMissense β β
β β β’ autoPVS1 β β
β βββββββββββββββββββ¬βββββββββββββββββββββββββ β
β β β
β βββββββββββββββββββΌβββββββββββββββββββββββββ β
β β 3. ACMG Rule Engine β β
β β InterVar base + custom extensions β β
β β 28 criteria β weighted scores β β
β βββββββββββββββββββ¬βββββββββββββββββββββββββ β
β β β
β βββββββββββββββββββΌβββββββββββββββββββββββββ β
β β 4. Claude Reasoning Layer β β
β β RAG context + ACMG pre-scores β β
β β β literature evidence synthesis β β
β β β VUS reasoning + uncertainty flags β β
β βββββββββββββββββββ¬βββββββββββββββββββββββββ β
β β β
β βββββββββββββββββββΌβββββββββββββββββββββββββ β
β β 5. Classification Combiner β β
β β Table 5 (Richards 2015) logic β β
β β β provisional 5-tier + confidence β β
β βββββββββββββββββββ¬βββββββββββββββββββββββββ β
β β β
β βββββββββββββββββββΌβββββββββββββββββββββββββ β
β β 6. Output: audit trail + report draft β β
β ββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββΌβββββββββββββββββββββββββββββ
β CURATOR REVIEW UI β
β Evidence table Β· Criterion override Β· Sign-off β
β ClinVar export Β· PDF report Β· LIMS integration β
βββββββββββββββββββββββββββββββββββββββββββββββββββ
```
---
## 4. RAG Pipeline Design (Hallucination Reduction)
This is the most critical architectural decision. The RAG system is what separates a reliable clinical tool from a hallucination-prone chatbot.
### 4.1 Why RAG works here
Instead of asking Claude to "recall" information about a variant from training data (which is stale and unverifiable), RAG:
1. Retrieves the actual PubMed abstracts/PMC full-texts relevant to the variant
2. Chunks and embeds them into a vector store
3. At query time, retrieves only the most semantically relevant chunks
4. Passes those chunks as explicit context to Claude
5. Claude reasons ONLY over what's in the context window β it cannot hallucinate what isn't there
### 4.2 Index Construction
```python
# Pseudocode for index build pipeline
# Step 1: Query PubMed for variant + gene
pubmed_results = fetch_pubmed(
query=f'"{gene_symbol}" AND "{variant_hgvs}" OR "{protein_change}"',
max_results=200
)
# Step 2: Fetch full text where available (PMC)
papers = [fetch_fulltext(pmid) or fetch_abstract(pmid)
for pmid in pubmed_results]
# Step 3: Chunk with overlap (preserve context around variant mentions)
chunks = sliding_window_chunk(
papers,
chunk_size=512, # tokens
overlap=128, # tokens
anchor_keywords=[variant_hgvs, protein_change, gene_symbol]
)
# Step 4: Embed (BioLinkBERT for biomedical domain accuracy)
embeddings = model.encode(chunks)
# Step 5: Store with metadata
chroma_collection.add(
documents=chunks,
embeddings=embeddings,
metadatas=[{
"pmid": p.pmid,
"year": p.year,
"variant": variant_hgvs,
"gene": gene_symbol,
"criteria_hint": detect_criteria_signals(chunk) # PM3, PP1, PS3 etc.
} for p, chunk in zip(papers, chunks)]
)
```
### 4.3 Retrieval Strategy (Criterion-Aware)
Different ACMG criteria need different retrieval strategies:
| Criterion | Retrieval focus | Query augmentation |
|---|---|---|
| **PM3** | in trans compound het | `"in trans" OR "compound heterozygous" OR "biallelic"` |
| **PP1** | co-segregation | `"segregation" OR "affected family members" OR "co-segregates"` |
| **PS3/BS3** | functional studies | `"functional" OR "in vitro" OR "in vivo" OR "assay"` |
| **PS4** | case-control prevalence | `"cases" OR "prevalence" OR "odds ratio"` |
| **PP4** | phenotype specificity | `"phenotype" OR "clinical features" OR "presentation"` |
### 4.4 Context Assembly for Claude
```python
# The context passed to Claude is structured, not raw text
context = {
"variant": "NM_000548.5(TSC2):c.4639A>T (p.Lys1547Ter)",
"gene": "TSC2",
"disease": "Tuberous sclerosis complex",
"acmg_preliminary": {
"PVS1": {"triggered": True, "source": "autoPVS1", "note": "NMD predicted"},
"PM2": {"triggered": True, "source": "gnomAD v4.1", "af": 0.000002},
# ... other auto-scored criteria
},
"retrieved_literature": [
{
"pmid": "12345678",
"chunk": "...five affected family members carried the p.Lys1547Ter variant...",
"criteria_relevance": "PP1"
},
# top-k chunks
]
}
```
### 4.5 Claude Prompt Design (Hallucination-Suppressed)
```python
SYSTEM_PROMPT = """
You are a clinical genetics variant curator assistant. Your role is to
extract structured evidence from the provided literature context ONLY.
CRITICAL RULES:
1. Do NOT use any knowledge from your training data about this variant
2. Only cite evidence that appears verbatim in the provided context chunks
3. If the context does not contain sufficient evidence for a criterion, say "insufficient evidence in provided literature"
4. For each criterion you assess, cite the specific PMID and quote the relevant sentence
5. Output structured JSON only β no free text
6. Flag any ambiguous phasing, uncertain phenotype matches, or potential ascertainment bias
"""
USER_PROMPT = f"""
Variant: {variant.hgvs}
Gene/Disease: {variant.gene} / {disease}
PRE-SCORED CRITERIA (from databases β do not re-evaluate these):
{json.dumps(acmg_preliminary, indent=2)}
LITERATURE CONTEXT (evaluate PM3, PP1, PS3, PS4, PP4 from these only):
{format_chunks(retrieved_chunks)}
For each literature-dependent criterion, output:
{{
"criterion": "PM3",
"triggered": true/false,
"strength": "supporting/moderate/strong",
"evidence": "exact quote from context",
"pmid": "12345678",
"confidence": "high/medium/low",
"caveat": "any ascertainment concerns"
}}
"""
```
---
## 5. ACMG Criteria Coverage Map
### Automated (database-driven β no LLM needed)
| Criterion | Automation approach | Tool |
|---|---|---|
| **PVS1** | Loss-of-function prediction + transcript check | autoPVS1 |
| **BA1** | gnomAD AF > 5% | gnomAD API |
| **BS1** | gnomAD AF > expected for disorder | gnomAD + disease incidence table |
| **BS2** | Healthy homozygote/heterozygote in gnomAD | gnomAD |
| **PM2** | Absent from gnomAD / very low AF | gnomAD API |
| **PM4** | In-frame indel length + conservation | Custom rule |
| **PM5** | Same aa position as known pathogenic missense | ClinVar lookup |
| **PS1** | Same aa change as established pathogenic | ClinVar lookup |
| **PP3** / **BP4** | REVEL, SpliceAI, AlphaMissense, CADD | Pre-scored tables |
| **BP1** | Missense in truncation-only gene | ClinGen curated gene list |
| **BP3** | In-frame indel in repeat region | RepeatMasker annotation |
| **BP7** | Synonymous + no splice prediction + non-conserved | SpliceAI + PhyloP |
| **PP2** | Missense in low-benign-missense gene | ClinGen gene-level stats |
### LLM-assisted (RAG + Claude)
| Criterion | Claude task |
|---|---|
| **PM3** | Extract in trans observations from literature (AutoPM3 approach) |
| **PP1** / **BS4** | Count segregating/non-segregating family members |
| **PS3** / **BS3** | Identify and assess functional assay data |
| **PS4** | Extract case counts and odds ratios |
| **PP4** | Assess phenotype specificity match |
| **PS2** / **PM6** | Identify confirmed/assumed de novo reports |
| **PP5** / **BP6** | Check recent authoritative database submissions |
### Requires curator input (cannot automate)
| Criterion | Why manual |
|---|---|
| **PM1** | Requires domain expert judgment about "critical" functional domains |
| **BP5** | Requires knowledge of the specific patient's alternative diagnosis |
| **PM3** (phasing) | Parental testing results needed from clinician |
---
## 6. Tech Stack
```
Backend: Python 3.12 Β· FastAPI Β· SQLAlchemy Β· Celery (async jobs)
Frontend: React 18 Β· TypeScript Β· Tailwind CSS
Databases: PostgreSQL (variants, audit trail) Β· SQLite (REVEL, gnomAD offline)
Vector DB: ChromaDB (embedded, on-premise)
Embeddings: sentence-transformers (BioLinkBERT or all-MiniLM-L6-v2)
LLM: Claude API (on-premise option: Ollama + open-source LLM as fallback)
Auth: OAuth2 / LDAP (for hospital integration)
Containers: Docker + docker-compose (single-command deployment)
Tests: pytest Β· hypothesis (property-based testing of ACMG logic)
```
---
## 7. Claude Code Implementation Plan
Use Claude Code (`claude` CLI) to build this in phases. Run from the project root.
### Prerequisites
```bash
# Install Claude Code
npm install -g @anthropic-ai/claude-code
# Verify
claude --version
```
### Phase 0 β Project Setup (Day 1)
```bash
mkdir variantlens && cd variantlens
claude "Create a Python FastAPI project called VariantLens for clinical genomic
variant interpretation. Set up:
- /backend: FastAPI app with routers for variants, evidence, classification
- /frontend: React + TypeScript + Tailwind project
- /data: SQLite databases for REVEL and gnomAD offline lookups
- docker-compose.yml with services: api, frontend, postgres, chroma
- pyproject.toml with dependencies: fastapi, sqlalchemy, chromadb,
sentence-transformers, anthropic, biopython, requests, httpx, celery
- .env.example with ANTHROPIC_API_KEY, NCBI_API_KEY, OMIM_API_KEY
Include a README with setup instructions."
```
### Phase 1 β Variant Normalization (Day 2β3)
```bash
claude "In /backend/app/services/normalization.py, implement a VariantNormalizer
class that:
1. Accepts variants in HGVS, VCF, or protein notation
2. Uses the Mutalyzer REST API (https://mutalyzer.nl/api/v2/) for normalization
3. Falls back to PyHGVS for offline normalization
4. Returns: canonical HGVS (genomic + coding + protein), transcript, gene symbol
5. Handles batch normalization with rate limiting
6. Includes comprehensive unit tests with 20+ test variants including edge cases
(stop-loss, indels, splice variants, mitochondrial)
Use pydantic models for all inputs/outputs."
```
### Phase 2 β Database Integrations (Day 4β7)
```bash
# gnomAD integration
claude "In /backend/app/services/gnomad.py, implement a GnomADClient that:
1. Queries gnomAD v4.1 GraphQL API for variant allele frequencies
2. Returns AF by population (AFR, EUR, ASJ, EAS, SAS, AMR, FIN)
3. Implements local SQLite caching to avoid redundant API calls
4. Computes BA1 (>5% AF), BS1 (>expected), BS2 (healthy homozygotes), PM2 (<0.0001)
5. Handles missing data and low coverage warnings
Include the gnomAD GraphQL query template as a constant."
# ClinVar integration
claude "In /backend/app/services/clinvar.py, implement a ClinVarClient that:
1. Queries ClinVar via NCBI Entrez E-utilities for a given variant
2. Parses existing classifications and review status (star rating)
3. Extracts PS1 evidence (same aa change, different nucleotide)
4. Extracts PM5 evidence (same position, different pathogenic missense)
5. Extracts PP5/BP6 evidence (recent reputable submissions)
6. Downloads weekly ClinVar VCF for local lookup (faster batch queries)
Use BioPython's Entrez module."
# In silico predictors
claude "In /backend/app/services/insilico.py, implement InSilicoPredictor that:
1. Loads REVEL scores from a local SQLite database (build script included)
2. Loads AlphaMissense scores from the downloaded TSV (2.5GB flat file)
3. Calls the SpliceAI lookup API (https://spliceailookup-api.broadinstitute.org)
4. Calls the CADD REST API (https://cadd.gs.washington.edu)
5. Aggregates concordant/discordant predictions for PP3/BP4
6. Follows the ACMG rule: concordant predictions = 1 piece of evidence (not additive)
Returns a structured InSilicoResult with per-tool scores and overall PP3/BP4 call."
# autoPVS1 integration
claude "Integrate the autoPVS1 Python package into /backend/app/services/pvs1.py.
Create a PVS1Assessor wrapper that:
1. Takes a normalized HGVS variant
2. Runs autoPVS1 to classify null variant strength (PVS1/PS1-equivalent/PM1-equivalent)
3. Returns structured output with reasoning for the rule applied
4. Handles the 5 caveats from the ACMG guidelines (LOF mechanism, 3' end, splice variants,
multiple transcripts, alternatively spliced exons)
Include comprehensive tests for CFTR, MYH7, and BRCA1 known variants."
```
### Phase 3 β RAG Pipeline (Day 8β11)
```bash
claude "Build the RAG literature pipeline in /backend/app/services/rag/:
1. literature_fetcher.py:
- Query PubMed E-utilities with variant-aware search queries
- Fetch full text from PMC where available, abstract otherwise
- Build criterion-specific queries for PM3, PP1, PS3, PS4
- Cache results to avoid re-fetching the same papers
2. chunker.py:
- Sliding window chunker (512 tokens, 128 overlap)
- Anchor chunks near variant mention sentences
- Detect which ACMG criteria each chunk is relevant to (keyword heuristic)
3. embedder.py:
- Use sentence-transformers BioLinkBERT for biomedical-domain embeddings
- Batch embedding with progress tracking
- Store to ChromaDB with full metadata (pmid, year, variant, gene, criteria_hint)
4. retriever.py:
- Criterion-aware query construction (different for PM3 vs PP1 vs PS3)
- Retrieve top-k chunks (k=8 per criterion)
- Deduplicate across criteria
- Return structured context for Claude
Each module must have typed interfaces (pydantic) and unit tests."
```
### Phase 4 β ACMG Rule Engine (Day 12β15)
```bash
claude "Build the ACMG rule engine in /backend/app/services/acmg/:
1. criteria.py: Pydantic models for each of the 28 criteria with:
- triggered (bool)
- strength (very_strong/strong/moderate/supporting/standalone)
- source (database name or PMID)
- evidence_text (quote or numeric value)
- confidence (high/medium/low)
- caveat (optional warning text)
2. rules.py: Implement all auto-scorable criteria:
- PVS1 (from autoPVS1 result)
- PS1, PM5 (from ClinVar)
- BA1, BS1, BS2, PM2 (from gnomAD)
- PP3/BP4 (from InSilico concordant predictions)
- PM4/BP3 (in-frame indel in repeat region)
- BP1 (gene-level truncation-only flag from ClinGen)
- BP7 (synonymous + no splice impact + non-conserved)
- PP2 (low benign missense gene)
3. combiner.py: Implement Table 5 from Richards 2015 exactly:
- All combination rules for Pathogenic/Likely Pathogenic/Benign/Likely Benign
- Returns provisional classification + list of triggered criteria
- Flags conflicting evidence (pathogenic + benign criteria both present)
- Exports to structured JSON for audit trail
4. validator.py: Unit tests using 50 known ClinVar variants
(10 P, 10 LP, 10 VUS, 10 LB, 10 B) β verify combiner matches ClinVar
classification at β₯85% concordance."
```
### Phase 5 β Claude Reasoning Layer (Day 16β18)
```bash
claude "Build the Claude reasoning layer in /backend/app/services/llm/:
1. prompts.py: Structured prompt templates for each literature-dependent criterion:
- PM3 prompt (in trans extraction, inspired by AutoPM3)
- PP1 prompt (segregation counting with anti-hallucination guards)
- PS3 prompt (functional assay quality assessment)
- PS4 prompt (case count and OR extraction)
- PP4 prompt (phenotype specificity matching)
Each prompt must:
- Explicitly instruct Claude to only use provided context (no training recall)
- Request JSON output with evidence quotes + PMIDs
- Include uncertainty and caveat detection
- Include examples of what hallucination looks like and how to avoid it
2. reasoner.py: LLM reasoning orchestrator that:
- Takes pre-scored criteria + RAG context
- Calls Claude for each literature-dependent criterion
- Parses and validates JSON responses
- Falls back gracefully if Claude response is malformed
- Logs all LLM calls with input/output for audit
3. synthesizer.py: Final synthesis pass that:
- Merges database-scored + LLM-scored criteria
- Produces human-readable evidence summary (for curator)
- Highlights conflicting or ambiguous evidence
- Generates uncertainty flags for VUS cases
Use the Anthropic Python SDK. Model: claude-sonnet-4-6. max_tokens: 2000."
```
### Phase 6 β Frontend (Day 19β22)
```bash
claude "Build the React frontend in /frontend/src/:
1. VariantInput component:
- Text field for HGVS entry with live validation against Mutalyzer
- VCF file upload (single variant or batch)
- HPO term autocomplete (using HPO API)
- Gene/disease context selector
2. EvidenceDashboard component (the main curator view):
- Criteria table showing all 28 criteria with status (triggered/not triggered/pending)
- Color coding: green (benign criteria), red (pathogenic), gray (not triggered)
- Each row expandable to show evidence source, quote, and confidence
- Override button per criterion with required free-text justification
- Literature panel showing RAG-retrieved papers with relevant quotes highlighted
3. ClassificationPanel component:
- Shows provisional 5-tier classification
- Shows confidence and any conflicting evidence flags
- Curator sign-off button with authentication
- Classification history / previous submissions
4. ReportGenerator component:
- Preview of clinical report in standard format
- Export to PDF, ClinVar submission XML, HL7 FHIR R4
Use React Query for API calls, Zustand for state, Tailwind for styling."
```
### Phase 7 β Testing & Validation (Day 23β25)
```bash
claude "Create a comprehensive validation suite in /tests/:
1. test_known_variants.py:
- Use 100 variants from ClinVar with 4-star expert panel reviews
- Assert classification concordance β₯ 85%
- Assert all triggered criteria are traceable to a source
- Assert no criterion is triggered without evidence
2. test_hallucination_guard.py:
- Feed the LLM prompts with deliberately wrong literature (controls)
- Assert Claude does not trigger PM3/PP1 when context contains no relevant evidence
- Assert Claude cites only PMIDs present in the provided context
3. test_acmg_combiner.py:
- Property-based tests using hypothesis
- Test all combination rules from Table 5 of Richards 2015
- Test edge cases: conflicting evidence, single criterion only
4. performance_benchmark.py:
- Time per variant (target: < 30 seconds including RAG)
- Batch throughput (target: 100 variants/hour)
- Memory usage per worker"
```
---
## 8. Directory Structure
```
variantlens/
βββ backend/
β βββ app/
β β βββ api/ # FastAPI routers
β β β βββ variants.py # POST /variants/classify
β β β βββ evidence.py # GET /variants/{id}/evidence
β β β βββ reports.py # GET /variants/{id}/report
β β βββ services/
β β β βββ normalization.py # Mutalyzer wrapper
β β β βββ gnomad.py # gnomAD client
β β β βββ clinvar.py # ClinVar client
β β β βββ insilico.py # REVEL, SpliceAI, CADD, AlphaMissense
β β β βββ pvs1.py # autoPVS1 wrapper
β β β βββ rag/
β β β β βββ fetcher.py # PubMed fetch
β β β β βββ chunker.py # Text chunking
β β β β βββ embedder.py # sentence-transformers
β β β β βββ retriever.py # ChromaDB query
β β β βββ acmg/
β β β β βββ criteria.py # Pydantic models
β β β β βββ rules.py # 28 criteria automation
β β β β βββ combiner.py # Table 5 logic
β β β βββ llm/
β β β βββ prompts.py # Criterion-specific prompts
β β β βββ reasoner.py # Claude API calls
β β β βββ synthesizer.py
β β βββ models/ # SQLAlchemy DB models
β βββ tests/
βββ frontend/
β βββ src/
β βββ components/
β β βββ VariantInput.tsx
β β βββ EvidenceDashboard.tsx
β β βββ ClassificationPanel.tsx
β β βββ ReportGenerator.tsx
β βββ hooks/
βββ data/
β βββ revel_scores.db # SQLite: pre-scored missense positions
β βββ alphamissense.tsv.gz # Downloaded AlphaMissense flat file
β βββ gnomad_cache.db # Local AF cache
βββ docker-compose.yml
βββ .env.example
βββ README.md
```
---
## 9. Privacy & Deployment
### On-premise deployment (recommended for clinical data)
```yaml
# docker-compose.yml excerpt
services:
api:
build: ./backend
environment:
- ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
- USE_LOCAL_LLM=false # set true + configure Ollama for air-gap
volumes:
- ./data:/app/data # all patient data stays local
chroma:
image: chromadb/chroma # runs locally, no cloud
volumes:
- chroma_data:/chroma
```
### Air-gapped option
If patient data cannot touch external APIs even for Claude:
- Replace Claude with a locally-hosted open-source LLM via Ollama
- Recommended model: `mistral-nemo` or `qwen2.5` (strong instruction following)
- Performance will be lower than Claude but maintains privacy
- Toggle via `USE_LOCAL_LLM=true` in `.env`
### Keys you need
| Key | Source | Free? |
|---|---|---|
| `ANTHROPIC_API_KEY` | console.anthropic.com | Pay per token |
| `NCBI_API_KEY` | ncbi.nlm.nih.gov/account | Free |
| `OMIM_API_KEY` | omim.org/api | Free for academic |
| `GNOMAD` | No key needed (REST API) | Free |
---
## 10. Development Timeline
| Phase | Duration | Milestone |
|---|---|---|
| 0 β Setup | Day 1 | Project scaffolded, Docker running |
| 1 β Normalization | Day 2β3 | Mutalyzer integration + tests passing |
| 2 β Databases | Day 4β7 | gnomAD, ClinVar, REVEL, SpliceAI, autoPVS1 integrated |
| 3 β RAG | Day 8β11 | Literature retrieval + ChromaDB indexing working |
| 4 β ACMG engine | Day 12β15 | All auto-scorable criteria + combiner; β₯85% concordance |
| 5 β LLM layer | Day 16β18 | Claude synthesizing PM3/PP1/PS3 from RAG context |
| 6 β Frontend | Day 19β22 | Full curator dashboard; report export |
| 7 β Validation | Day 23β25 | 100-variant benchmark suite passing |
**Total: ~5 weeks of focused development using Claude Code throughout**
---
## 11. Key Design Decisions Summary
| Decision | Choice | Rationale |
|---|---|---|
| LLM for literature only | Claude handles PM3, PP1, PS3, PS4, PP4 β not DB criteria | Reduces hallucination surface area; DB facts never go through LLM |
| RAG over in-context recall | ChromaDB + BioLinkBERT embeddings | Grounds Claude in actual retrieved text; eliminates training-data staleness |
| Prompt includes only context | System prompt explicitly forbids using training recall | Mirrors AI CURA's anti-hallucination strategy that achieved 96% concordance |
| autoPVS1 for PVS1 | Don't reinvent PVS1 logic | autoPVS1 has been validated extensively; reuse it |
| InterVar as ACMG scaffold | Build on existing 18-criteria implementation | Extend rather than rewrite; saves ~2 weeks |
| Human-in-the-loop always | Curator must review + sign off every classification | Matches ACMG guidance; required for clinical lab accreditation |
| On-premise ChromaDB | No patient data leaves the network | HIPAA/PHIPA compliance |
| JSON-only LLM output | All Claude responses are structured JSON | Enables reliable parsing + audit trail |
|