Spaces:

theodabos
/

varientlens

Sleeping

App Files Files Community

varientlens / docs /VariantLens_Build_Plan.md

Codex

Initial VariantLens clinical readiness scaffold

3e219fa 28 days ago

preview code

raw

history blame contribute delete

30.2 kB

VariantLens: Lab-Grade Variant Interpretation Tool

Full Implementation Plan — Claude Code Build

Jordan Lerner-Ellis Lab · University of Toronto · April 2026

1. Design Philosophy

Core principle: Human-in-the-loop augmentation. The tool accelerates evidence gathering, applies ACMG criteria, and uses Claude to synthesize unstructured literature — but a trained curator makes every final classification decision. This matches the design of all three tools from the November 2025 CGLC session (AI CURA, EvAgg, AutoPM3) and is the safest path to clinical adoption.

Non-negotiables:

All patient data stays on-premise (no genomic data sent to cloud APIs without explicit opt-in)
Full evidence audit trail — every criterion is traceable to a source
Compatible with ACMG SVC v4.0 when finalized
Export to ClinVar, PDF, and HL7 FHIR

2. Selected Tools & Frameworks to Integrate

2.1 Existing tools to build ON TOP OF (don't reinvent)

Tool	Role in VariantLens	Why
autoPVS1	PVS1 criterion automation	Best-in-class null variant assessment; open-source Python; integrates with pyhgvs
InterVar	ACMG rule engine scaffold	Implements ~18 criteria; open-source; use as base then extend to all 28
Mutalyzer	HGVS normalization	Industry standard; Python API available; solves the nomenclature inconsistency problem
PyHGVS	Secondary normalization	Lightweight Python library; good fallback
SpliceAI	Splice effect prediction	Pre-scored lookup tables available (avoid running the model per variant)
REVEL	Missense pathogenicity	Pre-computed for all missense positions in gnomAD; load as SQLite
AlphaMissense	Missense pathogenicity	2023 DeepMind model; scores for ~71% of human missense variants; download as flat file
CADD	Combined annotation	Pre-scored tracks; REST API available
ChromaDB	Vector store for RAG	Local, embedded, no server needed; Python-native; HIPAA-friendly
sentence-transformers	Embeddings for RAG	`all-MiniLM-L6-v2` for speed; `BioLinkBERT` for biomedical accuracy

2.2 Data sources to connect

Source	Data	Access method
gnomAD v4.1	Population allele frequencies	REST API + local SQLite for BA1/BS1/BS2/PM2
ClinVar	Existing classifications	Entrez E-utilities + local VCF download (weekly sync)
OMIM	Gene-disease + inheritance	API (free for academic use)
ClinGen VCEPs	Expert panel rules	ClinGen Allele Registry API
HGMD (lite)	Published variants	Public variant lists (full version if lab has license)
PubMed	Literature	E-utilities for abstract retrieval; full-text via PMC API
UniProt	Protein domain / functional domains	REST API for PM1

2.3 What NOT to rebuild

Do not implement your own in silico predictors — use pre-scored tables
Do not build your own variant normalizer — Mutalyzer handles this
Do not build your own vector database — ChromaDB is production-ready

3. System Architecture

┌─────────────────────────────────────────────────┐
│                  FRONTEND (React)                │
│   Variant input · HPO terms · Curator dashboard  │
└────────────────────┬────────────────────────────┘
                     │ REST API
┌────────────────────▼────────────────────────────┐
│               BACKEND (FastAPI / Python)         │
│                                                  │
│  ┌──────────────────────────────────────────┐   │
│  │  1. Normalization Layer                  │   │
│  │     Mutalyzer → canonical HGVS           │   │
│  └─────────────────┬────────────────────────┘   │
│                    │                             │
│  ┌─────────────────▼────────────────────────┐   │
│  │  2. Evidence Gathering (parallel async)  │   │
│  │                                          │   │
│  │  Databases:         RAG Pipeline:        │   │
│  │  • gnomAD           • PubMed fetch       │   │
│  │  • ClinVar          • ChromaDB query     │   │
│  │  • OMIM             • Relevant chunks    │   │
│  │  • REVEL/SpliceAI   • Context assembly   │   │
│  │  • AlphaMissense                         │   │
│  │  • autoPVS1                              │   │
│  └─────────────────┬────────────────────────┘   │
│                    │                             │
│  ┌─────────────────▼────────────────────────┐   │
│  │  3. ACMG Rule Engine                     │   │
│  │     InterVar base + custom extensions    │   │
│  │     28 criteria → weighted scores        │   │
│  └─────────────────┬────────────────────────┘   │
│                    │                             │
│  ┌─────────────────▼────────────────────────┐   │
│  │  4. Claude Reasoning Layer               │   │
│  │     RAG context + ACMG pre-scores        │   │
│  │     → literature evidence synthesis      │   │
│  │     → VUS reasoning + uncertainty flags  │   │
│  └─────────────────┬────────────────────────┘   │
│                    │                             │
│  ┌─────────────────▼────────────────────────┐   │
│  │  5. Classification Combiner              │   │
│  │     Table 5 (Richards 2015) logic        │   │
│  │     → provisional 5-tier + confidence    │   │
│  └─────────────────┬────────────────────────┘   │
│                    │                             │
│  ┌─────────────────▼────────────────────────┐   │
│  │  6. Output: audit trail + report draft   │   │
│  └──────────────────────────────────────────┘   │
│                                                  │
└─────────────────────────────────────────────────┘
                     │
┌────────────────────▼────────────────────────────┐
│              CURATOR REVIEW UI                   │
│   Evidence table · Criterion override · Sign-off │
│   ClinVar export · PDF report · LIMS integration │
└─────────────────────────────────────────────────┘

4. RAG Pipeline Design (Hallucination Reduction)

This is the most critical architectural decision. The RAG system is what separates a reliable clinical tool from a hallucination-prone chatbot.

4.1 Why RAG works here

Instead of asking Claude to "recall" information about a variant from training data (which is stale and unverifiable), RAG:

Retrieves the actual PubMed abstracts/PMC full-texts relevant to the variant
Chunks and embeds them into a vector store
At query time, retrieves only the most semantically relevant chunks
Passes those chunks as explicit context to Claude
Claude reasons ONLY over what's in the context window — it cannot hallucinate what isn't there

4.2 Index Construction

# Pseudocode for index build pipeline

# Step 1: Query PubMed for variant + gene
pubmed_results = fetch_pubmed(
    query=f'"{gene_symbol}" AND "{variant_hgvs}" OR "{protein_change}"',
    max_results=200
)

# Step 2: Fetch full text where available (PMC)
papers = [fetch_fulltext(pmid) or fetch_abstract(pmid) 
          for pmid in pubmed_results]

# Step 3: Chunk with overlap (preserve context around variant mentions)
chunks = sliding_window_chunk(
    papers, 
    chunk_size=512,      # tokens
    overlap=128,         # tokens
    anchor_keywords=[variant_hgvs, protein_change, gene_symbol]
)

# Step 4: Embed (BioLinkBERT for biomedical domain accuracy)
embeddings = model.encode(chunks)

# Step 5: Store with metadata
chroma_collection.add(
    documents=chunks,
    embeddings=embeddings,
    metadatas=[{
        "pmid": p.pmid, 
        "year": p.year, 
        "variant": variant_hgvs,
        "gene": gene_symbol,
        "criteria_hint": detect_criteria_signals(chunk)  # PM3, PP1, PS3 etc.
    } for p, chunk in zip(papers, chunks)]
)

4.3 Retrieval Strategy (Criterion-Aware)

Different ACMG criteria need different retrieval strategies:

Criterion	Retrieval focus	Query augmentation
PM3	in trans compound het	`"in trans" OR "compound heterozygous" OR "biallelic"`
PP1	co-segregation	`"segregation" OR "affected family members" OR "co-segregates"`
PS3/BS3	functional studies	`"functional" OR "in vitro" OR "in vivo" OR "assay"`
PS4	case-control prevalence	`"cases" OR "prevalence" OR "odds ratio"`
PP4	phenotype specificity	`"phenotype" OR "clinical features" OR "presentation"`

4.4 Context Assembly for Claude

# The context passed to Claude is structured, not raw text
context = {
    "variant": "NM_000548.5(TSC2):c.4639A>T (p.Lys1547Ter)",
    "gene": "TSC2",
    "disease": "Tuberous sclerosis complex",
    "acmg_preliminary": {
        "PVS1": {"triggered": True, "source": "autoPVS1", "note": "NMD predicted"},
        "PM2": {"triggered": True, "source": "gnomAD v4.1", "af": 0.000002},
        # ... other auto-scored criteria
    },
    "retrieved_literature": [
        {
            "pmid": "12345678",
            "chunk": "...five affected family members carried the p.Lys1547Ter variant...",
            "criteria_relevance": "PP1"
        },
        # top-k chunks
    ]
}

4.5 Claude Prompt Design (Hallucination-Suppressed)

SYSTEM_PROMPT = """
You are a clinical genetics variant curator assistant. Your role is to 
extract structured evidence from the provided literature context ONLY.

CRITICAL RULES:
1. Do NOT use any knowledge from your training data about this variant
2. Only cite evidence that appears verbatim in the provided context chunks
3. If the context does not contain sufficient evidence for a criterion, say "insufficient evidence in provided literature"
4. For each criterion you assess, cite the specific PMID and quote the relevant sentence
5. Output structured JSON only — no free text
6. Flag any ambiguous phasing, uncertain phenotype matches, or potential ascertainment bias
"""

USER_PROMPT = f"""
Variant: {variant.hgvs}
Gene/Disease: {variant.gene} / {disease}

PRE-SCORED CRITERIA (from databases — do not re-evaluate these):
{json.dumps(acmg_preliminary, indent=2)}

LITERATURE CONTEXT (evaluate PM3, PP1, PS3, PS4, PP4 from these only):
{format_chunks(retrieved_chunks)}

For each literature-dependent criterion, output:
{{
  "criterion": "PM3",
  "triggered": true/false,
  "strength": "supporting/moderate/strong",
  "evidence": "exact quote from context",
  "pmid": "12345678",
  "confidence": "high/medium/low",
  "caveat": "any ascertainment concerns"
}}
"""

5. ACMG Criteria Coverage Map

Automated (database-driven — no LLM needed)

Criterion	Automation approach	Tool
PVS1	Loss-of-function prediction + transcript check	autoPVS1
BA1	gnomAD AF > 5%	gnomAD API
BS1	gnomAD AF > expected for disorder	gnomAD + disease incidence table
BS2	Healthy homozygote/heterozygote in gnomAD	gnomAD
PM2	Absent from gnomAD / very low AF	gnomAD API
PM4	In-frame indel length + conservation	Custom rule
PM5	Same aa position as known pathogenic missense	ClinVar lookup
PS1	Same aa change as established pathogenic	ClinVar lookup
PP3 / BP4	REVEL, SpliceAI, AlphaMissense, CADD	Pre-scored tables
BP1	Missense in truncation-only gene	ClinGen curated gene list
BP3	In-frame indel in repeat region	RepeatMasker annotation
BP7	Synonymous + no splice prediction + non-conserved	SpliceAI + PhyloP
PP2	Missense in low-benign-missense gene	ClinGen gene-level stats

LLM-assisted (RAG + Claude)

Criterion	Claude task
PM3	Extract in trans observations from literature (AutoPM3 approach)
PP1 / BS4	Count segregating/non-segregating family members
PS3 / BS3	Identify and assess functional assay data
PS4	Extract case counts and odds ratios
PP4	Assess phenotype specificity match
PS2 / PM6	Identify confirmed/assumed de novo reports
PP5 / BP6	Check recent authoritative database submissions

Requires curator input (cannot automate)

Criterion	Why manual
PM1	Requires domain expert judgment about "critical" functional domains
BP5	Requires knowledge of the specific patient's alternative diagnosis
PM3 (phasing)	Parental testing results needed from clinician

6. Tech Stack

Backend:    Python 3.12 · FastAPI · SQLAlchemy · Celery (async jobs)
Frontend:   React 18 · TypeScript · Tailwind CSS
Databases:  PostgreSQL (variants, audit trail) · SQLite (REVEL, gnomAD offline)
Vector DB:  ChromaDB (embedded, on-premise)
Embeddings: sentence-transformers (BioLinkBERT or all-MiniLM-L6-v2)
LLM:        Claude API (on-premise option: Ollama + open-source LLM as fallback)
Auth:       OAuth2 / LDAP (for hospital integration)
Containers: Docker + docker-compose (single-command deployment)
Tests:      pytest · hypothesis (property-based testing of ACMG logic)

7. Claude Code Implementation Plan

Use Claude Code (claude CLI) to build this in phases. Run from the project root.

Prerequisites

# Install Claude Code
npm install -g @anthropic-ai/claude-code

# Verify
claude --version

Phase 0 — Project Setup (Day 1)

mkdir variantlens && cd variantlens

claude "Create a Python FastAPI project called VariantLens for clinical genomic 
variant interpretation. Set up:
- /backend: FastAPI app with routers for variants, evidence, classification
- /frontend: React + TypeScript + Tailwind project
- /data: SQLite databases for REVEL and gnomAD offline lookups
- docker-compose.yml with services: api, frontend, postgres, chroma
- pyproject.toml with dependencies: fastapi, sqlalchemy, chromadb, 
  sentence-transformers, anthropic, biopython, requests, httpx, celery
- .env.example with ANTHROPIC_API_KEY, NCBI_API_KEY, OMIM_API_KEY
Include a README with setup instructions."

Phase 1 — Variant Normalization (Day 2–3)

claude "In /backend/app/services/normalization.py, implement a VariantNormalizer 
class that:
1. Accepts variants in HGVS, VCF, or protein notation
2. Uses the Mutalyzer REST API (https://mutalyzer.nl/api/v2/) for normalization
3. Falls back to PyHGVS for offline normalization
4. Returns: canonical HGVS (genomic + coding + protein), transcript, gene symbol
5. Handles batch normalization with rate limiting
6. Includes comprehensive unit tests with 20+ test variants including edge cases 
   (stop-loss, indels, splice variants, mitochondrial)
Use pydantic models for all inputs/outputs."

Phase 2 — Database Integrations (Day 4–7)

# gnomAD integration
claude "In /backend/app/services/gnomad.py, implement a GnomADClient that:
1. Queries gnomAD v4.1 GraphQL API for variant allele frequencies
2. Returns AF by population (AFR, EUR, ASJ, EAS, SAS, AMR, FIN)
3. Implements local SQLite caching to avoid redundant API calls
4. Computes BA1 (>5% AF), BS1 (>expected), BS2 (healthy homozygotes), PM2 (<0.0001)
5. Handles missing data and low coverage warnings
Include the gnomAD GraphQL query template as a constant."

# ClinVar integration  
claude "In /backend/app/services/clinvar.py, implement a ClinVarClient that:
1. Queries ClinVar via NCBI Entrez E-utilities for a given variant
2. Parses existing classifications and review status (star rating)
3. Extracts PS1 evidence (same aa change, different nucleotide)
4. Extracts PM5 evidence (same position, different pathogenic missense)  
5. Extracts PP5/BP6 evidence (recent reputable submissions)
6. Downloads weekly ClinVar VCF for local lookup (faster batch queries)
Use BioPython's Entrez module."

# In silico predictors
claude "In /backend/app/services/insilico.py, implement InSilicoPredictor that:
1. Loads REVEL scores from a local SQLite database (build script included)
2. Loads AlphaMissense scores from the downloaded TSV (2.5GB flat file)
3. Calls the SpliceAI lookup API (https://spliceailookup-api.broadinstitute.org)
4. Calls the CADD REST API (https://cadd.gs.washington.edu)
5. Aggregates concordant/discordant predictions for PP3/BP4
6. Follows the ACMG rule: concordant predictions = 1 piece of evidence (not additive)
Returns a structured InSilicoResult with per-tool scores and overall PP3/BP4 call."

# autoPVS1 integration
claude "Integrate the autoPVS1 Python package into /backend/app/services/pvs1.py.
Create a PVS1Assessor wrapper that:
1. Takes a normalized HGVS variant
2. Runs autoPVS1 to classify null variant strength (PVS1/PS1-equivalent/PM1-equivalent)
3. Returns structured output with reasoning for the rule applied
4. Handles the 5 caveats from the ACMG guidelines (LOF mechanism, 3' end, splice variants, 
   multiple transcripts, alternatively spliced exons)
Include comprehensive tests for CFTR, MYH7, and BRCA1 known variants."

Phase 3 — RAG Pipeline (Day 8–11)

claude "Build the RAG literature pipeline in /backend/app/services/rag/:

1. literature_fetcher.py:
   - Query PubMed E-utilities with variant-aware search queries
   - Fetch full text from PMC where available, abstract otherwise
   - Build criterion-specific queries for PM3, PP1, PS3, PS4
   - Cache results to avoid re-fetching the same papers

2. chunker.py:
   - Sliding window chunker (512 tokens, 128 overlap)
   - Anchor chunks near variant mention sentences
   - Detect which ACMG criteria each chunk is relevant to (keyword heuristic)

3. embedder.py:
   - Use sentence-transformers BioLinkBERT for biomedical-domain embeddings
   - Batch embedding with progress tracking
   - Store to ChromaDB with full metadata (pmid, year, variant, gene, criteria_hint)

4. retriever.py:
   - Criterion-aware query construction (different for PM3 vs PP1 vs PS3)
   - Retrieve top-k chunks (k=8 per criterion)
   - Deduplicate across criteria
   - Return structured context for Claude

Each module must have typed interfaces (pydantic) and unit tests."

Phase 4 — ACMG Rule Engine (Day 12–15)

claude "Build the ACMG rule engine in /backend/app/services/acmg/:

1. criteria.py: Pydantic models for each of the 28 criteria with:
   - triggered (bool)
   - strength (very_strong/strong/moderate/supporting/standalone)
   - source (database name or PMID)
   - evidence_text (quote or numeric value)
   - confidence (high/medium/low)
   - caveat (optional warning text)

2. rules.py: Implement all auto-scorable criteria:
   - PVS1 (from autoPVS1 result)
   - PS1, PM5 (from ClinVar)
   - BA1, BS1, BS2, PM2 (from gnomAD)
   - PP3/BP4 (from InSilico concordant predictions)
   - PM4/BP3 (in-frame indel in repeat region)
   - BP1 (gene-level truncation-only flag from ClinGen)
   - BP7 (synonymous + no splice impact + non-conserved)
   - PP2 (low benign missense gene)

3. combiner.py: Implement Table 5 from Richards 2015 exactly:
   - All combination rules for Pathogenic/Likely Pathogenic/Benign/Likely Benign
   - Returns provisional classification + list of triggered criteria
   - Flags conflicting evidence (pathogenic + benign criteria both present)
   - Exports to structured JSON for audit trail

4. validator.py: Unit tests using 50 known ClinVar variants 
   (10 P, 10 LP, 10 VUS, 10 LB, 10 B) — verify combiner matches ClinVar
   classification at ≥85% concordance."

Phase 5 — Claude Reasoning Layer (Day 16–18)

claude "Build the Claude reasoning layer in /backend/app/services/llm/:

1. prompts.py: Structured prompt templates for each literature-dependent criterion:
   - PM3 prompt (in trans extraction, inspired by AutoPM3)
   - PP1 prompt (segregation counting with anti-hallucination guards)
   - PS3 prompt (functional assay quality assessment)
   - PS4 prompt (case count and OR extraction)
   - PP4 prompt (phenotype specificity matching)
   
   Each prompt must:
   - Explicitly instruct Claude to only use provided context (no training recall)
   - Request JSON output with evidence quotes + PMIDs
   - Include uncertainty and caveat detection
   - Include examples of what hallucination looks like and how to avoid it

2. reasoner.py: LLM reasoning orchestrator that:
   - Takes pre-scored criteria + RAG context
   - Calls Claude for each literature-dependent criterion
   - Parses and validates JSON responses
   - Falls back gracefully if Claude response is malformed
   - Logs all LLM calls with input/output for audit

3. synthesizer.py: Final synthesis pass that:
   - Merges database-scored + LLM-scored criteria
   - Produces human-readable evidence summary (for curator)
   - Highlights conflicting or ambiguous evidence
   - Generates uncertainty flags for VUS cases

Use the Anthropic Python SDK. Model: claude-sonnet-4-6. max_tokens: 2000."

Phase 6 — Frontend (Day 19–22)

claude "Build the React frontend in /frontend/src/:

1. VariantInput component:
   - Text field for HGVS entry with live validation against Mutalyzer
   - VCF file upload (single variant or batch)
   - HPO term autocomplete (using HPO API)
   - Gene/disease context selector

2. EvidenceDashboard component (the main curator view):
   - Criteria table showing all 28 criteria with status (triggered/not triggered/pending)
   - Color coding: green (benign criteria), red (pathogenic), gray (not triggered)
   - Each row expandable to show evidence source, quote, and confidence
   - Override button per criterion with required free-text justification
   - Literature panel showing RAG-retrieved papers with relevant quotes highlighted

3. ClassificationPanel component:
   - Shows provisional 5-tier classification
   - Shows confidence and any conflicting evidence flags
   - Curator sign-off button with authentication
   - Classification history / previous submissions

4. ReportGenerator component:
   - Preview of clinical report in standard format
   - Export to PDF, ClinVar submission XML, HL7 FHIR R4
   
Use React Query for API calls, Zustand for state, Tailwind for styling."

Phase 7 — Testing & Validation (Day 23–25)

claude "Create a comprehensive validation suite in /tests/:

1. test_known_variants.py:
   - Use 100 variants from ClinVar with 4-star expert panel reviews
   - Assert classification concordance ≥ 85%
   - Assert all triggered criteria are traceable to a source
   - Assert no criterion is triggered without evidence

2. test_hallucination_guard.py:
   - Feed the LLM prompts with deliberately wrong literature (controls)
   - Assert Claude does not trigger PM3/PP1 when context contains no relevant evidence
   - Assert Claude cites only PMIDs present in the provided context

3. test_acmg_combiner.py:
   - Property-based tests using hypothesis
   - Test all combination rules from Table 5 of Richards 2015
   - Test edge cases: conflicting evidence, single criterion only

4. performance_benchmark.py:
   - Time per variant (target: < 30 seconds including RAG)
   - Batch throughput (target: 100 variants/hour)
   - Memory usage per worker"

8. Directory Structure

variantlens/
├── backend/
│   ├── app/
│   │   ├── api/                  # FastAPI routers
│   │   │   ├── variants.py       # POST /variants/classify
│   │   │   ├── evidence.py       # GET /variants/{id}/evidence
│   │   │   └── reports.py        # GET /variants/{id}/report
│   │   ├── services/
│   │   │   ├── normalization.py  # Mutalyzer wrapper
│   │   │   ├── gnomad.py         # gnomAD client
│   │   │   ├── clinvar.py        # ClinVar client
│   │   │   ├── insilico.py       # REVEL, SpliceAI, CADD, AlphaMissense
│   │   │   ├── pvs1.py           # autoPVS1 wrapper
│   │   │   ├── rag/
│   │   │   │   ├── fetcher.py    # PubMed fetch
│   │   │   │   ├── chunker.py    # Text chunking
│   │   │   │   ├── embedder.py   # sentence-transformers
│   │   │   │   └── retriever.py  # ChromaDB query
│   │   │   ├── acmg/
│   │   │   │   ├── criteria.py   # Pydantic models
│   │   │   │   ├── rules.py      # 28 criteria automation
│   │   │   │   └── combiner.py   # Table 5 logic
│   │   │   └── llm/
│   │   │       ├── prompts.py    # Criterion-specific prompts
│   │   │       ├── reasoner.py   # Claude API calls
│   │   │       └── synthesizer.py
│   │   └── models/               # SQLAlchemy DB models
│   └── tests/
├── frontend/
│   └── src/
│       ├── components/
│       │   ├── VariantInput.tsx
│       │   ├── EvidenceDashboard.tsx
│       │   ├── ClassificationPanel.tsx
│       │   └── ReportGenerator.tsx
│       └── hooks/
├── data/
│   ├── revel_scores.db           # SQLite: pre-scored missense positions
│   ├── alphamissense.tsv.gz      # Downloaded AlphaMissense flat file
│   └── gnomad_cache.db           # Local AF cache
├── docker-compose.yml
├── .env.example
└── README.md

9. Privacy & Deployment

On-premise deployment (recommended for clinical data)

# docker-compose.yml excerpt
services:
  api:
    build: ./backend
    environment:
      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
      - USE_LOCAL_LLM=false  # set true + configure Ollama for air-gap
    volumes:
      - ./data:/app/data     # all patient data stays local
  
  chroma:
    image: chromadb/chroma    # runs locally, no cloud
    volumes:
      - chroma_data:/chroma

Air-gapped option

If patient data cannot touch external APIs even for Claude:

Replace Claude with a locally-hosted open-source LLM via Ollama
Recommended model: mistral-nemo or qwen2.5 (strong instruction following)
Performance will be lower than Claude but maintains privacy
Toggle via USE_LOCAL_LLM=true in .env

Keys you need

Key	Source	Free?
`ANTHROPIC_API_KEY`	console.anthropic.com	Pay per token
`NCBI_API_KEY`	ncbi.nlm.nih.gov/account	Free
`OMIM_API_KEY`	omim.org/api	Free for academic
`GNOMAD`	No key needed (REST API)	Free

10. Development Timeline

Phase	Duration	Milestone
0 — Setup	Day 1	Project scaffolded, Docker running
1 — Normalization	Day 2–3	Mutalyzer integration + tests passing
2 — Databases	Day 4–7	gnomAD, ClinVar, REVEL, SpliceAI, autoPVS1 integrated
3 — RAG	Day 8–11	Literature retrieval + ChromaDB indexing working
4 — ACMG engine	Day 12–15	All auto-scorable criteria + combiner; ≥85% concordance
5 — LLM layer	Day 16–18	Claude synthesizing PM3/PP1/PS3 from RAG context
6 — Frontend	Day 19–22	Full curator dashboard; report export
7 — Validation	Day 23–25	100-variant benchmark suite passing

Total: ~5 weeks of focused development using Claude Code throughout

11. Key Design Decisions Summary

Decision	Choice	Rationale
LLM for literature only	Claude handles PM3, PP1, PS3, PS4, PP4 — not DB criteria	Reduces hallucination surface area; DB facts never go through LLM
RAG over in-context recall	ChromaDB + BioLinkBERT embeddings	Grounds Claude in actual retrieved text; eliminates training-data staleness
Prompt includes only context	System prompt explicitly forbids using training recall	Mirrors AI CURA's anti-hallucination strategy that achieved 96% concordance
autoPVS1 for PVS1	Don't reinvent PVS1 logic	autoPVS1 has been validated extensively; reuse it
InterVar as ACMG scaffold	Build on existing 18-criteria implementation	Extend rather than rewrite; saves ~2 weeks
Human-in-the-loop always	Curator must review + sign off every classification	Matches ACMG guidance; required for clinical lab accreditation
On-premise ChromaDB	No patient data leaves the network	HIPAA/PHIPA compliance
JSON-only LLM output	All Claude responses are structured JSON	Enables reliable parsing + audit trail