# VariantLens: Lab-Grade Variant Interpretation Tool
## Full Implementation Plan — Claude Code Build

*Jordan Lerner-Ellis Lab · University of Toronto · April 2026*

---

## 1. Design Philosophy

**Core principle:** Human-in-the-loop augmentation. The tool accelerates evidence gathering, applies ACMG criteria, and uses Claude to synthesize unstructured literature — but a trained curator makes every final classification decision. This matches the design of all three tools from the November 2025 CGLC session (AI CURA, EvAgg, AutoPM3) and is the safest path to clinical adoption.

**Non-negotiables:**
- All patient data stays on-premise (no genomic data sent to cloud APIs without explicit opt-in)
- Full evidence audit trail — every criterion is traceable to a source
- Compatible with ACMG SVC v4.0 when finalized
- Export to ClinVar, PDF, and HL7 FHIR

---

## 2. Selected Tools & Frameworks to Integrate

### 2.1 Existing tools to build ON TOP OF (don't reinvent)

| Tool | Role in VariantLens | Why |
|---|---|---|
| **autoPVS1** | PVS1 criterion automation | Best-in-class null variant assessment; open-source Python; integrates with pyhgvs |
| **InterVar** | ACMG rule engine scaffold | Implements ~18 criteria; open-source; use as base then extend to all 28 |
| **Mutalyzer** | HGVS normalization | Industry standard; Python API available; solves the nomenclature inconsistency problem |
| **PyHGVS** | Secondary normalization | Lightweight Python library; good fallback |
| **SpliceAI** | Splice effect prediction | Pre-scored lookup tables available (avoid running the model per variant) |
| **REVEL** | Missense pathogenicity | Pre-computed for all missense positions in gnomAD; load as SQLite |
| **AlphaMissense** | Missense pathogenicity | 2023 DeepMind model; scores for ~71% of human missense variants; download as flat file |
| **CADD** | Combined annotation | Pre-scored tracks; REST API available |
| **ChromaDB** | Vector store for RAG | Local, embedded, no server needed; Python-native; HIPAA-friendly |
| **sentence-transformers** | Embeddings for RAG | `all-MiniLM-L6-v2` for speed; `BioLinkBERT` for biomedical accuracy |

### 2.2 Data sources to connect

| Source | Data | Access method |
|---|---|---|
| **gnomAD v4.1** | Population allele frequencies | REST API + local SQLite for BA1/BS1/BS2/PM2 |
| **ClinVar** | Existing classifications | Entrez E-utilities + local VCF download (weekly sync) |
| **OMIM** | Gene-disease + inheritance | API (free for academic use) |
| **ClinGen VCEPs** | Expert panel rules | ClinGen Allele Registry API |
| **HGMD (lite)** | Published variants | Public variant lists (full version if lab has license) |
| **PubMed** | Literature | E-utilities for abstract retrieval; full-text via PMC API |
| **UniProt** | Protein domain / functional domains | REST API for PM1 |

### 2.3 What NOT to rebuild

- Do not implement your own in silico predictors — use pre-scored tables
- Do not build your own variant normalizer — Mutalyzer handles this
- Do not build your own vector database — ChromaDB is production-ready

---

## 3. System Architecture

```
┌─────────────────────────────────────────────────┐
│                  FRONTEND (React)                │
│   Variant input · HPO terms · Curator dashboard  │
└────────────────────┬────────────────────────────┘
                     │ REST API
┌────────────────────▼────────────────────────────┐
│               BACKEND (FastAPI / Python)         │
│                                                  │
│  ┌──────────────────────────────────────────┐   │
│  │  1. Normalization Layer                  │   │
│  │     Mutalyzer → canonical HGVS           │   │
│  └─────────────────┬────────────────────────┘   │
│                    │                             │
│  ┌─────────────────▼────────────────────────┐   │
│  │  2. Evidence Gathering (parallel async)  │   │
│  │                                          │   │
│  │  Databases:         RAG Pipeline:        │   │
│  │  • gnomAD           • PubMed fetch       │   │
│  │  • ClinVar          • ChromaDB query     │   │
│  │  • OMIM             • Relevant chunks    │   │
│  │  • REVEL/SpliceAI   • Context assembly   │   │
│  │  • AlphaMissense                         │   │
│  │  • autoPVS1                              │   │
│  └─────────────────┬────────────────────────┘   │
│                    │                             │
│  ┌─────────────────▼────────────────────────┐   │
│  │  3. ACMG Rule Engine                     │   │
│  │     InterVar base + custom extensions    │   │
│  │     28 criteria → weighted scores        │   │
│  └─────────────────┬────────────────────────┘   │
│                    │                             │
│  ┌─────────────────▼────────────────────────┐   │
│  │  4. Claude Reasoning Layer               │   │
│  │     RAG context + ACMG pre-scores        │   │
│  │     → literature evidence synthesis      │   │
│  │     → VUS reasoning + uncertainty flags  │   │
│  └─────────────────┬────────────────────────┘   │
│                    │                             │
│  ┌─────────────────▼────────────────────────┐   │
│  │  5. Classification Combiner              │   │
│  │     Table 5 (Richards 2015) logic        │   │
│  │     → provisional 5-tier + confidence    │   │
│  └─────────────────┬────────────────────────┘   │
│                    │                             │
│  ┌─────────────────▼────────────────────────┐   │
│  │  6. Output: audit trail + report draft   │   │
│  └──────────────────────────────────────────┘   │
│                                                  │
└─────────────────────────────────────────────────┘
                     │
┌────────────────────▼────────────────────────────┐
│              CURATOR REVIEW UI                   │
│   Evidence table · Criterion override · Sign-off │
│   ClinVar export · PDF report · LIMS integration │
└─────────────────────────────────────────────────┘
```

---

## 4. RAG Pipeline Design (Hallucination Reduction)

This is the most critical architectural decision. The RAG system is what separates a reliable clinical tool from a hallucination-prone chatbot.

### 4.1 Why RAG works here

Instead of asking Claude to "recall" information about a variant from training data (which is stale and unverifiable), RAG:
1. Retrieves the actual PubMed abstracts/PMC full-texts relevant to the variant
2. Chunks and embeds them into a vector store
3. At query time, retrieves only the most semantically relevant chunks
4. Passes those chunks as explicit context to Claude
5. Claude reasons ONLY over what's in the context window — it cannot hallucinate what isn't there

### 4.2 Index Construction

```python
# Pseudocode for index build pipeline

# Step 1: Query PubMed for variant + gene
pubmed_results = fetch_pubmed(
    query=f'"{gene_symbol}" AND "{variant_hgvs}" OR "{protein_change}"',
    max_results=200
)

# Step 2: Fetch full text where available (PMC)
papers = [fetch_fulltext(pmid) or fetch_abstract(pmid) 
          for pmid in pubmed_results]

# Step 3: Chunk with overlap (preserve context around variant mentions)
chunks = sliding_window_chunk(
    papers, 
    chunk_size=512,      # tokens
    overlap=128,         # tokens
    anchor_keywords=[variant_hgvs, protein_change, gene_symbol]
)

# Step 4: Embed (BioLinkBERT for biomedical domain accuracy)
embeddings = model.encode(chunks)

# Step 5: Store with metadata
chroma_collection.add(
    documents=chunks,
    embeddings=embeddings,
    metadatas=[{
        "pmid": p.pmid, 
        "year": p.year, 
        "variant": variant_hgvs,
        "gene": gene_symbol,
        "criteria_hint": detect_criteria_signals(chunk)  # PM3, PP1, PS3 etc.
    } for p, chunk in zip(papers, chunks)]
)
```

### 4.3 Retrieval Strategy (Criterion-Aware)

Different ACMG criteria need different retrieval strategies:

| Criterion | Retrieval focus | Query augmentation |
|---|---|---|
| **PM3** | in trans compound het | `"in trans" OR "compound heterozygous" OR "biallelic"` |
| **PP1** | co-segregation | `"segregation" OR "affected family members" OR "co-segregates"` |
| **PS3/BS3** | functional studies | `"functional" OR "in vitro" OR "in vivo" OR "assay"` |
| **PS4** | case-control prevalence | `"cases" OR "prevalence" OR "odds ratio"` |
| **PP4** | phenotype specificity | `"phenotype" OR "clinical features" OR "presentation"` |

### 4.4 Context Assembly for Claude

```python
# The context passed to Claude is structured, not raw text
context = {
    "variant": "NM_000548.5(TSC2):c.4639A>T (p.Lys1547Ter)",
    "gene": "TSC2",
    "disease": "Tuberous sclerosis complex",
    "acmg_preliminary": {
        "PVS1": {"triggered": True, "source": "autoPVS1", "note": "NMD predicted"},
        "PM2": {"triggered": True, "source": "gnomAD v4.1", "af": 0.000002},
        # ... other auto-scored criteria
    },
    "retrieved_literature": [
        {
            "pmid": "12345678",
            "chunk": "...five affected family members carried the p.Lys1547Ter variant...",
            "criteria_relevance": "PP1"
        },
        # top-k chunks
    ]
}
```

### 4.5 Claude Prompt Design (Hallucination-Suppressed)

```python
SYSTEM_PROMPT = """
You are a clinical genetics variant curator assistant. Your role is to 
extract structured evidence from the provided literature context ONLY.

CRITICAL RULES:
1. Do NOT use any knowledge from your training data about this variant
2. Only cite evidence that appears verbatim in the provided context chunks
3. If the context does not contain sufficient evidence for a criterion, say "insufficient evidence in provided literature"
4. For each criterion you assess, cite the specific PMID and quote the relevant sentence
5. Output structured JSON only — no free text
6. Flag any ambiguous phasing, uncertain phenotype matches, or potential ascertainment bias
"""

USER_PROMPT = f"""
Variant: {variant.hgvs}
Gene/Disease: {variant.gene} / {disease}

PRE-SCORED CRITERIA (from databases — do not re-evaluate these):
{json.dumps(acmg_preliminary, indent=2)}

LITERATURE CONTEXT (evaluate PM3, PP1, PS3, PS4, PP4 from these only):
{format_chunks(retrieved_chunks)}

For each literature-dependent criterion, output:
{{
  "criterion": "PM3",
  "triggered": true/false,
  "strength": "supporting/moderate/strong",
  "evidence": "exact quote from context",
  "pmid": "12345678",
  "confidence": "high/medium/low",
  "caveat": "any ascertainment concerns"
}}
"""
```

---

## 5. ACMG Criteria Coverage Map

### Automated (database-driven — no LLM needed)

| Criterion | Automation approach | Tool |
|---|---|---|
| **PVS1** | Loss-of-function prediction + transcript check | autoPVS1 |
| **BA1** | gnomAD AF > 5% | gnomAD API |
| **BS1** | gnomAD AF > expected for disorder | gnomAD + disease incidence table |
| **BS2** | Healthy homozygote/heterozygote in gnomAD | gnomAD |
| **PM2** | Absent from gnomAD / very low AF | gnomAD API |
| **PM4** | In-frame indel length + conservation | Custom rule |
| **PM5** | Same aa position as known pathogenic missense | ClinVar lookup |
| **PS1** | Same aa change as established pathogenic | ClinVar lookup |
| **PP3** / **BP4** | REVEL, SpliceAI, AlphaMissense, CADD | Pre-scored tables |
| **BP1** | Missense in truncation-only gene | ClinGen curated gene list |
| **BP3** | In-frame indel in repeat region | RepeatMasker annotation |
| **BP7** | Synonymous + no splice prediction + non-conserved | SpliceAI + PhyloP |
| **PP2** | Missense in low-benign-missense gene | ClinGen gene-level stats |

### LLM-assisted (RAG + Claude)

| Criterion | Claude task |
|---|---|
| **PM3** | Extract in trans observations from literature (AutoPM3 approach) |
| **PP1** / **BS4** | Count segregating/non-segregating family members |
| **PS3** / **BS3** | Identify and assess functional assay data |
| **PS4** | Extract case counts and odds ratios |
| **PP4** | Assess phenotype specificity match |
| **PS2** / **PM6** | Identify confirmed/assumed de novo reports |
| **PP5** / **BP6** | Check recent authoritative database submissions |

### Requires curator input (cannot automate)

| Criterion | Why manual |
|---|---|
| **PM1** | Requires domain expert judgment about "critical" functional domains |
| **BP5** | Requires knowledge of the specific patient's alternative diagnosis |
| **PM3** (phasing) | Parental testing results needed from clinician |

---

## 6. Tech Stack

```
Backend:    Python 3.12 · FastAPI · SQLAlchemy · Celery (async jobs)
Frontend:   React 18 · TypeScript · Tailwind CSS
Databases:  PostgreSQL (variants, audit trail) · SQLite (REVEL, gnomAD offline)
Vector DB:  ChromaDB (embedded, on-premise)
Embeddings: sentence-transformers (BioLinkBERT or all-MiniLM-L6-v2)
LLM:        Claude API (on-premise option: Ollama + open-source LLM as fallback)
Auth:       OAuth2 / LDAP (for hospital integration)
Containers: Docker + docker-compose (single-command deployment)
Tests:      pytest · hypothesis (property-based testing of ACMG logic)
```

---

## 7. Claude Code Implementation Plan

Use Claude Code (`claude` CLI) to build this in phases. Run from the project root.

### Prerequisites

```bash
# Install Claude Code
npm install -g @anthropic-ai/claude-code

# Verify
claude --version
```

### Phase 0 — Project Setup (Day 1)

```bash
mkdir variantlens && cd variantlens

claude "Create a Python FastAPI project called VariantLens for clinical genomic 
variant interpretation. Set up:
- /backend: FastAPI app with routers for variants, evidence, classification
- /frontend: React + TypeScript + Tailwind project
- /data: SQLite databases for REVEL and gnomAD offline lookups
- docker-compose.yml with services: api, frontend, postgres, chroma
- pyproject.toml with dependencies: fastapi, sqlalchemy, chromadb, 
  sentence-transformers, anthropic, biopython, requests, httpx, celery
- .env.example with ANTHROPIC_API_KEY, NCBI_API_KEY, OMIM_API_KEY
Include a README with setup instructions."
```

### Phase 1 — Variant Normalization (Day 2–3)

```bash
claude "In /backend/app/services/normalization.py, implement a VariantNormalizer 
class that:
1. Accepts variants in HGVS, VCF, or protein notation
2. Uses the Mutalyzer REST API (https://mutalyzer.nl/api/v2/) for normalization
3. Falls back to PyHGVS for offline normalization
4. Returns: canonical HGVS (genomic + coding + protein), transcript, gene symbol
5. Handles batch normalization with rate limiting
6. Includes comprehensive unit tests with 20+ test variants including edge cases 
   (stop-loss, indels, splice variants, mitochondrial)
Use pydantic models for all inputs/outputs."
```

### Phase 2 — Database Integrations (Day 4–7)

```bash
# gnomAD integration
claude "In /backend/app/services/gnomad.py, implement a GnomADClient that:
1. Queries gnomAD v4.1 GraphQL API for variant allele frequencies
2. Returns AF by population (AFR, EUR, ASJ, EAS, SAS, AMR, FIN)
3. Implements local SQLite caching to avoid redundant API calls
4. Computes BA1 (>5% AF), BS1 (>expected), BS2 (healthy homozygotes), PM2 (<0.0001)
5. Handles missing data and low coverage warnings
Include the gnomAD GraphQL query template as a constant."

# ClinVar integration  
claude "In /backend/app/services/clinvar.py, implement a ClinVarClient that:
1. Queries ClinVar via NCBI Entrez E-utilities for a given variant
2. Parses existing classifications and review status (star rating)
3. Extracts PS1 evidence (same aa change, different nucleotide)
4. Extracts PM5 evidence (same position, different pathogenic missense)  
5. Extracts PP5/BP6 evidence (recent reputable submissions)
6. Downloads weekly ClinVar VCF for local lookup (faster batch queries)
Use BioPython's Entrez module."

# In silico predictors
claude "In /backend/app/services/insilico.py, implement InSilicoPredictor that:
1. Loads REVEL scores from a local SQLite database (build script included)
2. Loads AlphaMissense scores from the downloaded TSV (2.5GB flat file)
3. Calls the SpliceAI lookup API (https://spliceailookup-api.broadinstitute.org)
4. Calls the CADD REST API (https://cadd.gs.washington.edu)
5. Aggregates concordant/discordant predictions for PP3/BP4
6. Follows the ACMG rule: concordant predictions = 1 piece of evidence (not additive)
Returns a structured InSilicoResult with per-tool scores and overall PP3/BP4 call."

# autoPVS1 integration
claude "Integrate the autoPVS1 Python package into /backend/app/services/pvs1.py.
Create a PVS1Assessor wrapper that:
1. Takes a normalized HGVS variant
2. Runs autoPVS1 to classify null variant strength (PVS1/PS1-equivalent/PM1-equivalent)
3. Returns structured output with reasoning for the rule applied
4. Handles the 5 caveats from the ACMG guidelines (LOF mechanism, 3' end, splice variants, 
   multiple transcripts, alternatively spliced exons)
Include comprehensive tests for CFTR, MYH7, and BRCA1 known variants."
```

### Phase 3 — RAG Pipeline (Day 8–11)

```bash
claude "Build the RAG literature pipeline in /backend/app/services/rag/:

1. literature_fetcher.py:
   - Query PubMed E-utilities with variant-aware search queries
   - Fetch full text from PMC where available, abstract otherwise
   - Build criterion-specific queries for PM3, PP1, PS3, PS4
   - Cache results to avoid re-fetching the same papers

2. chunker.py:
   - Sliding window chunker (512 tokens, 128 overlap)
   - Anchor chunks near variant mention sentences
   - Detect which ACMG criteria each chunk is relevant to (keyword heuristic)

3. embedder.py:
   - Use sentence-transformers BioLinkBERT for biomedical-domain embeddings
   - Batch embedding with progress tracking
   - Store to ChromaDB with full metadata (pmid, year, variant, gene, criteria_hint)

4. retriever.py:
   - Criterion-aware query construction (different for PM3 vs PP1 vs PS3)
   - Retrieve top-k chunks (k=8 per criterion)
   - Deduplicate across criteria
   - Return structured context for Claude

Each module must have typed interfaces (pydantic) and unit tests."
```

### Phase 4 — ACMG Rule Engine (Day 12–15)

```bash
claude "Build the ACMG rule engine in /backend/app/services/acmg/:

1. criteria.py: Pydantic models for each of the 28 criteria with:
   - triggered (bool)
   - strength (very_strong/strong/moderate/supporting/standalone)
   - source (database name or PMID)
   - evidence_text (quote or numeric value)
   - confidence (high/medium/low)
   - caveat (optional warning text)

2. rules.py: Implement all auto-scorable criteria:
   - PVS1 (from autoPVS1 result)
   - PS1, PM5 (from ClinVar)
   - BA1, BS1, BS2, PM2 (from gnomAD)
   - PP3/BP4 (from InSilico concordant predictions)
   - PM4/BP3 (in-frame indel in repeat region)
   - BP1 (gene-level truncation-only flag from ClinGen)
   - BP7 (synonymous + no splice impact + non-conserved)
   - PP2 (low benign missense gene)

3. combiner.py: Implement Table 5 from Richards 2015 exactly:
   - All combination rules for Pathogenic/Likely Pathogenic/Benign/Likely Benign
   - Returns provisional classification + list of triggered criteria
   - Flags conflicting evidence (pathogenic + benign criteria both present)
   - Exports to structured JSON for audit trail

4. validator.py: Unit tests using 50 known ClinVar variants 
   (10 P, 10 LP, 10 VUS, 10 LB, 10 B) — verify combiner matches ClinVar
   classification at ≥85% concordance."
```

### Phase 5 — Claude Reasoning Layer (Day 16–18)

```bash
claude "Build the Claude reasoning layer in /backend/app/services/llm/:

1. prompts.py: Structured prompt templates for each literature-dependent criterion:
   - PM3 prompt (in trans extraction, inspired by AutoPM3)
   - PP1 prompt (segregation counting with anti-hallucination guards)
   - PS3 prompt (functional assay quality assessment)
   - PS4 prompt (case count and OR extraction)
   - PP4 prompt (phenotype specificity matching)
   
   Each prompt must:
   - Explicitly instruct Claude to only use provided context (no training recall)
   - Request JSON output with evidence quotes + PMIDs
   - Include uncertainty and caveat detection
   - Include examples of what hallucination looks like and how to avoid it

2. reasoner.py: LLM reasoning orchestrator that:
   - Takes pre-scored criteria + RAG context
   - Calls Claude for each literature-dependent criterion
   - Parses and validates JSON responses
   - Falls back gracefully if Claude response is malformed
   - Logs all LLM calls with input/output for audit

3. synthesizer.py: Final synthesis pass that:
   - Merges database-scored + LLM-scored criteria
   - Produces human-readable evidence summary (for curator)
   - Highlights conflicting or ambiguous evidence
   - Generates uncertainty flags for VUS cases

Use the Anthropic Python SDK. Model: claude-sonnet-4-6. max_tokens: 2000."
```

### Phase 6 — Frontend (Day 19–22)

```bash
claude "Build the React frontend in /frontend/src/:

1. VariantInput component:
   - Text field for HGVS entry with live validation against Mutalyzer
   - VCF file upload (single variant or batch)
   - HPO term autocomplete (using HPO API)
   - Gene/disease context selector

2. EvidenceDashboard component (the main curator view):
   - Criteria table showing all 28 criteria with status (triggered/not triggered/pending)
   - Color coding: green (benign criteria), red (pathogenic), gray (not triggered)
   - Each row expandable to show evidence source, quote, and confidence
   - Override button per criterion with required free-text justification
   - Literature panel showing RAG-retrieved papers with relevant quotes highlighted

3. ClassificationPanel component:
   - Shows provisional 5-tier classification
   - Shows confidence and any conflicting evidence flags
   - Curator sign-off button with authentication
   - Classification history / previous submissions

4. ReportGenerator component:
   - Preview of clinical report in standard format
   - Export to PDF, ClinVar submission XML, HL7 FHIR R4
   
Use React Query for API calls, Zustand for state, Tailwind for styling."
```

### Phase 7 — Testing & Validation (Day 23–25)

```bash
claude "Create a comprehensive validation suite in /tests/:

1. test_known_variants.py:
   - Use 100 variants from ClinVar with 4-star expert panel reviews
   - Assert classification concordance ≥ 85%
   - Assert all triggered criteria are traceable to a source
   - Assert no criterion is triggered without evidence

2. test_hallucination_guard.py:
   - Feed the LLM prompts with deliberately wrong literature (controls)
   - Assert Claude does not trigger PM3/PP1 when context contains no relevant evidence
   - Assert Claude cites only PMIDs present in the provided context

3. test_acmg_combiner.py:
   - Property-based tests using hypothesis
   - Test all combination rules from Table 5 of Richards 2015
   - Test edge cases: conflicting evidence, single criterion only

4. performance_benchmark.py:
   - Time per variant (target: < 30 seconds including RAG)
   - Batch throughput (target: 100 variants/hour)
   - Memory usage per worker"
```

---

## 8. Directory Structure

```
variantlens/
├── backend/
│   ├── app/
│   │   ├── api/                  # FastAPI routers
│   │   │   ├── variants.py       # POST /variants/classify
│   │   │   ├── evidence.py       # GET /variants/{id}/evidence
│   │   │   └── reports.py        # GET /variants/{id}/report
│   │   ├── services/
│   │   │   ├── normalization.py  # Mutalyzer wrapper
│   │   │   ├── gnomad.py         # gnomAD client
│   │   │   ├── clinvar.py        # ClinVar client
│   │   │   ├── insilico.py       # REVEL, SpliceAI, CADD, AlphaMissense
│   │   │   ├── pvs1.py           # autoPVS1 wrapper
│   │   │   ├── rag/
│   │   │   │   ├── fetcher.py    # PubMed fetch
│   │   │   │   ├── chunker.py    # Text chunking
│   │   │   │   ├── embedder.py   # sentence-transformers
│   │   │   │   └── retriever.py  # ChromaDB query
│   │   │   ├── acmg/
│   │   │   │   ├── criteria.py   # Pydantic models
│   │   │   │   ├── rules.py      # 28 criteria automation
│   │   │   │   └── combiner.py   # Table 5 logic
│   │   │   └── llm/
│   │   │       ├── prompts.py    # Criterion-specific prompts
│   │   │       ├── reasoner.py   # Claude API calls
│   │   │       └── synthesizer.py
│   │   └── models/               # SQLAlchemy DB models
│   └── tests/
├── frontend/
│   └── src/
│       ├── components/
│       │   ├── VariantInput.tsx
│       │   ├── EvidenceDashboard.tsx
│       │   ├── ClassificationPanel.tsx
│       │   └── ReportGenerator.tsx
│       └── hooks/
├── data/
│   ├── revel_scores.db           # SQLite: pre-scored missense positions
│   ├── alphamissense.tsv.gz      # Downloaded AlphaMissense flat file
│   └── gnomad_cache.db           # Local AF cache
├── docker-compose.yml
├── .env.example
└── README.md
```

---

## 9. Privacy & Deployment

### On-premise deployment (recommended for clinical data)

```yaml
# docker-compose.yml excerpt
services:
  api:
    build: ./backend
    environment:
      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
      - USE_LOCAL_LLM=false  # set true + configure Ollama for air-gap
    volumes:
      - ./data:/app/data     # all patient data stays local
  
  chroma:
    image: chromadb/chroma    # runs locally, no cloud
    volumes:
      - chroma_data:/chroma
```

### Air-gapped option

If patient data cannot touch external APIs even for Claude:
- Replace Claude with a locally-hosted open-source LLM via Ollama
- Recommended model: `mistral-nemo` or `qwen2.5` (strong instruction following)
- Performance will be lower than Claude but maintains privacy
- Toggle via `USE_LOCAL_LLM=true` in `.env`

### Keys you need

| Key | Source | Free? |
|---|---|---|
| `ANTHROPIC_API_KEY` | console.anthropic.com | Pay per token |
| `NCBI_API_KEY` | ncbi.nlm.nih.gov/account | Free |
| `OMIM_API_KEY` | omim.org/api | Free for academic |
| `GNOMAD` | No key needed (REST API) | Free |

---

## 10. Development Timeline

| Phase | Duration | Milestone |
|---|---|---|
| 0 — Setup | Day 1 | Project scaffolded, Docker running |
| 1 — Normalization | Day 2–3 | Mutalyzer integration + tests passing |
| 2 — Databases | Day 4–7 | gnomAD, ClinVar, REVEL, SpliceAI, autoPVS1 integrated |
| 3 — RAG | Day 8–11 | Literature retrieval + ChromaDB indexing working |
| 4 — ACMG engine | Day 12–15 | All auto-scorable criteria + combiner; ≥85% concordance |
| 5 — LLM layer | Day 16–18 | Claude synthesizing PM3/PP1/PS3 from RAG context |
| 6 — Frontend | Day 19–22 | Full curator dashboard; report export |
| 7 — Validation | Day 23–25 | 100-variant benchmark suite passing |

**Total: ~5 weeks of focused development using Claude Code throughout**

---

## 11. Key Design Decisions Summary

| Decision | Choice | Rationale |
|---|---|---|
| LLM for literature only | Claude handles PM3, PP1, PS3, PS4, PP4 — not DB criteria | Reduces hallucination surface area; DB facts never go through LLM |
| RAG over in-context recall | ChromaDB + BioLinkBERT embeddings | Grounds Claude in actual retrieved text; eliminates training-data staleness |
| Prompt includes only context | System prompt explicitly forbids using training recall | Mirrors AI CURA's anti-hallucination strategy that achieved 96% concordance |
| autoPVS1 for PVS1 | Don't reinvent PVS1 logic | autoPVS1 has been validated extensively; reuse it |
| InterVar as ACMG scaffold | Build on existing 18-criteria implementation | Extend rather than rewrite; saves ~2 weeks |
| Human-in-the-loop always | Curator must review + sign off every classification | Matches ACMG guidance; required for clinical lab accreditation |
| On-premise ChromaDB | No patient data leaves the network | HIPAA/PHIPA compliance |
| JSON-only LLM output | All Claude responses are structured JSON | Enables reliable parsing + audit trail |