--- title: GenoTriage emoji: 🧬 colorFrom: blue colorTo: green sdk: docker pinned: false app_port: 8000 tags: - openenv --- # GenoTriage 🧬 **An OpenEnv environment where AI agents classify real ClinVar SNP variants using ACMG criteria across three clinical difficulty tiers.** [![OpenEnv](https://img.shields.io/badge/OpenEnv-compatible-blue)](https://meta-pytorch.org/OpenEnv/) [![PyPI](https://img.shields.io/pypi/v/openenv-core?color=blue)](https://pypi.org/project/openenv-core/) --- ## Overview Clinical geneticists classify genetic variants daily to determine whether a mutation causes disease. This judgment — Pathogenic, Likely Pathogenic, Uncertain, Likely Benign, or Benign — directly impacts patient care, yet remains time-consuming, expert-dependent, and difficult to scale. **GenoTriage** turns this into a structured RL environment. Agents receive real SNP variants from [ClinVar](https://www.ncbi.nlm.nih.gov/clinvar/) enriched with population frequency data from [gnomAD](https://gnomad.broadinstitute.org/), and must classify them using the standard [ACMG/AMP five-tier system](https://www.acmg.net/). Each episode is single-step — the agent reads the evidence and submits one classification — making it fast, deterministic, and well-suited for both RL training and LLM evaluation. --- ## Environment Description | Property | Value | |---|---| | Variant type | SNPs (single nucleotide polymorphisms) only | | Data source | ClinVar (NCBI) + gnomAD v4 population frequencies | | Genome build | GRCh38 | | Episode structure | Single-step (reset → observe → classify → reward → done) | | Tasks | 3 (easy, medium, hard) | | Variants per task | 8 | | Interface | OpenEnv-compatible (step / reset / state) | --- ## Action Space The agent submits a `VepAction` with three fields: | Field | Type | Description | |---|---|---| | `classification` | `str` (one of 5) | ACMG tier: `Pathogenic`, `Likely_pathogenic`, `Uncertain_significance`, `Likely_benign`, or `Benign` | | `reasoning` | `str` | Explanation citing specific evidence from the observation (min 20 chars encouraged) | | `criteria_used` | `list[str]` | List of specific criteria that drove the decision (e.g. `"high population frequency"`, `"nonsense variant"`) | --- ## Observation Space The agent receives a `VepObservation` with the following fields: | Field | Type | Description | |---|---|---| | `gene` | `str` | Gene symbol (e.g. `BRCA1`, `CFTR`, `MSH2`) | | `chromosome` | `str` | Chromosome (e.g. `17`) | | `position` | `int` | GRCh38 genomic position | | `ref` / `alt` | `str` | Reference and alternate alleles | | `hgvs` | `str` | HGVS genomic notation | | `consequence` | `str \| None` | Molecular consequence (e.g. `missense_variant`, `nonsense`, `synonymous_variant`) | | `disease` | `str` | Primary disease associated with this gene | | `population_frequency` | `float \| None` | gnomAD v4 allele frequency (None if absent from gnomAD) | | `evidence_snippets` | `list[str]` | 3–4 evidence snippets: gene-disease context, consequence interpretation, frequency context, functional evidence | | `task_description` | `str` | Instructions for the agent | | `feedback` | `str` | Grader feedback after step() — empty on reset() | | `done` | `bool` | True after first step | | `reward` | `float` | Reward received (0.0 on reset) | --- ## Tasks ### Task 1 — `easy` (Benign / Likely Benign) Variants with clear benign signals: moderate-to-high population frequency, synonymous or non-coding consequence, and no functional evidence linking the specific variant to disease. Agents should score well by correctly reading population frequency and consequence type. **Expected agent score: 0.75 – 0.95** ### Task 2 — `medium` (Pathogenic / Likely Pathogenic) Variants with clear pathogenic signals: loss-of-function consequences (nonsense, splice-site), absent from gnomAD, and strong gene-disease association with clinical literature support. Agents must distinguish signal from noise and identify loss-of-function as a strong pathogenicity indicator. **Expected agent score: 0.55 – 0.80** ### Task 3 — `hard` (Uncertain Significance) Variants where evidence is genuinely ambiguous: missense or regulatory variants in disease genes with no functional studies, conflicting computational predictions, or intermediate frequency. Agents must recognise when evidence is insufficient rather than defaulting to a confident classification. **Expected agent score: 0.35 – 0.60** --- ## Reward Function Each step returns a reward in `[0.0, 1.0]` composed of three components: | Component | Max | Criteria | |---|---|---| | Classification accuracy | 0.70 | Exact match=0.70, one tier off=0.25, two off=0.05, three+ off=0.00 | | Reasoning quality | 0.20 | Keyword matches in reasoning (+0.12) + length ≥50 chars (+0.08) | | Criteria used | 0.10 | Non-empty list (+0.04) + ≥2 items (+0.06) | > **Important:** Reasoning and criteria bonuses are fully suppressed when the classification is 3+ tiers away from ground truth (e.g. Benign for a Pathogenic variant). Good writing cannot rescue a catastrophically wrong answer. --- ## Setup ### Prerequisites - Python 3.10+ - Docker Desktop or Docker Engine - A Hugging Face API token (free at [huggingface.co](https://huggingface.co)) ### Install ```bash git clone https://huggingface.co/spaces/fierce74/GenoTriage cd GenoTriage pip install openenv-core>=0.2.2 ``` ### Configure environment variables Copy `.env.example` to `.env` and fill in your values: ```bash cp .env.example .env ``` ```env HF_TOKEN=hf_your_token_here API_BASE_URL=https://router.huggingface.co/v1 MODEL_NAME=Qwen/Qwen2.5-72B-Instruct LOCAL_IMAGE_NAME=vep_env_env:latest ``` ### Build the Docker image ```bash docker build -t vep_env_env:latest . ``` ### Run the server locally (without Docker) ```bash pip install -e . uvicorn server.app:app --host 0.0.0.0 --port 8000 ``` --- ## Usage ### Run the baseline inference script ```bash python inference.py ``` This runs all 3 tasks sequentially (easy → medium → hard), printing structured logs: ``` [START] task=easy env=vep_env model=Qwen/Qwen2.5-72B-Instruct [STEP] step=1 action=Benign|CFTR reward=1.00 done=true error=null ... [END] success=true steps=8 score=0.875 rewards=1.00,0.90,... ``` ### Use the client in your own code ```python import asyncio from vep_env import VepAction, VepEnv async def main(): async with VepEnv(base_url="http://localhost:8000") as env: # Reset — receive a variant case result = await env.reset() obs = result.observation print(f"Gene: {obs.gene} | Disease: {obs.disease}") print(f"Consequence: {obs.consequence}") print(f"Population frequency: {obs.population_frequency}") for snippet in obs.evidence_snippets: print(f" - {snippet}") # Submit classification action = VepAction( classification="Pathogenic", reasoning="Nonsense variant in MSH2, absent from gnomAD, causes Lynch syndrome.", criteria_used=["nonsense variant", "absent from gnomAD", "disease gene"], ) result = await env.step(action) print(f"Reward: {result.reward}") print(f"Feedback: {result.observation.feedback}") asyncio.run(main()) ``` ### Control the task tier ```bash VEP_TASK=medium python inference.py # run medium tier only VEP_TASK=hard uvicorn server.app:app # start server in hard mode ``` --- ## Baseline Scores Evaluated using `Qwen/Qwen2.5-72B-Instruct` via Hugging Face Inference Router. | Task | Score | Notes | |---|---|---| | easy | 0.875 | Model correctly identifies benign signals in most cases | | medium | 0.800 | Strong on loss-of-function; occasionally misses subtle pathogenic signals | | hard | 0.738 | Tends toward confident classifications when VUS is correct answer | | **overall** | **0.804** | Average across all 3 tasks | --- ## Project Structure ``` GenoTriage/ ├── __init__.py # Package exports ├── models.py # VepAction, VepObservation (Pydantic) ├── client.py # VepEnv client (WebSocket) ├── inference.py # Baseline inference script ├── variants.json # Curated ClinVar variants (ground truth) ├── openenv.yaml # OpenEnv spec manifest ├── pyproject.toml # Package config ├── Dockerfile # Container definition └── server/ ├── app.py # FastAPI application ├── vep_env_environment.py # Environment logic + grader └── requirements.txt # Server dependencies ``` --- ## Data Variants are sourced from ClinVar (April 2026 release, GRCh38) filtered to: - SNPs only (`CLNVC=single_nucleotide_variant`) - Trusted review status (`criteria_provided` or better) - Named disease association - 8 well-known disease genes: MSH2, MLH1, VHL, CFTR, SCN5A, APC, TSC1, RET Population allele frequencies are from gnomAD v4 (queried at curation time and stored statically — no live API calls at runtime). ---