---
title: GenoTriage
emoji: 🧬
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
app_port: 8000
tags:
  - openenv
---

# GenoTriage 🧬

**An OpenEnv environment where AI agents classify real ClinVar SNP variants using ACMG criteria across three clinical difficulty tiers.**

[![OpenEnv](https://img.shields.io/badge/OpenEnv-compatible-blue)](https://meta-pytorch.org/OpenEnv/)
[![PyPI](https://img.shields.io/pypi/v/openenv-core?color=blue)](https://pypi.org/project/openenv-core/)

---

## Overview

Clinical geneticists classify genetic variants daily to determine whether a mutation causes disease. This judgment — Pathogenic, Likely Pathogenic, Uncertain, Likely Benign, or Benign — directly impacts patient care, yet remains time-consuming, expert-dependent, and difficult to scale.

**GenoTriage** turns this into a structured RL environment. Agents receive real SNP variants from [ClinVar](https://www.ncbi.nlm.nih.gov/clinvar/) enriched with population frequency data from [gnomAD](https://gnomad.broadinstitute.org/), and must classify them using the standard [ACMG/AMP five-tier system](https://www.acmg.net/). Each episode is single-step — the agent reads the evidence and submits one classification — making it fast, deterministic, and well-suited for both RL training and LLM evaluation.

---

## Environment Description

| Property | Value |
|---|---|
| Variant type | SNPs (single nucleotide polymorphisms) only |
| Data source | ClinVar (NCBI) + gnomAD v4 population frequencies |
| Genome build | GRCh38 |
| Episode structure | Single-step (reset → observe → classify → reward → done) |
| Tasks | 3 (easy, medium, hard) |
| Variants per task | 8 |
| Interface | OpenEnv-compatible (step / reset / state) |

---

## Action Space

The agent submits a `VepAction` with three fields:

| Field | Type | Description |
|---|---|---|
| `classification` | `str` (one of 5) | ACMG tier: `Pathogenic`, `Likely_pathogenic`, `Uncertain_significance`, `Likely_benign`, or `Benign` |
| `reasoning` | `str` | Explanation citing specific evidence from the observation (min 20 chars encouraged) |
| `criteria_used` | `list[str]` | List of specific criteria that drove the decision (e.g. `"high population frequency"`, `"nonsense variant"`) |

---

## Observation Space

The agent receives a `VepObservation` with the following fields:

| Field | Type | Description |
|---|---|---|
| `gene` | `str` | Gene symbol (e.g. `BRCA1`, `CFTR`, `MSH2`) |
| `chromosome` | `str` | Chromosome (e.g. `17`) |
| `position` | `int` | GRCh38 genomic position |
| `ref` / `alt` | `str` | Reference and alternate alleles |
| `hgvs` | `str` | HGVS genomic notation |
| `consequence` | `str \| None` | Molecular consequence (e.g. `missense_variant`, `nonsense`, `synonymous_variant`) |
| `disease` | `str` | Primary disease associated with this gene |
| `population_frequency` | `float \| None` | gnomAD v4 allele frequency (None if absent from gnomAD) |
| `evidence_snippets` | `list[str]` | 3–4 evidence snippets: gene-disease context, consequence interpretation, frequency context, functional evidence |
| `task_description` | `str` | Instructions for the agent |
| `feedback` | `str` | Grader feedback after step() — empty on reset() |
| `done` | `bool` | True after first step |
| `reward` | `float` | Reward received (0.0 on reset) |

---

## Tasks

### Task 1 — `easy` (Benign / Likely Benign)

Variants with clear benign signals: moderate-to-high population frequency, synonymous or non-coding consequence, and no functional evidence linking the specific variant to disease. Agents should score well by correctly reading population frequency and consequence type.

**Expected agent score: 0.75 – 0.95**

### Task 2 — `medium` (Pathogenic / Likely Pathogenic)

Variants with clear pathogenic signals: loss-of-function consequences (nonsense, splice-site), absent from gnomAD, and strong gene-disease association with clinical literature support. Agents must distinguish signal from noise and identify loss-of-function as a strong pathogenicity indicator.

**Expected agent score: 0.55 – 0.80**

### Task 3 — `hard` (Uncertain Significance)

Variants where evidence is genuinely ambiguous: missense or regulatory variants in disease genes with no functional studies, conflicting computational predictions, or intermediate frequency. Agents must recognise when evidence is insufficient rather than defaulting to a confident classification.

**Expected agent score: 0.35 – 0.60**

---

## Reward Function

Each step returns a reward in `[0.0, 1.0]` composed of three components:

| Component | Max | Criteria |
|---|---|---|
| Classification accuracy | 0.70 | Exact match=0.70, one tier off=0.25, two off=0.05, three+ off=0.00 |
| Reasoning quality | 0.20 | Keyword matches in reasoning (+0.12) + length ≥50 chars (+0.08) |
| Criteria used | 0.10 | Non-empty list (+0.04) + ≥2 items (+0.06) |

> **Important:** Reasoning and criteria bonuses are fully suppressed when the classification is 3+ tiers away from ground truth (e.g. Benign for a Pathogenic variant). Good writing cannot rescue a catastrophically wrong answer.

---

## Setup

### Prerequisites

- Python 3.10+
- Docker Desktop or Docker Engine
- A Hugging Face API token (free at [huggingface.co](https://huggingface.co))

### Install

```bash
git clone https://huggingface.co/spaces/fierce74/GenoTriage

cd GenoTriage
pip install openenv-core>=0.2.2
```

### Configure environment variables

Copy `.env.example` to `.env` and fill in your values:

```bash
cp .env.example .env
```

```env
HF_TOKEN=hf_your_token_here
API_BASE_URL=https://router.huggingface.co/v1
MODEL_NAME=Qwen/Qwen2.5-72B-Instruct
LOCAL_IMAGE_NAME=vep_env_env:latest
```

### Build the Docker image

```bash
docker build -t vep_env_env:latest .
```

### Run the server locally (without Docker)

```bash
pip install -e .
uvicorn server.app:app --host 0.0.0.0 --port 8000
```

---

## Usage

### Run the baseline inference script

```bash
python inference.py
```

This runs all 3 tasks sequentially (easy → medium → hard), printing structured logs:

```
[START] task=easy env=vep_env model=Qwen/Qwen2.5-72B-Instruct
[STEP] step=1 action=Benign|CFTR reward=1.00 done=true error=null
...
[END] success=true steps=8 score=0.875 rewards=1.00,0.90,...
```

### Use the client in your own code

```python
import asyncio
from vep_env import VepAction, VepEnv

async def main():
    async with VepEnv(base_url="http://localhost:8000") as env:
        # Reset — receive a variant case
        result = await env.reset()
        obs = result.observation
        print(f"Gene: {obs.gene} | Disease: {obs.disease}")
        print(f"Consequence: {obs.consequence}")
        print(f"Population frequency: {obs.population_frequency}")
        for snippet in obs.evidence_snippets:
            print(f"  - {snippet}")

        # Submit classification
        action = VepAction(
            classification="Pathogenic",
            reasoning="Nonsense variant in MSH2, absent from gnomAD, causes Lynch syndrome.",
            criteria_used=["nonsense variant", "absent from gnomAD", "disease gene"],
        )
        result = await env.step(action)
        print(f"Reward: {result.reward}")
        print(f"Feedback: {result.observation.feedback}")

asyncio.run(main())
```

### Control the task tier

```bash
VEP_TASK=medium python inference.py   # run medium tier only
VEP_TASK=hard uvicorn server.app:app  # start server in hard mode
```

---

## Baseline Scores

Evaluated using `Qwen/Qwen2.5-72B-Instruct` via Hugging Face Inference Router.

| Task | Score | Notes |
|---|---|---|
| easy | 0.875 | Model correctly identifies benign signals in most cases |
| medium | 0.800 | Strong on loss-of-function; occasionally misses subtle pathogenic signals |
| hard | 0.738 | Tends toward confident classifications when VUS is correct answer |
| **overall** | **0.804** | Average across all 3 tasks |

---

## Project Structure

```
GenoTriage/
├── __init__.py              # Package exports
├── models.py                # VepAction, VepObservation (Pydantic)
├── client.py                # VepEnv client (WebSocket)
├── inference.py             # Baseline inference script
├── variants.json            # Curated ClinVar variants (ground truth)
├── openenv.yaml             # OpenEnv spec manifest
├── pyproject.toml           # Package config
├── Dockerfile               # Container definition
└── server/
    ├── app.py               # FastAPI application
    ├── vep_env_environment.py  # Environment logic + grader
    └── requirements.txt     # Server dependencies
```

---

## Data

Variants are sourced from ClinVar (April 2026 release, GRCh38) filtered to:
- SNPs only (`CLNVC=single_nucleotide_variant`)
- Trusted review status (`criteria_provided` or better)
- Named disease association
- 8 well-known disease genes: MSH2, MLH1, VHL, CFTR, SCN5A, APC, TSC1, RET

Population allele frequencies are from gnomAD v4 (queried at curation time and stored statically — no live API calls at runtime).

---