Spaces:

fierce74
/

GenoTriage

Sleeping

App Files Files Community

GenoTriage / README.md

fierce74

Update README.md

6442015 verified about 2 months ago

preview code

raw

history blame contribute delete

9.13 kB

	---
	title: GenoTriage
	emoji: 🧬
	colorFrom: blue
	colorTo: green
	sdk: docker
	pinned: false
	app_port: 8000
	tags:
	- openenv
	---

	# GenoTriage 🧬

	An OpenEnv environment where AI agents classify real ClinVar SNP variants using ACMG criteria across three clinical difficulty tiers.

	[![OpenEnv](https://img.shields.io/badge/OpenEnv-compatible-blue)](https://meta-pytorch.org/OpenEnv/)
	[![PyPI](https://img.shields.io/pypi/v/openenv-core?color=blue)](https://pypi.org/project/openenv-core/)

	---

	## Overview

	Clinical geneticists classify genetic variants daily to determine whether a mutation causes disease. This judgment — Pathogenic, Likely Pathogenic, Uncertain, Likely Benign, or Benign — directly impacts patient care, yet remains time-consuming, expert-dependent, and difficult to scale.

	GenoTriage turns this into a structured RL environment. Agents receive real SNP variants from [ClinVar](https://www.ncbi.nlm.nih.gov/clinvar/) enriched with population frequency data from [gnomAD](https://gnomad.broadinstitute.org/), and must classify them using the standard [ACMG/AMP five-tier system](https://www.acmg.net/). Each episode is single-step — the agent reads the evidence and submits one classification — making it fast, deterministic, and well-suited for both RL training and LLM evaluation.

	---

	## Environment Description

	\| Property \| Value \|
	\|---\|---\|
	\| Variant type \| SNPs (single nucleotide polymorphisms) only \|
	\| Data source \| ClinVar (NCBI) + gnomAD v4 population frequencies \|
	\| Genome build \| GRCh38 \|
	\| Episode structure \| Single-step (reset → observe → classify → reward → done) \|
	\| Tasks \| 3 (easy, medium, hard) \|
	\| Variants per task \| 8 \|
	\| Interface \| OpenEnv-compatible (step / reset / state) \|

	---

	## Action Space

	The agent submits a `VepAction` with three fields:

	\| Field \| Type \| Description \|
	\|---\|---\|---\|
	\| `classification` \| `str` (one of 5) \| ACMG tier: `Pathogenic`, `Likely_pathogenic`, `Uncertain_significance`, `Likely_benign`, or `Benign` \|
	\| `reasoning` \| `str` \| Explanation citing specific evidence from the observation (min 20 chars encouraged) \|
	\| `criteria_used` \| `list[str]` \| List of specific criteria that drove the decision (e.g. `"high population frequency"`, `"nonsense variant"`) \|

	---

	## Observation Space

	The agent receives a `VepObservation` with the following fields:

	\| Field \| Type \| Description \|
	\|---\|---\|---\|
	\| `gene` \| `str` \| Gene symbol (e.g. `BRCA1`, `CFTR`, `MSH2`) \|
	\| `chromosome` \| `str` \| Chromosome (e.g. `17`) \|
	\| `position` \| `int` \| GRCh38 genomic position \|
	\| `ref` / `alt` \| `str` \| Reference and alternate alleles \|
	\| `hgvs` \| `str` \| HGVS genomic notation \|
	\| `consequence` \| `str \\| None` \| Molecular consequence (e.g. `missense_variant`, `nonsense`, `synonymous_variant`) \|
	\| `disease` \| `str` \| Primary disease associated with this gene \|
	\| `population_frequency` \| `float \\| None` \| gnomAD v4 allele frequency (None if absent from gnomAD) \|
	\| `evidence_snippets` \| `list[str]` \| 3–4 evidence snippets: gene-disease context, consequence interpretation, frequency context, functional evidence \|
	\| `task_description` \| `str` \| Instructions for the agent \|
	\| `feedback` \| `str` \| Grader feedback after step() — empty on reset() \|
	\| `done` \| `bool` \| True after first step \|
	\| `reward` \| `float` \| Reward received (0.0 on reset) \|

	---

	## Tasks

	### Task 1 — `easy` (Benign / Likely Benign)

	Variants with clear benign signals: moderate-to-high population frequency, synonymous or non-coding consequence, and no functional evidence linking the specific variant to disease. Agents should score well by correctly reading population frequency and consequence type.

	Expected agent score: 0.75 – 0.95

	### Task 2 — `medium` (Pathogenic / Likely Pathogenic)

	Variants with clear pathogenic signals: loss-of-function consequences (nonsense, splice-site), absent from gnomAD, and strong gene-disease association with clinical literature support. Agents must distinguish signal from noise and identify loss-of-function as a strong pathogenicity indicator.

	Expected agent score: 0.55 – 0.80

	### Task 3 — `hard` (Uncertain Significance)

	Variants where evidence is genuinely ambiguous: missense or regulatory variants in disease genes with no functional studies, conflicting computational predictions, or intermediate frequency. Agents must recognise when evidence is insufficient rather than defaulting to a confident classification.

	Expected agent score: 0.35 – 0.60

	---

	## Reward Function

	Each step returns a reward in `[0.0, 1.0]` composed of three components:

	\| Component \| Max \| Criteria \|
	\|---\|---\|---\|
	\| Classification accuracy \| 0.70 \| Exact match=0.70, one tier off=0.25, two off=0.05, three+ off=0.00 \|
	\| Reasoning quality \| 0.20 \| Keyword matches in reasoning (+0.12) + length ≥50 chars (+0.08) \|
	\| Criteria used \| 0.10 \| Non-empty list (+0.04) + ≥2 items (+0.06) \|

	> Important: Reasoning and criteria bonuses are fully suppressed when the classification is 3+ tiers away from ground truth (e.g. Benign for a Pathogenic variant). Good writing cannot rescue a catastrophically wrong answer.

	---

	## Setup

	### Prerequisites

	- Python 3.10+
	- Docker Desktop or Docker Engine
	- A Hugging Face API token (free at [huggingface.co](https://huggingface.co))

	### Install

	```bash
	git clone https://huggingface.co/spaces/fierce74/GenoTriage

	cd GenoTriage
	pip install openenv-core>=0.2.2
	```

	### Configure environment variables

	Copy `.env.example` to `.env` and fill in your values:

	```bash
	cp .env.example .env
	```

	```env
	HF_TOKEN=hf_your_token_here
	API_BASE_URL=https://router.huggingface.co/v1
	MODEL_NAME=Qwen/Qwen2.5-72B-Instruct
	LOCAL_IMAGE_NAME=vep_env_env:latest
	```

	### Build the Docker image

	```bash
	docker build -t vep_env_env:latest .
	```

	### Run the server locally (without Docker)

	```bash
	pip install -e .
	uvicorn server.app:app --host 0.0.0.0 --port 8000
	```

	---

	## Usage

	### Run the baseline inference script

	```bash
	python inference.py
	```

	This runs all 3 tasks sequentially (easy → medium → hard), printing structured logs:

	```
	[START] task=easy env=vep_env model=Qwen/Qwen2.5-72B-Instruct
	[STEP] step=1 action=Benign\|CFTR reward=1.00 done=true error=null
	...
	[END] success=true steps=8 score=0.875 rewards=1.00,0.90,...
	```

	### Use the client in your own code

	```python
	import asyncio
	from vep_env import VepAction, VepEnv

	async def main():
	async with VepEnv(base_url="http://localhost:8000") as env:
	# Reset — receive a variant case
	result = await env.reset()
	obs = result.observation
	print(f"Gene: {obs.gene} \| Disease: {obs.disease}")
	print(f"Consequence: {obs.consequence}")
	print(f"Population frequency: {obs.population_frequency}")
	for snippet in obs.evidence_snippets:
	print(f" - {snippet}")

	# Submit classification
	action = VepAction(
	classification="Pathogenic",
	reasoning="Nonsense variant in MSH2, absent from gnomAD, causes Lynch syndrome.",
	criteria_used=["nonsense variant", "absent from gnomAD", "disease gene"],
	)
	result = await env.step(action)
	print(f"Reward: {result.reward}")
	print(f"Feedback: {result.observation.feedback}")

	asyncio.run(main())
	```

	### Control the task tier

	```bash
	VEP_TASK=medium python inference.py # run medium tier only
	VEP_TASK=hard uvicorn server.app:app # start server in hard mode
	```

	---

	## Baseline Scores

	Evaluated using `Qwen/Qwen2.5-72B-Instruct` via Hugging Face Inference Router.

	\| Task \| Score \| Notes \|
	\|---\|---\|---\|
	\| easy \| 0.875 \| Model correctly identifies benign signals in most cases \|
	\| medium \| 0.800 \| Strong on loss-of-function; occasionally misses subtle pathogenic signals \|
	\| hard \| 0.738 \| Tends toward confident classifications when VUS is correct answer \|
	\| overall \| 0.804 \| Average across all 3 tasks \|

	---

	## Project Structure

	```
	GenoTriage/
	├── __init__.py # Package exports
	├── models.py # VepAction, VepObservation (Pydantic)
	├── client.py # VepEnv client (WebSocket)
	├── inference.py # Baseline inference script
	├── variants.json # Curated ClinVar variants (ground truth)
	├── openenv.yaml # OpenEnv spec manifest
	├── pyproject.toml # Package config
	├── Dockerfile # Container definition
	└── server/
	├── app.py # FastAPI application
	├── vep_env_environment.py # Environment logic + grader
	└── requirements.txt # Server dependencies
	```

	---

	## Data

	Variants are sourced from ClinVar (April 2026 release, GRCh38) filtered to:
	- SNPs only (`CLNVC=single_nucleotide_variant`)
	- Trusted review status (`criteria_provided` or better)
	- Named disease association
	- 8 well-known disease genes: MSH2, MLH1, VHL, CFTR, SCN5A, APC, TSC1, RET

	Population allele frequencies are from gnomAD v4 (queried at curation time and stored statically — no live API calls at runtime).

	---