Spaces:

CyCrawwler
/

annotation-qa-env

Sleeping

App Files Files Community

annotation-qa-env / README.md

k3tikvats

initial commit

8b4d6a8 7 days ago

preview code

raw

history blame contribute delete

4.61 kB

metadata

title: Annotation QA Env
emoji: 🔍
colorFrom: blue
colorTo: indigo
sdk: docker
app_port: 8000

🔍 Annotation QA Environment

An OpenEnv environment where an AI agent reviews and corrects intentionally-flawed ML annotations on synthetic scenes. Built for the Meta OpenEnv × SST Hackathon.

🎯 The Challenge

Real-world ML training data is noisy. Annotation teams make mistakes — bounding boxes drift, class labels get swapped, objects get missed. This environment simulates that review pipeline:

Agent receives a scene description + current annotations (some are wrong)
Agent identifies errors by comparing annotations to scene objects
Agent corrects errors through bbox adjustments, class changes, additions, and removals
Agent submits and receives a score based on annotation quality improvement

📋 Tasks (3 Difficulty Levels)

Task	Difficulty	Errors	Max Steps
`fix_bboxes`	Easy	Bbox expansion, shifting, shrinking, spurious, missing	15
`fix_classes`	Medium	Bbox errors + class label confusion (car↔truck, dog↔cat)	20
`batch_audit`	Hard	Subtle bbox shifts + similar-class confusion + cross-batch issues	30

🏗️ Architecture

annotation_qa_env/
├── models.py              ← Action, Observation, State (Pydantic)
├── client.py              ← EnvClient for WebSocket interaction
├── inference.py           ← Baseline LLM agent (OpenAI client)
├── server/
│   ├── environment.py     ← Core game logic (reset, step, state)
│   ├── grader.py          ← IoU-based deterministic grading
│   ├── corruption.py      ← Annotation corruption strategies
│   ├── app.py             ← FastAPI server
│   └── Dockerfile         ← Container definition
└── data/
    └── generate_dataset.py ← Synthetic scene generator

🚀 Quick Start

Install & Run Locally

cd annotation_qa_env
pip install -e .
uvicorn server.app:app --host 0.0.0.0 --port 8000

Use the Client

from annotation_qa_env import AnnotationQAEnv, AnnotationQAAction

with AnnotationQAEnv(base_url="http://localhost:8000").sync() as env:
    result = env.reset(task="fix_bboxes")
    print(result.observation.annotations)

    result = env.step(AnnotationQAAction(
        action_type="adjust_bbox",
        annotation_id=0,
        new_bbox=[0.1, 0.2, 0.15, 0.1],
    ))
    print(f"Reward: {result.reward}")

Docker

docker build -t annotation-qa-env:latest -f server/Dockerfile .
docker run -d -p 8000:8000 annotation-qa-env:latest

Deploy to HF Spaces

openenv push --repo-id username/annotation-qa-env

📊 Grading

The grading function is deterministic and returns scores in [0.0, 1.0]:

Score = (final_quality - initial_quality) / (1.0 - initial_quality)

Where quality is a weighted composite of:

Mean IoU (40%) — How well do predicted bboxes overlap with gold?
Class Accuracy (30%) — Are class labels correct?
Precision (15%) — Are there spurious annotations?
Recall (15%) — Are there missing annotations?

🤖 Actions

Action	Required Fields	Description
`adjust_bbox`	`annotation_id`, `new_bbox`	Fix a bounding box
`change_class`	`annotation_id`, `new_class`	Fix a class label
`add_annotation`	`new_bbox`, `new_class`	Add a missing annotation
`remove_annotation`	`annotation_id`	Remove a spurious annotation
`submit`	(none)	Finalize corrections

📦 Environment Variables

Variable	Default	Description
`API_BASE_URL`	`https://router.huggingface.co/v1`	LLM API endpoint
`MODEL_NAME`	`Qwen/Qwen2.5-72B-Instruct`	Model for inference
`HF_TOKEN`	—	API key

🔬 Why Synthetic Scenes?

We use programmatic scene descriptions instead of real COCO images because:

Docker size: COCO train2017 is ~18GB — exceeds container limits
Memory: Base64 images in observations would spike past 8GB RAM
LLM text-only: Evaluation uses text-only LLMs (no vision models)
Determinism: Same seed = same data = reproducible scores
Zero setup: No dataset download — everything is self-contained

The annotation QA task is fundamentally about spatial + categorical reasoning, which text captures fully.

📜 License

BSD-3-Clause (matching OpenEnv)