Spaces:

Siddh12334
/

context-corruption-env

Sleeping

App Files Files Community

context-corruption-env / README.md

aagparekh

Add interactive frontend UI

b0c701c about 1 month ago

preview code

raw

history blame contribute delete

4.35 kB

metadata

title: Context Corruption Env
emoji: 🔍
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
pinned: false
license: mit

ContextCorruption-Env

OpenEnv Hackathon | Meta x Hugging Face x PyTorch

ContextCorruption-Env is an OpenEnv environment for training epistemic robustness in LLMs. The agent receives a factual question plus retrieved documents, some of which are deliberately corrupted. It must answer the question and flag unreliable sources.

This submission targets Theme #3.1: World Modeling / Professional Tasks. The environment simulates a partially observable information workspace where some evidence is trustworthy and some evidence lies.

Required Materials

Environment Space: https://huggingface.co/spaces/Siddh12334/context-corruption-env
Mini-blog / writeup: BLOG.md
Training Space: https://huggingface.co/spaces/Siddh12334/context-corruption-training
Trained LoRA checkpoint: https://huggingface.co/Siddh12334/qwen-1.5b-context-corruption
Training logs/history: assets/training_history_rl5jygl8.csv
Raw training output log: assets/wandb_run_rl5jygl8/output.log
Completion samples: assets/completions_samples.md
Training script: training/train_grpo.py
Notebook: training/ContextCorruption_GRPO.ipynb

Environment Summary

Each episode contains:

1 factual question
8 retrieved documents
1-4 corrupted documents
12-step budget
deterministic reward

The agent can take four actions:

read_doc: spend budget to inspect a document;
flag_suspicious: mark a document as likely corrupted;
unflag_doc: remove a flag;
submit_answer: finish with an answer and confidence score.

The environment is intentionally simple to run but hard to master. A weak agent can guess an answer. A stronger agent must notice contradictions and avoid over-flagging clean documents.

Interactive Demo UI

The FastAPI app serves a lightweight frontend at /. It lets users start an episode, inspect the eight retrieved documents, spend read budget, flag suspicious documents, submit an answer with confidence, and optionally call the trained model through /model/infer.

Run locally with:

uvicorn environment.server:app --host 0.0.0.0 --port 7860

Reward

The reward is deterministic and compositional. There is no hidden LLM judge.

Component	What It Rewards	Weight
Answer correctness	exact match after normalization	+0.40
Corruption recall	fraction of corrupt docs found	+0.30
Precision	avoids false accusations	+0.20
Confidence calibration	confidence helps only when correct	+/-0.10
Efficiency	small bonus for conserving budget	+0.05

Reward range: -0.5 to 1.05.

Results

We trained Qwen2-1.5B-Instruct with GRPO using Unsloth / TRL. The run was sized for hackathon constraints, but it produced a clear signal above the random baseline.

Agent	Reward Evidence
Random baseline	0.1302 avg reward over 100 episodes
Qwen2-1.5B GRPO	0.3289 final logged reward in the finished WandB run

The trained LoRA adapter is pushed to the Hub and is loaded by the hosted Space through /model/infer for a live sanity check.

Additional exported charts:

The WandB run was exported into this repo so judges do not need access to a private project. See the raw log, scalar history, config, summary, and completion tables under assets/wandb_run_rl5jygl8/.

Repo Structure

environment/   # OpenEnv environment, actions, reward, server, model inference
data/          # QA loading, corruptions, document generation
training/      # GRPO training script and notebook
eval/          # random baseline evaluation
assets/        # charts, exported training logs, completion samples