---
title: Context Corruption Env
emoji: 🔍
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
pinned: false
license: mit
---

# ContextCorruption-Env

> OpenEnv Hackathon | Meta x Hugging Face x PyTorch

ContextCorruption-Env is an OpenEnv environment for training epistemic robustness in LLMs. The agent receives a factual question plus retrieved documents, some of which are deliberately corrupted. It must answer the question and flag unreliable sources.

This submission targets **Theme #3.1: World Modeling / Professional Tasks**. The environment simulates a partially observable information workspace where some evidence is trustworthy and some evidence lies.

## Required Materials

- **Environment Space:** https://huggingface.co/spaces/Siddh12334/context-corruption-env
- **Mini-blog / writeup:** [`BLOG.md`](BLOG.md)
- **Training Space:** https://huggingface.co/spaces/Siddh12334/context-corruption-training
- **Trained LoRA checkpoint:** https://huggingface.co/Siddh12334/qwen-1.5b-context-corruption
- **Training logs/history:** [`assets/training_history_rl5jygl8.csv`](assets/training_history_rl5jygl8.csv)
- **Raw training output log:** [`assets/wandb_run_rl5jygl8/output.log`](assets/wandb_run_rl5jygl8/output.log)
- **Completion samples:** [`assets/completions_samples.md`](assets/completions_samples.md)
- **Training script:** [`training/train_grpo.py`](training/train_grpo.py)
- **Notebook:** [`training/ContextCorruption_GRPO.ipynb`](training/ContextCorruption_GRPO.ipynb)

## Environment Summary

Each episode contains:

- **1 factual question**
- **8 retrieved documents**
- **1-4 corrupted documents**
- **12-step budget**
- **deterministic reward**

The agent can take four actions:

- `read_doc`: spend budget to inspect a document;
- `flag_suspicious`: mark a document as likely corrupted;
- `unflag_doc`: remove a flag;
- `submit_answer`: finish with an answer and confidence score.

The environment is intentionally simple to run but hard to master. A weak agent can guess an answer. A stronger agent must notice contradictions and avoid over-flagging clean documents.

## Interactive Demo UI

The FastAPI app serves a lightweight frontend at `/`. It lets users start an episode, inspect the eight retrieved documents, spend read budget, flag suspicious documents, submit an answer with confidence, and optionally call the trained model through `/model/infer`.

Run locally with:

```bash
uvicorn environment.server:app --host 0.0.0.0 --port 7860
```

## Reward

The reward is deterministic and compositional. There is no hidden LLM judge.

| Component | What It Rewards | Weight |
|---|---:|---:|
| Answer correctness | exact match after normalization | +0.40 |
| Corruption recall | fraction of corrupt docs found | +0.30 |
| Precision | avoids false accusations | +0.20 |
| Confidence calibration | confidence helps only when correct | +/-0.10 |
| Efficiency | small bonus for conserving budget | +0.05 |

Reward range: **-0.5 to 1.05**.

## Results

We trained **Qwen2-1.5B-Instruct** with GRPO using Unsloth / TRL. The run was sized for hackathon constraints, but it produced a clear signal above the random baseline.

| Agent | Reward Evidence |
|---|---:|
| Random baseline | **0.1302 avg reward** over 100 episodes |
| Qwen2-1.5B GRPO | **0.3289 final logged reward** in the finished WandB run |

The trained LoRA adapter is pushed to the Hub and is loaded by the hosted Space through `/model/infer` for a live sanity check.

![Reward curve](assets/reward_curve.png)

![Loss curve](assets/loss_curve.png)

Additional exported charts:

- [Policy entropy](assets/entropy_curve.png)
- [Mean completion length](assets/completion_length_curve.png)
- [Gradient norm](assets/grad_norm_curve.png)
- [Learning rate](assets/learning_rate_curve.png)

The WandB run was exported into this repo so judges do not need access to a private project. See the raw log, scalar history, config, summary, and completion tables under [`assets/wandb_run_rl5jygl8/`](assets/wandb_run_rl5jygl8/).

## Repo Structure

```text
environment/   # OpenEnv environment, actions, reward, server, model inference
data/          # QA loading, corruptions, document generation
training/      # GRPO training script and notebook
eval/          # random baseline evaluation
assets/        # charts, exported training logs, completion samples
```