Spaces:

Siddh12334
/

context-corruption-env

Sleeping

App Files Files Community

context-corruption-env / README.md

aagparekh

Add interactive frontend UI

b0c701c about 1 month ago

preview code

raw

history blame contribute delete

4.35 kB

	---
	title: Context Corruption Env
	emoji: 🔍
	colorFrom: blue
	colorTo: purple
	sdk: docker
	app_port: 7860
	pinned: false
	license: mit
	---

	# ContextCorruption-Env

	> OpenEnv Hackathon \| Meta x Hugging Face x PyTorch

	ContextCorruption-Env is an OpenEnv environment for training epistemic robustness in LLMs. The agent receives a factual question plus retrieved documents, some of which are deliberately corrupted. It must answer the question and flag unreliable sources.

	This submission targets Theme #3.1: World Modeling / Professional Tasks. The environment simulates a partially observable information workspace where some evidence is trustworthy and some evidence lies.

	## Required Materials

	- Environment Space: https://huggingface.co/spaces/Siddh12334/context-corruption-env
	- Mini-blog / writeup: [`BLOG.md`](BLOG.md)
	- Training Space: https://huggingface.co/spaces/Siddh12334/context-corruption-training
	- Trained LoRA checkpoint: https://huggingface.co/Siddh12334/qwen-1.5b-context-corruption
	- Training logs/history: [`assets/training_history_rl5jygl8.csv`](assets/training_history_rl5jygl8.csv)
	- Raw training output log: [`assets/wandb_run_rl5jygl8/output.log`](assets/wandb_run_rl5jygl8/output.log)
	- Completion samples: [`assets/completions_samples.md`](assets/completions_samples.md)
	- Training script: [`training/train_grpo.py`](training/train_grpo.py)
	- Notebook: [`training/ContextCorruption_GRPO.ipynb`](training/ContextCorruption_GRPO.ipynb)

	## Environment Summary

	Each episode contains:

	- 1 factual question
	- 8 retrieved documents
	- 1-4 corrupted documents
	- 12-step budget
	- deterministic reward

	The agent can take four actions:

	- `read_doc`: spend budget to inspect a document;
	- `flag_suspicious`: mark a document as likely corrupted;
	- `unflag_doc`: remove a flag;
	- `submit_answer`: finish with an answer and confidence score.

	The environment is intentionally simple to run but hard to master. A weak agent can guess an answer. A stronger agent must notice contradictions and avoid over-flagging clean documents.

	## Interactive Demo UI

	The FastAPI app serves a lightweight frontend at `/`. It lets users start an episode, inspect the eight retrieved documents, spend read budget, flag suspicious documents, submit an answer with confidence, and optionally call the trained model through `/model/infer`.

	Run locally with:

	```bash
	uvicorn environment.server:app --host 0.0.0.0 --port 7860
	```

	## Reward

	The reward is deterministic and compositional. There is no hidden LLM judge.

	\| Component \| What It Rewards \| Weight \|
	\|---\|---:\|---:\|
	\| Answer correctness \| exact match after normalization \| +0.40 \|
	\| Corruption recall \| fraction of corrupt docs found \| +0.30 \|
	\| Precision \| avoids false accusations \| +0.20 \|
	\| Confidence calibration \| confidence helps only when correct \| +/-0.10 \|
	\| Efficiency \| small bonus for conserving budget \| +0.05 \|

	Reward range: -0.5 to 1.05.

	## Results

	We trained Qwen2-1.5B-Instruct with GRPO using Unsloth / TRL. The run was sized for hackathon constraints, but it produced a clear signal above the random baseline.

	\| Agent \| Reward Evidence \|
	\|---\|---:\|
	\| Random baseline \| 0.1302 avg reward over 100 episodes \|
	\| Qwen2-1.5B GRPO \| 0.3289 final logged reward in the finished WandB run \|

	The trained LoRA adapter is pushed to the Hub and is loaded by the hosted Space through `/model/infer` for a live sanity check.

	![Reward curve](assets/reward_curve.png)

	![Loss curve](assets/loss_curve.png)

	Additional exported charts:

	- [Policy entropy](assets/entropy_curve.png)
	- [Mean completion length](assets/completion_length_curve.png)
	- [Gradient norm](assets/grad_norm_curve.png)
	- [Learning rate](assets/learning_rate_curve.png)

	The WandB run was exported into this repo so judges do not need access to a private project. See the raw log, scalar history, config, summary, and completion tables under [`assets/wandb_run_rl5jygl8/`](assets/wandb_run_rl5jygl8/).

	## Repo Structure

	```text
	environment/ # OpenEnv environment, actions, reward, server, model inference
	data/ # QA loading, corruptions, document generation
	training/ # GRPO training script and notebook
	eval/ # random baseline evaluation
	assets/ # charts, exported training logs, completion samples
	```