--- title: Context Corruption Env emoji: 🔍 colorFrom: blue colorTo: purple sdk: docker app_port: 7860 pinned: false license: mit --- # ContextCorruption-Env > OpenEnv Hackathon | Meta x Hugging Face x PyTorch ContextCorruption-Env is an OpenEnv environment for training epistemic robustness in LLMs. The agent receives a factual question plus retrieved documents, some of which are deliberately corrupted. It must answer the question and flag unreliable sources. This submission targets **Theme #3.1: World Modeling / Professional Tasks**. The environment simulates a partially observable information workspace where some evidence is trustworthy and some evidence lies. ## Required Materials - **Environment Space:** https://huggingface.co/spaces/Siddh12334/context-corruption-env - **Mini-blog / writeup:** [`BLOG.md`](BLOG.md) - **Training Space:** https://huggingface.co/spaces/Siddh12334/context-corruption-training - **Trained LoRA checkpoint:** https://huggingface.co/Siddh12334/qwen-1.5b-context-corruption - **Training logs/history:** [`assets/training_history_rl5jygl8.csv`](assets/training_history_rl5jygl8.csv) - **Raw training output log:** [`assets/wandb_run_rl5jygl8/output.log`](assets/wandb_run_rl5jygl8/output.log) - **Completion samples:** [`assets/completions_samples.md`](assets/completions_samples.md) - **Training script:** [`training/train_grpo.py`](training/train_grpo.py) - **Notebook:** [`training/ContextCorruption_GRPO.ipynb`](training/ContextCorruption_GRPO.ipynb) ## Environment Summary Each episode contains: - **1 factual question** - **8 retrieved documents** - **1-4 corrupted documents** - **12-step budget** - **deterministic reward** The agent can take four actions: - `read_doc`: spend budget to inspect a document; - `flag_suspicious`: mark a document as likely corrupted; - `unflag_doc`: remove a flag; - `submit_answer`: finish with an answer and confidence score. The environment is intentionally simple to run but hard to master. A weak agent can guess an answer. A stronger agent must notice contradictions and avoid over-flagging clean documents. ## Interactive Demo UI The FastAPI app serves a lightweight frontend at `/`. It lets users start an episode, inspect the eight retrieved documents, spend read budget, flag suspicious documents, submit an answer with confidence, and optionally call the trained model through `/model/infer`. Run locally with: ```bash uvicorn environment.server:app --host 0.0.0.0 --port 7860 ``` ## Reward The reward is deterministic and compositional. There is no hidden LLM judge. | Component | What It Rewards | Weight | |---|---:|---:| | Answer correctness | exact match after normalization | +0.40 | | Corruption recall | fraction of corrupt docs found | +0.30 | | Precision | avoids false accusations | +0.20 | | Confidence calibration | confidence helps only when correct | +/-0.10 | | Efficiency | small bonus for conserving budget | +0.05 | Reward range: **-0.5 to 1.05**. ## Results We trained **Qwen2-1.5B-Instruct** with GRPO using Unsloth / TRL. The run was sized for hackathon constraints, but it produced a clear signal above the random baseline. | Agent | Reward Evidence | |---|---:| | Random baseline | **0.1302 avg reward** over 100 episodes | | Qwen2-1.5B GRPO | **0.3289 final logged reward** in the finished WandB run | The trained LoRA adapter is pushed to the Hub and is loaded by the hosted Space through `/model/infer` for a live sanity check. ![Reward curve](assets/reward_curve.png) ![Loss curve](assets/loss_curve.png) Additional exported charts: - [Policy entropy](assets/entropy_curve.png) - [Mean completion length](assets/completion_length_curve.png) - [Gradient norm](assets/grad_norm_curve.png) - [Learning rate](assets/learning_rate_curve.png) The WandB run was exported into this repo so judges do not need access to a private project. See the raw log, scalar history, config, summary, and completion tables under [`assets/wandb_run_rl5jygl8/`](assets/wandb_run_rl5jygl8/). ## Repo Structure ```text environment/ # OpenEnv environment, actions, reward, server, model inference data/ # QA loading, corruptions, document generation training/ # GRPO training script and notebook eval/ # random baseline evaluation assets/ # charts, exported training logs, completion samples ```