new_model / README.md
Ayu
feat: RecallTrace Tasks 1-9 complete - belief calibration + curriculum + plots
d19137b
---
title: RecallTrace OpenEnv
emoji: 🚨
colorFrom: red
colorTo: blue
sdk: docker
app_port: 7860
pinned: false
---
## πŸš€ Quick Start (Run in one command)
```bash
pip install -r requirements.txt
python run_selfplay.py
```
*(No API keys, no GPUs, runs in <2 seconds on CPU)*
---
# RecallTrace: Causal Inference via Adversarial Self-Play
An RL agent that doesn't just learn to detect contamination β€” it learns to infer the hidden causal intervention behind it.
Trained via adversarial self-play, where an adversary learns to hide better as the investigator learns to reason better.
---
## πŸŽ₯ What you'll see
- Agent improves from random (spray-and-pray) to precise, belief-calibrated quarantine.
- F1 score increases to ~1.0 over 200 episodes.
- Nodes quarantined drops from 8.3/episode to 3.1/episode.
- Adversary adapts to agent weaknesses dynamically.
---
## πŸ“Š Proof of Learning
### 1. The Learning Curves
*(Generated automatically when you run the script)*
![Training Curves](plots/selfplay_training.png)
### 2. Before vs After Behavior
*(Untrained vs Trained Agent Comparison)*
![Before vs After](plots/before_after_demo.png)
---
## 🧠 Why This Is Unique
1. **Causal Inference (not Graph Traversal)**: 30-50% of the graph edges are hidden. The agent must perform abductive reasoning to identify *which* hidden causal intervention (relabeling, mixing, record deletion) produced the observed contamination pattern.
2. **Partial Observability**: The agent relies on a probabilistic belief state (`P(contaminated)` per node) and tool calls to reduce entropy.
3. **Adversarial Self-Play (Theme 4)**: The environment's difficulty is not static. An adversary agent chooses where to place interventions, adapting its curriculum based on the investigator's failure modes.
4. **Belief-Based Decisions (Theme 3.1)**: Quarantines are only rewarded if the agent is confident (`P > 0.8`). Uncalibrated guesses are heavily penalized.
---
## βš™οΈ How It Works
- **The Environment**: A procedural generator builds a unique contamination propagation graph every episode with decoys, false positives, and hidden interventions.
- **The Investigator (Agent 1)**: Inspects nodes, traces lineages, and cross-references data to find contamination and quarantine it. Rewarded for precision and recall (+2.0 for correct, -1.5 for incorrect).
- **The Adversary (Agent 2)**: Chooses intervention types and placements. Rewarded exclusively when the Investigator fails.
---
## πŸ§ͺ Reproducibility
- **Runs in <2 seconds on CPU.**
- **No external APIs or heavy models required.**
- **Deterministic seeds used** for exact evaluation and metric reproducibility.
---
## πŸ“¦ Project Structure
```text
recalltrace-openenv/
β”œβ”€β”€ run_selfplay.py # ENTRY POINT
β”œβ”€β”€ app.py # Hugging Face Gradio UI
β”œβ”€β”€ README.md # Project Story
β”œβ”€β”€ PITCH.md # 3-Minute Mentor Pitch Script
β”œβ”€β”€ MENTOR_PREP.md # Fast-prep for live judging
β”œβ”€β”€ PITCH_LANGUAGE.md # Language guidelines
β”œβ”€β”€ architecture.html # Visual Flow Diagram
β”‚
β”œβ”€β”€ selfplay/ # Core Logic (Investigator, Adversary, Tracker)
β”œβ”€β”€ env/ # Original OpenEnv Environment definition
β”‚
β”œβ”€β”€ plots/ # Auto-generated Demo Imagery
β”‚ β”œβ”€β”€ selfplay_training.png
β”‚ β”œβ”€β”€ before_after_demo.png
β”‚ └── episode_comparison.png
```
sdk: docker
app_port: 7860
---
# πŸš€ RecallTrace OpenEnv
RecallTrace is a **real-world AI environment** designed for **product recall tracing and precision containment**.
It simulates how companies handle:
- contaminated product recalls
- supply chain tracing
- selective quarantine decisions
This environment evaluates **agent reasoning + decision-making**, not just correctness.
---
# 🧠 What This Environment Does
Given a recall notice (e.g., *"Lot A is contaminated"*), the agent must:
1. Trace where the product went
2. Identify affected nodes (warehouses, stores)
3. Handle relabeling / transformations
4. Quarantine **only unsafe inventory**
5. Avoid blocking safe stock
6. Notify affected entities
7. Finalize with correct containment
---
# 🎯 Why This Is Important
This is a **real industry problem** seen in:
- food recalls
- pharma defects
- logistics failures
Challenges include:
- Graph traversal
- Partial observability
- Lot transformations
- Mixed inventory reasoning
- Precision decision-making
---
# 🧩 Tasks (Scenarios)
## πŸ”Ή Easy β€” Direct Recall
- Single contaminated lot
- Straight supply chain
- Goal: trace and quarantine correctly
---
## πŸ”Ή Medium β€” Relabeled Inventory
- Lot gets renamed (LotA β†’ LotA1)
- Goal: track transformations and quarantine
---
## πŸ”Ή Hard β€” Mixed Inventory
- Contaminated + safe stock mixed
- Goal: isolate unsafe quantity **without over-blocking**
---
# βš™οΈ Action Space
| Action | Description |
|------|------------|
| inspect_node | View inventory at a node |
| trace_lot | Follow product lineage |
| quarantine | Block unsafe stock |
| notify | Inform affected nodes |
| finalize | End task |
---
# πŸ“¦ Observation Structure
Each step returns:
- recall_notice
- inventory
- action history
- trace results
- inspection data
---
# πŸ† Reward & Grading
### Reward System
- + Correct tracing
- + Correct quarantine
- + Correct notification
- βˆ’ Wrong node
- βˆ’ Over-quarantine
- βˆ’ Missed unsafe stock
---
### Final Score
Range: **0.0 β†’ 1.0**
Based on:
- accuracy
- precision
- efficiency
---
# 🧱 Project Structure
```bash
recalltrace-openenv/
β”‚
β”œβ”€β”€ env/ # Environment logic
β”‚ β”œβ”€β”€ env.py
β”‚ └── __init__.py
β”‚
β”œβ”€β”€ scenario/ # Scenario generation
β”‚ └── scenario.py
β”‚
β”œβ”€β”€ grader/ # Evaluation + reward
β”‚ └── grader.py
β”‚
β”œβ”€β”€ inference/ # Agent simulation
β”‚ └── inference.py
β”‚
β”œβ”€β”€ config/
β”‚ └── openenv.yaml
β”‚
β”œβ”€β”€ Dockerfile
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ README.md
```
## 🧠 What the agent learns
- Early: quarantines 6–8 nodes randomly (F1 ~0.3)
- Mid: starts identifying patterns (F1 ~0.6)
- Late: infers intervention type before acting (F1 ~0.8)
The agent does not memorize β€” it infers hidden causal events under partial observability.