--- title: RecallTrace OpenEnv emoji: ๐Ÿšจ colorFrom: red colorTo: blue sdk: docker app_port: 7860 pinned: false --- ## ๐Ÿš€ Quick Start (Run in one command) ```bash pip install -r requirements.txt python run_selfplay.py ``` *(No API keys, no GPUs, runs in <2 seconds on CPU)* --- # RecallTrace: Causal Inference via Adversarial Self-Play An RL agent that doesn't just learn to detect contamination โ€” it learns to infer the hidden causal intervention behind it. Trained via adversarial self-play, where an adversary learns to hide better as the investigator learns to reason better. --- ## ๐ŸŽฅ What you'll see - Agent improves from random (spray-and-pray) to precise, belief-calibrated quarantine. - F1 score increases to ~1.0 over 200 episodes. - Nodes quarantined drops from 8.3/episode to 3.1/episode. - Adversary adapts to agent weaknesses dynamically. --- ## ๐Ÿ“Š Proof of Learning ### 1. The Learning Curves *(Generated automatically when you run the script)* ![Training Curves](plots/selfplay_training.png) ### 2. Before vs After Behavior *(Untrained vs Trained Agent Comparison)* ![Before vs After](plots/before_after_demo.png) --- ## ๐Ÿง  Why This Is Unique 1. **Causal Inference (not Graph Traversal)**: 30-50% of the graph edges are hidden. The agent must perform abductive reasoning to identify *which* hidden causal intervention (relabeling, mixing, record deletion) produced the observed contamination pattern. 2. **Partial Observability**: The agent relies on a probabilistic belief state (`P(contaminated)` per node) and tool calls to reduce entropy. 3. **Adversarial Self-Play (Theme 4)**: The environment's difficulty is not static. An adversary agent chooses where to place interventions, adapting its curriculum based on the investigator's failure modes. 4. **Belief-Based Decisions (Theme 3.1)**: Quarantines are only rewarded if the agent is confident (`P > 0.8`). Uncalibrated guesses are heavily penalized. --- ## โš™๏ธ How It Works - **The Environment**: A procedural generator builds a unique contamination propagation graph every episode with decoys, false positives, and hidden interventions. - **The Investigator (Agent 1)**: Inspects nodes, traces lineages, and cross-references data to find contamination and quarantine it. Rewarded for precision and recall (+2.0 for correct, -1.5 for incorrect). - **The Adversary (Agent 2)**: Chooses intervention types and placements. Rewarded exclusively when the Investigator fails. --- ## ๐Ÿงช Reproducibility - **Runs in <2 seconds on CPU.** - **No external APIs or heavy models required.** - **Deterministic seeds used** for exact evaluation and metric reproducibility. --- ## ๐Ÿ“ฆ Project Structure ```text recalltrace-openenv/ โ”œโ”€โ”€ run_selfplay.py # ENTRY POINT โ”œโ”€โ”€ app.py # Hugging Face Gradio UI โ”œโ”€โ”€ README.md # Project Story โ”œโ”€โ”€ PITCH.md # 3-Minute Mentor Pitch Script โ”œโ”€โ”€ MENTOR_PREP.md # Fast-prep for live judging โ”œโ”€โ”€ PITCH_LANGUAGE.md # Language guidelines โ”œโ”€โ”€ architecture.html # Visual Flow Diagram โ”‚ โ”œโ”€โ”€ selfplay/ # Core Logic (Investigator, Adversary, Tracker) โ”œโ”€โ”€ env/ # Original OpenEnv Environment definition โ”‚ โ”œโ”€โ”€ plots/ # Auto-generated Demo Imagery โ”‚ โ”œโ”€โ”€ selfplay_training.png โ”‚ โ”œโ”€โ”€ before_after_demo.png โ”‚ โ””โ”€โ”€ episode_comparison.png ``` sdk: docker app_port: 7860 --- # ๐Ÿš€ RecallTrace OpenEnv RecallTrace is a **real-world AI environment** designed for **product recall tracing and precision containment**. It simulates how companies handle: - contaminated product recalls - supply chain tracing - selective quarantine decisions This environment evaluates **agent reasoning + decision-making**, not just correctness. --- # ๐Ÿง  What This Environment Does Given a recall notice (e.g., *"Lot A is contaminated"*), the agent must: 1. Trace where the product went 2. Identify affected nodes (warehouses, stores) 3. Handle relabeling / transformations 4. Quarantine **only unsafe inventory** 5. Avoid blocking safe stock 6. Notify affected entities 7. Finalize with correct containment --- # ๐ŸŽฏ Why This Is Important This is a **real industry problem** seen in: - food recalls - pharma defects - logistics failures Challenges include: - Graph traversal - Partial observability - Lot transformations - Mixed inventory reasoning - Precision decision-making --- # ๐Ÿงฉ Tasks (Scenarios) ## ๐Ÿ”น Easy โ€” Direct Recall - Single contaminated lot - Straight supply chain - Goal: trace and quarantine correctly --- ## ๐Ÿ”น Medium โ€” Relabeled Inventory - Lot gets renamed (LotA โ†’ LotA1) - Goal: track transformations and quarantine --- ## ๐Ÿ”น Hard โ€” Mixed Inventory - Contaminated + safe stock mixed - Goal: isolate unsafe quantity **without over-blocking** --- # โš™๏ธ Action Space | Action | Description | |------|------------| | inspect_node | View inventory at a node | | trace_lot | Follow product lineage | | quarantine | Block unsafe stock | | notify | Inform affected nodes | | finalize | End task | --- # ๐Ÿ“ฆ Observation Structure Each step returns: - recall_notice - inventory - action history - trace results - inspection data --- # ๐Ÿ† Reward & Grading ### Reward System - + Correct tracing - + Correct quarantine - + Correct notification - โˆ’ Wrong node - โˆ’ Over-quarantine - โˆ’ Missed unsafe stock --- ### Final Score Range: **0.0 โ†’ 1.0** Based on: - accuracy - precision - efficiency --- # ๐Ÿงฑ Project Structure ```bash recalltrace-openenv/ โ”‚ โ”œโ”€โ”€ env/ # Environment logic โ”‚ โ”œโ”€โ”€ env.py โ”‚ โ””โ”€โ”€ __init__.py โ”‚ โ”œโ”€โ”€ scenario/ # Scenario generation โ”‚ โ””โ”€โ”€ scenario.py โ”‚ โ”œโ”€โ”€ grader/ # Evaluation + reward โ”‚ โ””โ”€โ”€ grader.py โ”‚ โ”œโ”€โ”€ inference/ # Agent simulation โ”‚ โ””โ”€โ”€ inference.py โ”‚ โ”œโ”€โ”€ config/ โ”‚ โ””โ”€โ”€ openenv.yaml โ”‚ โ”œโ”€โ”€ Dockerfile โ”œโ”€โ”€ requirements.txt โ”œโ”€โ”€ README.md ``` ## ๐Ÿง  What the agent learns - Early: quarantines 6โ€“8 nodes randomly (F1 ~0.3) - Mid: starts identifying patterns (F1 ~0.6) - Late: infers intervention type before acting (F1 ~0.8) The agent does not memorize โ€” it infers hidden causal events under partial observability.