| --- |
| title: RecallTrace OpenEnv |
| emoji: π¨ |
| colorFrom: red |
| colorTo: blue |
| sdk: docker |
| app_port: 7860 |
| pinned: false |
| --- |
| |
| ## π Quick Start (Run in one command) |
|
|
| ```bash |
| pip install -r requirements.txt |
| python run_selfplay.py |
| ``` |
| *(No API keys, no GPUs, runs in <2 seconds on CPU)* |
| --- |
|
|
| # RecallTrace: Causal Inference via Adversarial Self-Play |
|
|
| An RL agent that doesn't just learn to detect contamination β it learns to infer the hidden causal intervention behind it. |
|
|
| Trained via adversarial self-play, where an adversary learns to hide better as the investigator learns to reason better. |
|
|
| --- |
|
|
| ## π₯ What you'll see |
|
|
| - Agent improves from random (spray-and-pray) to precise, belief-calibrated quarantine. |
| - F1 score increases to ~1.0 over 200 episodes. |
| - Nodes quarantined drops from 8.3/episode to 3.1/episode. |
| - Adversary adapts to agent weaknesses dynamically. |
|
|
| --- |
|
|
| ## π Proof of Learning |
|
|
| ### 1. The Learning Curves |
| *(Generated automatically when you run the script)* |
|
|
|  |
|
|
| ### 2. Before vs After Behavior |
| *(Untrained vs Trained Agent Comparison)* |
|
|
|  |
|
|
| --- |
|
|
| ## π§ Why This Is Unique |
|
|
| 1. **Causal Inference (not Graph Traversal)**: 30-50% of the graph edges are hidden. The agent must perform abductive reasoning to identify *which* hidden causal intervention (relabeling, mixing, record deletion) produced the observed contamination pattern. |
| 2. **Partial Observability**: The agent relies on a probabilistic belief state (`P(contaminated)` per node) and tool calls to reduce entropy. |
| 3. **Adversarial Self-Play (Theme 4)**: The environment's difficulty is not static. An adversary agent chooses where to place interventions, adapting its curriculum based on the investigator's failure modes. |
| 4. **Belief-Based Decisions (Theme 3.1)**: Quarantines are only rewarded if the agent is confident (`P > 0.8`). Uncalibrated guesses are heavily penalized. |
|
|
| --- |
|
|
| ## βοΈ How It Works |
|
|
| - **The Environment**: A procedural generator builds a unique contamination propagation graph every episode with decoys, false positives, and hidden interventions. |
| - **The Investigator (Agent 1)**: Inspects nodes, traces lineages, and cross-references data to find contamination and quarantine it. Rewarded for precision and recall (+2.0 for correct, -1.5 for incorrect). |
| - **The Adversary (Agent 2)**: Chooses intervention types and placements. Rewarded exclusively when the Investigator fails. |
|
|
| --- |
|
|
| ## π§ͺ Reproducibility |
|
|
| - **Runs in <2 seconds on CPU.** |
| - **No external APIs or heavy models required.** |
| - **Deterministic seeds used** for exact evaluation and metric reproducibility. |
|
|
| --- |
|
|
| ## π¦ Project Structure |
| ```text |
| recalltrace-openenv/ |
| βββ run_selfplay.py # ENTRY POINT |
| βββ app.py # Hugging Face Gradio UI |
| βββ README.md # Project Story |
| βββ PITCH.md # 3-Minute Mentor Pitch Script |
| βββ MENTOR_PREP.md # Fast-prep for live judging |
| βββ PITCH_LANGUAGE.md # Language guidelines |
| βββ architecture.html # Visual Flow Diagram |
| β |
| βββ selfplay/ # Core Logic (Investigator, Adversary, Tracker) |
| βββ env/ # Original OpenEnv Environment definition |
| β |
| βββ plots/ # Auto-generated Demo Imagery |
| β βββ selfplay_training.png |
| β βββ before_after_demo.png |
| β βββ episode_comparison.png |
| ``` |
| sdk: docker |
| app_port: 7860 |
| --- |
| |
| # π RecallTrace OpenEnv |
| |
| RecallTrace is a **real-world AI environment** designed for **product recall tracing and precision containment**. |
| |
| It simulates how companies handle: |
| - contaminated product recalls |
| - supply chain tracing |
| - selective quarantine decisions |
| |
| This environment evaluates **agent reasoning + decision-making**, not just correctness. |
| |
| --- |
| |
| # π§ What This Environment Does |
| |
| Given a recall notice (e.g., *"Lot A is contaminated"*), the agent must: |
| |
| 1. Trace where the product went |
| 2. Identify affected nodes (warehouses, stores) |
| 3. Handle relabeling / transformations |
| 4. Quarantine **only unsafe inventory** |
| 5. Avoid blocking safe stock |
| 6. Notify affected entities |
| 7. Finalize with correct containment |
| |
| --- |
| |
| # π― Why This Is Important |
| |
| This is a **real industry problem** seen in: |
| - food recalls |
| - pharma defects |
| - logistics failures |
| |
| Challenges include: |
| - Graph traversal |
| - Partial observability |
| - Lot transformations |
| - Mixed inventory reasoning |
| - Precision decision-making |
| |
| --- |
| |
| # π§© Tasks (Scenarios) |
| |
| ## πΉ Easy β Direct Recall |
| - Single contaminated lot |
| - Straight supply chain |
| - Goal: trace and quarantine correctly |
| |
| --- |
| |
| ## πΉ Medium β Relabeled Inventory |
| - Lot gets renamed (LotA β LotA1) |
| - Goal: track transformations and quarantine |
| |
| --- |
| |
| ## πΉ Hard β Mixed Inventory |
| - Contaminated + safe stock mixed |
| - Goal: isolate unsafe quantity **without over-blocking** |
| |
| --- |
| |
| # βοΈ Action Space |
| |
| | Action | Description | |
| |------|------------| |
| | inspect_node | View inventory at a node | |
| | trace_lot | Follow product lineage | |
| | quarantine | Block unsafe stock | |
| | notify | Inform affected nodes | |
| | finalize | End task | |
| |
| --- |
| |
| # π¦ Observation Structure |
| |
| Each step returns: |
| |
| - recall_notice |
| - inventory |
| - action history |
| - trace results |
| - inspection data |
| |
| --- |
|
|
| # π Reward & Grading |
|
|
| ### Reward System |
| - + Correct tracing |
| - + Correct quarantine |
| - + Correct notification |
| - β Wrong node |
| - β Over-quarantine |
| - β Missed unsafe stock |
|
|
| --- |
|
|
| ### Final Score |
| Range: **0.0 β 1.0** |
|
|
| Based on: |
| - accuracy |
| - precision |
| - efficiency |
|
|
| --- |
|
|
| # π§± Project Structure |
|
|
| ```bash |
| recalltrace-openenv/ |
| β |
| βββ env/ # Environment logic |
| β βββ env.py |
| β βββ __init__.py |
| β |
| βββ scenario/ # Scenario generation |
| β βββ scenario.py |
| β |
| βββ grader/ # Evaluation + reward |
| β βββ grader.py |
| β |
| βββ inference/ # Agent simulation |
| β βββ inference.py |
| β |
| βββ config/ |
| β βββ openenv.yaml |
| β |
| βββ Dockerfile |
| βββ requirements.txt |
| βββ README.md |
| ``` |
|
|
| ## π§ What the agent learns |
|
|
| - Early: quarantines 6β8 nodes randomly (F1 ~0.3) |
| - Mid: starts identifying patterns (F1 ~0.6) |
| - Late: infers intervention type before acting (F1 ~0.8) |
|
|
| The agent does not memorize β it infers hidden causal events under partial observability. |
|
|