title: RecallTrace OpenEnv
emoji: π¨
colorFrom: red
colorTo: blue
sdk: docker
app_port: 7860
pinned: false
π Quick Start (Run in one command)
pip install -r requirements.txt
python run_selfplay.py
(No API keys, no GPUs, runs in <2 seconds on CPU)
RecallTrace: Causal Inference via Adversarial Self-Play
An RL agent that doesn't just learn to detect contamination β it learns to infer the hidden causal intervention behind it.
Trained via adversarial self-play, where an adversary learns to hide better as the investigator learns to reason better.
π₯ What you'll see
- Agent improves from random (spray-and-pray) to precise, belief-calibrated quarantine.
- F1 score increases to ~1.0 over 200 episodes.
- Nodes quarantined drops from 8.3/episode to 3.1/episode.
- Adversary adapts to agent weaknesses dynamically.
π Proof of Learning
1. The Learning Curves
(Generated automatically when you run the script)
2. Before vs After Behavior
(Untrained vs Trained Agent Comparison)
π§ Why This Is Unique
- Causal Inference (not Graph Traversal): 30-50% of the graph edges are hidden. The agent must perform abductive reasoning to identify which hidden causal intervention (relabeling, mixing, record deletion) produced the observed contamination pattern.
- Partial Observability: The agent relies on a probabilistic belief state (
P(contaminated)per node) and tool calls to reduce entropy. - Adversarial Self-Play (Theme 4): The environment's difficulty is not static. An adversary agent chooses where to place interventions, adapting its curriculum based on the investigator's failure modes.
- Belief-Based Decisions (Theme 3.1): Quarantines are only rewarded if the agent is confident (
P > 0.8). Uncalibrated guesses are heavily penalized.
βοΈ How It Works
- The Environment: A procedural generator builds a unique contamination propagation graph every episode with decoys, false positives, and hidden interventions.
- The Investigator (Agent 1): Inspects nodes, traces lineages, and cross-references data to find contamination and quarantine it. Rewarded for precision and recall (+2.0 for correct, -1.5 for incorrect).
- The Adversary (Agent 2): Chooses intervention types and placements. Rewarded exclusively when the Investigator fails.
π§ͺ Reproducibility
- Runs in <2 seconds on CPU.
- No external APIs or heavy models required.
- Deterministic seeds used for exact evaluation and metric reproducibility.
π¦ Project Structure
recalltrace-openenv/
βββ run_selfplay.py # ENTRY POINT
βββ app.py # Hugging Face Gradio UI
βββ README.md # Project Story
βββ PITCH.md # 3-Minute Mentor Pitch Script
βββ MENTOR_PREP.md # Fast-prep for live judging
βββ PITCH_LANGUAGE.md # Language guidelines
βββ architecture.html # Visual Flow Diagram
β
βββ selfplay/ # Core Logic (Investigator, Adversary, Tracker)
βββ env/ # Original OpenEnv Environment definition
β
βββ plots/ # Auto-generated Demo Imagery
β βββ selfplay_training.png
β βββ before_after_demo.png
β βββ episode_comparison.png
sdk: docker app_port: 7860
π RecallTrace OpenEnv
RecallTrace is a real-world AI environment designed for product recall tracing and precision containment.
It simulates how companies handle:
- contaminated product recalls
- supply chain tracing
- selective quarantine decisions
This environment evaluates agent reasoning + decision-making, not just correctness.
π§ What This Environment Does
Given a recall notice (e.g., "Lot A is contaminated"), the agent must:
- Trace where the product went
- Identify affected nodes (warehouses, stores)
- Handle relabeling / transformations
- Quarantine only unsafe inventory
- Avoid blocking safe stock
- Notify affected entities
- Finalize with correct containment
π― Why This Is Important
This is a real industry problem seen in:
- food recalls
- pharma defects
- logistics failures
Challenges include:
- Graph traversal
- Partial observability
- Lot transformations
- Mixed inventory reasoning
- Precision decision-making
π§© Tasks (Scenarios)
πΉ Easy β Direct Recall
- Single contaminated lot
- Straight supply chain
- Goal: trace and quarantine correctly
πΉ Medium β Relabeled Inventory
- Lot gets renamed (LotA β LotA1)
- Goal: track transformations and quarantine
πΉ Hard β Mixed Inventory
- Contaminated + safe stock mixed
- Goal: isolate unsafe quantity without over-blocking
βοΈ Action Space
| Action | Description |
|---|---|
| inspect_node | View inventory at a node |
| trace_lot | Follow product lineage |
| quarantine | Block unsafe stock |
| notify | Inform affected nodes |
| finalize | End task |
π¦ Observation Structure
Each step returns:
- recall_notice
- inventory
- action history
- trace results
- inspection data
π Reward & Grading
Reward System
- Correct tracing
- Correct quarantine
- Correct notification
- β Wrong node
- β Over-quarantine
- β Missed unsafe stock
Final Score
Range: 0.0 β 1.0
Based on:
- accuracy
- precision
- efficiency
π§± Project Structure
recalltrace-openenv/
β
βββ env/ # Environment logic
β βββ env.py
β βββ __init__.py
β
βββ scenario/ # Scenario generation
β βββ scenario.py
β
βββ grader/ # Evaluation + reward
β βββ grader.py
β
βββ inference/ # Agent simulation
β βββ inference.py
β
βββ config/
β βββ openenv.yaml
β
βββ Dockerfile
βββ requirements.txt
βββ README.md
π§ What the agent learns
- Early: quarantines 6β8 nodes randomly (F1 ~0.3)
- Mid: starts identifying patterns (F1 ~0.6)
- Late: infers intervention type before acting (F1 ~0.8)
The agent does not memorize β it infers hidden causal events under partial observability.

