new_model / README.md
Ayu
feat: RecallTrace Tasks 1-9 complete - belief calibration + curriculum + plots
d19137b
metadata
title: RecallTrace OpenEnv
emoji: 🚨
colorFrom: red
colorTo: blue
sdk: docker
app_port: 7860
pinned: false

πŸš€ Quick Start (Run in one command)

pip install -r requirements.txt
python run_selfplay.py

(No API keys, no GPUs, runs in <2 seconds on CPU)

RecallTrace: Causal Inference via Adversarial Self-Play

An RL agent that doesn't just learn to detect contamination β€” it learns to infer the hidden causal intervention behind it.

Trained via adversarial self-play, where an adversary learns to hide better as the investigator learns to reason better.


πŸŽ₯ What you'll see

  • Agent improves from random (spray-and-pray) to precise, belief-calibrated quarantine.
  • F1 score increases to ~1.0 over 200 episodes.
  • Nodes quarantined drops from 8.3/episode to 3.1/episode.
  • Adversary adapts to agent weaknesses dynamically.

πŸ“Š Proof of Learning

1. The Learning Curves

(Generated automatically when you run the script)

Training Curves

2. Before vs After Behavior

(Untrained vs Trained Agent Comparison)

Before vs After


🧠 Why This Is Unique

  1. Causal Inference (not Graph Traversal): 30-50% of the graph edges are hidden. The agent must perform abductive reasoning to identify which hidden causal intervention (relabeling, mixing, record deletion) produced the observed contamination pattern.
  2. Partial Observability: The agent relies on a probabilistic belief state (P(contaminated) per node) and tool calls to reduce entropy.
  3. Adversarial Self-Play (Theme 4): The environment's difficulty is not static. An adversary agent chooses where to place interventions, adapting its curriculum based on the investigator's failure modes.
  4. Belief-Based Decisions (Theme 3.1): Quarantines are only rewarded if the agent is confident (P > 0.8). Uncalibrated guesses are heavily penalized.

βš™οΈ How It Works

  • The Environment: A procedural generator builds a unique contamination propagation graph every episode with decoys, false positives, and hidden interventions.
  • The Investigator (Agent 1): Inspects nodes, traces lineages, and cross-references data to find contamination and quarantine it. Rewarded for precision and recall (+2.0 for correct, -1.5 for incorrect).
  • The Adversary (Agent 2): Chooses intervention types and placements. Rewarded exclusively when the Investigator fails.

πŸ§ͺ Reproducibility

  • Runs in <2 seconds on CPU.
  • No external APIs or heavy models required.
  • Deterministic seeds used for exact evaluation and metric reproducibility.

πŸ“¦ Project Structure

recalltrace-openenv/
β”œβ”€β”€ run_selfplay.py        # ENTRY POINT
β”œβ”€β”€ app.py                 # Hugging Face Gradio UI
β”œβ”€β”€ README.md              # Project Story
β”œβ”€β”€ PITCH.md               # 3-Minute Mentor Pitch Script
β”œβ”€β”€ MENTOR_PREP.md         # Fast-prep for live judging
β”œβ”€β”€ PITCH_LANGUAGE.md      # Language guidelines
β”œβ”€β”€ architecture.html      # Visual Flow Diagram
β”‚
β”œβ”€β”€ selfplay/              # Core Logic (Investigator, Adversary, Tracker)
β”œβ”€β”€ env/                   # Original OpenEnv Environment definition
β”‚
β”œβ”€β”€ plots/                 # Auto-generated Demo Imagery
β”‚   β”œβ”€β”€ selfplay_training.png
β”‚   β”œβ”€β”€ before_after_demo.png
β”‚   └── episode_comparison.png

sdk: docker app_port: 7860

πŸš€ RecallTrace OpenEnv

RecallTrace is a real-world AI environment designed for product recall tracing and precision containment.

It simulates how companies handle:

  • contaminated product recalls
  • supply chain tracing
  • selective quarantine decisions

This environment evaluates agent reasoning + decision-making, not just correctness.


🧠 What This Environment Does

Given a recall notice (e.g., "Lot A is contaminated"), the agent must:

  1. Trace where the product went
  2. Identify affected nodes (warehouses, stores)
  3. Handle relabeling / transformations
  4. Quarantine only unsafe inventory
  5. Avoid blocking safe stock
  6. Notify affected entities
  7. Finalize with correct containment

🎯 Why This Is Important

This is a real industry problem seen in:

  • food recalls
  • pharma defects
  • logistics failures

Challenges include:

  • Graph traversal
  • Partial observability
  • Lot transformations
  • Mixed inventory reasoning
  • Precision decision-making

🧩 Tasks (Scenarios)

πŸ”Ή Easy β€” Direct Recall

  • Single contaminated lot
  • Straight supply chain
  • Goal: trace and quarantine correctly

πŸ”Ή Medium β€” Relabeled Inventory

  • Lot gets renamed (LotA β†’ LotA1)
  • Goal: track transformations and quarantine

πŸ”Ή Hard β€” Mixed Inventory

  • Contaminated + safe stock mixed
  • Goal: isolate unsafe quantity without over-blocking

βš™οΈ Action Space

Action Description
inspect_node View inventory at a node
trace_lot Follow product lineage
quarantine Block unsafe stock
notify Inform affected nodes
finalize End task

πŸ“¦ Observation Structure

Each step returns:

  • recall_notice
  • inventory
  • action history
  • trace results
  • inspection data

πŸ† Reward & Grading

Reward System

    • Correct tracing
    • Correct quarantine
    • Correct notification
  • βˆ’ Wrong node
  • βˆ’ Over-quarantine
  • βˆ’ Missed unsafe stock

Final Score

Range: 0.0 β†’ 1.0

Based on:

  • accuracy
  • precision
  • efficiency

🧱 Project Structure

recalltrace-openenv/
β”‚
β”œβ”€β”€ env/                # Environment logic
β”‚   β”œβ”€β”€ env.py
β”‚   └── __init__.py
β”‚
β”œβ”€β”€ scenario/           # Scenario generation
β”‚   └── scenario.py
β”‚
β”œβ”€β”€ grader/             # Evaluation + reward
β”‚   └── grader.py
β”‚
β”œβ”€β”€ inference/          # Agent simulation
β”‚   └── inference.py
β”‚
β”œβ”€β”€ config/
β”‚   └── openenv.yaml
β”‚
β”œβ”€β”€ Dockerfile
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ README.md

🧠 What the agent learns

  • Early: quarantines 6–8 nodes randomly (F1 ~0.3)
  • Mid: starts identifying patterns (F1 ~0.6)
  • Late: infers intervention type before acting (F1 ~0.8)

The agent does not memorize β€” it infers hidden causal events under partial observability.