Spaces:

Majen
/

new_model

Sleeping

App Files Files Community

new_model / README.md

Ayu

feat: RecallTrace Tasks 1-9 complete - belief calibration + curriculum + plots

d19137b about 1 month ago

preview code

raw

history blame contribute delete

6.46 kB

metadata

title: RecallTrace OpenEnv
emoji: 🚨
colorFrom: red
colorTo: blue
sdk: docker
app_port: 7860
pinned: false

🚀 Quick Start (Run in one command)

pip install -r requirements.txt
python run_selfplay.py

(No API keys, no GPUs, runs in <2 seconds on CPU)

RecallTrace: Causal Inference via Adversarial Self-Play

An RL agent that doesn't just learn to detect contamination — it learns to infer the hidden causal intervention behind it.

Trained via adversarial self-play, where an adversary learns to hide better as the investigator learns to reason better.

🎥 What you'll see

Agent improves from random (spray-and-pray) to precise, belief-calibrated quarantine.
F1 score increases to ~1.0 over 200 episodes.
Nodes quarantined drops from 8.3/episode to 3.1/episode.
Adversary adapts to agent weaknesses dynamically.

📊 Proof of Learning

1. The Learning Curves

(Generated automatically when you run the script)

2. Before vs After Behavior

(Untrained vs Trained Agent Comparison)

🧠 Why This Is Unique

Causal Inference (not Graph Traversal): 30-50% of the graph edges are hidden. The agent must perform abductive reasoning to identify which hidden causal intervention (relabeling, mixing, record deletion) produced the observed contamination pattern.
Partial Observability: The agent relies on a probabilistic belief state (P(contaminated) per node) and tool calls to reduce entropy.
Adversarial Self-Play (Theme 4): The environment's difficulty is not static. An adversary agent chooses where to place interventions, adapting its curriculum based on the investigator's failure modes.
Belief-Based Decisions (Theme 3.1): Quarantines are only rewarded if the agent is confident (P > 0.8). Uncalibrated guesses are heavily penalized.

⚙️ How It Works

The Environment: A procedural generator builds a unique contamination propagation graph every episode with decoys, false positives, and hidden interventions.
The Investigator (Agent 1): Inspects nodes, traces lineages, and cross-references data to find contamination and quarantine it. Rewarded for precision and recall (+2.0 for correct, -1.5 for incorrect).
The Adversary (Agent 2): Chooses intervention types and placements. Rewarded exclusively when the Investigator fails.

🧪 Reproducibility

Runs in <2 seconds on CPU.
No external APIs or heavy models required.
Deterministic seeds used for exact evaluation and metric reproducibility.

📦 Project Structure

recalltrace-openenv/
├── run_selfplay.py        # ENTRY POINT
├── app.py                 # Hugging Face Gradio UI
├── README.md              # Project Story
├── PITCH.md               # 3-Minute Mentor Pitch Script
├── MENTOR_PREP.md         # Fast-prep for live judging
├── PITCH_LANGUAGE.md      # Language guidelines
├── architecture.html      # Visual Flow Diagram
│
├── selfplay/              # Core Logic (Investigator, Adversary, Tracker)
├── env/                   # Original OpenEnv Environment definition
│
├── plots/                 # Auto-generated Demo Imagery
│   ├── selfplay_training.png
│   ├── before_after_demo.png
│   └── episode_comparison.png

sdk: docker app_port: 7860

🚀 RecallTrace OpenEnv

RecallTrace is a real-world AI environment designed for product recall tracing and precision containment.

It simulates how companies handle:

contaminated product recalls
supply chain tracing
selective quarantine decisions

This environment evaluates agent reasoning + decision-making, not just correctness.

🧠 What This Environment Does

Given a recall notice (e.g., "Lot A is contaminated"), the agent must:

Trace where the product went
Identify affected nodes (warehouses, stores)
Handle relabeling / transformations
Quarantine only unsafe inventory
Avoid blocking safe stock
Notify affected entities
Finalize with correct containment

🎯 Why This Is Important

This is a real industry problem seen in:

food recalls
pharma defects
logistics failures

Challenges include:

Graph traversal
Partial observability
Lot transformations
Mixed inventory reasoning
Precision decision-making

🧩 Tasks (Scenarios)

🔹 Easy — Direct Recall

Single contaminated lot
Straight supply chain
Goal: trace and quarantine correctly

🔹 Medium — Relabeled Inventory

Lot gets renamed (LotA → LotA1)
Goal: track transformations and quarantine

🔹 Hard — Mixed Inventory

Contaminated + safe stock mixed
Goal: isolate unsafe quantity without over-blocking

⚙️ Action Space

Action	Description
inspect_node	View inventory at a node
trace_lot	Follow product lineage
quarantine	Block unsafe stock
notify	Inform affected nodes
finalize	End task

📦 Observation Structure

Each step returns:

recall_notice
inventory
action history
trace results
inspection data

🏆 Reward & Grading

Reward System

- Correct tracing
- Correct quarantine
- Correct notification
− Wrong node
− Over-quarantine
− Missed unsafe stock

Final Score

Range: 0.0 → 1.0

Based on:

accuracy
precision
efficiency

🧱 Project Structure

recalltrace-openenv/
│
├── env/                # Environment logic
│   ├── env.py
│   └── __init__.py
│
├── scenario/           # Scenario generation
│   └── scenario.py
│
├── grader/             # Evaluation + reward
│   └── grader.py
│
├── inference/          # Agent simulation
│   └── inference.py
│
├── config/
│   └── openenv.yaml
│
├── Dockerfile
├── requirements.txt
├── README.md

🧠 What the agent learns

Early: quarantines 6–8 nodes randomly (F1 ~0.3)
Mid: starts identifying patterns (F1 ~0.6)
Late: infers intervention type before acting (F1 ~0.8)

The agent does not memorize — it infers hidden causal events under partial observability.