Spaces:

JOY0021
/

autonomy-calibration-benchmark

Paused

App Files Files Community

autonomy-calibration-benchmark / README.md

Rhythm@28

deploy: final verified championship submission

ef737d3 about 1 month ago

preview code

raw

history blame contribute delete

3.64 kB

metadata

title: Autonomy Calibration Hub
colorFrom: indigo
colorTo: blue
sdk: docker
pinned: false
app_port: 7860

Epistemic Agency Hub: Autonomy Calibration Environment

🏆 OpenEnv India Hackathon 2026 Official Submission

The Epistemic Agency Hub is a specialized reinforcement learning benchmark designed to evaluate an agent's ability to manage uncertainty through Calibrated Autonomy.

Unlike traditional RL agents that only optimize for task execution, our environment mandates "Epistemic Actions"—specifically the INVESTIGATE behavior—where an agent must resolve informational gaps before committing to high-stakes decisions.

🏗️ Core Framework: Investigate-then-Act

The environment implements a calibration-first workflow to reduce agential over-confidence:

Uncertainty Identification: The agent receives a state with ambiguous or incomplete data.
Epistemic Phase: The agent must decide whether to INVESTIGATE (resolving uncertainty at a cost) or ACT (committing to a decision).
Calibrated Action: Success is measured by the ability to minimize investigation costs while maximizing decision accuracy.

🛠️ Technical Implementation

🧠 Action Space & Behavior

OpenEnv Compliance: Fully compliant with the latest OpenEnv API specifications.
Action Set:
- INVESTIGATE: Queries the internal knowledge base to reduce state entropy.
- ACT: Executes the final decision based on the current belief state.
- RECOVER: Error-handling mechanism for miscalibrated decisions.
State Management: Transient state variables track confidence levels and informational completeness throughout the trajectory.

⚖️ Reward Model (GRPO)

We utilize Group Relative Policy Optimization (GRPO) to calibrate the agent's logic:

Causal Merit Reward: Distributed for successful investigation steps leading to high accuracy.
Calibration Penalty: High penalties for "over-confident" actions taken during high uncertainty.
Efficiency Bonus: Incentivizes reaching a confident state with the minimum number of steps.

📈 Performance Evidence & Metrics

Our trained agent demonstrates clear convergence during the GRPO calibration phase.

Metric	Baseline	Calibrated Agent (v2)	Improvement
Epistemic Success Rate	64%	92%	+28%
Avg. Reward	0.42	0.87	+107%
Risk Incidents	12	2	-83%

🏆 Submission Artifacts

Hugging Face Space: Live Benchmark Hub
Trained Weights: autonomy-agent-v2
Documentation:
- 📖 Technical Case Study (Blog)
- 🚀 Step-by-Step Walkthrough
Reproducibility:

🚀 Deployment and Setup

Local Development

# Install dependencies
pip install -r requirements.txt

# Start the dashboard
uvicorn main:app --port 7860

Production Build (Docker)

docker build -t autonomy-calibration-hub .
docker run -p 7860:7860 autonomy-calibration-hub

MIT License - OpenEnv India 2026.