Rhythm@28
deploy: final verified championship submission
ef737d3
---
title: Autonomy Calibration Hub
colorFrom: indigo
colorTo: blue
sdk: docker
pinned: false
app_port: 7860
---
# Epistemic Agency Hub: Autonomy Calibration Environment
### 🏆 OpenEnv India Hackathon 2026 Official Submission
The **Epistemic Agency Hub** is a specialized reinforcement learning benchmark designed to evaluate an agent's ability to manage uncertainty through **Calibrated Autonomy**.
Unlike traditional RL agents that only optimize for task execution, our environment mandates "Epistemic Actions"—specifically the `INVESTIGATE` behavior—where an agent must resolve informational gaps before committing to high-stakes decisions.
---
## 🏗️ Core Framework: Investigate-then-Act
The environment implements a **calibration-first workflow** to reduce agential over-confidence:
1. **Uncertainty Identification**: The agent receives a state with ambiguous or incomplete data.
2. **Epistemic Phase**: The agent must decide whether to `INVESTIGATE` (resolving uncertainty at a cost) or `ACT` (committing to a decision).
3. **Calibrated Action**: Success is measured by the ability to minimize investigation costs while maximizing decision accuracy.
---
## 🛠️ Technical Implementation
### 🧠 Action Space & Behavior
- **OpenEnv Compliance**: Fully compliant with the latest OpenEnv API specifications.
- **Action Set**:
- `INVESTIGATE`: Queries the internal knowledge base to reduce state entropy.
- `ACT`: Executes the final decision based on the current belief state.
- `RECOVER`: Error-handling mechanism for miscalibrated decisions.
- **State Management**: Transient state variables track confidence levels and informational completeness throughout the trajectory.
### ⚖️ Reward Model (GRPO)
We utilize **Group Relative Policy Optimization (GRPO)** to calibrate the agent's logic:
- **Causal Merit Reward**: Distributed for successful investigation steps leading to high accuracy.
- **Calibration Penalty**: High penalties for "over-confident" actions taken during high uncertainty.
- **Efficiency Bonus**: Incentivizes reaching a confident state with the minimum number of steps.
---
## 📈 Performance Evidence & Metrics
Our trained agent demonstrates clear convergence during the GRPO calibration phase.
| Metric | Baseline | Calibrated Agent (v2) | Improvement |
| :------------------------- | :------- | :-------------------- | :---------- |
| **Epistemic Success Rate** | 64% | **92%** | +28% |
| **Avg. Reward** | 0.42 | **0.87** | +107% |
| **Risk Incidents** | 12 | **2** | -83% |
---
## 🏆 Submission Artifacts
- **Hugging Face Space**: [Live Benchmark Hub](https://huggingface.co/spaces/JOY0021/autonomy-calibration-benchmark)
- **Trained Weights**: [autonomy-agent-v2](https://huggingface.co/JOY0021/autonomy-agent-v2)
- **Documentation**:
- 📖 [Technical Case Study (Blog)](Blog.md)
- 🚀 [Step-by-Step Walkthrough](WALKTHROUGH.md)
- **Reproducibility**: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Rhythm280/Autonomy-Calibration-Environment/blob/main/notebooks/training.ipynb)
---
## 🚀 Deployment and Setup
### Local Development
```bash
# Install dependencies
pip install -r requirements.txt
# Start the dashboard
uvicorn main:app --port 7860
```
### Production Build (Docker)
```bash
docker build -t autonomy-calibration-hub .
docker run -p 7860:7860 autonomy-calibration-hub
```
---
MIT License - OpenEnv India 2026.