stack_doctor / README.md
bledden's picture
Upload folder using huggingface_hub
8b92d51 verified
---
title: Stack Doctor Environment Server
emoji: 🩺
colorFrom: red
colorTo: blue
sdk: docker
pinned: false
app_port: 8000
base_path: /web
tags:
- openenv
---
# Stack Doctor
An OpenEnv RL environment where an overseer LLM diagnoses sick inference stacks. The agent probes subsystems, reconciles conflicting specialist-agent reports (some of which are wrong), and selects the minimal correct fix β€” all within a 6-step budget.
Inspired by real SM12x enablement bugs across vLLM, FlashInfer, SGLang, CUTLASS, and Flash-Attention.
**Track**: Statement 3.1 β€” World Modeling / Professional Tasks
**Sub-theme**: Fleet AI β€” Scalable Oversight Agents ($10K)
## Quick Start
```python
from stack_doctor import StackDoctorEnv, StackDoctorAction
import json
env = StackDoctorEnv(base_url="https://bledden-stack-doctor.hf.space")
env.connect()
# Start a new incident
result = env.reset()
print(result.observation.incident_ticket)
print(result.observation.specialist_opinions)
# Investigate
result = env.step(StackDoctorAction(message=json.dumps(
{"type": "inspect", "target": "logs"}
)))
print(result.observation.output)
# Submit diagnosis
result = env.step(StackDoctorAction(message=json.dumps(
{"type": "submit", "root_cause": "arch_guard", "fix": "relax_arch_check"}
)))
print(f"Reward: {result.reward}, Done: {result.done}")
env.close()
```
## Environment Design
### Root Causes (6) and Fixes (6)
| Root Cause | Fix | Real-World Motif |
|-----------|-----|-----------------|
| `arch_guard` | `relax_arch_check` | FlashInfer SM121 capability checks |
| `backend_whitelist` | `add_whitelist_entry` | vLLM Marlin SM121+ whitelist gaps |
| `runtime_loader` | `fix_runtime_path` | SGLang CUDA 13 runtime issues |
| `backend_selector` | `switch_backend` | CUTLASS dispatch mistakes |
| `model_config` | `update_model_config` | Model config mismatches on new hardware |
| `weight_layout` | `fix_weight_mapping` | Weight layout problems across backends |
### Specialists (4)
`runtime`, `dispatch`, `kernel`, `loader` β€” at least one gives wrong advice per scenario.
### Action Space (JSON)
```json
{"type":"inspect","target":"logs|config|snippet|metrics"}
{"type":"ask_specialist","specialist":"runtime|dispatch|kernel|loader"}
{"type":"apply_fix","fix":"<one of 6 fixes>"}
{"type":"submit","root_cause":"<one of 6>","fix":"<one of 6>"}
```
### Reward Function
| Event | Reward |
|-------|--------|
| `inspect` or `ask_specialist` | -0.25 |
| Correct `apply_fix` | +3 |
| Wrong `apply_fix` | -2 |
| Correct `submit` (per field) | +8 |
| Wrong `submit` (per field) | -4 |
| Solved in ≀4 steps | +2 bonus |
| Invalid action | -2 |
### Baselines
| Policy | RC Accuracy | Fix Accuracy | Avg Steps | Avg Reward |
|--------|:-:|:-:|:-:|:-:|
| Oracle | 100% | 100% | 1.0 | 18.0 |
| Heuristic | 100% | 100% | 4.0 | 20.5 |
| Random | 18% | 18% | 3.2 | -4.1 |
## Fleet AI: Specialist Oversight
The core mechanic that targets Fleet AI's $10K sub-theme: the agent must act as a **scalable oversight agent** that reconciles conflicting specialist reports. Specialists have per-scenario reliability β€” the agent cannot learn "always trust specialist X" and must evaluate evidence on each case.
## Training
Uses Unsloth + TRL GRPO with 3 reward signals:
1. **Valid JSON** β€” can the output be parsed as an action plan?
2. **Environment reward** β€” cumulative reward from executing the plan
3. **Efficiency** β€” bonus for shorter plans that still submit correctly
## Development
```bash
# Local server
cd stack_doctor && PYTHONPATH=. uvicorn server.app:app --port 8000
# Run baselines
PYTHONPATH=. python3 -c "from server.baselines import *; ..."
# Deploy to HF Spaces
openenv push --repo-id bledden/stack-doctor
```