Spaces:
Sleeping
title: Stack Doctor Environment Server
emoji: 🩺
colorFrom: red
colorTo: blue
sdk: docker
pinned: false
app_port: 8000
base_path: /web
tags:
- openenv
Stack Doctor
An OpenEnv RL environment where an overseer LLM diagnoses sick inference stacks. The agent probes subsystems, reconciles conflicting specialist-agent reports (some of which are wrong), and selects the minimal correct fix — all within a 6-step budget.
Inspired by real SM12x enablement bugs across vLLM, FlashInfer, SGLang, CUTLASS, and Flash-Attention.
Track: Statement 3.1 — World Modeling / Professional Tasks Sub-theme: Fleet AI — Scalable Oversight Agents ($10K)
Quick Start
from stack_doctor import StackDoctorEnv, StackDoctorAction
import json
env = StackDoctorEnv(base_url="https://bledden-stack-doctor.hf.space")
env.connect()
# Start a new incident
result = env.reset()
print(result.observation.incident_ticket)
print(result.observation.specialist_opinions)
# Investigate
result = env.step(StackDoctorAction(message=json.dumps(
{"type": "inspect", "target": "logs"}
)))
print(result.observation.output)
# Submit diagnosis
result = env.step(StackDoctorAction(message=json.dumps(
{"type": "submit", "root_cause": "arch_guard", "fix": "relax_arch_check"}
)))
print(f"Reward: {result.reward}, Done: {result.done}")
env.close()
Environment Design
Root Causes (6) and Fixes (6)
| Root Cause | Fix | Real-World Motif |
|---|---|---|
arch_guard |
relax_arch_check |
FlashInfer SM121 capability checks |
backend_whitelist |
add_whitelist_entry |
vLLM Marlin SM121+ whitelist gaps |
runtime_loader |
fix_runtime_path |
SGLang CUDA 13 runtime issues |
backend_selector |
switch_backend |
CUTLASS dispatch mistakes |
model_config |
update_model_config |
Model config mismatches on new hardware |
weight_layout |
fix_weight_mapping |
Weight layout problems across backends |
Specialists (4)
runtime, dispatch, kernel, loader — at least one gives wrong advice per scenario.
Action Space (JSON)
{"type":"inspect","target":"logs|config|snippet|metrics"}
{"type":"ask_specialist","specialist":"runtime|dispatch|kernel|loader"}
{"type":"apply_fix","fix":"<one of 6 fixes>"}
{"type":"submit","root_cause":"<one of 6>","fix":"<one of 6>"}
Reward Function
| Event | Reward |
|---|---|
inspect or ask_specialist |
-0.25 |
Correct apply_fix |
+3 |
Wrong apply_fix |
-2 |
Correct submit (per field) |
+8 |
Wrong submit (per field) |
-4 |
| Solved in ≤4 steps | +2 bonus |
| Invalid action | -2 |
Baselines
| Policy | RC Accuracy | Fix Accuracy | Avg Steps | Avg Reward |
|---|---|---|---|---|
| Oracle | 100% | 100% | 1.0 | 18.0 |
| Heuristic | 100% | 100% | 4.0 | 20.5 |
| Random | 18% | 18% | 3.2 | -4.1 |
Fleet AI: Specialist Oversight
The core mechanic that targets Fleet AI's $10K sub-theme: the agent must act as a scalable oversight agent that reconciles conflicting specialist reports. Specialists have per-scenario reliability — the agent cannot learn "always trust specialist X" and must evaluate evidence on each case.
Training
Uses Unsloth + TRL GRPO with 3 reward signals:
- Valid JSON — can the output be parsed as an action plan?
- Environment reward — cumulative reward from executing the plan
- Efficiency — bonus for shorter plans that still submit correctly
Development
# Local server
cd stack_doctor && PYTHONPATH=. uvicorn server.app:app --port 8000
# Run baselines
PYTHONPATH=. python3 -c "from server.baselines import *; ..."
# Deploy to HF Spaces
openenv push --repo-id bledden/stack-doctor