Spaces:
Sleeping
Sleeping
File size: 3,738 Bytes
1dcde67 8b92d51 1dcde67 8b92d51 1dcde67 8b92d51 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 | ---
title: Stack Doctor Environment Server
emoji: 🩺
colorFrom: red
colorTo: blue
sdk: docker
pinned: false
app_port: 8000
base_path: /web
tags:
- openenv
---
# Stack Doctor
An OpenEnv RL environment where an overseer LLM diagnoses sick inference stacks. The agent probes subsystems, reconciles conflicting specialist-agent reports (some of which are wrong), and selects the minimal correct fix — all within a 6-step budget.
Inspired by real SM12x enablement bugs across vLLM, FlashInfer, SGLang, CUTLASS, and Flash-Attention.
**Track**: Statement 3.1 — World Modeling / Professional Tasks
**Sub-theme**: Fleet AI — Scalable Oversight Agents ($10K)
## Quick Start
```python
from stack_doctor import StackDoctorEnv, StackDoctorAction
import json
env = StackDoctorEnv(base_url="https://bledden-stack-doctor.hf.space")
env.connect()
# Start a new incident
result = env.reset()
print(result.observation.incident_ticket)
print(result.observation.specialist_opinions)
# Investigate
result = env.step(StackDoctorAction(message=json.dumps(
{"type": "inspect", "target": "logs"}
)))
print(result.observation.output)
# Submit diagnosis
result = env.step(StackDoctorAction(message=json.dumps(
{"type": "submit", "root_cause": "arch_guard", "fix": "relax_arch_check"}
)))
print(f"Reward: {result.reward}, Done: {result.done}")
env.close()
```
## Environment Design
### Root Causes (6) and Fixes (6)
| Root Cause | Fix | Real-World Motif |
|-----------|-----|-----------------|
| `arch_guard` | `relax_arch_check` | FlashInfer SM121 capability checks |
| `backend_whitelist` | `add_whitelist_entry` | vLLM Marlin SM121+ whitelist gaps |
| `runtime_loader` | `fix_runtime_path` | SGLang CUDA 13 runtime issues |
| `backend_selector` | `switch_backend` | CUTLASS dispatch mistakes |
| `model_config` | `update_model_config` | Model config mismatches on new hardware |
| `weight_layout` | `fix_weight_mapping` | Weight layout problems across backends |
### Specialists (4)
`runtime`, `dispatch`, `kernel`, `loader` — at least one gives wrong advice per scenario.
### Action Space (JSON)
```json
{"type":"inspect","target":"logs|config|snippet|metrics"}
{"type":"ask_specialist","specialist":"runtime|dispatch|kernel|loader"}
{"type":"apply_fix","fix":"<one of 6 fixes>"}
{"type":"submit","root_cause":"<one of 6>","fix":"<one of 6>"}
```
### Reward Function
| Event | Reward |
|-------|--------|
| `inspect` or `ask_specialist` | -0.25 |
| Correct `apply_fix` | +3 |
| Wrong `apply_fix` | -2 |
| Correct `submit` (per field) | +8 |
| Wrong `submit` (per field) | -4 |
| Solved in ≤4 steps | +2 bonus |
| Invalid action | -2 |
### Baselines
| Policy | RC Accuracy | Fix Accuracy | Avg Steps | Avg Reward |
|--------|:-:|:-:|:-:|:-:|
| Oracle | 100% | 100% | 1.0 | 18.0 |
| Heuristic | 100% | 100% | 4.0 | 20.5 |
| Random | 18% | 18% | 3.2 | -4.1 |
## Fleet AI: Specialist Oversight
The core mechanic that targets Fleet AI's $10K sub-theme: the agent must act as a **scalable oversight agent** that reconciles conflicting specialist reports. Specialists have per-scenario reliability — the agent cannot learn "always trust specialist X" and must evaluate evidence on each case.
## Training
Uses Unsloth + TRL GRPO with 3 reward signals:
1. **Valid JSON** — can the output be parsed as an action plan?
2. **Environment reward** — cumulative reward from executing the plan
3. **Efficiency** — bonus for shorter plans that still submit correctly
## Development
```bash
# Local server
cd stack_doctor && PYTHONPATH=. uvicorn server.app:app --port 8000
# Run baselines
PYTHONPATH=. python3 -c "from server.baselines import *; ..."
# Deploy to HF Spaces
openenv push --repo-id bledden/stack-doctor
```
|