stack_doctor / README.md
bledden's picture
Upload folder using huggingface_hub
8b92d51 verified
metadata
title: Stack Doctor Environment Server
emoji: 🩺
colorFrom: red
colorTo: blue
sdk: docker
pinned: false
app_port: 8000
base_path: /web
tags:
  - openenv

Stack Doctor

An OpenEnv RL environment where an overseer LLM diagnoses sick inference stacks. The agent probes subsystems, reconciles conflicting specialist-agent reports (some of which are wrong), and selects the minimal correct fix — all within a 6-step budget.

Inspired by real SM12x enablement bugs across vLLM, FlashInfer, SGLang, CUTLASS, and Flash-Attention.

Track: Statement 3.1 — World Modeling / Professional Tasks Sub-theme: Fleet AI — Scalable Oversight Agents ($10K)

Quick Start

from stack_doctor import StackDoctorEnv, StackDoctorAction
import json

env = StackDoctorEnv(base_url="https://bledden-stack-doctor.hf.space")
env.connect()

# Start a new incident
result = env.reset()
print(result.observation.incident_ticket)
print(result.observation.specialist_opinions)

# Investigate
result = env.step(StackDoctorAction(message=json.dumps(
    {"type": "inspect", "target": "logs"}
)))
print(result.observation.output)

# Submit diagnosis
result = env.step(StackDoctorAction(message=json.dumps(
    {"type": "submit", "root_cause": "arch_guard", "fix": "relax_arch_check"}
)))
print(f"Reward: {result.reward}, Done: {result.done}")

env.close()

Environment Design

Root Causes (6) and Fixes (6)

Root Cause Fix Real-World Motif
arch_guard relax_arch_check FlashInfer SM121 capability checks
backend_whitelist add_whitelist_entry vLLM Marlin SM121+ whitelist gaps
runtime_loader fix_runtime_path SGLang CUDA 13 runtime issues
backend_selector switch_backend CUTLASS dispatch mistakes
model_config update_model_config Model config mismatches on new hardware
weight_layout fix_weight_mapping Weight layout problems across backends

Specialists (4)

runtime, dispatch, kernel, loader — at least one gives wrong advice per scenario.

Action Space (JSON)

{"type":"inspect","target":"logs|config|snippet|metrics"}
{"type":"ask_specialist","specialist":"runtime|dispatch|kernel|loader"}
{"type":"apply_fix","fix":"<one of 6 fixes>"}
{"type":"submit","root_cause":"<one of 6>","fix":"<one of 6>"}

Reward Function

Event Reward
inspect or ask_specialist -0.25
Correct apply_fix +3
Wrong apply_fix -2
Correct submit (per field) +8
Wrong submit (per field) -4
Solved in ≤4 steps +2 bonus
Invalid action -2

Baselines

Policy RC Accuracy Fix Accuracy Avg Steps Avg Reward
Oracle 100% 100% 1.0 18.0
Heuristic 100% 100% 4.0 20.5
Random 18% 18% 3.2 -4.1

Fleet AI: Specialist Oversight

The core mechanic that targets Fleet AI's $10K sub-theme: the agent must act as a scalable oversight agent that reconciles conflicting specialist reports. Specialists have per-scenario reliability — the agent cannot learn "always trust specialist X" and must evaluate evidence on each case.

Training

Uses Unsloth + TRL GRPO with 3 reward signals:

  1. Valid JSON — can the output be parsed as an action plan?
  2. Environment reward — cumulative reward from executing the plan
  3. Efficiency — bonus for shorter plans that still submit correctly

Development

# Local server
cd stack_doctor && PYTHONPATH=. uvicorn server.app:app --port 8000

# Run baselines
PYTHONPATH=. python3 -c "from server.baselines import *; ..."

# Deploy to HF Spaces
openenv push --repo-id bledden/stack-doctor