Spaces:

bledden
/

stack_doctor

Sleeping

App Files Files Community

stack_doctor / README.md

bledden

Upload folder using huggingface_hub

8b92d51 verified 3 days ago

preview code

raw

history blame contribute delete

3.74 kB

	---
	title: Stack Doctor Environment Server
	emoji: 🩺
	colorFrom: red
	colorTo: blue
	sdk: docker
	pinned: false
	app_port: 8000
	base_path: /web
	tags:
	- openenv
	---

	# Stack Doctor

	An OpenEnv RL environment where an overseer LLM diagnoses sick inference stacks. The agent probes subsystems, reconciles conflicting specialist-agent reports (some of which are wrong), and selects the minimal correct fix — all within a 6-step budget.

	Inspired by real SM12x enablement bugs across vLLM, FlashInfer, SGLang, CUTLASS, and Flash-Attention.

	Track: Statement 3.1 — World Modeling / Professional Tasks
	Sub-theme: Fleet AI — Scalable Oversight Agents ($10K)

	## Quick Start

	```python
	from stack_doctor import StackDoctorEnv, StackDoctorAction
	import json

	env = StackDoctorEnv(base_url="https://bledden-stack-doctor.hf.space")
	env.connect()

	# Start a new incident
	result = env.reset()
	print(result.observation.incident_ticket)
	print(result.observation.specialist_opinions)

	# Investigate
	result = env.step(StackDoctorAction(message=json.dumps(
	{"type": "inspect", "target": "logs"}
	)))
	print(result.observation.output)

	# Submit diagnosis
	result = env.step(StackDoctorAction(message=json.dumps(
	{"type": "submit", "root_cause": "arch_guard", "fix": "relax_arch_check"}
	)))
	print(f"Reward: {result.reward}, Done: {result.done}")

	env.close()
	```

	## Environment Design

	### Root Causes (6) and Fixes (6)

	\| Root Cause \| Fix \| Real-World Motif \|
	\|-----------\|-----\|-----------------\|
	\| `arch_guard` \| `relax_arch_check` \| FlashInfer SM121 capability checks \|
	\| `backend_whitelist` \| `add_whitelist_entry` \| vLLM Marlin SM121+ whitelist gaps \|
	\| `runtime_loader` \| `fix_runtime_path` \| SGLang CUDA 13 runtime issues \|
	\| `backend_selector` \| `switch_backend` \| CUTLASS dispatch mistakes \|
	\| `model_config` \| `update_model_config` \| Model config mismatches on new hardware \|
	\| `weight_layout` \| `fix_weight_mapping` \| Weight layout problems across backends \|

	### Specialists (4)

	`runtime`, `dispatch`, `kernel`, `loader` — at least one gives wrong advice per scenario.

	### Action Space (JSON)

	```json
	{"type":"inspect","target":"logs\|config\|snippet\|metrics"}
	{"type":"ask_specialist","specialist":"runtime\|dispatch\|kernel\|loader"}
	{"type":"apply_fix","fix":"<one of 6 fixes>"}
	{"type":"submit","root_cause":"<one of 6>","fix":"<one of 6>"}
	```

	### Reward Function

	\| Event \| Reward \|
	\|-------\|--------\|
	\| `inspect` or `ask_specialist` \| -0.25 \|
	\| Correct `apply_fix` \| +3 \|
	\| Wrong `apply_fix` \| -2 \|
	\| Correct `submit` (per field) \| +8 \|
	\| Wrong `submit` (per field) \| -4 \|
	\| Solved in ≤4 steps \| +2 bonus \|
	\| Invalid action \| -2 \|

	### Baselines

	\| Policy \| RC Accuracy \| Fix Accuracy \| Avg Steps \| Avg Reward \|
	\|--------\|:-:\|:-:\|:-:\|:-:\|
	\| Oracle \| 100% \| 100% \| 1.0 \| 18.0 \|
	\| Heuristic \| 100% \| 100% \| 4.0 \| 20.5 \|
	\| Random \| 18% \| 18% \| 3.2 \| -4.1 \|

	## Fleet AI: Specialist Oversight

	The core mechanic that targets Fleet AI's $10K sub-theme: the agent must act as a scalable oversight agent that reconciles conflicting specialist reports. Specialists have per-scenario reliability — the agent cannot learn "always trust specialist X" and must evaluate evidence on each case.

	## Training

	Uses Unsloth + TRL GRPO with 3 reward signals:
	1. Valid JSON — can the output be parsed as an action plan?
	2. Environment reward — cumulative reward from executing the plan
	3. Efficiency — bonus for shorter plans that still submit correctly

	## Development

	```bash
	# Local server
	cd stack_doctor && PYTHONPATH=. uvicorn server.app:app --port 8000

	# Run baselines
	PYTHONPATH=. python3 -c "from server.baselines import *; ..."

	# Deploy to HF Spaces
	openenv push --repo-id bledden/stack-doctor
	```