atlasops / docs /HF_SPACE_SETUP.md
Harikishanth R
fix: skip-kubectl + scroll + health β€” HF Space ready
7e9a520

Hugging Face Spaces β€” wired 7B agents + 72B judge

AtlasOps speaks two OpenAI-compatible HTTP endpoints:

Role Env vars Typical model id
Incident agents (triage→comms) VLLM_BASE, AGENT_MODEL, token Your merged 7B on Hub or Qwen/Qwen2.5-7B-Instruct
Judge (scores responses; benchmarks + optional live ribbon) JUDGE_URL, JUDGE_MODEL, token Smaller HF model if 72B is blocked on quota

One-switch setup on the Space root

Configure Space β†’ Settings β†’ Variables and secrets:

  1. HF_TOKEN β€” your HF access token (read plus Inference / fine-grained Inference permission if you use Router).
  2. ATLASOPS_USE_HF_INFERENCE = 1

This activates config/hf_space_env.py: it copies HF_TOKEN into LLM_API_KEY and JUDGE_API_KEY, and points both routers at https://router.huggingface.co/v1 when you were still using localhost placeholders.

Optional override:

HF_INFERENCE_BASE=https://router.huggingface.co/v1

Model IDs agents will call

Required:

AGENT_MODEL=<your-namespace/your-atlasops-merged-7b>
JUDGE_MODEL=Qwen/Qwen2.5-72B-Instruct-AWQ

(or any Hub model Router actually serves under your billing tier)

72B AWQ often needs a paid Inference allotment or dedicated Inference Endpoint.
If Router returns 429/403 on 72B, set for example JUDGE_MODEL=Qwen/Qwen2.5-32B-Instruct temporarily β€” AtlasOps keeps working.

Putting your GRPO weights on Hugging Face (7B)

The coordinator sends only a model string (no silent LoRA layer). Serving options:

  1. Merge LoRA locally into the base checkpoint, upload the merged weights to your-org/atlasops-7b-grpo, set AGENT_MODEL to that repo (see training/merge_lora_for_hub.py after pip install -e ".[train]").
  2. Self-hosted vLLM + --enable-lora on AMD hardware (not HF Space CPU) β€” would require coordinator changes to attach LoRA per request unless you bake merged weights yourself.

For most hackathon demos merged Hub model + Router is the least painful.

Live judge inside the Ops UI

When ATLASOPS_USE_HF_INFERENCE=1, the coordinator fires one judge call after comms, and the timeline prints the score (judge_trajectory tool line).

Explicit flags:

ATLASOPS_LIVE_JUDGE=1   # force ON
ATLASOPS_LIVE_JUDGE=0   # force OFF even with HF inference pack

Local MI300x with JUDGE_URL=http://localhost:8001/v1: keep ATLASOPS_USE_HF_INFERENCE unset, set ATLASOPS_LIVE_JUDGE=1 if you still want judge lines in Grafana.

Health checks after deploy

  • GET https://<space>/health β€” agent + judge model names + bases (inspect JSON).
  • GET https://<space>/api/health β€” coordinator copy with live_judge Boolean.

Neither endpoint prints raw tokens.

After redeploy with the bundled UI, the footer bar polls /health and shows Discord webhook βœ“ or βœ— add DISCORD_WEBHOOK_URL so you know whether DISCORD_WEBHOOK_URL reached the Space (no webhook URL is ever rendered).

Summary checklist

HF_TOKEN=<secret>
ATLASOPS_USE_HF_INFERENCE=1
AGENT_MODEL=your-org/your-trained-merged-7b
JUDGE_MODEL=Qwen/Qwen2.5-72B-Instruct-AWQ    # or a smaller HF model Router allows
BACKEND=openai                                 # optional; bootstrap sets default when ATLASOPS_USE_HF_INFERENCE=1

Redeploy the Space; trigger chaos from the sidebar and confirm timeline shows both tool calls (judge / judge_trajectory) and agent turns.

Space has no kubeconfig (Pod Kill returns 500)

The UI’s Inject endpoint runs kubectl apply on chaos manifests. Typical HF Space containers do not have credentials to your GKE API server, so you will see POST /inject β†’ 500 in logs and the coordinator never starts.

Set this Space variable so inject skips kubectl but still schedules the incident pipeline (reads live Alertmanager after a short delay):

ATLASOPS_SKIP_KUBECTL_INJECT=1

Real fault injection still requires a reachable cluster from somewhere that has kubeconfig (CI, laptop, or another service). For a public demo, you can rely on already-firing alerts in Alertmanager or webhook-driven incidents.


Discord (why nothing appears in your server)

AtlasOps does not use a Discord β€œbot” that shows online in the member list. It posts through an Incoming Webhook URL.

  1. Discord β†’ your server β†’ Server Settings β†’ Integrations β†’ Webhooks β†’ New Webhook β†’ pick #general (or a channel) β†’ copy Webhook URL.
  2. In HF Space Secrets, add DISCORD_WEBHOOK_URL = that URL (same as agents/tools/comms.py expects).
  3. A message is sent only when the comms agent runs the slack_post_update tool during a real incident (not the browser-only UI preview). If the agent chain is stuck before comms (e.g. LLM unreachable), Discord stays empty β€” fix VLLM_BASE / HF inference first.

Optional: SLACK_WEBHOOK_URL for Slack in parallel; both can be set.