Meta-SRE / README.md
Anvit25's picture
Fix short_description length for HF metadata validation
8b5e393
metadata
title: Meta-SRE
emoji: πŸ”§
colorFrom: blue
colorTo: red
sdk: docker
pinned: false
license: mit
short_description: OpenEnv benchmark for SRE incident debugging

Meta-SRE OpenEnv Benchmark

A live simulation environment for training and evaluating LLM agents as Senior Site Reliability Engineers at Meta.

Connect with openenv_client

import openenv_client

env = openenv_client.connect("huggingface.co/spaces/Anvit25/Meta-SRE")
obs = env.reset(task_id=1)

done = False
while not done:
    action = your_agent.decide(obs)   # {"tool": ..., "params": {...}}
    obs, reward, done, info = env.step(action)

score = env.grade()
print(f"Score: {score['normalized_score']:.3f}")

Direct API

import requests

BASE = "https://anvit25-meta-sre.hf.space"

obs   = requests.post(f"{BASE}/reset", json={"task_id": 1}).json()
done  = False

while not done:
    action = your_agent.decide(obs)
    result = requests.post(f"{BASE}/step", json=action).json()
    obs    = result["observation"]
    done   = result["done"]

score = requests.get(f"{BASE}/grade").json()["normalized_score"]
print(f"Score: {score:.3f}")

Tasks

ID Difficulty Description
1 Easy AttributeError β€” hallucinated dict method in ad_ranking
2 Medium Silent timestamp corruption (CAPI β†’ ROAS degradation)
3 Medium-Hard DB connection pool exhaustion under load
4 Hard Circular FK migration cascading across services
5 Hard PII data exposure via DEBUG_MODE=True

Endpoints

  • POST /reset β€” start episode ({"task_id": 1-5})
  • POST /step β€” take action ({"tool": "...", "params": {...}})
  • GET /state β€” current observation
  • GET /grade β€” episode score
  • GET /tools β€” available tools list
  • GET /health β€” health check