--- title: Meta-SRE emoji: 🔧 colorFrom: blue colorTo: red sdk: docker pinned: false license: mit short_description: OpenEnv benchmark for SRE incident debugging --- # Meta-SRE OpenEnv Benchmark A live simulation environment for training and evaluating LLM agents as Senior Site Reliability Engineers at Meta. ## Connect with openenv_client ```python import openenv_client env = openenv_client.connect("huggingface.co/spaces/Anvit25/Meta-SRE") obs = env.reset(task_id=1) done = False while not done: action = your_agent.decide(obs) # {"tool": ..., "params": {...}} obs, reward, done, info = env.step(action) score = env.grade() print(f"Score: {score['normalized_score']:.3f}") ``` ## Direct API ```python import requests BASE = "https://anvit25-meta-sre.hf.space" obs = requests.post(f"{BASE}/reset", json={"task_id": 1}).json() done = False while not done: action = your_agent.decide(obs) result = requests.post(f"{BASE}/step", json=action).json() obs = result["observation"] done = result["done"] score = requests.get(f"{BASE}/grade").json()["normalized_score"] print(f"Score: {score:.3f}") ``` ## Tasks | ID | Difficulty | Description | |----|-----------|-------------| | 1 | Easy | AttributeError — hallucinated dict method in ad_ranking | | 2 | Medium | Silent timestamp corruption (CAPI → ROAS degradation) | | 3 | Medium-Hard | DB connection pool exhaustion under load | | 4 | Hard | Circular FK migration cascading across services | | 5 | Hard | PII data exposure via DEBUG_MODE=True | ## Endpoints - `POST /reset` — start episode (`{"task_id": 1-5}`) - `POST /step` — take action (`{"tool": "...", "params": {...}}`) - `GET /state` — current observation - `GET /grade` — episode score - `GET /tools` — available tools list - `GET /health` — health check