Spaces:

Anvit25
/

Meta-SRE

Sleeping

File size: 1,837 Bytes

ad6248e
 
 
 
 
 
 
 
8b5e393
ad6248e

---
title: Meta-SRE
emoji: 🔧
colorFrom: blue
colorTo: red
sdk: docker
pinned: false
license: mit
short_description: OpenEnv benchmark for SRE incident debugging
---

# Meta-SRE OpenEnv Benchmark

A live simulation environment for training and evaluating LLM agents as Senior Site Reliability Engineers at Meta.

## Connect with openenv_client

```python
import openenv_client

env = openenv_client.connect("huggingface.co/spaces/Anvit25/Meta-SRE")
obs = env.reset(task_id=1)

done = False
while not done:
    action = your_agent.decide(obs)   # {"tool": ..., "params": {...}}
    obs, reward, done, info = env.step(action)

score = env.grade()
print(f"Score: {score['normalized_score']:.3f}")
```

## Direct API

```python
import requests

BASE = "https://anvit25-meta-sre.hf.space"

obs   = requests.post(f"{BASE}/reset", json={"task_id": 1}).json()
done  = False

while not done:
    action = your_agent.decide(obs)
    result = requests.post(f"{BASE}/step", json=action).json()
    obs    = result["observation"]
    done   = result["done"]

score = requests.get(f"{BASE}/grade").json()["normalized_score"]
print(f"Score: {score:.3f}")
```

## Tasks

| ID | Difficulty | Description |
|----|-----------|-------------|
| 1  | Easy | AttributeError — hallucinated dict method in ad_ranking |
| 2  | Medium | Silent timestamp corruption (CAPI → ROAS degradation) |
| 3  | Medium-Hard | DB connection pool exhaustion under load |
| 4  | Hard | Circular FK migration cascading across services |
| 5  | Hard | PII data exposure via DEBUG_MODE=True |

## Endpoints

- `POST /reset` — start episode (`{"task_id": 1-5}`)
- `POST /step` — take action (`{"tool": "...", "params": {...}}`)
- `GET /state` — current observation
- `GET /grade` — episode score
- `GET /tools` — available tools list
- `GET /health` — health check