File size: 1,837 Bytes
ad6248e
 
 
 
 
 
 
 
8b5e393
ad6248e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
---
title: Meta-SRE
emoji: πŸ”§
colorFrom: blue
colorTo: red
sdk: docker
pinned: false
license: mit
short_description: OpenEnv benchmark for SRE incident debugging
---

# Meta-SRE OpenEnv Benchmark

A live simulation environment for training and evaluating LLM agents as Senior Site Reliability Engineers at Meta.

## Connect with openenv_client

```python
import openenv_client

env = openenv_client.connect("huggingface.co/spaces/Anvit25/Meta-SRE")
obs = env.reset(task_id=1)

done = False
while not done:
    action = your_agent.decide(obs)   # {"tool": ..., "params": {...}}
    obs, reward, done, info = env.step(action)

score = env.grade()
print(f"Score: {score['normalized_score']:.3f}")
```

## Direct API

```python
import requests

BASE = "https://anvit25-meta-sre.hf.space"

obs   = requests.post(f"{BASE}/reset", json={"task_id": 1}).json()
done  = False

while not done:
    action = your_agent.decide(obs)
    result = requests.post(f"{BASE}/step", json=action).json()
    obs    = result["observation"]
    done   = result["done"]

score = requests.get(f"{BASE}/grade").json()["normalized_score"]
print(f"Score: {score:.3f}")
```

## Tasks

| ID | Difficulty | Description |
|----|-----------|-------------|
| 1  | Easy | AttributeError β€” hallucinated dict method in ad_ranking |
| 2  | Medium | Silent timestamp corruption (CAPI β†’ ROAS degradation) |
| 3  | Medium-Hard | DB connection pool exhaustion under load |
| 4  | Hard | Circular FK migration cascading across services |
| 5  | Hard | PII data exposure via DEBUG_MODE=True |

## Endpoints

- `POST /reset` β€” start episode (`{"task_id": 1-5}`)
- `POST /step` β€” take action (`{"tool": "...", "params": {...}}`)
- `GET /state` β€” current observation
- `GET /grade` β€” episode score
- `GET /tools` β€” available tools list
- `GET /health` β€” health check