| --- |
| title: Meta-SRE |
| emoji: π§ |
| colorFrom: blue |
| colorTo: red |
| sdk: docker |
| pinned: false |
| license: mit |
| short_description: OpenEnv benchmark for SRE incident debugging |
| --- |
| |
| # Meta-SRE OpenEnv Benchmark |
|
|
| A live simulation environment for training and evaluating LLM agents as Senior Site Reliability Engineers at Meta. |
|
|
| ## Connect with openenv_client |
| |
| ```python |
| import openenv_client |
|
|
| env = openenv_client.connect("huggingface.co/spaces/Anvit25/Meta-SRE") |
| obs = env.reset(task_id=1) |
|
|
| done = False |
| while not done: |
| action = your_agent.decide(obs) # {"tool": ..., "params": {...}} |
| obs, reward, done, info = env.step(action) |
| |
| score = env.grade() |
| print(f"Score: {score['normalized_score']:.3f}") |
| ``` |
| |
| ## Direct API |
| |
| ```python |
| import requests |
| |
| BASE = "https://anvit25-meta-sre.hf.space" |
| |
| obs = requests.post(f"{BASE}/reset", json={"task_id": 1}).json() |
| done = False |
|
|
| while not done: |
| action = your_agent.decide(obs) |
| result = requests.post(f"{BASE}/step", json=action).json() |
| obs = result["observation"] |
| done = result["done"] |
| |
| score = requests.get(f"{BASE}/grade").json()["normalized_score"] |
| print(f"Score: {score:.3f}") |
| ``` |
| |
| ## Tasks |
| |
| | ID | Difficulty | Description | |
| |----|-----------|-------------| |
| | 1 | Easy | AttributeError β hallucinated dict method in ad_ranking | |
| | 2 | Medium | Silent timestamp corruption (CAPI β ROAS degradation) | |
| | 3 | Medium-Hard | DB connection pool exhaustion under load | |
| | 4 | Hard | Circular FK migration cascading across services | |
| | 5 | Hard | PII data exposure via DEBUG_MODE=True | |
| |
| ## Endpoints |
| |
| - `POST /reset` β start episode (`{"task_id": 1-5}`) |
| - `POST /step` β take action (`{"tool": "...", "params": {...}}`) |
| - `GET /state` β current observation |
| - `GET /grade` β episode score |
| - `GET /tools` β available tools list |
| - `GET /health` β health check |
|
|