Spaces:

Anvit25
/

Meta-SRE

Sleeping

App Files Files Community

Meta-SRE / README.md

Anvit25

Fix short_description length for HF metadata validation

8b5e393 about 1 month ago

preview code

raw

history blame contribute delete

1.84 kB

metadata

title: Meta-SRE
emoji: 🔧
colorFrom: blue
colorTo: red
sdk: docker
pinned: false
license: mit
short_description: OpenEnv benchmark for SRE incident debugging

Meta-SRE OpenEnv Benchmark

A live simulation environment for training and evaluating LLM agents as Senior Site Reliability Engineers at Meta.

Connect with openenv_client

import openenv_client

env = openenv_client.connect("huggingface.co/spaces/Anvit25/Meta-SRE")
obs = env.reset(task_id=1)

done = False
while not done:
    action = your_agent.decide(obs)   # {"tool": ..., "params": {...}}
    obs, reward, done, info = env.step(action)

score = env.grade()
print(f"Score: {score['normalized_score']:.3f}")

Direct API

import requests

BASE = "https://anvit25-meta-sre.hf.space"

obs   = requests.post(f"{BASE}/reset", json={"task_id": 1}).json()
done  = False

while not done:
    action = your_agent.decide(obs)
    result = requests.post(f"{BASE}/step", json=action).json()
    obs    = result["observation"]
    done   = result["done"]

score = requests.get(f"{BASE}/grade").json()["normalized_score"]
print(f"Score: {score:.3f}")

Tasks

ID	Difficulty	Description
1	Easy	AttributeError — hallucinated dict method in ad_ranking
2	Medium	Silent timestamp corruption (CAPI → ROAS degradation)
3	Medium-Hard	DB connection pool exhaustion under load
4	Hard	Circular FK migration cascading across services
5	Hard	PII data exposure via DEBUG_MODE=True

Endpoints

POST /reset — start episode ({"task_id": 1-5})
POST /step — take action ({"tool": "...", "params": {...}})
GET /state — current observation
GET /grade — episode score
GET /tools — available tools list
GET /health — health check