Spaces:

Anvit25
/

Meta-SRE

Sleeping

App Files Files Community

Meta-SRE / README.md

Anvit25

Fix short_description length for HF metadata validation

8b5e393 about 1 month ago

preview code

raw

history blame contribute delete

1.84 kB

	---
	title: Meta-SRE
	emoji: 🔧
	colorFrom: blue
	colorTo: red
	sdk: docker
	pinned: false
	license: mit
	short_description: OpenEnv benchmark for SRE incident debugging
	---

	# Meta-SRE OpenEnv Benchmark

	A live simulation environment for training and evaluating LLM agents as Senior Site Reliability Engineers at Meta.

	## Connect with openenv_client

	```python
	import openenv_client

	env = openenv_client.connect("huggingface.co/spaces/Anvit25/Meta-SRE")
	obs = env.reset(task_id=1)

	done = False
	while not done:
	action = your_agent.decide(obs) # {"tool": ..., "params": {...}}
	obs, reward, done, info = env.step(action)

	score = env.grade()
	print(f"Score: {score['normalized_score']:.3f}")
	```

	## Direct API

	```python
	import requests

	BASE = "https://anvit25-meta-sre.hf.space"

	obs = requests.post(f"{BASE}/reset", json={"task_id": 1}).json()
	done = False

	while not done:
	action = your_agent.decide(obs)
	result = requests.post(f"{BASE}/step", json=action).json()
	obs = result["observation"]
	done = result["done"]

	score = requests.get(f"{BASE}/grade").json()["normalized_score"]
	print(f"Score: {score:.3f}")
	```

	## Tasks

	\| ID \| Difficulty \| Description \|
	\|----\|-----------\|-------------\|
	\| 1 \| Easy \| AttributeError — hallucinated dict method in ad_ranking \|
	\| 2 \| Medium \| Silent timestamp corruption (CAPI → ROAS degradation) \|
	\| 3 \| Medium-Hard \| DB connection pool exhaustion under load \|
	\| 4 \| Hard \| Circular FK migration cascading across services \|
	\| 5 \| Hard \| PII data exposure via DEBUG_MODE=True \|

	## Endpoints

	- `POST /reset` — start episode (`{"task_id": 1-5}`)
	- `POST /step` — take action (`{"tool": "...", "params": {...}}`)
	- `GET /state` — current observation
	- `GET /grade` — episode score
	- `GET /tools` — available tools list
	- `GET /health` — health check