---
title: Cache Env
emoji: 🏢
colorFrom: green
colorTo: pink
sdk: docker
pinned: false
---

# Cache invalidation environment (OpenEnv)

## For judges — what this is

**Problem in one sentence:** Backends cache data to go fast; they must decide **when to invalidate, softly refresh, or leave cache alone** using **noisy clues** (like real monitoring), not the ground truth.

**Why it matters:** Cache invalidation is a daily systems tradeoff: act too often and you burn CPU and churn storage; act too late and users see stale data. This env turns that into a **short episode** an agent can be scored on.

**Our approach:** Several cache **items** per episode with hidden staleness (TTL, update rate). The API exposes only **observable** fields (`age`, `access_count`, `last_result` as hit/stale with noise). The agent picks **one action per step** for one key: `invalidate`, `refresh`, or `keep`. Step rewards give **partial credit**; at episode end a **programmatic grader** sets **`final_score` in [0.0, 1.0]**.

**Tasks:** **easy → medium → hard** — more items and higher volatility; each task registers a dedicated **agent grader** (`env/task_graders.py`) and is listed in `openenv.yaml` and **`GET /tasks`**.

---

## OpenEnv spec compliance

- **Typed models:** `env/models.py` — `CacheAction`, `CacheObservation`, `CacheState` (Pydantic, `openenv.core.env_server` bases).
- **Environment:** `env/cache_environment.py` — `CacheInvalidationEnvironment` implements `reset` / `step` / `state` / `get_metadata`.
- **HTTP server:** `server/app.py` — `create_fastapi_app(...)` from `openenv-core` (singleton env instance for stateful HTTP), plus **`GET /tasks`** for task + grader discovery.
- **Manifest:** `openenv.yaml` — `spec_version`, `tasks` (each with `grader: true`, `grader_callable`, `score_range`), `endpoints`, `app: server.app:app`, `port: 7860`.
- **Client (WebSocket):** `env/client.py` — `CacheInvalidationEnvClient` for typed `EnvClient` usage.
- **Shim:** `app.py` re-exports `app` for `uvicorn app:app`.

Standard routes include **`/reset`**, **`/step`**, **`/state`**, **`/schema`**, **`/metadata`**, **`/health`**, **`/openapi.json`**, **`/mcp`** (OpenEnv default).

---

## Action & observation

**Action (POST `/step` body, OpenEnv wrapped form):**

```json
{
  "action": {
    "type": "invalidate",
    "key": "item_0"
  }
}
```

`type` is one of: `invalidate`, `refresh`, `keep`. `key` must match an item in the current observation.

**Reset (POST `/reset`):**

```json
{
  "seed": 42,
  "task_id": "easy"
}
```

Use `task_id` or `task_name` with `easy` | `medium` | `hard`. Omit both to sample a task. `seed` makes generation reproducible.

**Response shape (reset & step):**

```json
{
  "observation": {
    "items": [...],
    "step": 0,
    "task_id": "easy",
    "final_score": null,
    "done": false
  },
  "reward": 0.0,
  "done": false
}
```

When `done` is `true`, `observation.final_score` is the episode grader output in **[0.0, 1.0]**.

---

## Tasks and graders

- **Registry:** `env/task_graders.py` — `TASK_AGENT_GRADERS` maps `easy` / `medium` / `hard` to distinct callables (same rubric; difficulty comes from env dynamics).
- **Discovery:** `GET /tasks` returns `tasks`, `graders`, and `grader_registry` for automated validation.
- **Episode grader:** `env/grader.py` — `evaluate_episode` (freshness, unnecessary invalidations, oscillation).

---

## Setup & run

**Install (dev):**

```bash
uv sync --extra dev
```

**Local server:**

```bash
uv run server
# or
uvicorn app:app --host 0.0.0.0 --port 7860
```

**Health check:**

```bash
curl -s -o /dev/null -w '%{http_code}\n' -X POST \
  -H 'Content-Type: application/json' -d '{}' \
  'http://127.0.0.1:7860/reset'
```

Expect `200`.

**Docker:** `docker build -t cache-env .` then run with the same `CMD` as in the `Dockerfile` (`uvicorn app:app`, port **7860**).

---

## Baseline inference (`inference.py`)

- Uses **OpenEnv HTTP** wire format: wrapped `action`, `observation` in responses.
- **Reproducibility:** `EPISODE_SEED` (default `42`) and `TASK_ID` (default `easy`).
- **All three tasks:** `RUN_ALL_TASKS=1` runs `easy`, then `medium`, then `hard` with the same seed (fast on CPU; well under 20 minutes).
- Optional LLM path: set `HF_TOKEN`, `API_BASE_URL`, `MODEL_NAME`; otherwise the **heuristic** policy runs (no API key required).

```bash
export ENV_URL='http://127.0.0.1:7860'   # or your Space https://....hf.space
export EPISODE_SEED=42
export TASK_ID=easy
python inference.py

# Phase-1 style: one process, three tasks
RUN_ALL_TASKS=1 python inference.py
```

---

## Tests (Phase 1 checks)

```bash
uv run pytest tests/ -q
```

Covers: `GET /tasks` (≥3 tasks with graders), grader outputs in [0,1], OpenEnv reset/step JSON shape, reproducible seed, full episode `final_score`.

---

## Validation (pre-submission)

```bash
openenv validate
./validate-submission.sh 'https://YOUR-SPACE.hf.space' .
docker build .
```

---

## Repository layout

| Path | Purpose |
|------|---------|
| `env/models.py` | Typed Action / Observation / State |
| `env/cache_environment.py` | `Environment` implementation |
| `env/grader.py` | Step rewards + episode `evaluate_episode` |
| `env/task_graders.py` | **Three named agent graders** (registry) |
| `env/tasks.py` | Task configs + `TASK_MANIFEST` |
| `env/client.py` | Typed WebSocket `EnvClient` |
| `server/app.py` | `create_fastapi_app` + `/tasks` |
| `app.py` | Uvicorn entry shim |
| `inference.py` | Baseline + `[START]`/`[STEP]`/`[END]` logs |
| `openenv.yaml` | Full OpenEnv manifest |
| `tests/` | Phase 1 pytest |

---

## Scoring

- **Per-step `reward`:** Shaped (can be negative mid-episode).
- **`final_score`:** In **[0.0, 1.0]** when `done`; combines correctness, unnecessary invalidations, and action stability.

---

## Resource notes

Inference and the env server are lightweight (short episodes, small JSON). Suitable for **2 vCPU / 8 GiB**; keep `RUN_ALL_TASKS` episodes bounded (fixed 10 steps per episode × 3 tasks).