--- title: Cache Env emoji: 🏢 colorFrom: green colorTo: pink sdk: docker pinned: false --- # Cache invalidation environment (OpenEnv) ## For judges — what this is **Problem in one sentence:** Backends cache data to go fast; they must decide **when to invalidate, softly refresh, or leave cache alone** using **noisy clues** (like real monitoring), not the ground truth. **Why it matters:** Cache invalidation is a daily systems tradeoff: act too often and you burn CPU and churn storage; act too late and users see stale data. This env turns that into a **short episode** an agent can be scored on. **Our approach:** Several cache **items** per episode with hidden staleness (TTL, update rate). The API exposes only **observable** fields (`age`, `access_count`, `last_result` as hit/stale with noise). The agent picks **one action per step** for one key: `invalidate`, `refresh`, or `keep`. Step rewards give **partial credit**; at episode end a **programmatic grader** sets **`final_score` in [0.0, 1.0]**. **Tasks:** **easy → medium → hard** — more items and higher volatility; each task registers a dedicated **agent grader** (`env/task_graders.py`) and is listed in `openenv.yaml` and **`GET /tasks`**. --- ## OpenEnv spec compliance - **Typed models:** `env/models.py` — `CacheAction`, `CacheObservation`, `CacheState` (Pydantic, `openenv.core.env_server` bases). - **Environment:** `env/cache_environment.py` — `CacheInvalidationEnvironment` implements `reset` / `step` / `state` / `get_metadata`. - **HTTP server:** `server/app.py` — `create_fastapi_app(...)` from `openenv-core` (singleton env instance for stateful HTTP), plus **`GET /tasks`** for task + grader discovery. - **Manifest:** `openenv.yaml` — `spec_version`, `tasks` (each with `grader: true`, `grader_callable`, `score_range`), `endpoints`, `app: server.app:app`, `port: 7860`. - **Client (WebSocket):** `env/client.py` — `CacheInvalidationEnvClient` for typed `EnvClient` usage. - **Shim:** `app.py` re-exports `app` for `uvicorn app:app`. Standard routes include **`/reset`**, **`/step`**, **`/state`**, **`/schema`**, **`/metadata`**, **`/health`**, **`/openapi.json`**, **`/mcp`** (OpenEnv default). --- ## Action & observation **Action (POST `/step` body, OpenEnv wrapped form):** ```json { "action": { "type": "invalidate", "key": "item_0" } } ``` `type` is one of: `invalidate`, `refresh`, `keep`. `key` must match an item in the current observation. **Reset (POST `/reset`):** ```json { "seed": 42, "task_id": "easy" } ``` Use `task_id` or `task_name` with `easy` | `medium` | `hard`. Omit both to sample a task. `seed` makes generation reproducible. **Response shape (reset & step):** ```json { "observation": { "items": [...], "step": 0, "task_id": "easy", "final_score": null, "done": false }, "reward": 0.0, "done": false } ``` When `done` is `true`, `observation.final_score` is the episode grader output in **[0.0, 1.0]**. --- ## Tasks and graders - **Registry:** `env/task_graders.py` — `TASK_AGENT_GRADERS` maps `easy` / `medium` / `hard` to distinct callables (same rubric; difficulty comes from env dynamics). - **Discovery:** `GET /tasks` returns `tasks`, `graders`, and `grader_registry` for automated validation. - **Episode grader:** `env/grader.py` — `evaluate_episode` (freshness, unnecessary invalidations, oscillation). --- ## Setup & run **Install (dev):** ```bash uv sync --extra dev ``` **Local server:** ```bash uv run server # or uvicorn app:app --host 0.0.0.0 --port 7860 ``` **Health check:** ```bash curl -s -o /dev/null -w '%{http_code}\n' -X POST \ -H 'Content-Type: application/json' -d '{}' \ 'http://127.0.0.1:7860/reset' ``` Expect `200`. **Docker:** `docker build -t cache-env .` then run with the same `CMD` as in the `Dockerfile` (`uvicorn app:app`, port **7860**). --- ## Baseline inference (`inference.py`) - Uses **OpenEnv HTTP** wire format: wrapped `action`, `observation` in responses. - **Reproducibility:** `EPISODE_SEED` (default `42`) and `TASK_ID` (default `easy`). - **All three tasks:** `RUN_ALL_TASKS=1` runs `easy`, then `medium`, then `hard` with the same seed (fast on CPU; well under 20 minutes). - Optional LLM path: set `HF_TOKEN`, `API_BASE_URL`, `MODEL_NAME`; otherwise the **heuristic** policy runs (no API key required). ```bash export ENV_URL='http://127.0.0.1:7860' # or your Space https://....hf.space export EPISODE_SEED=42 export TASK_ID=easy python inference.py # Phase-1 style: one process, three tasks RUN_ALL_TASKS=1 python inference.py ``` --- ## Tests (Phase 1 checks) ```bash uv run pytest tests/ -q ``` Covers: `GET /tasks` (≥3 tasks with graders), grader outputs in [0,1], OpenEnv reset/step JSON shape, reproducible seed, full episode `final_score`. --- ## Validation (pre-submission) ```bash openenv validate ./validate-submission.sh 'https://YOUR-SPACE.hf.space' . docker build . ``` --- ## Repository layout | Path | Purpose | |------|---------| | `env/models.py` | Typed Action / Observation / State | | `env/cache_environment.py` | `Environment` implementation | | `env/grader.py` | Step rewards + episode `evaluate_episode` | | `env/task_graders.py` | **Three named agent graders** (registry) | | `env/tasks.py` | Task configs + `TASK_MANIFEST` | | `env/client.py` | Typed WebSocket `EnvClient` | | `server/app.py` | `create_fastapi_app` + `/tasks` | | `app.py` | Uvicorn entry shim | | `inference.py` | Baseline + `[START]`/`[STEP]`/`[END]` logs | | `openenv.yaml` | Full OpenEnv manifest | | `tests/` | Phase 1 pytest | --- ## Scoring - **Per-step `reward`:** Shaped (can be negative mid-episode). - **`final_score`:** In **[0.0, 1.0]** when `done`; combines correctness, unnecessary invalidations, and action stability. --- ## Resource notes Inference and the env server are lightweight (short episodes, small JSON). Suitable for **2 vCPU / 8 GiB**; keep `RUN_ALL_TASKS` episodes bounded (fixed 10 steps per episode × 3 tasks).