Spaces:
Sleeping
Sleeping
Parv Pareek commited on
Commit ·
e75c8ce
1
Parent(s): 351158b
done
Browse files- README.md +123 -42
- app.py +3 -37
- cache_invalidation_env.egg-info/PKG-INFO +2 -0
- cache_invalidation_env.egg-info/SOURCES.txt +5 -2
- cache_invalidation_env.egg-info/requires.txt +3 -0
- env/__init__.py +13 -1
- env/cache_environment.py +156 -0
- env/client.py +30 -0
- env/core.py +0 -91
- env/generator.py +25 -18
- env/grader.py +11 -26
- env/models.py +38 -11
- env/task_graders.py +35 -0
- env/tasks.py +26 -4
- inference.py +62 -39
- openenv.yaml +18 -17
- pyproject.toml +3 -0
- server/app.py +51 -4
- tests/conftest.py +10 -0
- tests/test_phase1.py +73 -0
- uv.lock +43 -0
README.md
CHANGED
|
@@ -15,54 +15,139 @@ pinned: false
|
|
| 15 |
|
| 16 |
**Why it matters:** Cache invalidation is a daily systems tradeoff: act too often and you burn CPU and churn storage; act too late and users see stale data. This env turns that into a **short episode** an agent can be scored on.
|
| 17 |
|
| 18 |
-
**Our approach:**
|
| 19 |
|
| 20 |
-
**Tasks:**
|
| 21 |
|
| 22 |
---
|
| 23 |
|
| 24 |
-
##
|
| 25 |
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
|
|
|
| 31 |
|
| 32 |
-
**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 33 |
|
| 34 |
```bash
|
| 35 |
curl -s -o /dev/null -w '%{http_code}\n' -X POST \
|
| 36 |
-H 'Content-Type: application/json' -d '{}' \
|
| 37 |
-
'
|
| 38 |
```
|
| 39 |
|
| 40 |
Expect `200`.
|
| 41 |
|
| 42 |
-
**
|
| 43 |
|
| 44 |
---
|
| 45 |
|
| 46 |
## Baseline inference (`inference.py`)
|
| 47 |
|
| 48 |
-
- Uses
|
| 49 |
-
-
|
| 50 |
-
-
|
| 51 |
-
|
| 52 |
-
Run:
|
| 53 |
|
| 54 |
```bash
|
| 55 |
-
export
|
| 56 |
-
export
|
| 57 |
-
export
|
| 58 |
python inference.py
|
|
|
|
|
|
|
|
|
|
| 59 |
```
|
| 60 |
|
| 61 |
---
|
| 62 |
|
| 63 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
| 64 |
|
| 65 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 66 |
|
| 67 |
```bash
|
| 68 |
openenv validate
|
|
@@ -72,35 +157,31 @@ docker build .
|
|
| 72 |
|
| 73 |
---
|
| 74 |
|
| 75 |
-
## Repository layout
|
| 76 |
|
| 77 |
| Path | Purpose |
|
| 78 |
|------|---------|
|
| 79 |
-
| `
|
| 80 |
-
| `env/` | Environment
|
| 81 |
-
| `
|
| 82 |
-
| `
|
| 83 |
-
| `
|
| 84 |
-
| `
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 85 |
|
| 86 |
---
|
| 87 |
|
| 88 |
-
## Scoring
|
| 89 |
|
| 90 |
-
- **Per-step reward:** Shaped
|
| 91 |
-
- **
|
| 92 |
|
| 93 |
---
|
| 94 |
|
| 95 |
-
##
|
| 96 |
-
|
| 97 |
-
| Criterion | Status |
|
| 98 |
-
|-----------|--------|
|
| 99 |
-
| Real-world task (not a toy game) | Cache invalidation under uncertainty |
|
| 100 |
-
| `reset` / `step` / `state` | Implemented |
|
| 101 |
-
| `openenv.yaml` | Present |
|
| 102 |
-
| 3 tasks + grader | `easy` / `medium` / `hard` |
|
| 103 |
-
| Meaningful rewards | Dense step reward + episode score in [0, 1] |
|
| 104 |
-
| Baseline | `inference.py` + OpenAI client + stdout format |
|
| 105 |
|
| 106 |
-
|
|
|
|
| 15 |
|
| 16 |
**Why it matters:** Cache invalidation is a daily systems tradeoff: act too often and you burn CPU and churn storage; act too late and users see stale data. This env turns that into a **short episode** an agent can be scored on.
|
| 17 |
|
| 18 |
+
**Our approach:** Several cache **items** per episode with hidden staleness (TTL, update rate). The API exposes only **observable** fields (`age`, `access_count`, `last_result` as hit/stale with noise). The agent picks **one action per step** for one key: `invalidate`, `refresh`, or `keep`. Step rewards give **partial credit**; at episode end a **programmatic grader** sets **`final_score` in [0.0, 1.0]**.
|
| 19 |
|
| 20 |
+
**Tasks:** **easy → medium → hard** — more items and higher volatility; each task registers a dedicated **agent grader** (`env/task_graders.py`) and is listed in `openenv.yaml` and **`GET /tasks`**.
|
| 21 |
|
| 22 |
---
|
| 23 |
|
| 24 |
+
## OpenEnv spec compliance
|
| 25 |
|
| 26 |
+
- **Typed models:** `env/models.py` — `CacheAction`, `CacheObservation`, `CacheState` (Pydantic, `openenv.core.env_server` bases).
|
| 27 |
+
- **Environment:** `env/cache_environment.py` — `CacheInvalidationEnvironment` implements `reset` / `step` / `state` / `get_metadata`.
|
| 28 |
+
- **HTTP server:** `server/app.py` — `create_fastapi_app(...)` from `openenv-core` (singleton env instance for stateful HTTP), plus **`GET /tasks`** for task + grader discovery.
|
| 29 |
+
- **Manifest:** `openenv.yaml` — `spec_version`, `tasks` (each with `grader: true`, `grader_callable`, `score_range`), `endpoints`, `app: server.app:app`, `port: 7860`.
|
| 30 |
+
- **Client (WebSocket):** `env/client.py` — `CacheInvalidationEnvClient` for typed `EnvClient` usage.
|
| 31 |
+
- **Shim:** `app.py` re-exports `app` for `uvicorn app:app`.
|
| 32 |
|
| 33 |
+
Standard routes include **`/reset`**, **`/step`**, **`/state`**, **`/schema`**, **`/metadata`**, **`/health`**, **`/openapi.json`**, **`/mcp`** (OpenEnv default).
|
| 34 |
+
|
| 35 |
+
---
|
| 36 |
+
|
| 37 |
+
## Action & observation
|
| 38 |
+
|
| 39 |
+
**Action (POST `/step` body, OpenEnv wrapped form):**
|
| 40 |
+
|
| 41 |
+
```json
|
| 42 |
+
{
|
| 43 |
+
"action": {
|
| 44 |
+
"type": "invalidate",
|
| 45 |
+
"key": "item_0"
|
| 46 |
+
}
|
| 47 |
+
}
|
| 48 |
+
```
|
| 49 |
+
|
| 50 |
+
`type` is one of: `invalidate`, `refresh`, `keep`. `key` must match an item in the current observation.
|
| 51 |
+
|
| 52 |
+
**Reset (POST `/reset`):**
|
| 53 |
+
|
| 54 |
+
```json
|
| 55 |
+
{
|
| 56 |
+
"seed": 42,
|
| 57 |
+
"task_id": "easy"
|
| 58 |
+
}
|
| 59 |
+
```
|
| 60 |
+
|
| 61 |
+
Use `task_id` or `task_name` with `easy` | `medium` | `hard`. Omit both to sample a task. `seed` makes generation reproducible.
|
| 62 |
+
|
| 63 |
+
**Response shape (reset & step):**
|
| 64 |
+
|
| 65 |
+
```json
|
| 66 |
+
{
|
| 67 |
+
"observation": {
|
| 68 |
+
"items": [...],
|
| 69 |
+
"step": 0,
|
| 70 |
+
"task_id": "easy",
|
| 71 |
+
"final_score": null,
|
| 72 |
+
"done": false
|
| 73 |
+
},
|
| 74 |
+
"reward": 0.0,
|
| 75 |
+
"done": false
|
| 76 |
+
}
|
| 77 |
+
```
|
| 78 |
+
|
| 79 |
+
When `done` is `true`, `observation.final_score` is the episode grader output in **[0.0, 1.0]**.
|
| 80 |
+
|
| 81 |
+
---
|
| 82 |
+
|
| 83 |
+
## Tasks and graders
|
| 84 |
+
|
| 85 |
+
- **Registry:** `env/task_graders.py` — `TASK_AGENT_GRADERS` maps `easy` / `medium` / `hard` to distinct callables (same rubric; difficulty comes from env dynamics).
|
| 86 |
+
- **Discovery:** `GET /tasks` returns `tasks`, `graders`, and `grader_registry` for automated validation.
|
| 87 |
+
- **Episode grader:** `env/grader.py` — `evaluate_episode` (freshness, unnecessary invalidations, oscillation).
|
| 88 |
+
|
| 89 |
+
---
|
| 90 |
+
|
| 91 |
+
## Setup & run
|
| 92 |
+
|
| 93 |
+
**Install (dev):**
|
| 94 |
+
|
| 95 |
+
```bash
|
| 96 |
+
uv sync --extra dev
|
| 97 |
+
```
|
| 98 |
+
|
| 99 |
+
**Local server:**
|
| 100 |
+
|
| 101 |
+
```bash
|
| 102 |
+
uv run server
|
| 103 |
+
# or
|
| 104 |
+
uvicorn app:app --host 0.0.0.0 --port 7860
|
| 105 |
+
```
|
| 106 |
+
|
| 107 |
+
**Health check:**
|
| 108 |
|
| 109 |
```bash
|
| 110 |
curl -s -o /dev/null -w '%{http_code}\n' -X POST \
|
| 111 |
-H 'Content-Type: application/json' -d '{}' \
|
| 112 |
+
'http://127.0.0.1:7860/reset'
|
| 113 |
```
|
| 114 |
|
| 115 |
Expect `200`.
|
| 116 |
|
| 117 |
+
**Docker:** `docker build -t cache-env .` then run with the same `CMD` as in the `Dockerfile` (`uvicorn app:app`, port **7860**).
|
| 118 |
|
| 119 |
---
|
| 120 |
|
| 121 |
## Baseline inference (`inference.py`)
|
| 122 |
|
| 123 |
+
- Uses **OpenEnv HTTP** wire format: wrapped `action`, `observation` in responses.
|
| 124 |
+
- **Reproducibility:** `EPISODE_SEED` (default `42`) and `TASK_ID` (default `easy`).
|
| 125 |
+
- **All three tasks:** `RUN_ALL_TASKS=1` runs `easy`, then `medium`, then `hard` with the same seed (fast on CPU; well under 20 minutes).
|
| 126 |
+
- Optional LLM path: set `HF_TOKEN`, `API_BASE_URL`, `MODEL_NAME`; otherwise the **heuristic** policy runs (no API key required).
|
|
|
|
| 127 |
|
| 128 |
```bash
|
| 129 |
+
export ENV_URL='http://127.0.0.1:7860' # or your Space https://....hf.space
|
| 130 |
+
export EPISODE_SEED=42
|
| 131 |
+
export TASK_ID=easy
|
| 132 |
python inference.py
|
| 133 |
+
|
| 134 |
+
# Phase-1 style: one process, three tasks
|
| 135 |
+
RUN_ALL_TASKS=1 python inference.py
|
| 136 |
```
|
| 137 |
|
| 138 |
---
|
| 139 |
|
| 140 |
+
## Tests (Phase 1 checks)
|
| 141 |
+
|
| 142 |
+
```bash
|
| 143 |
+
uv run pytest tests/ -q
|
| 144 |
+
```
|
| 145 |
|
| 146 |
+
Covers: `GET /tasks` (≥3 tasks with graders), grader outputs in [0,1], OpenEnv reset/step JSON shape, reproducible seed, full episode `final_score`.
|
| 147 |
+
|
| 148 |
+
---
|
| 149 |
+
|
| 150 |
+
## Validation (pre-submission)
|
| 151 |
|
| 152 |
```bash
|
| 153 |
openenv validate
|
|
|
|
| 157 |
|
| 158 |
---
|
| 159 |
|
| 160 |
+
## Repository layout
|
| 161 |
|
| 162 |
| Path | Purpose |
|
| 163 |
|------|---------|
|
| 164 |
+
| `env/models.py` | Typed Action / Observation / State |
|
| 165 |
+
| `env/cache_environment.py` | `Environment` implementation |
|
| 166 |
+
| `env/grader.py` | Step rewards + episode `evaluate_episode` |
|
| 167 |
+
| `env/task_graders.py` | **Three named agent graders** (registry) |
|
| 168 |
+
| `env/tasks.py` | Task configs + `TASK_MANIFEST` |
|
| 169 |
+
| `env/client.py` | Typed WebSocket `EnvClient` |
|
| 170 |
+
| `server/app.py` | `create_fastapi_app` + `/tasks` |
|
| 171 |
+
| `app.py` | Uvicorn entry shim |
|
| 172 |
+
| `inference.py` | Baseline + `[START]`/`[STEP]`/`[END]` logs |
|
| 173 |
+
| `openenv.yaml` | Full OpenEnv manifest |
|
| 174 |
+
| `tests/` | Phase 1 pytest |
|
| 175 |
|
| 176 |
---
|
| 177 |
|
| 178 |
+
## Scoring
|
| 179 |
|
| 180 |
+
- **Per-step `reward`:** Shaped (can be negative mid-episode).
|
| 181 |
+
- **`final_score`:** In **[0.0, 1.0]** when `done`; combines correctness, unnecessary invalidations, and action stability.
|
| 182 |
|
| 183 |
---
|
| 184 |
|
| 185 |
+
## Resource notes
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 186 |
|
| 187 |
+
Inference and the env server are lightweight (short episodes, small JSON). Suitable for **2 vCPU / 8 GiB**; keep `RUN_ALL_TASKS` episodes bounded (fixed 10 steps per episode × 3 tasks).
|
app.py
CHANGED
|
@@ -1,39 +1,5 @@
|
|
| 1 |
-
|
| 2 |
-
from pydantic import BaseModel, ConfigDict
|
| 3 |
-
from env.core import CacheEnv
|
| 4 |
-
from env.tasks import TASK_MANIFEST
|
| 5 |
|
| 6 |
-
app
|
| 7 |
-
env = CacheEnv()
|
| 8 |
|
| 9 |
-
|
| 10 |
-
class ResetBody(BaseModel):
|
| 11 |
-
model_config = ConfigDict(extra="ignore")
|
| 12 |
-
task_id: str | None = None
|
| 13 |
-
task_name: str | None = None
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
@app.post("/reset")
|
| 17 |
-
def reset(body: ResetBody = Body(default_factory=ResetBody)):
|
| 18 |
-
task_key = body.task_id or body.task_name
|
| 19 |
-
state = env.reset(task_id=task_key)
|
| 20 |
-
return {
|
| 21 |
-
"state": state,
|
| 22 |
-
"task_id": state.get("task_id"),
|
| 23 |
-
}
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
@app.get("/tasks")
|
| 27 |
-
def list_tasks():
|
| 28 |
-
"""Hub validators use this to discover tasks that expose episode grading (final_score)."""
|
| 29 |
-
return {"tasks": TASK_MANIFEST}
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
@app.post("/step")
|
| 33 |
-
def step(action: dict):
|
| 34 |
-
return env.step(action)
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
@app.get("/state")
|
| 38 |
-
def state():
|
| 39 |
-
return env.get_state()
|
|
|
|
| 1 |
+
"""Shim for `uvicorn app:app` (Docker / local one-liners)."""
|
|
|
|
|
|
|
|
|
|
| 2 |
|
| 3 |
+
from server.app import app
|
|
|
|
| 4 |
|
| 5 |
+
__all__ = ["app"]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
cache_invalidation_env.egg-info/PKG-INFO
CHANGED
|
@@ -10,3 +10,5 @@ Requires-Dist: pydantic>=2.0.0
|
|
| 10 |
Requires-Dist: requests>=2.28.0
|
| 11 |
Requires-Dist: openai>=1.0.0
|
| 12 |
Requires-Dist: python-dotenv>=1.0.0
|
|
|
|
|
|
|
|
|
| 10 |
Requires-Dist: requests>=2.28.0
|
| 11 |
Requires-Dist: openai>=1.0.0
|
| 12 |
Requires-Dist: python-dotenv>=1.0.0
|
| 13 |
+
Provides-Extra: dev
|
| 14 |
+
Requires-Dist: pytest>=8.0; extra == "dev"
|
cache_invalidation_env.egg-info/SOURCES.txt
CHANGED
|
@@ -7,10 +7,13 @@ cache_invalidation_env.egg-info/entry_points.txt
|
|
| 7 |
cache_invalidation_env.egg-info/requires.txt
|
| 8 |
cache_invalidation_env.egg-info/top_level.txt
|
| 9 |
env/__init__.py
|
| 10 |
-
env/
|
|
|
|
| 11 |
env/generator.py
|
| 12 |
env/grader.py
|
| 13 |
env/models.py
|
|
|
|
| 14 |
env/tasks.py
|
| 15 |
server/__init__.py
|
| 16 |
-
server/app.py
|
|
|
|
|
|
| 7 |
cache_invalidation_env.egg-info/requires.txt
|
| 8 |
cache_invalidation_env.egg-info/top_level.txt
|
| 9 |
env/__init__.py
|
| 10 |
+
env/cache_environment.py
|
| 11 |
+
env/client.py
|
| 12 |
env/generator.py
|
| 13 |
env/grader.py
|
| 14 |
env/models.py
|
| 15 |
+
env/task_graders.py
|
| 16 |
env/tasks.py
|
| 17 |
server/__init__.py
|
| 18 |
+
server/app.py
|
| 19 |
+
tests/test_phase1.py
|
cache_invalidation_env.egg-info/requires.txt
CHANGED
|
@@ -5,3 +5,6 @@ pydantic>=2.0.0
|
|
| 5 |
requests>=2.28.0
|
| 6 |
openai>=1.0.0
|
| 7 |
python-dotenv>=1.0.0
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
requests>=2.28.0
|
| 6 |
openai>=1.0.0
|
| 7 |
python-dotenv>=1.0.0
|
| 8 |
+
|
| 9 |
+
[dev]
|
| 10 |
+
pytest>=8.0
|
env/__init__.py
CHANGED
|
@@ -1 +1,13 @@
|
|
| 1 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Cache invalidation OpenEnv package."""
|
| 2 |
+
|
| 3 |
+
from env.cache_environment import CacheInvalidationEnvironment
|
| 4 |
+
from env.client import CacheInvalidationEnvClient
|
| 5 |
+
from env.models import CacheAction, CacheObservation, CacheState
|
| 6 |
+
|
| 7 |
+
__all__ = [
|
| 8 |
+
"CacheAction",
|
| 9 |
+
"CacheObservation",
|
| 10 |
+
"CacheState",
|
| 11 |
+
"CacheInvalidationEnvironment",
|
| 12 |
+
"CacheInvalidationEnvClient",
|
| 13 |
+
]
|
env/cache_environment.py
ADDED
|
@@ -0,0 +1,156 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""OpenEnv Environment: cache invalidation under partial observability."""
|
| 2 |
+
|
| 3 |
+
from __future__ import annotations
|
| 4 |
+
|
| 5 |
+
import random
|
| 6 |
+
from typing import Any, Optional
|
| 7 |
+
|
| 8 |
+
from openenv.core.env_server import Environment
|
| 9 |
+
from openenv.core.env_server.types import EnvironmentMetadata
|
| 10 |
+
|
| 11 |
+
from env.generator import generate_env
|
| 12 |
+
from env.grader import compute_step_reward, evaluate_episode
|
| 13 |
+
from env.models import CacheAction, CacheItem, CacheObservation, CacheState
|
| 14 |
+
from env.tasks import sample_task
|
| 15 |
+
|
| 16 |
+
|
| 17 |
+
class CacheInvalidationEnvironment(Environment[CacheAction, CacheObservation, CacheState]):
|
| 18 |
+
"""Stateful cache control: invalidate, refresh, or keep per step (one key)."""
|
| 19 |
+
|
| 20 |
+
SUPPORTS_CONCURRENT_SESSIONS = False
|
| 21 |
+
|
| 22 |
+
def __init__(self) -> None:
|
| 23 |
+
super().__init__()
|
| 24 |
+
self._rng: random.Random | type[random] = random
|
| 25 |
+
self.history: list[dict[str, Any]] = []
|
| 26 |
+
self.task_id: str = "easy"
|
| 27 |
+
self.hidden: list[dict[str, Any]] = []
|
| 28 |
+
self.current_time: int = 0
|
| 29 |
+
self._items: list[dict[str, Any]] = []
|
| 30 |
+
self._step: int = 0
|
| 31 |
+
|
| 32 |
+
def reset(
|
| 33 |
+
self,
|
| 34 |
+
seed: Optional[int] = None,
|
| 35 |
+
episode_id: Optional[str] = None,
|
| 36 |
+
task_id: Optional[str] = None,
|
| 37 |
+
task_name: Optional[str] = None,
|
| 38 |
+
**kwargs: Any,
|
| 39 |
+
) -> CacheObservation:
|
| 40 |
+
tid = task_id or task_name or kwargs.get("task_id") or kwargs.get("task_name")
|
| 41 |
+
self._reset_rubric()
|
| 42 |
+
|
| 43 |
+
if seed is not None:
|
| 44 |
+
self._rng = random.Random(int(seed))
|
| 45 |
+
else:
|
| 46 |
+
self._rng = random
|
| 47 |
+
|
| 48 |
+
self.history = []
|
| 49 |
+
if tid in ("easy", "medium", "hard"):
|
| 50 |
+
self.task_id = tid
|
| 51 |
+
else:
|
| 52 |
+
self.task_id = sample_task(self._rng)
|
| 53 |
+
|
| 54 |
+
items, hidden, current_time = generate_env(self.task_id, rng=self._rng)
|
| 55 |
+
self._items = items
|
| 56 |
+
self.hidden = hidden
|
| 57 |
+
self.current_time = current_time
|
| 58 |
+
self._step = 0
|
| 59 |
+
|
| 60 |
+
return self._observation(
|
| 61 |
+
reward=None,
|
| 62 |
+
done=False,
|
| 63 |
+
final_score=None,
|
| 64 |
+
)
|
| 65 |
+
|
| 66 |
+
def step(
|
| 67 |
+
self,
|
| 68 |
+
action: CacheAction,
|
| 69 |
+
timeout_s: Optional[float] = None,
|
| 70 |
+
**kwargs: Any,
|
| 71 |
+
) -> CacheObservation:
|
| 72 |
+
key = action.key
|
| 73 |
+
action_type = action.type
|
| 74 |
+
|
| 75 |
+
item_index = next(
|
| 76 |
+
(i for i, x in enumerate(self._items) if x["key"] == key), None
|
| 77 |
+
)
|
| 78 |
+
|
| 79 |
+
if item_index is None:
|
| 80 |
+
return self._observation(reward=-1.0, done=True, final_score=None)
|
| 81 |
+
|
| 82 |
+
hidden = self.hidden[item_index]
|
| 83 |
+
item = self._items[item_index]
|
| 84 |
+
|
| 85 |
+
age = self.current_time - hidden["last_update"]
|
| 86 |
+
is_stale = age > hidden["base_ttl"] or self._rng.random() < hidden["update_freq"]
|
| 87 |
+
|
| 88 |
+
self.history.append({"action": action_type, "is_stale": is_stale})
|
| 89 |
+
|
| 90 |
+
reward = compute_step_reward(action_type, is_stale)
|
| 91 |
+
|
| 92 |
+
if action_type == "invalidate":
|
| 93 |
+
hidden["last_update"] = self.current_time
|
| 94 |
+
item["age"] = 0
|
| 95 |
+
|
| 96 |
+
elif action_type == "refresh":
|
| 97 |
+
hidden["last_update"] = self.current_time - 1
|
| 98 |
+
item["age"] = 1
|
| 99 |
+
|
| 100 |
+
elif action_type == "keep":
|
| 101 |
+
item["age"] += 1
|
| 102 |
+
|
| 103 |
+
item["last_result"] = (
|
| 104 |
+
"stale"
|
| 105 |
+
if is_stale and self._rng.random() < 0.7
|
| 106 |
+
else "hit"
|
| 107 |
+
if not is_stale or self._rng.random() < 0.9
|
| 108 |
+
else "stale"
|
| 109 |
+
)
|
| 110 |
+
|
| 111 |
+
self.current_time += 1
|
| 112 |
+
self._step += 1
|
| 113 |
+
|
| 114 |
+
done = self._step >= 10
|
| 115 |
+
final_score = evaluate_episode(self.history) if done else None
|
| 116 |
+
|
| 117 |
+
return self._observation(
|
| 118 |
+
reward=reward,
|
| 119 |
+
done=done,
|
| 120 |
+
final_score=final_score,
|
| 121 |
+
)
|
| 122 |
+
|
| 123 |
+
@property
|
| 124 |
+
def state(self) -> CacheState:
|
| 125 |
+
return CacheState(
|
| 126 |
+
episode_id=None,
|
| 127 |
+
step_count=self._step,
|
| 128 |
+
task_id=self.task_id,
|
| 129 |
+
items=[CacheItem.model_validate(x) for x in self._items],
|
| 130 |
+
)
|
| 131 |
+
|
| 132 |
+
def get_metadata(self) -> EnvironmentMetadata:
|
| 133 |
+
return EnvironmentMetadata(
|
| 134 |
+
name="cache_invalidation_env",
|
| 135 |
+
description=(
|
| 136 |
+
"Cache invalidation under uncertainty: choose invalidate, refresh, or keep "
|
| 137 |
+
"per step from noisy hit/stale observations."
|
| 138 |
+
),
|
| 139 |
+
version="1.0.0",
|
| 140 |
+
)
|
| 141 |
+
|
| 142 |
+
def _observation(
|
| 143 |
+
self,
|
| 144 |
+
*,
|
| 145 |
+
reward: float | None,
|
| 146 |
+
done: bool,
|
| 147 |
+
final_score: float | None,
|
| 148 |
+
) -> CacheObservation:
|
| 149 |
+
return CacheObservation(
|
| 150 |
+
done=done,
|
| 151 |
+
reward=reward,
|
| 152 |
+
items=[CacheItem.model_validate(x) for x in self._items],
|
| 153 |
+
step=self._step,
|
| 154 |
+
task_id=self.task_id,
|
| 155 |
+
final_score=final_score,
|
| 156 |
+
)
|
env/client.py
ADDED
|
@@ -0,0 +1,30 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Typed WebSocket client for CacheInvalidationEnvironment."""
|
| 2 |
+
|
| 3 |
+
from __future__ import annotations
|
| 4 |
+
|
| 5 |
+
from typing import Any, Dict
|
| 6 |
+
|
| 7 |
+
from openenv.core.client_types import StepResult
|
| 8 |
+
from openenv.core.env_client import EnvClient
|
| 9 |
+
|
| 10 |
+
from env.models import CacheAction, CacheObservation, CacheState
|
| 11 |
+
|
| 12 |
+
|
| 13 |
+
class CacheInvalidationEnvClient(EnvClient[CacheAction, CacheObservation, CacheState]):
|
| 14 |
+
def _step_payload(self, action: CacheAction | Dict[str, Any]) -> Dict[str, Any]:
|
| 15 |
+
if isinstance(action, CacheAction):
|
| 16 |
+
return action.model_dump()
|
| 17 |
+
return CacheAction.model_validate(action).model_dump()
|
| 18 |
+
|
| 19 |
+
def _parse_result(self, payload: Dict[str, Any]) -> StepResult[CacheObservation]:
|
| 20 |
+
obs_inner = payload.get("observation", {})
|
| 21 |
+
return StepResult(
|
| 22 |
+
observation=CacheObservation.model_validate(
|
| 23 |
+
{**obs_inner, "reward": payload.get("reward"), "done": payload.get("done", False)}
|
| 24 |
+
),
|
| 25 |
+
reward=payload.get("reward"),
|
| 26 |
+
done=payload.get("done", False),
|
| 27 |
+
)
|
| 28 |
+
|
| 29 |
+
def _parse_state(self, payload: Dict[str, Any]) -> CacheState:
|
| 30 |
+
return CacheState.model_validate(payload)
|
env/core.py
DELETED
|
@@ -1,91 +0,0 @@
|
|
| 1 |
-
import random
|
| 2 |
-
from env.generator import generate_env
|
| 3 |
-
from env.grader import compute_step_reward
|
| 4 |
-
from env.tasks import sample_task
|
| 5 |
-
class CacheEnv:
|
| 6 |
-
|
| 7 |
-
def __init__(self):
|
| 8 |
-
self.reset()
|
| 9 |
-
|
| 10 |
-
def reset(self, task_id=None):
|
| 11 |
-
self.history = []
|
| 12 |
-
if task_id in ("easy", "medium", "hard"):
|
| 13 |
-
self.task_id = task_id
|
| 14 |
-
else:
|
| 15 |
-
self.task_id = sample_task()
|
| 16 |
-
items, hidden, current_time = generate_env(self.task_id)
|
| 17 |
-
|
| 18 |
-
self.state = {
|
| 19 |
-
"items": items,
|
| 20 |
-
"step": 0,
|
| 21 |
-
"task_id": self.task_id
|
| 22 |
-
}
|
| 23 |
-
|
| 24 |
-
self.hidden = hidden
|
| 25 |
-
self.current_time = current_time
|
| 26 |
-
self.total_reward = 0
|
| 27 |
-
|
| 28 |
-
return self.state
|
| 29 |
-
|
| 30 |
-
def step(self, action):
|
| 31 |
-
key = action.get("key")
|
| 32 |
-
action_type = action.get("type")
|
| 33 |
-
|
| 34 |
-
item_index = next((i for i, x in enumerate(self.state["items"]) if x["key"] == key), None)
|
| 35 |
-
|
| 36 |
-
if item_index is None:
|
| 37 |
-
return {"state": self.state, "reward": -1.0, "done": True}
|
| 38 |
-
|
| 39 |
-
hidden = self.hidden[item_index]
|
| 40 |
-
item = self.state["items"][item_index]
|
| 41 |
-
|
| 42 |
-
# hidden staleness
|
| 43 |
-
age = self.current_time - hidden["last_update"]
|
| 44 |
-
is_stale = age > hidden["base_ttl"] or random.random() < hidden["update_freq"]
|
| 45 |
-
|
| 46 |
-
self.history.append({
|
| 47 |
-
"action": action_type,
|
| 48 |
-
"is_stale": is_stale
|
| 49 |
-
})
|
| 50 |
-
|
| 51 |
-
reward = compute_step_reward(action_type, is_stale)
|
| 52 |
-
self.total_reward += reward
|
| 53 |
-
|
| 54 |
-
# apply action
|
| 55 |
-
if action_type == "invalidate":
|
| 56 |
-
hidden["last_update"] = self.current_time
|
| 57 |
-
item["age"] = 0
|
| 58 |
-
|
| 59 |
-
elif action_type == "refresh":
|
| 60 |
-
hidden["last_update"] = self.current_time - 1
|
| 61 |
-
item["age"] = 1
|
| 62 |
-
|
| 63 |
-
elif action_type == "keep":
|
| 64 |
-
item["age"] += 1
|
| 65 |
-
|
| 66 |
-
# noisy observation
|
| 67 |
-
item["last_result"] = (
|
| 68 |
-
"stale" if is_stale and random.random() < 0.7
|
| 69 |
-
else "hit" if not is_stale or random.random() < 0.9
|
| 70 |
-
else "stale"
|
| 71 |
-
)
|
| 72 |
-
|
| 73 |
-
self.current_time += 1
|
| 74 |
-
self.state["step"] += 1
|
| 75 |
-
|
| 76 |
-
done = self.state["step"] >= 10
|
| 77 |
-
from env.grader import evaluate_episode
|
| 78 |
-
|
| 79 |
-
if done:
|
| 80 |
-
final_score = evaluate_episode(self.history)
|
| 81 |
-
else:
|
| 82 |
-
final_score = None
|
| 83 |
-
return {
|
| 84 |
-
"state": self.state,
|
| 85 |
-
"reward": reward,
|
| 86 |
-
"done": done,
|
| 87 |
-
"task_id": self.task_id,
|
| 88 |
-
"final_score": final_score
|
| 89 |
-
}
|
| 90 |
-
def get_state(self):
|
| 91 |
-
return self.state
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
env/generator.py
CHANGED
|
@@ -1,7 +1,10 @@
|
|
| 1 |
import random
|
| 2 |
from env.tasks import get_task
|
| 3 |
|
| 4 |
-
|
|
|
|
|
|
|
|
|
|
| 5 |
config = get_task(task_id)
|
| 6 |
|
| 7 |
state_items = []
|
|
@@ -10,27 +13,31 @@ def generate_env(task_id):
|
|
| 10 |
current_time = 0
|
| 11 |
|
| 12 |
for i in range(config["num_items"]):
|
| 13 |
-
base_ttl =
|
| 14 |
-
update_freq =
|
| 15 |
-
last_update =
|
| 16 |
|
| 17 |
age = current_time - last_update
|
| 18 |
|
| 19 |
-
is_stale = age > base_ttl or
|
| 20 |
|
| 21 |
-
last_result = "stale" if is_stale and
|
| 22 |
|
| 23 |
-
state_items.append(
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
|
|
|
|
|
|
| 29 |
|
| 30 |
-
hidden_items.append(
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
|
|
|
|
|
|
|
| 35 |
|
| 36 |
-
return state_items, hidden_items, current_time
|
|
|
|
| 1 |
import random
|
| 2 |
from env.tasks import get_task
|
| 3 |
|
| 4 |
+
|
| 5 |
+
def generate_env(task_id, rng=None):
|
| 6 |
+
"""Build initial items and hidden dynamics. Use *rng* for reproducible episodes."""
|
| 7 |
+
r = rng if rng is not None else random
|
| 8 |
config = get_task(task_id)
|
| 9 |
|
| 10 |
state_items = []
|
|
|
|
| 13 |
current_time = 0
|
| 14 |
|
| 15 |
for i in range(config["num_items"]):
|
| 16 |
+
base_ttl = r.randint(3, 8)
|
| 17 |
+
update_freq = r.uniform(0.1, config["volatility"])
|
| 18 |
+
last_update = r.randint(0, 3)
|
| 19 |
|
| 20 |
age = current_time - last_update
|
| 21 |
|
| 22 |
+
is_stale = age > base_ttl or r.random() < update_freq
|
| 23 |
|
| 24 |
+
last_result = "stale" if is_stale and r.random() < 0.7 else "hit"
|
| 25 |
|
| 26 |
+
state_items.append(
|
| 27 |
+
{
|
| 28 |
+
"key": f"item_{i}",
|
| 29 |
+
"age": max(age, 0),
|
| 30 |
+
"access_count": r.randint(1, 20),
|
| 31 |
+
"last_result": last_result,
|
| 32 |
+
}
|
| 33 |
+
)
|
| 34 |
|
| 35 |
+
hidden_items.append(
|
| 36 |
+
{
|
| 37 |
+
"base_ttl": base_ttl,
|
| 38 |
+
"update_freq": update_freq,
|
| 39 |
+
"last_update": last_update,
|
| 40 |
+
}
|
| 41 |
+
)
|
| 42 |
|
| 43 |
+
return state_items, hidden_items, current_time
|
env/grader.py
CHANGED
|
@@ -1,9 +1,6 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
def clamp_strict_unit_interval(x: float) -> float:
|
| 6 |
-
return float(min(1.0 - _SCORE_EPS, max(_SCORE_EPS, x)))
|
| 7 |
|
| 8 |
|
| 9 |
def compute_step_reward(action_type, is_stale):
|
|
@@ -20,11 +17,10 @@ def compute_step_reward(action_type, is_stale):
|
|
| 20 |
|
| 21 |
return reward
|
| 22 |
|
|
|
|
| 23 |
def normalize_episode_score(total_reward, max_steps=10):
|
| 24 |
-
# expected max ≈ 1.0 per step
|
| 25 |
score = total_reward / max_steps
|
| 26 |
-
return
|
| 27 |
-
|
| 28 |
|
| 29 |
|
| 30 |
def evaluate_episode(history):
|
|
@@ -35,11 +31,10 @@ def evaluate_episode(history):
|
|
| 35 |
"is_stale": bool
|
| 36 |
}
|
| 37 |
"""
|
| 38 |
-
|
| 39 |
total_steps = len(history)
|
| 40 |
|
| 41 |
if total_steps == 0:
|
| 42 |
-
return
|
| 43 |
|
| 44 |
correct_decisions = 0
|
| 45 |
unnecessary_invalidations = 0
|
|
@@ -51,33 +46,23 @@ def evaluate_episode(history):
|
|
| 51 |
action = step["action"]
|
| 52 |
is_stale = step["is_stale"]
|
| 53 |
|
| 54 |
-
|
| 55 |
-
|
| 56 |
-
|
| 57 |
correct_decisions += 1
|
| 58 |
|
| 59 |
-
# ❌ unnecessary invalidation
|
| 60 |
if action == "invalidate" and not is_stale:
|
| 61 |
unnecessary_invalidations += 1
|
| 62 |
|
| 63 |
-
# ❌ oscillation (flip behavior)
|
| 64 |
if last_action and last_action != action:
|
| 65 |
oscillations += 1
|
| 66 |
|
| 67 |
last_action = action
|
| 68 |
|
| 69 |
-
# ---- normalize metrics ----
|
| 70 |
freshness = correct_decisions / total_steps
|
| 71 |
-
|
| 72 |
efficiency = 1 - (unnecessary_invalidations / total_steps)
|
| 73 |
-
|
| 74 |
stability = 1 - (oscillations / total_steps)
|
| 75 |
|
| 76 |
-
|
| 77 |
-
score = (
|
| 78 |
-
0.5 * freshness +
|
| 79 |
-
0.3 * efficiency +
|
| 80 |
-
0.2 * stability
|
| 81 |
-
)
|
| 82 |
|
| 83 |
-
return
|
|
|
|
| 1 |
+
def clamp_unit_interval(x: float) -> float:
|
| 2 |
+
"""Clamp to [0.0, 1.0] (Phase 1 / rubric)."""
|
| 3 |
+
return max(0.0, min(1.0, float(x)))
|
|
|
|
|
|
|
|
|
|
| 4 |
|
| 5 |
|
| 6 |
def compute_step_reward(action_type, is_stale):
|
|
|
|
| 17 |
|
| 18 |
return reward
|
| 19 |
|
| 20 |
+
|
| 21 |
def normalize_episode_score(total_reward, max_steps=10):
|
|
|
|
| 22 |
score = total_reward / max_steps
|
| 23 |
+
return clamp_unit_interval(score)
|
|
|
|
| 24 |
|
| 25 |
|
| 26 |
def evaluate_episode(history):
|
|
|
|
| 31 |
"is_stale": bool
|
| 32 |
}
|
| 33 |
"""
|
|
|
|
| 34 |
total_steps = len(history)
|
| 35 |
|
| 36 |
if total_steps == 0:
|
| 37 |
+
return clamp_unit_interval(0.0)
|
| 38 |
|
| 39 |
correct_decisions = 0
|
| 40 |
unnecessary_invalidations = 0
|
|
|
|
| 46 |
action = step["action"]
|
| 47 |
is_stale = step["is_stale"]
|
| 48 |
|
| 49 |
+
if (is_stale and action in ["invalidate", "refresh"]) or (
|
| 50 |
+
not is_stale and action == "keep"
|
| 51 |
+
):
|
| 52 |
correct_decisions += 1
|
| 53 |
|
|
|
|
| 54 |
if action == "invalidate" and not is_stale:
|
| 55 |
unnecessary_invalidations += 1
|
| 56 |
|
|
|
|
| 57 |
if last_action and last_action != action:
|
| 58 |
oscillations += 1
|
| 59 |
|
| 60 |
last_action = action
|
| 61 |
|
|
|
|
| 62 |
freshness = correct_decisions / total_steps
|
|
|
|
| 63 |
efficiency = 1 - (unnecessary_invalidations / total_steps)
|
|
|
|
| 64 |
stability = 1 - (oscillations / total_steps)
|
| 65 |
|
| 66 |
+
score = 0.5 * freshness + 0.3 * efficiency + 0.2 * stability
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 67 |
|
| 68 |
+
return clamp_unit_interval(score)
|
env/models.py
CHANGED
|
@@ -1,16 +1,43 @@
|
|
| 1 |
-
|
| 2 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
|
| 4 |
class CacheItem(BaseModel):
|
|
|
|
|
|
|
| 5 |
key: str
|
| 6 |
-
age: int
|
| 7 |
-
access_count: int
|
| 8 |
-
last_result: str
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
|
| 10 |
-
class
|
| 11 |
-
|
| 12 |
-
step: int
|
| 13 |
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
key: str
|
|
|
|
| 1 |
+
"""Typed OpenEnv contracts: Action, Observation, State."""
|
| 2 |
+
|
| 3 |
+
from __future__ import annotations
|
| 4 |
+
|
| 5 |
+
from typing import Literal
|
| 6 |
+
|
| 7 |
+
from openenv.core.env_server import Action, Observation, State
|
| 8 |
+
from pydantic import BaseModel, ConfigDict, Field
|
| 9 |
+
|
| 10 |
|
| 11 |
class CacheItem(BaseModel):
|
| 12 |
+
model_config = ConfigDict(extra="allow")
|
| 13 |
+
|
| 14 |
key: str
|
| 15 |
+
age: int = Field(ge=0)
|
| 16 |
+
access_count: int = Field(ge=0)
|
| 17 |
+
last_result: str
|
| 18 |
+
|
| 19 |
+
|
| 20 |
+
class CacheAction(Action):
|
| 21 |
+
"""Per-step decision for one cache key."""
|
| 22 |
+
|
| 23 |
+
type: Literal["invalidate", "refresh", "keep"]
|
| 24 |
+
key: str
|
| 25 |
+
|
| 26 |
+
|
| 27 |
+
class CacheObservation(Observation):
|
| 28 |
+
"""What the agent sees (no hidden TTL / true staleness)."""
|
| 29 |
+
|
| 30 |
+
items: list[CacheItem] = Field(default_factory=list)
|
| 31 |
+
step: int = Field(default=0, ge=0)
|
| 32 |
+
task_id: str = ""
|
| 33 |
+
final_score: float | None = Field(
|
| 34 |
+
default=None,
|
| 35 |
+
description="Episode grader output in [0,1] when done=True; else None.",
|
| 36 |
+
)
|
| 37 |
+
|
| 38 |
|
| 39 |
+
class CacheState(State):
|
| 40 |
+
"""Server-visible state (no hidden dynamics)."""
|
|
|
|
| 41 |
|
| 42 |
+
task_id: str = ""
|
| 43 |
+
items: list[CacheItem] = Field(default_factory=list)
|
|
|
env/task_graders.py
ADDED
|
@@ -0,0 +1,35 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Registered agent graders — one enabled grader per task (easy / medium / hard).
|
| 3 |
+
|
| 4 |
+
Automated checks count tasks that declare a grader and can run episode scoring.
|
| 5 |
+
All three share the same history-based rubric; difficulty is enforced by the
|
| 6 |
+
environment dynamics (items + volatility), not by different formulas.
|
| 7 |
+
"""
|
| 8 |
+
|
| 9 |
+
from __future__ import annotations
|
| 10 |
+
|
| 11 |
+
from typing import Any, Callable, Dict, List
|
| 12 |
+
|
| 13 |
+
from env.grader import evaluate_episode
|
| 14 |
+
|
| 15 |
+
History = List[Dict[str, Any]]
|
| 16 |
+
|
| 17 |
+
|
| 18 |
+
def easy_agent_grader(history: History) -> float:
|
| 19 |
+
return evaluate_episode(history)
|
| 20 |
+
|
| 21 |
+
|
| 22 |
+
def medium_agent_grader(history: History) -> float:
|
| 23 |
+
return evaluate_episode(history)
|
| 24 |
+
|
| 25 |
+
|
| 26 |
+
def hard_agent_grader(history: History) -> float:
|
| 27 |
+
return evaluate_episode(history)
|
| 28 |
+
|
| 29 |
+
|
| 30 |
+
# Explicit registry (imported by server /tasks and static analysis)
|
| 31 |
+
TASK_AGENT_GRADERS: Dict[str, Callable[[History], float]] = {
|
| 32 |
+
"easy": easy_agent_grader,
|
| 33 |
+
"medium": medium_agent_grader,
|
| 34 |
+
"hard": hard_agent_grader,
|
| 35 |
+
}
|
env/tasks.py
CHANGED
|
@@ -1,6 +1,8 @@
|
|
| 1 |
import random
|
| 2 |
|
| 3 |
-
|
|
|
|
|
|
|
| 4 |
TASK_MANIFEST = [
|
| 5 |
{
|
| 6 |
"name": "easy",
|
|
@@ -10,6 +12,8 @@ TASK_MANIFEST = [
|
|
| 10 |
"difficulty": "easy",
|
| 11 |
"max_steps": 10,
|
| 12 |
"grader": True,
|
|
|
|
|
|
|
| 13 |
"score_range": [0.0, 1.0],
|
| 14 |
},
|
| 15 |
{
|
|
@@ -20,6 +24,8 @@ TASK_MANIFEST = [
|
|
| 20 |
"difficulty": "medium",
|
| 21 |
"max_steps": 10,
|
| 22 |
"grader": True,
|
|
|
|
|
|
|
| 23 |
"score_range": [0.0, 1.0],
|
| 24 |
},
|
| 25 |
{
|
|
@@ -30,6 +36,8 @@ TASK_MANIFEST = [
|
|
| 30 |
"difficulty": "hard",
|
| 31 |
"max_steps": 10,
|
| 32 |
"grader": True,
|
|
|
|
|
|
|
| 33 |
"score_range": [0.0, 1.0],
|
| 34 |
},
|
| 35 |
]
|
|
@@ -47,8 +55,22 @@ def get_task(task_id):
|
|
| 47 |
else:
|
| 48 |
return {
|
| 49 |
"num_items": 3,
|
| 50 |
-
"volatility": 0.3
|
| 51 |
}
|
| 52 |
|
| 53 |
-
|
| 54 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
import random
|
| 2 |
|
| 3 |
+
from env.task_graders import TASK_AGENT_GRADERS
|
| 4 |
+
|
| 5 |
+
# Declared for GET /tasks + openenv.yaml (Phase 1 task/grader discovery).
|
| 6 |
TASK_MANIFEST = [
|
| 7 |
{
|
| 8 |
"name": "easy",
|
|
|
|
| 12 |
"difficulty": "easy",
|
| 13 |
"max_steps": 10,
|
| 14 |
"grader": True,
|
| 15 |
+
"grader_kind": "programmatic",
|
| 16 |
+
"grader_callable": "env.task_graders:easy_agent_grader",
|
| 17 |
"score_range": [0.0, 1.0],
|
| 18 |
},
|
| 19 |
{
|
|
|
|
| 24 |
"difficulty": "medium",
|
| 25 |
"max_steps": 10,
|
| 26 |
"grader": True,
|
| 27 |
+
"grader_kind": "programmatic",
|
| 28 |
+
"grader_callable": "env.task_graders:medium_agent_grader",
|
| 29 |
"score_range": [0.0, 1.0],
|
| 30 |
},
|
| 31 |
{
|
|
|
|
| 36 |
"difficulty": "hard",
|
| 37 |
"max_steps": 10,
|
| 38 |
"grader": True,
|
| 39 |
+
"grader_kind": "programmatic",
|
| 40 |
+
"grader_callable": "env.task_graders:hard_agent_grader",
|
| 41 |
"score_range": [0.0, 1.0],
|
| 42 |
},
|
| 43 |
]
|
|
|
|
| 55 |
else:
|
| 56 |
return {
|
| 57 |
"num_items": 3,
|
| 58 |
+
"volatility": 0.3,
|
| 59 |
}
|
| 60 |
|
| 61 |
+
|
| 62 |
+
def sample_task(rng=None):
|
| 63 |
+
r = rng if rng is not None else random
|
| 64 |
+
return r.choice(["easy", "medium", "hard"])
|
| 65 |
+
|
| 66 |
+
|
| 67 |
+
def list_graders():
|
| 68 |
+
"""Return task ids that have an enabled agent grader."""
|
| 69 |
+
return [
|
| 70 |
+
{
|
| 71 |
+
"task": name,
|
| 72 |
+
"grader_enabled": fn is not None,
|
| 73 |
+
"callable": getattr(fn, "__name__", str(fn)),
|
| 74 |
+
}
|
| 75 |
+
for name, fn in TASK_AGENT_GRADERS.items()
|
| 76 |
+
]
|
inference.py
CHANGED
|
@@ -3,14 +3,13 @@ import os
|
|
| 3 |
import sys
|
| 4 |
import textwrap
|
| 5 |
from pathlib import Path
|
| 6 |
-
from typing import List, Optional
|
| 7 |
|
| 8 |
import requests
|
| 9 |
from openai import OpenAI
|
| 10 |
|
| 11 |
-
from env.grader import
|
| 12 |
|
| 13 |
-
# Load .env from repo root so HF_TOKEN / API_BASE_URL work when you run: python inference.py
|
| 14 |
try:
|
| 15 |
from dotenv import load_dotenv
|
| 16 |
|
|
@@ -18,33 +17,36 @@ try:
|
|
| 18 |
except ImportError:
|
| 19 |
pass
|
| 20 |
|
| 21 |
-
# ---- Mandatory env (see hackathon spec) ----
|
| 22 |
API_KEY = os.getenv("HF_TOKEN") or os.getenv("API_KEY")
|
| 23 |
API_BASE_URL = os.getenv("API_BASE_URL") or "https://router.huggingface.co/v1"
|
| 24 |
-
# HF deprecated api-inference.huggingface.co (410); router is the supported OpenAI-compatible host.
|
| 25 |
-
|
| 26 |
MODEL_NAME = os.getenv("MODEL_NAME") or "Qwen/Qwen2.5-72B-Instruct"
|
| 27 |
-
|
| 28 |
-
|
|
|
|
|
|
|
| 29 |
BENCHMARK = "cache_invalidation_env"
|
| 30 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 31 |
if not API_KEY:
|
| 32 |
print(
|
| 33 |
-
"WARNING: HF_TOKEN is not set. LLM calls will fail; the script will
|
| 34 |
-
"heuristic policy
|
| 35 |
file=sys.stderr,
|
| 36 |
)
|
| 37 |
|
| 38 |
client = OpenAI(base_url=API_BASE_URL, api_key=API_KEY or "hf-invalid")
|
| 39 |
|
| 40 |
-
MEMORY = {}
|
| 41 |
-
LAST_USED = None
|
| 42 |
|
| 43 |
SYSTEM_PROMPT = textwrap.dedent(
|
| 44 |
"""
|
| 45 |
-
You are a cache invalidation agent. Given the environment
|
| 46 |
on a single line, no markdown, with keys "type" and "key". type must be one of: invalidate, refresh, keep.
|
| 47 |
-
key must match one of the item keys in
|
| 48 |
"""
|
| 49 |
).strip()
|
| 50 |
|
|
@@ -72,11 +74,11 @@ def log_end(success: bool, steps: int, score: float, rewards: List[float]) -> No
|
|
| 72 |
)
|
| 73 |
|
| 74 |
|
| 75 |
-
def select_item(
|
| 76 |
global LAST_USED
|
| 77 |
-
items =
|
| 78 |
|
| 79 |
-
def score(item):
|
| 80 |
s = 0
|
| 81 |
if item["last_result"] == "stale":
|
| 82 |
s += 3
|
|
@@ -98,7 +100,7 @@ def select_item(state, step):
|
|
| 98 |
return best
|
| 99 |
|
| 100 |
|
| 101 |
-
def decide(item, step):
|
| 102 |
key = item["key"]
|
| 103 |
last_result = item["last_result"]
|
| 104 |
age = item["age"]
|
|
@@ -123,8 +125,7 @@ def decide(item, step):
|
|
| 123 |
return {"type": "keep", "key": key}
|
| 124 |
|
| 125 |
|
| 126 |
-
def llm_action(
|
| 127 |
-
"""Call HF OpenAI-compatible API; return None on any failure so caller can fall back."""
|
| 128 |
try:
|
| 129 |
completion = client.chat.completions.create(
|
| 130 |
model=MODEL_NAME,
|
|
@@ -133,7 +134,7 @@ def llm_action(state) -> Optional[dict]:
|
|
| 133 |
{
|
| 134 |
"role": "user",
|
| 135 |
"content": (
|
| 136 |
-
f"
|
| 137 |
'Return JSON only: {"type": "...", "key": "..."}'
|
| 138 |
),
|
| 139 |
},
|
|
@@ -156,7 +157,8 @@ def llm_action(state) -> Optional[dict]:
|
|
| 156 |
return None
|
| 157 |
|
| 158 |
|
| 159 |
-
def
|
|
|
|
| 160 |
global LAST_USED
|
| 161 |
LAST_USED = None
|
| 162 |
MEMORY.clear()
|
|
@@ -165,26 +167,28 @@ def run() -> None:
|
|
| 165 |
steps_taken = 0
|
| 166 |
episode_score = 0.0
|
| 167 |
success = False
|
|
|
|
| 168 |
|
| 169 |
try:
|
| 170 |
-
score_from_env = False
|
| 171 |
res = requests.post(
|
| 172 |
-
f"{
|
| 173 |
-
json={},
|
| 174 |
headers={"Content-Type": "application/json"},
|
| 175 |
timeout=60,
|
| 176 |
)
|
| 177 |
res.raise_for_status()
|
| 178 |
body = res.json()
|
| 179 |
-
|
| 180 |
-
|
| 181 |
|
| 182 |
-
log_start(task=
|
| 183 |
|
| 184 |
for step in range(1, 11):
|
| 185 |
-
item = select_item(
|
| 186 |
|
| 187 |
-
action =
|
|
|
|
|
|
|
| 188 |
if action is None:
|
| 189 |
action = decide(item, step)
|
| 190 |
|
|
@@ -194,21 +198,22 @@ def run() -> None:
|
|
| 194 |
}
|
| 195 |
|
| 196 |
step_res = requests.post(
|
| 197 |
-
f"{
|
| 198 |
-
json=action,
|
| 199 |
headers={"Content-Type": "application/json"},
|
| 200 |
timeout=60,
|
| 201 |
)
|
| 202 |
step_res.raise_for_status()
|
| 203 |
data = step_res.json()
|
| 204 |
|
| 205 |
-
reward = float(data["reward"])
|
| 206 |
done = bool(data["done"])
|
| 207 |
rewards.append(reward)
|
| 208 |
steps_taken = step
|
| 209 |
|
| 210 |
-
|
| 211 |
-
|
|
|
|
| 212 |
score_from_env = True
|
| 213 |
|
| 214 |
log_step(
|
|
@@ -219,8 +224,7 @@ def run() -> None:
|
|
| 219 |
error=None,
|
| 220 |
)
|
| 221 |
|
| 222 |
-
|
| 223 |
-
|
| 224 |
if done:
|
| 225 |
break
|
| 226 |
|
|
@@ -229,13 +233,13 @@ def run() -> None:
|
|
| 229 |
success = avg_r > 0.3
|
| 230 |
if not score_from_env and rewards:
|
| 231 |
avg_r = sum(rewards) / len(rewards)
|
| 232 |
-
episode_score =
|
| 233 |
|
| 234 |
except Exception as exc:
|
| 235 |
success = False
|
| 236 |
print(f"[RUN] fatal: {exc}", file=sys.stderr)
|
| 237 |
finally:
|
| 238 |
-
episode_score =
|
| 239 |
log_end(
|
| 240 |
success=success,
|
| 241 |
steps=steps_taken,
|
|
@@ -244,5 +248,24 @@ def run() -> None:
|
|
| 244 |
)
|
| 245 |
|
| 246 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 247 |
if __name__ == "__main__":
|
| 248 |
run()
|
|
|
|
| 3 |
import sys
|
| 4 |
import textwrap
|
| 5 |
from pathlib import Path
|
| 6 |
+
from typing import Any, Dict, List, Optional
|
| 7 |
|
| 8 |
import requests
|
| 9 |
from openai import OpenAI
|
| 10 |
|
| 11 |
+
from env.grader import clamp_unit_interval
|
| 12 |
|
|
|
|
| 13 |
try:
|
| 14 |
from dotenv import load_dotenv
|
| 15 |
|
|
|
|
| 17 |
except ImportError:
|
| 18 |
pass
|
| 19 |
|
|
|
|
| 20 |
API_KEY = os.getenv("HF_TOKEN") or os.getenv("API_KEY")
|
| 21 |
API_BASE_URL = os.getenv("API_BASE_URL") or "https://router.huggingface.co/v1"
|
|
|
|
|
|
|
| 22 |
MODEL_NAME = os.getenv("MODEL_NAME") or "Qwen/Qwen2.5-72B-Instruct"
|
| 23 |
+
ENV_URL = os.getenv(
|
| 24 |
+
"ENV_URL",
|
| 25 |
+
"http://127.0.0.1:7860",
|
| 26 |
+
).rstrip("/")
|
| 27 |
BENCHMARK = "cache_invalidation_env"
|
| 28 |
|
| 29 |
+
# Reproducibility (Phase 1 / baseline): fixed seed + task → deterministic heuristic run.
|
| 30 |
+
EPISODE_SEED = int(os.getenv("EPISODE_SEED", "42"))
|
| 31 |
+
TASK_ID = os.getenv("TASK_ID", "easy")
|
| 32 |
+
|
| 33 |
if not API_KEY:
|
| 34 |
print(
|
| 35 |
+
"WARNING: HF_TOKEN is not set. LLM calls will fail; the script will use the "
|
| 36 |
+
"heuristic policy only.",
|
| 37 |
file=sys.stderr,
|
| 38 |
)
|
| 39 |
|
| 40 |
client = OpenAI(base_url=API_BASE_URL, api_key=API_KEY or "hf-invalid")
|
| 41 |
|
| 42 |
+
MEMORY: Dict[str, Any] = {}
|
| 43 |
+
LAST_USED: Optional[str] = None
|
| 44 |
|
| 45 |
SYSTEM_PROMPT = textwrap.dedent(
|
| 46 |
"""
|
| 47 |
+
You are a cache invalidation agent. Given the environment observation (JSON), reply with exactly one JSON object
|
| 48 |
on a single line, no markdown, with keys "type" and "key". type must be one of: invalidate, refresh, keep.
|
| 49 |
+
key must match one of the item keys in observation["items"].
|
| 50 |
"""
|
| 51 |
).strip()
|
| 52 |
|
|
|
|
| 74 |
)
|
| 75 |
|
| 76 |
|
| 77 |
+
def select_item(obs: Dict[str, Any], step: int) -> Dict[str, Any]:
|
| 78 |
global LAST_USED
|
| 79 |
+
items = obs["items"]
|
| 80 |
|
| 81 |
+
def score(item: Dict[str, Any]) -> int:
|
| 82 |
s = 0
|
| 83 |
if item["last_result"] == "stale":
|
| 84 |
s += 3
|
|
|
|
| 100 |
return best
|
| 101 |
|
| 102 |
|
| 103 |
+
def decide(item: Dict[str, Any], step: int) -> Dict[str, str]:
|
| 104 |
key = item["key"]
|
| 105 |
last_result = item["last_result"]
|
| 106 |
age = item["age"]
|
|
|
|
| 125 |
return {"type": "keep", "key": key}
|
| 126 |
|
| 127 |
|
| 128 |
+
def llm_action(obs: Dict[str, Any]) -> Optional[dict]:
|
|
|
|
| 129 |
try:
|
| 130 |
completion = client.chat.completions.create(
|
| 131 |
model=MODEL_NAME,
|
|
|
|
| 134 |
{
|
| 135 |
"role": "user",
|
| 136 |
"content": (
|
| 137 |
+
f"Observation:\n{json.dumps(obs)}\n\n"
|
| 138 |
'Return JSON only: {"type": "...", "key": "..."}'
|
| 139 |
),
|
| 140 |
},
|
|
|
|
| 157 |
return None
|
| 158 |
|
| 159 |
|
| 160 |
+
def run_episode(*, env_url: str, task_id: str, seed: int, use_llm: bool) -> None:
|
| 161 |
+
"""One episode over OpenEnv HTTP API (wrapped action + observation)."""
|
| 162 |
global LAST_USED
|
| 163 |
LAST_USED = None
|
| 164 |
MEMORY.clear()
|
|
|
|
| 167 |
steps_taken = 0
|
| 168 |
episode_score = 0.0
|
| 169 |
success = False
|
| 170 |
+
score_from_env = False
|
| 171 |
|
| 172 |
try:
|
|
|
|
| 173 |
res = requests.post(
|
| 174 |
+
f"{env_url}/reset",
|
| 175 |
+
json={"seed": seed, "task_id": task_id},
|
| 176 |
headers={"Content-Type": "application/json"},
|
| 177 |
timeout=60,
|
| 178 |
)
|
| 179 |
res.raise_for_status()
|
| 180 |
body = res.json()
|
| 181 |
+
obs = body.get("observation", body)
|
| 182 |
+
tid = str(obs.get("task_id", task_id))
|
| 183 |
|
| 184 |
+
log_start(task=tid, env=BENCHMARK, model=MODEL_NAME)
|
| 185 |
|
| 186 |
for step in range(1, 11):
|
| 187 |
+
item = select_item(obs, step)
|
| 188 |
|
| 189 |
+
action: Optional[dict] = None
|
| 190 |
+
if use_llm:
|
| 191 |
+
action = llm_action(obs)
|
| 192 |
if action is None:
|
| 193 |
action = decide(item, step)
|
| 194 |
|
|
|
|
| 198 |
}
|
| 199 |
|
| 200 |
step_res = requests.post(
|
| 201 |
+
f"{env_url}/step",
|
| 202 |
+
json={"action": action},
|
| 203 |
headers={"Content-Type": "application/json"},
|
| 204 |
timeout=60,
|
| 205 |
)
|
| 206 |
step_res.raise_for_status()
|
| 207 |
data = step_res.json()
|
| 208 |
|
| 209 |
+
reward = float(data["reward"] if data["reward"] is not None else 0.0)
|
| 210 |
done = bool(data["done"])
|
| 211 |
rewards.append(reward)
|
| 212 |
steps_taken = step
|
| 213 |
|
| 214 |
+
inner = data.get("observation", {})
|
| 215 |
+
if inner.get("final_score") is not None:
|
| 216 |
+
episode_score = float(inner["final_score"])
|
| 217 |
score_from_env = True
|
| 218 |
|
| 219 |
log_step(
|
|
|
|
| 224 |
error=None,
|
| 225 |
)
|
| 226 |
|
| 227 |
+
obs = inner
|
|
|
|
| 228 |
if done:
|
| 229 |
break
|
| 230 |
|
|
|
|
| 233 |
success = avg_r > 0.3
|
| 234 |
if not score_from_env and rewards:
|
| 235 |
avg_r = sum(rewards) / len(rewards)
|
| 236 |
+
episode_score = clamp_unit_interval((avg_r + 1.0) / 2.0)
|
| 237 |
|
| 238 |
except Exception as exc:
|
| 239 |
success = False
|
| 240 |
print(f"[RUN] fatal: {exc}", file=sys.stderr)
|
| 241 |
finally:
|
| 242 |
+
episode_score = clamp_unit_interval(episode_score)
|
| 243 |
log_end(
|
| 244 |
success=success,
|
| 245 |
steps=steps_taken,
|
|
|
|
| 248 |
)
|
| 249 |
|
| 250 |
|
| 251 |
+
def run() -> None:
|
| 252 |
+
use_llm = bool(API_KEY and API_KEY != "hf-invalid")
|
| 253 |
+
if os.getenv("RUN_ALL_TASKS", "").lower() in ("1", "true", "yes"):
|
| 254 |
+
for tid in ("easy", "medium", "hard"):
|
| 255 |
+
run_episode(
|
| 256 |
+
env_url=ENV_URL,
|
| 257 |
+
task_id=tid,
|
| 258 |
+
seed=EPISODE_SEED,
|
| 259 |
+
use_llm=use_llm,
|
| 260 |
+
)
|
| 261 |
+
return
|
| 262 |
+
run_episode(
|
| 263 |
+
env_url=ENV_URL,
|
| 264 |
+
task_id=TASK_ID,
|
| 265 |
+
seed=EPISODE_SEED,
|
| 266 |
+
use_llm=use_llm,
|
| 267 |
+
)
|
| 268 |
+
|
| 269 |
+
|
| 270 |
if __name__ == "__main__":
|
| 271 |
run()
|
openenv.yaml
CHANGED
|
@@ -1,8 +1,14 @@
|
|
|
|
|
| 1 |
name: cache_invalidation_env
|
| 2 |
version: "1.0.0"
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
description: >
|
| 4 |
-
|
| 5 |
-
|
|
|
|
| 6 |
|
| 7 |
tasks:
|
| 8 |
- name: easy
|
|
@@ -10,6 +16,8 @@ tasks:
|
|
| 10 |
difficulty: easy
|
| 11 |
max_steps: 10
|
| 12 |
grader: true
|
|
|
|
|
|
|
| 13 |
score_range: [0.0, 1.0]
|
| 14 |
|
| 15 |
- name: medium
|
|
@@ -17,31 +25,24 @@ tasks:
|
|
| 17 |
difficulty: medium
|
| 18 |
max_steps: 10
|
| 19 |
grader: true
|
|
|
|
|
|
|
| 20 |
score_range: [0.0, 1.0]
|
| 21 |
|
| 22 |
- name: hard
|
| 23 |
-
description: "Most items and high volatility; staleness signal
|
| 24 |
difficulty: hard
|
| 25 |
max_steps: 10
|
| 26 |
grader: true
|
|
|
|
|
|
|
| 27 |
score_range: [0.0, 1.0]
|
| 28 |
|
| 29 |
-
actions:
|
| 30 |
-
type: object
|
| 31 |
-
properties:
|
| 32 |
-
type:
|
| 33 |
-
type: string
|
| 34 |
-
key:
|
| 35 |
-
type: string
|
| 36 |
-
|
| 37 |
-
observations:
|
| 38 |
-
type: object
|
| 39 |
-
|
| 40 |
-
reward:
|
| 41 |
-
type: float
|
| 42 |
-
|
| 43 |
endpoints:
|
| 44 |
reset: POST /reset
|
| 45 |
step: POST /step
|
| 46 |
state: GET /state
|
|
|
|
|
|
|
|
|
|
| 47 |
tasks: GET /tasks
|
|
|
|
| 1 |
+
spec_version: 1
|
| 2 |
name: cache_invalidation_env
|
| 3 |
version: "1.0.0"
|
| 4 |
+
type: space
|
| 5 |
+
runtime: fastapi
|
| 6 |
+
app: server.app:app
|
| 7 |
+
port: 7860
|
| 8 |
description: >
|
| 9 |
+
Cache invalidation under uncertainty: agents choose invalidate, refresh, or keep per step
|
| 10 |
+
from noisy hit/stale observations. Three difficulty tasks (easy → hard), each with a
|
| 11 |
+
programmatic episode grader (final_score in [0,1]).
|
| 12 |
|
| 13 |
tasks:
|
| 14 |
- name: easy
|
|
|
|
| 16 |
difficulty: easy
|
| 17 |
max_steps: 10
|
| 18 |
grader: true
|
| 19 |
+
grader_kind: programmatic
|
| 20 |
+
grader_callable: env.task_graders:easy_agent_grader
|
| 21 |
score_range: [0.0, 1.0]
|
| 22 |
|
| 23 |
- name: medium
|
|
|
|
| 25 |
difficulty: medium
|
| 26 |
max_steps: 10
|
| 27 |
grader: true
|
| 28 |
+
grader_kind: programmatic
|
| 29 |
+
grader_callable: env.task_graders:medium_agent_grader
|
| 30 |
score_range: [0.0, 1.0]
|
| 31 |
|
| 32 |
- name: hard
|
| 33 |
+
description: "Most items and high volatility; noisy staleness signal and harder tradeoffs."
|
| 34 |
difficulty: hard
|
| 35 |
max_steps: 10
|
| 36 |
grader: true
|
| 37 |
+
grader_kind: programmatic
|
| 38 |
+
grader_callable: env.task_graders:hard_agent_grader
|
| 39 |
score_range: [0.0, 1.0]
|
| 40 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 41 |
endpoints:
|
| 42 |
reset: POST /reset
|
| 43 |
step: POST /step
|
| 44 |
state: GET /state
|
| 45 |
+
schema: GET /schema
|
| 46 |
+
metadata: GET /metadata
|
| 47 |
+
health: GET /health
|
| 48 |
tasks: GET /tasks
|
pyproject.toml
CHANGED
|
@@ -17,6 +17,9 @@ dependencies = [
|
|
| 17 |
"python-dotenv>=1.0.0",
|
| 18 |
]
|
| 19 |
|
|
|
|
|
|
|
|
|
|
| 20 |
[project.scripts]
|
| 21 |
server = "server.app:main"
|
| 22 |
|
|
|
|
| 17 |
"python-dotenv>=1.0.0",
|
| 18 |
]
|
| 19 |
|
| 20 |
+
[project.optional-dependencies]
|
| 21 |
+
dev = ["pytest>=8.0"]
|
| 22 |
+
|
| 23 |
[project.scripts]
|
| 24 |
server = "server.app:main"
|
| 25 |
|
server/app.py
CHANGED
|
@@ -1,12 +1,59 @@
|
|
| 1 |
-
"""OpenEnv
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
|
| 3 |
import uvicorn
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
|
| 6 |
-
def main(host: str = "0.0.0.0", port: int = 7860):
|
| 7 |
-
from app import app as fastapi_app
|
| 8 |
|
| 9 |
-
|
|
|
|
|
|
|
|
|
|
| 10 |
|
| 11 |
|
| 12 |
if __name__ == "__main__":
|
|
|
|
| 1 |
+
"""OpenEnv FastAPI server: full HTTPEnvServer + task/grader discovery routes."""
|
| 2 |
+
|
| 3 |
+
from __future__ import annotations
|
| 4 |
+
|
| 5 |
+
import os
|
| 6 |
+
from typing import Optional
|
| 7 |
|
| 8 |
import uvicorn
|
| 9 |
+
from openenv.core.env_server import create_fastapi_app
|
| 10 |
+
|
| 11 |
+
from env.cache_environment import CacheInvalidationEnvironment
|
| 12 |
+
from env.models import CacheAction, CacheObservation
|
| 13 |
+
from env.task_graders import TASK_AGENT_GRADERS
|
| 14 |
+
from env.tasks import TASK_MANIFEST, list_graders
|
| 15 |
+
|
| 16 |
+
_singleton: CacheInvalidationEnvironment | None = None
|
| 17 |
+
|
| 18 |
+
|
| 19 |
+
def _env_factory() -> CacheInvalidationEnvironment:
|
| 20 |
+
global _singleton
|
| 21 |
+
if _singleton is None:
|
| 22 |
+
_singleton = CacheInvalidationEnvironment()
|
| 23 |
+
return _singleton
|
| 24 |
+
|
| 25 |
+
|
| 26 |
+
app = create_fastapi_app(
|
| 27 |
+
_env_factory,
|
| 28 |
+
CacheAction,
|
| 29 |
+
CacheObservation,
|
| 30 |
+
max_concurrent_envs=1,
|
| 31 |
+
)
|
| 32 |
+
|
| 33 |
|
| 34 |
+
@app.get(
|
| 35 |
+
"/tasks",
|
| 36 |
+
tags=["Environment Info"],
|
| 37 |
+
summary="List tasks and grader registration",
|
| 38 |
+
)
|
| 39 |
+
def http_list_tasks():
|
| 40 |
+
return {
|
| 41 |
+
"tasks": TASK_MANIFEST,
|
| 42 |
+
"graders": list_graders(),
|
| 43 |
+
"grader_registry": {
|
| 44 |
+
name: {
|
| 45 |
+
"enabled": True,
|
| 46 |
+
"qualified_name": f"{fn.__module__}:{fn.__name__}",
|
| 47 |
+
}
|
| 48 |
+
for name, fn in TASK_AGENT_GRADERS.items()
|
| 49 |
+
},
|
| 50 |
+
}
|
| 51 |
|
|
|
|
|
|
|
| 52 |
|
| 53 |
+
def main(host: Optional[str] = None, port: Optional[int] = None) -> None:
|
| 54 |
+
host = host or os.environ.get("HOST", "0.0.0.0")
|
| 55 |
+
port = int(port or os.environ.get("PORT", "7860"))
|
| 56 |
+
uvicorn.run(app, host=host, port=port)
|
| 57 |
|
| 58 |
|
| 59 |
if __name__ == "__main__":
|
tests/conftest.py
ADDED
|
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import pytest
|
| 2 |
+
|
| 3 |
+
|
| 4 |
+
@pytest.fixture(autouse=True)
|
| 5 |
+
def reset_env_singleton():
|
| 6 |
+
import server.app as sa
|
| 7 |
+
|
| 8 |
+
sa._singleton = None
|
| 9 |
+
yield
|
| 10 |
+
sa._singleton = None
|
tests/test_phase1.py
ADDED
|
@@ -0,0 +1,73 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Phase 1 gates: OpenEnv HTTP, three tasks, graders in [0,1], reproducible seed."""
|
| 2 |
+
|
| 3 |
+
import pytest
|
| 4 |
+
from fastapi.testclient import TestClient
|
| 5 |
+
|
| 6 |
+
from env.grader import clamp_unit_interval, evaluate_episode
|
| 7 |
+
from env.task_graders import TASK_AGENT_GRADERS
|
| 8 |
+
from server.app import app
|
| 9 |
+
|
| 10 |
+
|
| 11 |
+
@pytest.fixture
|
| 12 |
+
def client():
|
| 13 |
+
return TestClient(app)
|
| 14 |
+
|
| 15 |
+
|
| 16 |
+
def test_tasks_endpoint_three_graders(client):
|
| 17 |
+
r = client.get("/tasks")
|
| 18 |
+
assert r.status_code == 200
|
| 19 |
+
data = r.json()
|
| 20 |
+
assert len(data["tasks"]) >= 3
|
| 21 |
+
enabled = [t for t in data["tasks"] if t.get("grader")]
|
| 22 |
+
assert len(enabled) >= 3
|
| 23 |
+
assert len(data["grader_registry"]) >= 3
|
| 24 |
+
|
| 25 |
+
|
| 26 |
+
def test_each_task_grader_returns_unit_interval():
|
| 27 |
+
history = [
|
| 28 |
+
{"action": "keep", "is_stale": False},
|
| 29 |
+
{"action": "invalidate", "is_stale": True},
|
| 30 |
+
]
|
| 31 |
+
for name, fn in TASK_AGENT_GRADERS.items():
|
| 32 |
+
s = fn(history)
|
| 33 |
+
assert 0.0 <= s <= 1.0, (name, s)
|
| 34 |
+
|
| 35 |
+
|
| 36 |
+
def test_reset_step_openenv_shape(client):
|
| 37 |
+
r = client.post("/reset", json={"seed": 123, "task_id": "medium"})
|
| 38 |
+
assert r.status_code == 200
|
| 39 |
+
body = r.json()
|
| 40 |
+
assert set(body.keys()) >= {"observation", "reward", "done"}
|
| 41 |
+
obs = body["observation"]
|
| 42 |
+
assert obs["task_id"] == "medium"
|
| 43 |
+
key = obs["items"][0]["key"]
|
| 44 |
+
s = client.post("/step", json={"action": {"type": "keep", "key": key}})
|
| 45 |
+
assert s.status_code == 200
|
| 46 |
+
assert "observation" in s.json()
|
| 47 |
+
|
| 48 |
+
|
| 49 |
+
def test_reproducible_reset_seed(client):
|
| 50 |
+
a = client.post("/reset", json={"seed": 999, "task_id": "easy"}).json()["observation"]
|
| 51 |
+
b = client.post("/reset", json={"seed": 999, "task_id": "easy"}).json()["observation"]
|
| 52 |
+
assert a["items"] == b["items"]
|
| 53 |
+
|
| 54 |
+
|
| 55 |
+
def test_final_score_in_range(client):
|
| 56 |
+
r = client.post("/reset", json={"seed": 0, "task_id": "easy"})
|
| 57 |
+
obs = r.json()["observation"]
|
| 58 |
+
final = None
|
| 59 |
+
for _ in range(12):
|
| 60 |
+
k = obs["items"][0]["key"]
|
| 61 |
+
d = client.post("/step", json={"action": {"type": "keep", "key": k}}).json()
|
| 62 |
+
obs = d["observation"]
|
| 63 |
+
if obs.get("final_score") is not None:
|
| 64 |
+
final = obs["final_score"]
|
| 65 |
+
break
|
| 66 |
+
assert final is not None
|
| 67 |
+
assert 0.0 <= final <= 1.0
|
| 68 |
+
|
| 69 |
+
|
| 70 |
+
def test_clamp_unit_interval():
|
| 71 |
+
assert clamp_unit_interval(-1) == 0.0
|
| 72 |
+
assert clamp_unit_interval(2) == 1.0
|
| 73 |
+
assert evaluate_episode([]) == 0.0
|
uv.lock
CHANGED
|
@@ -234,16 +234,23 @@ dependencies = [
|
|
| 234 |
{ name = "uvicorn", extra = ["standard"] },
|
| 235 |
]
|
| 236 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 237 |
[package.metadata]
|
| 238 |
requires-dist = [
|
| 239 |
{ name = "fastapi", specifier = ">=0.100.0" },
|
| 240 |
{ name = "openai", specifier = ">=1.0.0" },
|
| 241 |
{ name = "openenv-core", extras = ["core"], specifier = ">=0.2.2" },
|
| 242 |
{ name = "pydantic", specifier = ">=2.0.0" },
|
|
|
|
| 243 |
{ name = "python-dotenv", specifier = ">=1.0.0" },
|
| 244 |
{ name = "requests", specifier = ">=2.28.0" },
|
| 245 |
{ name = "uvicorn", extras = ["standard"], specifier = ">=0.22.0" },
|
| 246 |
]
|
|
|
|
| 247 |
|
| 248 |
[[package]]
|
| 249 |
name = "cachetools"
|
|
@@ -956,6 +963,15 @@ wheels = [
|
|
| 956 |
{ url = "https://files.pythonhosted.org/packages/fa/5e/f8e9a1d23b9c20a551a8a02ea3637b4642e22c2626e3a13a9a29cdea99eb/importlib_metadata-8.7.1-py3-none-any.whl", hash = "sha256:5a1f80bf1daa489495071efbb095d75a634cf28a8bc299581244063b53176151", size = 27865, upload-time = "2025-12-21T10:00:18.329Z" },
|
| 957 |
]
|
| 958 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 959 |
[[package]]
|
| 960 |
name = "jaraco-classes"
|
| 961 |
version = "3.4.0"
|
|
@@ -1893,6 +1909,15 @@ wheels = [
|
|
| 1893 |
{ url = "https://files.pythonhosted.org/packages/63/d7/97f7e3a6abb67d8080dd406fd4df842c2be0efaf712d1c899c32a075027c/platformdirs-4.9.4-py3-none-any.whl", hash = "sha256:68a9a4619a666ea6439f2ff250c12a853cd1cbd5158d258bd824a7df6be2f868", size = 21216, upload-time = "2026-03-05T18:34:12.172Z" },
|
| 1894 |
]
|
| 1895 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1896 |
[[package]]
|
| 1897 |
name = "py-key-value-aio"
|
| 1898 |
version = "0.4.4"
|
|
@@ -2123,6 +2148,24 @@ wheels = [
|
|
| 2123 |
{ url = "https://files.pythonhosted.org/packages/df/80/fc9d01d5ed37ba4c42ca2b55b4339ae6e200b456be3a1aaddf4a9fa99b8c/pyperclip-1.11.0-py3-none-any.whl", hash = "sha256:299403e9ff44581cb9ba2ffeed69c7aa96a008622ad0c46cb575ca75b5b84273", size = 11063, upload-time = "2025-09-26T14:40:36.069Z" },
|
| 2124 |
]
|
| 2125 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2126 |
[[package]]
|
| 2127 |
name = "python-dateutil"
|
| 2128 |
version = "2.9.0.post0"
|
|
|
|
| 234 |
{ name = "uvicorn", extra = ["standard"] },
|
| 235 |
]
|
| 236 |
|
| 237 |
+
[package.optional-dependencies]
|
| 238 |
+
dev = [
|
| 239 |
+
{ name = "pytest" },
|
| 240 |
+
]
|
| 241 |
+
|
| 242 |
[package.metadata]
|
| 243 |
requires-dist = [
|
| 244 |
{ name = "fastapi", specifier = ">=0.100.0" },
|
| 245 |
{ name = "openai", specifier = ">=1.0.0" },
|
| 246 |
{ name = "openenv-core", extras = ["core"], specifier = ">=0.2.2" },
|
| 247 |
{ name = "pydantic", specifier = ">=2.0.0" },
|
| 248 |
+
{ name = "pytest", marker = "extra == 'dev'", specifier = ">=8.0" },
|
| 249 |
{ name = "python-dotenv", specifier = ">=1.0.0" },
|
| 250 |
{ name = "requests", specifier = ">=2.28.0" },
|
| 251 |
{ name = "uvicorn", extras = ["standard"], specifier = ">=0.22.0" },
|
| 252 |
]
|
| 253 |
+
provides-extras = ["dev"]
|
| 254 |
|
| 255 |
[[package]]
|
| 256 |
name = "cachetools"
|
|
|
|
| 963 |
{ url = "https://files.pythonhosted.org/packages/fa/5e/f8e9a1d23b9c20a551a8a02ea3637b4642e22c2626e3a13a9a29cdea99eb/importlib_metadata-8.7.1-py3-none-any.whl", hash = "sha256:5a1f80bf1daa489495071efbb095d75a634cf28a8bc299581244063b53176151", size = 27865, upload-time = "2025-12-21T10:00:18.329Z" },
|
| 964 |
]
|
| 965 |
|
| 966 |
+
[[package]]
|
| 967 |
+
name = "iniconfig"
|
| 968 |
+
version = "2.3.0"
|
| 969 |
+
source = { registry = "https://pypi.org/simple" }
|
| 970 |
+
sdist = { url = "https://files.pythonhosted.org/packages/72/34/14ca021ce8e5dfedc35312d08ba8bf51fdd999c576889fc2c24cb97f4f10/iniconfig-2.3.0.tar.gz", hash = "sha256:c76315c77db068650d49c5b56314774a7804df16fee4402c1f19d6d15d8c4730", size = 20503, upload-time = "2025-10-18T21:55:43.219Z" }
|
| 971 |
+
wheels = [
|
| 972 |
+
{ url = "https://files.pythonhosted.org/packages/cb/b1/3846dd7f199d53cb17f49cba7e651e9ce294d8497c8c150530ed11865bb8/iniconfig-2.3.0-py3-none-any.whl", hash = "sha256:f631c04d2c48c52b84d0d0549c99ff3859c98df65b3101406327ecc7d53fbf12", size = 7484, upload-time = "2025-10-18T21:55:41.639Z" },
|
| 973 |
+
]
|
| 974 |
+
|
| 975 |
[[package]]
|
| 976 |
name = "jaraco-classes"
|
| 977 |
version = "3.4.0"
|
|
|
|
| 1909 |
{ url = "https://files.pythonhosted.org/packages/63/d7/97f7e3a6abb67d8080dd406fd4df842c2be0efaf712d1c899c32a075027c/platformdirs-4.9.4-py3-none-any.whl", hash = "sha256:68a9a4619a666ea6439f2ff250c12a853cd1cbd5158d258bd824a7df6be2f868", size = 21216, upload-time = "2026-03-05T18:34:12.172Z" },
|
| 1910 |
]
|
| 1911 |
|
| 1912 |
+
[[package]]
|
| 1913 |
+
name = "pluggy"
|
| 1914 |
+
version = "1.6.0"
|
| 1915 |
+
source = { registry = "https://pypi.org/simple" }
|
| 1916 |
+
sdist = { url = "https://files.pythonhosted.org/packages/f9/e2/3e91f31a7d2b083fe6ef3fa267035b518369d9511ffab804f839851d2779/pluggy-1.6.0.tar.gz", hash = "sha256:7dcc130b76258d33b90f61b658791dede3486c3e6bfb003ee5c9bfb396dd22f3", size = 69412, upload-time = "2025-05-15T12:30:07.975Z" }
|
| 1917 |
+
wheels = [
|
| 1918 |
+
{ url = "https://files.pythonhosted.org/packages/54/20/4d324d65cc6d9205fabedc306948156824eb9f0ee1633355a8f7ec5c66bf/pluggy-1.6.0-py3-none-any.whl", hash = "sha256:e920276dd6813095e9377c0bc5566d94c932c33b27a3e3945d8389c374dd4746", size = 20538, upload-time = "2025-05-15T12:30:06.134Z" },
|
| 1919 |
+
]
|
| 1920 |
+
|
| 1921 |
[[package]]
|
| 1922 |
name = "py-key-value-aio"
|
| 1923 |
version = "0.4.4"
|
|
|
|
| 2148 |
{ url = "https://files.pythonhosted.org/packages/df/80/fc9d01d5ed37ba4c42ca2b55b4339ae6e200b456be3a1aaddf4a9fa99b8c/pyperclip-1.11.0-py3-none-any.whl", hash = "sha256:299403e9ff44581cb9ba2ffeed69c7aa96a008622ad0c46cb575ca75b5b84273", size = 11063, upload-time = "2025-09-26T14:40:36.069Z" },
|
| 2149 |
]
|
| 2150 |
|
| 2151 |
+
[[package]]
|
| 2152 |
+
name = "pytest"
|
| 2153 |
+
version = "9.0.3"
|
| 2154 |
+
source = { registry = "https://pypi.org/simple" }
|
| 2155 |
+
dependencies = [
|
| 2156 |
+
{ name = "colorama", marker = "sys_platform == 'win32'" },
|
| 2157 |
+
{ name = "exceptiongroup", marker = "python_full_version < '3.11'" },
|
| 2158 |
+
{ name = "iniconfig" },
|
| 2159 |
+
{ name = "packaging" },
|
| 2160 |
+
{ name = "pluggy" },
|
| 2161 |
+
{ name = "pygments" },
|
| 2162 |
+
{ name = "tomli", marker = "python_full_version < '3.11'" },
|
| 2163 |
+
]
|
| 2164 |
+
sdist = { url = "https://files.pythonhosted.org/packages/7d/0d/549bd94f1a0a402dc8cf64563a117c0f3765662e2e668477624baeec44d5/pytest-9.0.3.tar.gz", hash = "sha256:b86ada508af81d19edeb213c681b1d48246c1a91d304c6c81a427674c17eb91c", size = 1572165, upload-time = "2026-04-07T17:16:18.027Z" }
|
| 2165 |
+
wheels = [
|
| 2166 |
+
{ url = "https://files.pythonhosted.org/packages/d4/24/a372aaf5c9b7208e7112038812994107bc65a84cd00e0354a88c2c77a617/pytest-9.0.3-py3-none-any.whl", hash = "sha256:2c5efc453d45394fdd706ade797c0a81091eccd1d6e4bccfcd476e2b8e0ab5d9", size = 375249, upload-time = "2026-04-07T17:16:16.13Z" },
|
| 2167 |
+
]
|
| 2168 |
+
|
| 2169 |
[[package]]
|
| 2170 |
name = "python-dateutil"
|
| 2171 |
version = "2.9.0.post0"
|