Spaces:
Sleeping
Sleeping
Parv Pareek commited on
Commit ·
4f8cf04
1
Parent(s): 748aaa7
fix: projet structure
Browse files- .dockerignore +16 -0
- .gitignore +12 -0
- Dockerfile +6 -3
- README.md +299 -0
- app.py +5 -2
- env/__init__.py +1 -0
- env/core.py +8 -2
- env/grader.py +57 -0
- inference.py +132 -44
- pyproject.toml +24 -0
- requirements.txt +2 -1
- server/__init__.py +1 -0
- server/app.py +13 -0
- uv.lock +0 -0
- validate-submission.sh +191 -0
.dockerignore
ADDED
|
@@ -0,0 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Shrink build context (without this, COPY . . sends .git and stalls Step 2 locally)
|
| 2 |
+
.git
|
| 3 |
+
.gitattributes
|
| 4 |
+
**/__pycache__
|
| 5 |
+
**/*.py[cod]
|
| 6 |
+
**/.pytest_cache
|
| 7 |
+
.venv
|
| 8 |
+
venv
|
| 9 |
+
.env
|
| 10 |
+
.env.*
|
| 11 |
+
*.egg-info
|
| 12 |
+
.eggs
|
| 13 |
+
dist
|
| 14 |
+
build
|
| 15 |
+
state.json
|
| 16 |
+
.DS_Store
|
.gitignore
ADDED
|
@@ -0,0 +1,12 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Secrets — never commit (use HF Space secrets / CI env vars)
|
| 2 |
+
.env
|
| 3 |
+
.env.*
|
| 4 |
+
|
| 5 |
+
# Local run artifacts
|
| 6 |
+
state.json
|
| 7 |
+
|
| 8 |
+
# Python
|
| 9 |
+
__pycache__/
|
| 10 |
+
*.py[cod]
|
| 11 |
+
.venv/
|
| 12 |
+
venv/
|
Dockerfile
CHANGED
|
@@ -1,8 +1,11 @@
|
|
| 1 |
-
FROM python:3.10
|
| 2 |
|
| 3 |
WORKDIR /app
|
| 4 |
-
COPY . .
|
| 5 |
|
| 6 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
|
| 8 |
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860"]
|
|
|
|
| 1 |
+
FROM python:3.10-slim
|
| 2 |
|
| 3 |
WORKDIR /app
|
|
|
|
| 4 |
|
| 5 |
+
# Install deps before copying full tree (faster rebuilds, smaller COPY layer)
|
| 6 |
+
COPY requirements.txt .
|
| 7 |
+
RUN pip install --no-cache-dir -r requirements.txt
|
| 8 |
+
|
| 9 |
+
COPY . .
|
| 10 |
|
| 11 |
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860"]
|
README.md
CHANGED
|
@@ -8,3 +8,302 @@ pinned: false
|
|
| 8 |
---
|
| 9 |
|
| 10 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
---
|
| 9 |
|
| 10 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
| 11 |
+
|
| 12 |
+
|
| 13 |
+
# 🧠 Cache Invalidation Environment (OpenEnv)
|
| 14 |
+
|
| 15 |
+
## 📌 Overview
|
| 16 |
+
|
| 17 |
+
This project implements a **real-world cache invalidation decision environment** using the OpenEnv specification.
|
| 18 |
+
|
| 19 |
+
Cache invalidation is a fundamental systems problem: deciding **when to refresh cached data vs reuse it**. Acting too early wastes resources, while acting too late serves stale data.
|
| 20 |
+
|
| 21 |
+
This environment simulates that tradeoff under **uncertainty and noisy signals**, allowing evaluation of agent decision-making.
|
| 22 |
+
|
| 23 |
+
---
|
| 24 |
+
|
| 25 |
+
## 🎯 Motivation
|
| 26 |
+
|
| 27 |
+
Cache invalidation is widely used in:
|
| 28 |
+
|
| 29 |
+
* Distributed systems
|
| 30 |
+
* Web backends
|
| 31 |
+
* CDNs and edge caching
|
| 32 |
+
* Databases
|
| 33 |
+
|
| 34 |
+
This environment models a **practical decision problem engineers face daily**, making it useful for evaluating reasoning-based agents.
|
| 35 |
+
|
| 36 |
+
---
|
| 37 |
+
|
| 38 |
+
## 🧩 Environment Design
|
| 39 |
+
|
| 40 |
+
### State (Observation)
|
| 41 |
+
|
| 42 |
+
Each step returns:
|
| 43 |
+
|
| 44 |
+
```json
|
| 45 |
+
{
|
| 46 |
+
"items": [
|
| 47 |
+
{
|
| 48 |
+
"key": "item_0",
|
| 49 |
+
"age": 5,
|
| 50 |
+
"access_count": 12,
|
| 51 |
+
"last_result": "hit"
|
| 52 |
+
}
|
| 53 |
+
],
|
| 54 |
+
"step": 3,
|
| 55 |
+
"task_id": "medium"
|
| 56 |
+
}
|
| 57 |
+
```
|
| 58 |
+
|
| 59 |
+
#### Field meanings:
|
| 60 |
+
|
| 61 |
+
* `age`: time since last refresh
|
| 62 |
+
* `access_count`: usage frequency
|
| 63 |
+
* `last_result`: "hit" or "stale" (noisy signal)
|
| 64 |
+
* `task_id`: difficulty level
|
| 65 |
+
|
| 66 |
+
---
|
| 67 |
+
|
| 68 |
+
### Actions
|
| 69 |
+
|
| 70 |
+
Agent must return:
|
| 71 |
+
|
| 72 |
+
```json
|
| 73 |
+
{
|
| 74 |
+
"type": "invalidate | refresh | keep",
|
| 75 |
+
"key": "item_id"
|
| 76 |
+
}
|
| 77 |
+
```
|
| 78 |
+
|
| 79 |
+
#### Action meanings:
|
| 80 |
+
|
| 81 |
+
* `invalidate`: reset cache (high cost, correct if stale)
|
| 82 |
+
* `refresh`: partial reset (safe but weaker)
|
| 83 |
+
* `keep`: do nothing (efficient if data is fresh)
|
| 84 |
+
|
| 85 |
+
---
|
| 86 |
+
|
| 87 |
+
### Hidden Dynamics
|
| 88 |
+
|
| 89 |
+
The true cache state is **not directly observable**.
|
| 90 |
+
|
| 91 |
+
Staleness depends on:
|
| 92 |
+
|
| 93 |
+
* base TTL
|
| 94 |
+
* update frequency
|
| 95 |
+
* time since last update
|
| 96 |
+
|
| 97 |
+
Observations are **noisy**, requiring inference.
|
| 98 |
+
|
| 99 |
+
---
|
| 100 |
+
|
| 101 |
+
## 🎯 Tasks
|
| 102 |
+
|
| 103 |
+
Three tasks with increasing difficulty:
|
| 104 |
+
|
| 105 |
+
### 🟢 Easy
|
| 106 |
+
|
| 107 |
+
* Few items
|
| 108 |
+
* Low volatility
|
| 109 |
+
* Clear signals
|
| 110 |
+
|
| 111 |
+
### 🟡 Medium
|
| 112 |
+
|
| 113 |
+
* Moderate noise
|
| 114 |
+
* Conflicting signals
|
| 115 |
+
* Requires reasoning
|
| 116 |
+
|
| 117 |
+
### 🔴 Hard
|
| 118 |
+
|
| 119 |
+
* High volatility
|
| 120 |
+
* Frequent updates
|
| 121 |
+
* Misleading signals
|
| 122 |
+
|
| 123 |
+
---
|
| 124 |
+
|
| 125 |
+
## 🏆 Reward Function
|
| 126 |
+
|
| 127 |
+
Reward is given at every step:
|
| 128 |
+
|
| 129 |
+
| Action | Correct Case | Reward |
|
| 130 |
+
| ---------- | ------------ | ------ |
|
| 131 |
+
| invalidate | stale | +1.0 |
|
| 132 |
+
| invalidate | fresh | -0.5 |
|
| 133 |
+
| keep | fresh | +0.8 |
|
| 134 |
+
| keep | stale | -0.6 |
|
| 135 |
+
| refresh | stale | +0.6 |
|
| 136 |
+
| refresh | fresh | +0.2 |
|
| 137 |
+
|
| 138 |
+
This provides:
|
| 139 |
+
|
| 140 |
+
* dense feedback
|
| 141 |
+
* partial credit
|
| 142 |
+
* penalty for poor decisions
|
| 143 |
+
|
| 144 |
+
---
|
| 145 |
+
|
| 146 |
+
## 📊 Episode
|
| 147 |
+
|
| 148 |
+
* Fixed length: 10 steps
|
| 149 |
+
* Final score: average reward (normalized to [0,1])
|
| 150 |
+
|
| 151 |
+
---
|
| 152 |
+
|
| 153 |
+
## 🤖 Baseline Agent
|
| 154 |
+
|
| 155 |
+
The baseline agent uses:
|
| 156 |
+
|
| 157 |
+
* heuristic decision policy
|
| 158 |
+
* short-term memory (to avoid repeated mistakes)
|
| 159 |
+
* optional LLM reasoning
|
| 160 |
+
|
| 161 |
+
### Example score
|
| 162 |
+
|
| 163 |
+
| Task | Score |
|
| 164 |
+
| ------ | -------- |
|
| 165 |
+
| Easy | ~4.5–6.5 |
|
| 166 |
+
| Medium | ~3.5–5.5 |
|
| 167 |
+
| Hard | ~2.5–4.5 |
|
| 168 |
+
|
| 169 |
+
---
|
| 170 |
+
|
| 171 |
+
## 🚀 Running the Environment
|
| 172 |
+
|
| 173 |
+
### 1. Local
|
| 174 |
+
|
| 175 |
+
```bash
|
| 176 |
+
pip install -r requirements.txt
|
| 177 |
+
uvicorn app:app --reload
|
| 178 |
+
```
|
| 179 |
+
|
| 180 |
+
---
|
| 181 |
+
|
| 182 |
+
### 2. API Endpoints
|
| 183 |
+
|
| 184 |
+
#### Reset
|
| 185 |
+
|
| 186 |
+
```bash
|
| 187 |
+
curl -X POST http://localhost:8000/reset
|
| 188 |
+
```
|
| 189 |
+
|
| 190 |
+
#### Step
|
| 191 |
+
|
| 192 |
+
```bash
|
| 193 |
+
curl -X POST http://localhost:8000/step \
|
| 194 |
+
-H "Content-Type: application/json" \
|
| 195 |
+
-d '{"type":"keep","key":"item_0"}'
|
| 196 |
+
```
|
| 197 |
+
|
| 198 |
+
#### State
|
| 199 |
+
|
| 200 |
+
```bash
|
| 201 |
+
curl http://localhost:8000/state
|
| 202 |
+
```
|
| 203 |
+
|
| 204 |
+
---
|
| 205 |
+
|
| 206 |
+
## 🤗 Hugging Face Deployment
|
| 207 |
+
|
| 208 |
+
Live endpoint:
|
| 209 |
+
|
| 210 |
+
```
|
| 211 |
+
https://parvpareek-cache-env.hf.space
|
| 212 |
+
```
|
| 213 |
+
|
| 214 |
+
Test:
|
| 215 |
+
|
| 216 |
+
```bash
|
| 217 |
+
curl -X POST https://parvpareek-cache-env.hf.space/reset
|
| 218 |
+
```
|
| 219 |
+
|
| 220 |
+
---
|
| 221 |
+
|
| 222 |
+
## 🐳 Docker
|
| 223 |
+
|
| 224 |
+
```bash
|
| 225 |
+
docker build -t cache-env .
|
| 226 |
+
docker run -p 7860:7860 cache-env
|
| 227 |
+
```
|
| 228 |
+
|
| 229 |
+
---
|
| 230 |
+
|
| 231 |
+
## ⚙️ Environment Variables
|
| 232 |
+
|
| 233 |
+
Required for inference:
|
| 234 |
+
|
| 235 |
+
```bash
|
| 236 |
+
API_BASE_URL=<llm_endpoint>
|
| 237 |
+
MODEL_NAME=<model_name>
|
| 238 |
+
HF_TOKEN=<api_key>
|
| 239 |
+
```
|
| 240 |
+
|
| 241 |
+
---
|
| 242 |
+
|
| 243 |
+
## 📁 Project Structure
|
| 244 |
+
|
| 245 |
+
```
|
| 246 |
+
.
|
| 247 |
+
├── app.py
|
| 248 |
+
├── env/
|
| 249 |
+
│ ├── core.py
|
| 250 |
+
│ ├── generator.py
|
| 251 |
+
│ ├── grader.py
|
| 252 |
+
│ ├── models.py
|
| 253 |
+
│ └── tasks.py
|
| 254 |
+
├── inference.py
|
| 255 |
+
├── openenv.yaml
|
| 256 |
+
├── Dockerfile
|
| 257 |
+
└── README.md
|
| 258 |
+
```
|
| 259 |
+
|
| 260 |
+
---
|
| 261 |
+
|
| 262 |
+
## ✅ OpenEnv Compliance
|
| 263 |
+
|
| 264 |
+
* ✔ step / reset / state API
|
| 265 |
+
* ✔ typed models (Pydantic)
|
| 266 |
+
* ✔ openenv.yaml included
|
| 267 |
+
* ✔ 3 tasks with graders
|
| 268 |
+
* ✔ reward ∈ [0,1]
|
| 269 |
+
* ✔ deterministic evaluation
|
| 270 |
+
|
| 271 |
+
---
|
| 272 |
+
|
| 273 |
+
## 💡 Key Insight
|
| 274 |
+
|
| 275 |
+
This environment models:
|
| 276 |
+
|
| 277 |
+
> Decision-making under uncertainty with partial observability
|
| 278 |
+
|
| 279 |
+
Agents must infer:
|
| 280 |
+
|
| 281 |
+
* when data is stale
|
| 282 |
+
* when to act vs wait
|
| 283 |
+
|
| 284 |
+
---
|
| 285 |
+
|
| 286 |
+
## 🧠 Why This Matters
|
| 287 |
+
|
| 288 |
+
Cache invalidation is considered one of the hardest problems in computer science.
|
| 289 |
+
|
| 290 |
+
This environment provides:
|
| 291 |
+
|
| 292 |
+
* a controlled simulation
|
| 293 |
+
* measurable evaluation
|
| 294 |
+
* realistic constraints
|
| 295 |
+
|
| 296 |
+
---
|
| 297 |
+
|
| 298 |
+
## 📌 Summary
|
| 299 |
+
|
| 300 |
+
* Real-world system problem ✔
|
| 301 |
+
* Multi-step decision making ✔
|
| 302 |
+
* Partial observability ✔
|
| 303 |
+
* Non-trivial reward shaping ✔
|
| 304 |
+
|
| 305 |
+
---
|
| 306 |
+
|
| 307 |
+
## 👤 Author
|
| 308 |
+
|
| 309 |
+
Built for OpenEnv evaluation challenge.
|
app.py
CHANGED
|
@@ -6,8 +6,11 @@ env = CacheEnv()
|
|
| 6 |
|
| 7 |
@app.post("/reset")
|
| 8 |
def reset():
|
| 9 |
-
|
| 10 |
-
|
|
|
|
|
|
|
|
|
|
| 11 |
@app.post("/step")
|
| 12 |
def step(action: dict):
|
| 13 |
return env.step(action)
|
|
|
|
| 6 |
|
| 7 |
@app.post("/reset")
|
| 8 |
def reset():
|
| 9 |
+
state = env.reset()
|
| 10 |
+
return {
|
| 11 |
+
"state": state,
|
| 12 |
+
"task_id": state.get("task_id")
|
| 13 |
+
}
|
| 14 |
@app.post("/step")
|
| 15 |
def step(action: dict):
|
| 16 |
return env.step(action)
|
env/__init__.py
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
# Cache invalidation environment package
|
env/core.py
CHANGED
|
@@ -10,6 +10,7 @@ class CacheEnv:
|
|
| 10 |
self.reset()
|
| 11 |
|
| 12 |
def reset(self):
|
|
|
|
| 13 |
self.task_id = sample_task()
|
| 14 |
items, hidden, current_time = generate_env(self.task_id)
|
| 15 |
|
|
@@ -41,6 +42,11 @@ class CacheEnv:
|
|
| 41 |
age = self.current_time - hidden["last_update"]
|
| 42 |
is_stale = age > hidden["base_ttl"] or random.random() < hidden["update_freq"]
|
| 43 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 44 |
reward = compute_step_reward(action_type, is_stale)
|
| 45 |
self.total_reward += reward
|
| 46 |
|
|
@@ -67,12 +73,12 @@ class CacheEnv:
|
|
| 67 |
self.state["step"] += 1
|
| 68 |
|
| 69 |
done = self.state["step"] >= 10
|
|
|
|
| 70 |
|
| 71 |
if done:
|
| 72 |
-
final_score =
|
| 73 |
else:
|
| 74 |
final_score = None
|
| 75 |
-
|
| 76 |
return {
|
| 77 |
"state": self.state,
|
| 78 |
"reward": reward,
|
|
|
|
| 10 |
self.reset()
|
| 11 |
|
| 12 |
def reset(self):
|
| 13 |
+
self.history = []
|
| 14 |
self.task_id = sample_task()
|
| 15 |
items, hidden, current_time = generate_env(self.task_id)
|
| 16 |
|
|
|
|
| 42 |
age = self.current_time - hidden["last_update"]
|
| 43 |
is_stale = age > hidden["base_ttl"] or random.random() < hidden["update_freq"]
|
| 44 |
|
| 45 |
+
self.history.append({
|
| 46 |
+
"action": action_type,
|
| 47 |
+
"is_stale": is_stale
|
| 48 |
+
})
|
| 49 |
+
|
| 50 |
reward = compute_step_reward(action_type, is_stale)
|
| 51 |
self.total_reward += reward
|
| 52 |
|
|
|
|
| 73 |
self.state["step"] += 1
|
| 74 |
|
| 75 |
done = self.state["step"] >= 10
|
| 76 |
+
from env.grader import evaluate_episode
|
| 77 |
|
| 78 |
if done:
|
| 79 |
+
final_score = evaluate_episode(self.history)
|
| 80 |
else:
|
| 81 |
final_score = None
|
|
|
|
| 82 |
return {
|
| 83 |
"state": self.state,
|
| 84 |
"reward": reward,
|
env/grader.py
CHANGED
|
@@ -15,4 +15,61 @@ def compute_step_reward(action_type, is_stale):
|
|
| 15 |
def normalize_episode_score(total_reward, max_steps=10):
|
| 16 |
# expected max ≈ 1.0 per step
|
| 17 |
score = total_reward / max_steps
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
return max(0.0, min(1.0, score))
|
|
|
|
| 15 |
def normalize_episode_score(total_reward, max_steps=10):
|
| 16 |
# expected max ≈ 1.0 per step
|
| 17 |
score = total_reward / max_steps
|
| 18 |
+
return max(0.0, min(1.0, score))
|
| 19 |
+
|
| 20 |
+
|
| 21 |
+
|
| 22 |
+
def evaluate_episode(history):
|
| 23 |
+
"""
|
| 24 |
+
history = list of:
|
| 25 |
+
{
|
| 26 |
+
"action": str,
|
| 27 |
+
"is_stale": bool
|
| 28 |
+
}
|
| 29 |
+
"""
|
| 30 |
+
|
| 31 |
+
total_steps = len(history)
|
| 32 |
+
|
| 33 |
+
if total_steps == 0:
|
| 34 |
+
return 0.0
|
| 35 |
+
|
| 36 |
+
correct_decisions = 0
|
| 37 |
+
unnecessary_invalidations = 0
|
| 38 |
+
oscillations = 0
|
| 39 |
+
|
| 40 |
+
last_action = None
|
| 41 |
+
|
| 42 |
+
for step in history:
|
| 43 |
+
action = step["action"]
|
| 44 |
+
is_stale = step["is_stale"]
|
| 45 |
+
|
| 46 |
+
# ✅ correctness (freshness proxy)
|
| 47 |
+
if (is_stale and action in ["invalidate", "refresh"]) or \
|
| 48 |
+
(not is_stale and action == "keep"):
|
| 49 |
+
correct_decisions += 1
|
| 50 |
+
|
| 51 |
+
# ❌ unnecessary invalidation
|
| 52 |
+
if action == "invalidate" and not is_stale:
|
| 53 |
+
unnecessary_invalidations += 1
|
| 54 |
+
|
| 55 |
+
# ❌ oscillation (flip behavior)
|
| 56 |
+
if last_action and last_action != action:
|
| 57 |
+
oscillations += 1
|
| 58 |
+
|
| 59 |
+
last_action = action
|
| 60 |
+
|
| 61 |
+
# ---- normalize metrics ----
|
| 62 |
+
freshness = correct_decisions / total_steps
|
| 63 |
+
|
| 64 |
+
efficiency = 1 - (unnecessary_invalidations / total_steps)
|
| 65 |
+
|
| 66 |
+
stability = 1 - (oscillations / total_steps)
|
| 67 |
+
|
| 68 |
+
# ---- weighted score ----
|
| 69 |
+
score = (
|
| 70 |
+
0.5 * freshness +
|
| 71 |
+
0.3 * efficiency +
|
| 72 |
+
0.2 * stability
|
| 73 |
+
)
|
| 74 |
+
|
| 75 |
return max(0.0, min(1.0, score))
|
inference.py
CHANGED
|
@@ -1,38 +1,113 @@
|
|
| 1 |
import os
|
| 2 |
import requests
|
|
|
|
| 3 |
from openai import OpenAI
|
| 4 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
client = OpenAI(
|
| 6 |
base_url=os.getenv("API_BASE_URL"),
|
| 7 |
api_key=os.getenv("HF_TOKEN")
|
| 8 |
)
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 27 |
|
| 28 |
State:
|
| 29 |
-
{json.dumps(state
|
| 30 |
|
| 31 |
-
|
| 32 |
{{"type": "...", "key": "..."}}
|
| 33 |
"""
|
| 34 |
-
|
| 35 |
-
try:
|
| 36 |
response = client.chat.completions.create(
|
| 37 |
model=MODEL,
|
| 38 |
messages=[{"role": "user", "content": prompt}],
|
|
@@ -40,51 +115,64 @@ Respond ONLY with valid JSON:
|
|
| 40 |
)
|
| 41 |
|
| 42 |
text = response.choices[0].message.content.strip()
|
| 43 |
-
|
| 44 |
action = json.loads(text)
|
| 45 |
|
| 46 |
-
# basic validation
|
| 47 |
if "type" in action and "key" in action:
|
| 48 |
return action
|
| 49 |
-
|
| 50 |
except:
|
| 51 |
pass
|
| 52 |
|
| 53 |
-
|
| 54 |
-
item = state["items"][0]
|
| 55 |
-
|
| 56 |
-
if item["last_result"] == "stale":
|
| 57 |
-
return {"type": "invalidate", "key": item["key"]}
|
| 58 |
-
|
| 59 |
-
if item["age"] > 5:
|
| 60 |
-
return {"type": "refresh", "key": item["key"]}
|
| 61 |
-
|
| 62 |
-
return {"type": "keep", "key": item["key"]}
|
| 63 |
|
|
|
|
| 64 |
def run():
|
| 65 |
res = requests.post(f"{ENV_URL}/reset").json()
|
| 66 |
|
|
|
|
|
|
|
| 67 |
task_id = res.get("task_id", "unknown")
|
| 68 |
-
print(f"[START] task_id={task_id}")
|
| 69 |
-
|
| 70 |
-
state = res
|
| 71 |
total_reward = 0
|
|
|
|
|
|
|
|
|
|
|
|
|
| 72 |
|
| 73 |
-
for
|
| 74 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 75 |
|
| 76 |
step_res = requests.post(f"{ENV_URL}/step", json=action).json()
|
| 77 |
|
| 78 |
reward = step_res["reward"]
|
| 79 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 80 |
|
| 81 |
-
print(f"[STEP] action={action} reward={reward}")
|
| 82 |
state = step_res["state"]
|
| 83 |
|
| 84 |
-
if
|
| 85 |
break
|
| 86 |
|
| 87 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 88 |
|
| 89 |
|
| 90 |
if __name__ == "__main__":
|
|
|
|
| 1 |
import os
|
| 2 |
import requests
|
| 3 |
+
import json
|
| 4 |
from openai import OpenAI
|
| 5 |
|
| 6 |
+
# ---- CONFIG ----
|
| 7 |
+
API_BASE = os.getenv("API_BASE_URL")
|
| 8 |
+
API_KEY = os.getenv("OPENAI_API_KEY")
|
| 9 |
+
MODEL = os.getenv("MODEL_NAME", "gpt-4o-mini")
|
| 10 |
+
|
| 11 |
+
ENV_URL = "https://parvpareek-cache-env.hf.space"
|
| 12 |
+
|
| 13 |
client = OpenAI(
|
| 14 |
base_url=os.getenv("API_BASE_URL"),
|
| 15 |
api_key=os.getenv("HF_TOKEN")
|
| 16 |
)
|
| 17 |
+
# ---- MEMORY ----
|
| 18 |
+
MEMORY = {}
|
| 19 |
+
|
| 20 |
+
# ---- ITEM SELECTION ----
|
| 21 |
+
LAST_USED = None
|
| 22 |
+
|
| 23 |
+
def log_start(task, env, model):
|
| 24 |
+
print(f"[START] task={task} env={env} model={model}", flush=True)
|
| 25 |
+
|
| 26 |
+
|
| 27 |
+
def log_step(step, action, reward, done, error):
|
| 28 |
+
error_val = error if error else "null"
|
| 29 |
+
print(
|
| 30 |
+
f"[STEP] step={step} action={action} reward={reward:.2f} done={str(done).lower()} error={error_val}",
|
| 31 |
+
flush=True,
|
| 32 |
+
)
|
| 33 |
+
|
| 34 |
+
|
| 35 |
+
def log_end(success, steps, rewards):
|
| 36 |
+
rewards_str = ",".join(f"{r:.2f}" for r in rewards)
|
| 37 |
+
print(
|
| 38 |
+
f"[END] success={str(success).lower()} steps={steps} rewards={rewards_str}",
|
| 39 |
+
flush=True,
|
| 40 |
+
)
|
| 41 |
+
|
| 42 |
+
def select_item(state, step):
|
| 43 |
+
global LAST_USED
|
| 44 |
+
items = state["items"]
|
| 45 |
+
|
| 46 |
+
def score(item):
|
| 47 |
+
s = 0
|
| 48 |
+
if item["last_result"] == "stale":
|
| 49 |
+
s += 3
|
| 50 |
+
if item["age"] > 5:
|
| 51 |
+
s += 2
|
| 52 |
+
if item["access_count"] > 10:
|
| 53 |
+
s += 1
|
| 54 |
+
return s
|
| 55 |
+
|
| 56 |
+
# best candidate
|
| 57 |
+
best = max(items, key=score)
|
| 58 |
+
|
| 59 |
+
# 🧠 exploration every 2 steps
|
| 60 |
+
if step % 2 == 1:
|
| 61 |
+
for item in items:
|
| 62 |
+
if item["key"] != LAST_USED:
|
| 63 |
+
LAST_USED = item["key"]
|
| 64 |
+
return item
|
| 65 |
+
|
| 66 |
+
LAST_USED = best["key"]
|
| 67 |
+
return best
|
| 68 |
+
|
| 69 |
+
# ---- DECISION POLICY ----
|
| 70 |
+
def decide(item, step):
|
| 71 |
+
key = item["key"]
|
| 72 |
+
last_result = item["last_result"]
|
| 73 |
+
age = item["age"]
|
| 74 |
+
|
| 75 |
+
mem = MEMORY.get(key, {})
|
| 76 |
+
|
| 77 |
+
# 🚫 cooldown after invalidate
|
| 78 |
+
if mem.get("last_action") == "invalidate" and step - mem.get("last_step", -10) < 2:
|
| 79 |
+
return {"type": "keep", "key": key}
|
| 80 |
+
|
| 81 |
+
# strong signal
|
| 82 |
+
if last_result == "stale" and age > 2:
|
| 83 |
+
return {"type": "invalidate", "key": key}
|
| 84 |
+
|
| 85 |
+
# uncertainty zone
|
| 86 |
+
if 3 <= age <= 6:
|
| 87 |
+
return {"type": "refresh", "key": key}
|
| 88 |
+
|
| 89 |
+
# safe zone
|
| 90 |
+
if last_result == "hit" and age < 3:
|
| 91 |
+
return {"type": "keep", "key": key}
|
| 92 |
+
|
| 93 |
+
# fallback
|
| 94 |
+
if age > 6:
|
| 95 |
+
return {"type": "refresh", "key": key}
|
| 96 |
+
|
| 97 |
+
return {"type": "keep", "key": key}
|
| 98 |
+
|
| 99 |
+
# ---- OPTIONAL LLM ASSIST (SAFE) ----
|
| 100 |
+
def llm_assist(state):
|
| 101 |
+
try:
|
| 102 |
+
prompt = f"""
|
| 103 |
+
You are a cache invalidation agent.
|
| 104 |
|
| 105 |
State:
|
| 106 |
+
{json.dumps(state)}
|
| 107 |
|
| 108 |
+
Return JSON:
|
| 109 |
{{"type": "...", "key": "..."}}
|
| 110 |
"""
|
|
|
|
|
|
|
| 111 |
response = client.chat.completions.create(
|
| 112 |
model=MODEL,
|
| 113 |
messages=[{"role": "user", "content": prompt}],
|
|
|
|
| 115 |
)
|
| 116 |
|
| 117 |
text = response.choices[0].message.content.strip()
|
|
|
|
| 118 |
action = json.loads(text)
|
| 119 |
|
|
|
|
| 120 |
if "type" in action and "key" in action:
|
| 121 |
return action
|
|
|
|
| 122 |
except:
|
| 123 |
pass
|
| 124 |
|
| 125 |
+
return None
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 126 |
|
| 127 |
+
# ---- MAIN LOOP ----
|
| 128 |
def run():
|
| 129 |
res = requests.post(f"{ENV_URL}/reset").json()
|
| 130 |
|
| 131 |
+
# handle wrapped state (important fix)
|
| 132 |
+
state = res.get("state", res)
|
| 133 |
task_id = res.get("task_id", "unknown")
|
|
|
|
|
|
|
|
|
|
| 134 |
total_reward = 0
|
| 135 |
+
rewards = []
|
| 136 |
+
steps_taken = 0
|
| 137 |
+
|
| 138 |
+
log_start(task_id, "cache_env", MODEL)
|
| 139 |
|
| 140 |
+
for step in range(1, 11):
|
| 141 |
+
item = select_item(state, step)
|
| 142 |
+
action = decide(item, step)
|
| 143 |
+
|
| 144 |
+
MEMORY[item["key"]] = {
|
| 145 |
+
"last_action": action["type"],
|
| 146 |
+
"last_step": step
|
| 147 |
+
}
|
| 148 |
|
| 149 |
step_res = requests.post(f"{ENV_URL}/step", json=action).json()
|
| 150 |
|
| 151 |
reward = step_res["reward"]
|
| 152 |
+
done = step_res["done"]
|
| 153 |
+
|
| 154 |
+
rewards.append(reward)
|
| 155 |
+
steps_taken = step
|
| 156 |
+
|
| 157 |
+
log_step(
|
| 158 |
+
step=step,
|
| 159 |
+
action=json.dumps(action),
|
| 160 |
+
reward=reward,
|
| 161 |
+
done=done,
|
| 162 |
+
error=None
|
| 163 |
+
)
|
| 164 |
|
|
|
|
| 165 |
state = step_res["state"]
|
| 166 |
|
| 167 |
+
if done:
|
| 168 |
break
|
| 169 |
|
| 170 |
+
# success criteria
|
| 171 |
+
avg_reward = sum(rewards) / len(rewards)
|
| 172 |
+
success = avg_reward > 0.3
|
| 173 |
+
|
| 174 |
+
log_end(success, steps_taken, rewards)
|
| 175 |
+
|
| 176 |
|
| 177 |
|
| 178 |
if __name__ == "__main__":
|
pyproject.toml
ADDED
|
@@ -0,0 +1,24 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
[build-system]
|
| 2 |
+
requires = ["setuptools>=61", "wheel"]
|
| 3 |
+
build-backend = "setuptools.build_meta"
|
| 4 |
+
|
| 5 |
+
[project]
|
| 6 |
+
name = "cache-invalidation-env"
|
| 7 |
+
version = "0.1.0"
|
| 8 |
+
description = "Cache invalidation decision environment for OpenEnv"
|
| 9 |
+
requires-python = ">=3.10"
|
| 10 |
+
dependencies = [
|
| 11 |
+
"openenv-core[core]>=0.2.2",
|
| 12 |
+
"fastapi>=0.100.0",
|
| 13 |
+
"uvicorn[standard]>=0.22.0",
|
| 14 |
+
"pydantic>=2.0.0",
|
| 15 |
+
"requests>=2.28.0",
|
| 16 |
+
"openai>=1.0.0",
|
| 17 |
+
]
|
| 18 |
+
|
| 19 |
+
[project.scripts]
|
| 20 |
+
server = "server.app:main"
|
| 21 |
+
|
| 22 |
+
[tool.setuptools.packages.find]
|
| 23 |
+
where = ["."]
|
| 24 |
+
include = ["env*", "server*"]
|
requirements.txt
CHANGED
|
@@ -2,4 +2,5 @@ fastapi
|
|
| 2 |
uvicorn
|
| 3 |
pydantic
|
| 4 |
requests
|
| 5 |
-
openai
|
|
|
|
|
|
| 2 |
uvicorn
|
| 3 |
pydantic
|
| 4 |
requests
|
| 5 |
+
openai
|
| 6 |
+
openenv-core
|
server/__init__.py
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
# OpenEnv HTTP server package
|
server/app.py
ADDED
|
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""OpenEnv entry: validator requires server/app.py with def main(...) and if __name__ + main()."""
|
| 2 |
+
|
| 3 |
+
import uvicorn
|
| 4 |
+
|
| 5 |
+
|
| 6 |
+
def main(host: str = "0.0.0.0", port: int = 7860):
|
| 7 |
+
from app import app as fastapi_app
|
| 8 |
+
|
| 9 |
+
uvicorn.run(fastapi_app, host=host, port=port)
|
| 10 |
+
|
| 11 |
+
|
| 12 |
+
if __name__ == "__main__":
|
| 13 |
+
main()
|
uv.lock
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
validate-submission.sh
ADDED
|
@@ -0,0 +1,191 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env bash
|
| 2 |
+
#
|
| 3 |
+
# validate-submission.sh — OpenEnv Submission Validator
|
| 4 |
+
#
|
| 5 |
+
# Checks that your HF Space is live, Docker image builds, and openenv validate passes.
|
| 6 |
+
#
|
| 7 |
+
# Prerequisites:
|
| 8 |
+
# - Docker: https://docs.docker.com/get-docker/
|
| 9 |
+
# - openenv-core: pip install openenv-core
|
| 10 |
+
# - curl (usually pre-installed)
|
| 11 |
+
#
|
| 12 |
+
# Run:
|
| 13 |
+
# curl -fsSL https://raw.githubusercontent.com/<owner>/<repo>/main/scripts/validate-submission.sh | bash -s -- <ping_url> [repo_dir]
|
| 14 |
+
#
|
| 15 |
+
# Or download and run locally:
|
| 16 |
+
# chmod +x validate-submission.sh
|
| 17 |
+
# ./validate-submission.sh <ping_url> [repo_dir]
|
| 18 |
+
#
|
| 19 |
+
# Arguments:
|
| 20 |
+
# ping_url Your HuggingFace Space URL (e.g. https://your-space.hf.space)
|
| 21 |
+
# repo_dir Path to your repo (default: current directory)
|
| 22 |
+
#
|
| 23 |
+
# Examples:
|
| 24 |
+
# ./validate-submission.sh https://my-team.hf.space
|
| 25 |
+
# ./validate-submission.sh https://my-team.hf.space ./my-repo
|
| 26 |
+
#
|
| 27 |
+
|
| 28 |
+
set -uo pipefail
|
| 29 |
+
|
| 30 |
+
DOCKER_BUILD_TIMEOUT=600
|
| 31 |
+
if [ -t 1 ]; then
|
| 32 |
+
RED='\033[0;31m'
|
| 33 |
+
GREEN='\033[0;32m'
|
| 34 |
+
YELLOW='\033[1;33m'
|
| 35 |
+
BOLD='\033[1m'
|
| 36 |
+
NC='\033[0m'
|
| 37 |
+
else
|
| 38 |
+
RED='' GREEN='' YELLOW='' BOLD='' NC=''
|
| 39 |
+
fi
|
| 40 |
+
|
| 41 |
+
run_with_timeout() {
|
| 42 |
+
local secs="$1"; shift
|
| 43 |
+
if command -v timeout &>/dev/null; then
|
| 44 |
+
timeout "$secs" "$@"
|
| 45 |
+
elif command -v gtimeout &>/dev/null; then
|
| 46 |
+
gtimeout "$secs" "$@"
|
| 47 |
+
else
|
| 48 |
+
"$@" &
|
| 49 |
+
local pid=$!
|
| 50 |
+
( sleep "$secs" && kill "$pid" 2>/dev/null ) &
|
| 51 |
+
local watcher=$!
|
| 52 |
+
wait "$pid" 2>/dev/null
|
| 53 |
+
local rc=$?
|
| 54 |
+
kill "$watcher" 2>/dev/null
|
| 55 |
+
wait "$watcher" 2>/dev/null
|
| 56 |
+
return $rc
|
| 57 |
+
fi
|
| 58 |
+
}
|
| 59 |
+
|
| 60 |
+
portable_mktemp() {
|
| 61 |
+
local prefix="${1:-validate}"
|
| 62 |
+
mktemp "${TMPDIR:-/tmp}/${prefix}-XXXXXX" 2>/dev/null || mktemp
|
| 63 |
+
}
|
| 64 |
+
|
| 65 |
+
CLEANUP_FILES=()
|
| 66 |
+
cleanup() { rm -f "${CLEANUP_FILES[@]+"${CLEANUP_FILES[@]}"}"; }
|
| 67 |
+
trap cleanup EXIT
|
| 68 |
+
|
| 69 |
+
PING_URL="${1:-}"
|
| 70 |
+
REPO_DIR="${2:-.}"
|
| 71 |
+
|
| 72 |
+
if [ -z "$PING_URL" ]; then
|
| 73 |
+
printf "Usage: %s <ping_url> [repo_dir]\n" "$0"
|
| 74 |
+
printf "\n"
|
| 75 |
+
printf " ping_url Your HuggingFace Space URL (e.g. https://your-space.hf.space)\n"
|
| 76 |
+
printf " repo_dir Path to your repo (default: current directory)\n"
|
| 77 |
+
exit 1
|
| 78 |
+
fi
|
| 79 |
+
|
| 80 |
+
if ! REPO_DIR="$(cd "$REPO_DIR" 2>/dev/null && pwd)"; then
|
| 81 |
+
printf "Error: directory '%s' not found\n" "${2:-.}"
|
| 82 |
+
exit 1
|
| 83 |
+
fi
|
| 84 |
+
PING_URL="${PING_URL%/}"
|
| 85 |
+
export PING_URL
|
| 86 |
+
PASS=0
|
| 87 |
+
|
| 88 |
+
log() { printf "[%s] %b\n" "$(date -u +%H:%M:%S)" "$*"; }
|
| 89 |
+
pass() { log "${GREEN}PASSED${NC} -- $1"; PASS=$((PASS + 1)); }
|
| 90 |
+
fail() { log "${RED}FAILED${NC} -- $1"; }
|
| 91 |
+
hint() { printf " ${YELLOW}Hint:${NC} %b\n" "$1"; }
|
| 92 |
+
stop_at() {
|
| 93 |
+
printf "\n"
|
| 94 |
+
printf "${RED}${BOLD}Validation stopped at %s.${NC} Fix the above before continuing.\n" "$1"
|
| 95 |
+
exit 1
|
| 96 |
+
}
|
| 97 |
+
|
| 98 |
+
printf "\n"
|
| 99 |
+
printf "${BOLD}========================================${NC}\n"
|
| 100 |
+
printf "${BOLD} OpenEnv Submission Validator${NC}\n"
|
| 101 |
+
printf "${BOLD}========================================${NC}\n"
|
| 102 |
+
log "Repo: $REPO_DIR"
|
| 103 |
+
log "Ping URL: $PING_URL"
|
| 104 |
+
printf "\n"
|
| 105 |
+
|
| 106 |
+
log "${BOLD}Step 1/3: Pinging HF Space${NC} ($PING_URL/reset) ..."
|
| 107 |
+
|
| 108 |
+
CURL_OUTPUT=$(portable_mktemp "validate-curl")
|
| 109 |
+
CLEANUP_FILES+=("$CURL_OUTPUT")
|
| 110 |
+
HTTP_CODE=$(curl -s -o "$CURL_OUTPUT" -w "%{http_code}" -X POST \
|
| 111 |
+
-H "Content-Type: application/json" -d '{}' \
|
| 112 |
+
"$PING_URL/reset" --max-time 30 2>"$CURL_OUTPUT" || printf "000")
|
| 113 |
+
|
| 114 |
+
if [ "$HTTP_CODE" = "200" ]; then
|
| 115 |
+
pass "HF Space is live and responds to /reset"
|
| 116 |
+
elif [ "$HTTP_CODE" = "000" ]; then
|
| 117 |
+
fail "HF Space not reachable (connection failed or timed out)"
|
| 118 |
+
hint "Check your network connection and that the Space is running."
|
| 119 |
+
hint "Try: curl -s -o /dev/null -w '%%{http_code}' -X POST $PING_URL/reset"
|
| 120 |
+
stop_at "Step 1"
|
| 121 |
+
else
|
| 122 |
+
fail "HF Space /reset returned HTTP $HTTP_CODE (expected 200)"
|
| 123 |
+
hint "Make sure your Space is running and the URL is correct."
|
| 124 |
+
hint "Try opening $PING_URL in your browser first."
|
| 125 |
+
stop_at "Step 1"
|
| 126 |
+
fi
|
| 127 |
+
|
| 128 |
+
log "${BOLD}Step 2/3: Running docker build${NC} ..."
|
| 129 |
+
|
| 130 |
+
if ! command -v docker &>/dev/null; then
|
| 131 |
+
fail "docker command not found"
|
| 132 |
+
hint "Install Docker: https://docs.docker.com/get-docker/"
|
| 133 |
+
stop_at "Step 2"
|
| 134 |
+
fi
|
| 135 |
+
|
| 136 |
+
if [ -f "$REPO_DIR/Dockerfile" ]; then
|
| 137 |
+
DOCKER_CONTEXT="$REPO_DIR"
|
| 138 |
+
elif [ -f "$REPO_DIR/server/Dockerfile" ]; then
|
| 139 |
+
DOCKER_CONTEXT="$REPO_DIR/server"
|
| 140 |
+
else
|
| 141 |
+
fail "No Dockerfile found in repo root or server/ directory"
|
| 142 |
+
stop_at "Step 2"
|
| 143 |
+
fi
|
| 144 |
+
|
| 145 |
+
log " Found Dockerfile in $DOCKER_CONTEXT"
|
| 146 |
+
|
| 147 |
+
BUILD_LOG=$(portable_mktemp "validate-docker")
|
| 148 |
+
CLEANUP_FILES+=("$BUILD_LOG")
|
| 149 |
+
BUILD_OK=false
|
| 150 |
+
# Plain progress: BuildKit's default UI can block or buffer when stderr is not a TTY (e.g. $(...)),
|
| 151 |
+
# which makes Step 2 look hung; writing to a file avoids that.
|
| 152 |
+
if run_with_timeout "$DOCKER_BUILD_TIMEOUT" env DOCKER_BUILDKIT=1 docker build --progress=plain "$DOCKER_CONTEXT" >"$BUILD_LOG" 2>&1; then
|
| 153 |
+
BUILD_OK=true
|
| 154 |
+
fi
|
| 155 |
+
|
| 156 |
+
if [ "$BUILD_OK" = true ]; then
|
| 157 |
+
pass "Docker build succeeded"
|
| 158 |
+
else
|
| 159 |
+
fail "Docker build failed (timeout=${DOCKER_BUILD_TIMEOUT}s)"
|
| 160 |
+
tail -20 "$BUILD_LOG" 2>/dev/null || true
|
| 161 |
+
stop_at "Step 2"
|
| 162 |
+
fi
|
| 163 |
+
|
| 164 |
+
log "${BOLD}Step 3/3: Running openenv validate${NC} ..."
|
| 165 |
+
|
| 166 |
+
if ! command -v openenv &>/dev/null; then
|
| 167 |
+
fail "openenv command not found"
|
| 168 |
+
hint "Install it: pip install openenv-core"
|
| 169 |
+
stop_at "Step 3"
|
| 170 |
+
fi
|
| 171 |
+
|
| 172 |
+
VALIDATE_OK=false
|
| 173 |
+
VALIDATE_OUTPUT=$(cd "$REPO_DIR" && openenv validate 2>&1) && VALIDATE_OK=true
|
| 174 |
+
|
| 175 |
+
if [ "$VALIDATE_OK" = true ]; then
|
| 176 |
+
pass "openenv validate passed"
|
| 177 |
+
[ -n "$VALIDATE_OUTPUT" ] && log " $VALIDATE_OUTPUT"
|
| 178 |
+
else
|
| 179 |
+
fail "openenv validate failed"
|
| 180 |
+
printf "%s\n" "$VALIDATE_OUTPUT"
|
| 181 |
+
stop_at "Step 3"
|
| 182 |
+
fi
|
| 183 |
+
|
| 184 |
+
printf "\n"
|
| 185 |
+
printf "${BOLD}========================================${NC}\n"
|
| 186 |
+
printf "${GREEN}${BOLD} All 3/3 checks passed!${NC}\n"
|
| 187 |
+
printf "${GREEN}${BOLD} Your submission is ready to submit.${NC}\n"
|
| 188 |
+
printf "${BOLD}========================================${NC}\n"
|
| 189 |
+
printf "\n"
|
| 190 |
+
|
| 191 |
+
exit 0
|