Parv Pareek commited on
Commit
4f8cf04
·
1 Parent(s): 748aaa7

fix: projet structure

Browse files
Files changed (15) hide show
  1. .dockerignore +16 -0
  2. .gitignore +12 -0
  3. Dockerfile +6 -3
  4. README.md +299 -0
  5. app.py +5 -2
  6. env/__init__.py +1 -0
  7. env/core.py +8 -2
  8. env/grader.py +57 -0
  9. inference.py +132 -44
  10. pyproject.toml +24 -0
  11. requirements.txt +2 -1
  12. server/__init__.py +1 -0
  13. server/app.py +13 -0
  14. uv.lock +0 -0
  15. validate-submission.sh +191 -0
.dockerignore ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Shrink build context (without this, COPY . . sends .git and stalls Step 2 locally)
2
+ .git
3
+ .gitattributes
4
+ **/__pycache__
5
+ **/*.py[cod]
6
+ **/.pytest_cache
7
+ .venv
8
+ venv
9
+ .env
10
+ .env.*
11
+ *.egg-info
12
+ .eggs
13
+ dist
14
+ build
15
+ state.json
16
+ .DS_Store
.gitignore ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Secrets — never commit (use HF Space secrets / CI env vars)
2
+ .env
3
+ .env.*
4
+
5
+ # Local run artifacts
6
+ state.json
7
+
8
+ # Python
9
+ __pycache__/
10
+ *.py[cod]
11
+ .venv/
12
+ venv/
Dockerfile CHANGED
@@ -1,8 +1,11 @@
1
- FROM python:3.10
2
 
3
  WORKDIR /app
4
- COPY . .
5
 
6
- RUN pip install -r requirements.txt
 
 
 
 
7
 
8
  CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860"]
 
1
+ FROM python:3.10-slim
2
 
3
  WORKDIR /app
 
4
 
5
+ # Install deps before copying full tree (faster rebuilds, smaller COPY layer)
6
+ COPY requirements.txt .
7
+ RUN pip install --no-cache-dir -r requirements.txt
8
+
9
+ COPY . .
10
 
11
  CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860"]
README.md CHANGED
@@ -8,3 +8,302 @@ pinned: false
8
  ---
9
 
10
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  ---
9
 
10
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
11
+
12
+
13
+ # 🧠 Cache Invalidation Environment (OpenEnv)
14
+
15
+ ## 📌 Overview
16
+
17
+ This project implements a **real-world cache invalidation decision environment** using the OpenEnv specification.
18
+
19
+ Cache invalidation is a fundamental systems problem: deciding **when to refresh cached data vs reuse it**. Acting too early wastes resources, while acting too late serves stale data.
20
+
21
+ This environment simulates that tradeoff under **uncertainty and noisy signals**, allowing evaluation of agent decision-making.
22
+
23
+ ---
24
+
25
+ ## 🎯 Motivation
26
+
27
+ Cache invalidation is widely used in:
28
+
29
+ * Distributed systems
30
+ * Web backends
31
+ * CDNs and edge caching
32
+ * Databases
33
+
34
+ This environment models a **practical decision problem engineers face daily**, making it useful for evaluating reasoning-based agents.
35
+
36
+ ---
37
+
38
+ ## 🧩 Environment Design
39
+
40
+ ### State (Observation)
41
+
42
+ Each step returns:
43
+
44
+ ```json
45
+ {
46
+ "items": [
47
+ {
48
+ "key": "item_0",
49
+ "age": 5,
50
+ "access_count": 12,
51
+ "last_result": "hit"
52
+ }
53
+ ],
54
+ "step": 3,
55
+ "task_id": "medium"
56
+ }
57
+ ```
58
+
59
+ #### Field meanings:
60
+
61
+ * `age`: time since last refresh
62
+ * `access_count`: usage frequency
63
+ * `last_result`: "hit" or "stale" (noisy signal)
64
+ * `task_id`: difficulty level
65
+
66
+ ---
67
+
68
+ ### Actions
69
+
70
+ Agent must return:
71
+
72
+ ```json
73
+ {
74
+ "type": "invalidate | refresh | keep",
75
+ "key": "item_id"
76
+ }
77
+ ```
78
+
79
+ #### Action meanings:
80
+
81
+ * `invalidate`: reset cache (high cost, correct if stale)
82
+ * `refresh`: partial reset (safe but weaker)
83
+ * `keep`: do nothing (efficient if data is fresh)
84
+
85
+ ---
86
+
87
+ ### Hidden Dynamics
88
+
89
+ The true cache state is **not directly observable**.
90
+
91
+ Staleness depends on:
92
+
93
+ * base TTL
94
+ * update frequency
95
+ * time since last update
96
+
97
+ Observations are **noisy**, requiring inference.
98
+
99
+ ---
100
+
101
+ ## 🎯 Tasks
102
+
103
+ Three tasks with increasing difficulty:
104
+
105
+ ### 🟢 Easy
106
+
107
+ * Few items
108
+ * Low volatility
109
+ * Clear signals
110
+
111
+ ### 🟡 Medium
112
+
113
+ * Moderate noise
114
+ * Conflicting signals
115
+ * Requires reasoning
116
+
117
+ ### 🔴 Hard
118
+
119
+ * High volatility
120
+ * Frequent updates
121
+ * Misleading signals
122
+
123
+ ---
124
+
125
+ ## 🏆 Reward Function
126
+
127
+ Reward is given at every step:
128
+
129
+ | Action | Correct Case | Reward |
130
+ | ---------- | ------------ | ------ |
131
+ | invalidate | stale | +1.0 |
132
+ | invalidate | fresh | -0.5 |
133
+ | keep | fresh | +0.8 |
134
+ | keep | stale | -0.6 |
135
+ | refresh | stale | +0.6 |
136
+ | refresh | fresh | +0.2 |
137
+
138
+ This provides:
139
+
140
+ * dense feedback
141
+ * partial credit
142
+ * penalty for poor decisions
143
+
144
+ ---
145
+
146
+ ## 📊 Episode
147
+
148
+ * Fixed length: 10 steps
149
+ * Final score: average reward (normalized to [0,1])
150
+
151
+ ---
152
+
153
+ ## 🤖 Baseline Agent
154
+
155
+ The baseline agent uses:
156
+
157
+ * heuristic decision policy
158
+ * short-term memory (to avoid repeated mistakes)
159
+ * optional LLM reasoning
160
+
161
+ ### Example score
162
+
163
+ | Task | Score |
164
+ | ------ | -------- |
165
+ | Easy | ~4.5–6.5 |
166
+ | Medium | ~3.5–5.5 |
167
+ | Hard | ~2.5–4.5 |
168
+
169
+ ---
170
+
171
+ ## 🚀 Running the Environment
172
+
173
+ ### 1. Local
174
+
175
+ ```bash
176
+ pip install -r requirements.txt
177
+ uvicorn app:app --reload
178
+ ```
179
+
180
+ ---
181
+
182
+ ### 2. API Endpoints
183
+
184
+ #### Reset
185
+
186
+ ```bash
187
+ curl -X POST http://localhost:8000/reset
188
+ ```
189
+
190
+ #### Step
191
+
192
+ ```bash
193
+ curl -X POST http://localhost:8000/step \
194
+ -H "Content-Type: application/json" \
195
+ -d '{"type":"keep","key":"item_0"}'
196
+ ```
197
+
198
+ #### State
199
+
200
+ ```bash
201
+ curl http://localhost:8000/state
202
+ ```
203
+
204
+ ---
205
+
206
+ ## 🤗 Hugging Face Deployment
207
+
208
+ Live endpoint:
209
+
210
+ ```
211
+ https://parvpareek-cache-env.hf.space
212
+ ```
213
+
214
+ Test:
215
+
216
+ ```bash
217
+ curl -X POST https://parvpareek-cache-env.hf.space/reset
218
+ ```
219
+
220
+ ---
221
+
222
+ ## 🐳 Docker
223
+
224
+ ```bash
225
+ docker build -t cache-env .
226
+ docker run -p 7860:7860 cache-env
227
+ ```
228
+
229
+ ---
230
+
231
+ ## ⚙️ Environment Variables
232
+
233
+ Required for inference:
234
+
235
+ ```bash
236
+ API_BASE_URL=<llm_endpoint>
237
+ MODEL_NAME=<model_name>
238
+ HF_TOKEN=<api_key>
239
+ ```
240
+
241
+ ---
242
+
243
+ ## 📁 Project Structure
244
+
245
+ ```
246
+ .
247
+ ├── app.py
248
+ ├── env/
249
+ │ ├── core.py
250
+ │ ├── generator.py
251
+ │ ├── grader.py
252
+ │ ├── models.py
253
+ │ └── tasks.py
254
+ ├── inference.py
255
+ ├── openenv.yaml
256
+ ├── Dockerfile
257
+ └── README.md
258
+ ```
259
+
260
+ ---
261
+
262
+ ## ✅ OpenEnv Compliance
263
+
264
+ * ✔ step / reset / state API
265
+ * ✔ typed models (Pydantic)
266
+ * ✔ openenv.yaml included
267
+ * ✔ 3 tasks with graders
268
+ * ✔ reward ∈ [0,1]
269
+ * ✔ deterministic evaluation
270
+
271
+ ---
272
+
273
+ ## 💡 Key Insight
274
+
275
+ This environment models:
276
+
277
+ > Decision-making under uncertainty with partial observability
278
+
279
+ Agents must infer:
280
+
281
+ * when data is stale
282
+ * when to act vs wait
283
+
284
+ ---
285
+
286
+ ## 🧠 Why This Matters
287
+
288
+ Cache invalidation is considered one of the hardest problems in computer science.
289
+
290
+ This environment provides:
291
+
292
+ * a controlled simulation
293
+ * measurable evaluation
294
+ * realistic constraints
295
+
296
+ ---
297
+
298
+ ## 📌 Summary
299
+
300
+ * Real-world system problem ✔
301
+ * Multi-step decision making ✔
302
+ * Partial observability ✔
303
+ * Non-trivial reward shaping ✔
304
+
305
+ ---
306
+
307
+ ## 👤 Author
308
+
309
+ Built for OpenEnv evaluation challenge.
app.py CHANGED
@@ -6,8 +6,11 @@ env = CacheEnv()
6
 
7
  @app.post("/reset")
8
  def reset():
9
- return env.reset()
10
-
 
 
 
11
  @app.post("/step")
12
  def step(action: dict):
13
  return env.step(action)
 
6
 
7
  @app.post("/reset")
8
  def reset():
9
+ state = env.reset()
10
+ return {
11
+ "state": state,
12
+ "task_id": state.get("task_id")
13
+ }
14
  @app.post("/step")
15
  def step(action: dict):
16
  return env.step(action)
env/__init__.py ADDED
@@ -0,0 +1 @@
 
 
1
+ # Cache invalidation environment package
env/core.py CHANGED
@@ -10,6 +10,7 @@ class CacheEnv:
10
  self.reset()
11
 
12
  def reset(self):
 
13
  self.task_id = sample_task()
14
  items, hidden, current_time = generate_env(self.task_id)
15
 
@@ -41,6 +42,11 @@ class CacheEnv:
41
  age = self.current_time - hidden["last_update"]
42
  is_stale = age > hidden["base_ttl"] or random.random() < hidden["update_freq"]
43
 
 
 
 
 
 
44
  reward = compute_step_reward(action_type, is_stale)
45
  self.total_reward += reward
46
 
@@ -67,12 +73,12 @@ class CacheEnv:
67
  self.state["step"] += 1
68
 
69
  done = self.state["step"] >= 10
 
70
 
71
  if done:
72
- final_score = normalize_episode_score(self.total_reward)
73
  else:
74
  final_score = None
75
-
76
  return {
77
  "state": self.state,
78
  "reward": reward,
 
10
  self.reset()
11
 
12
  def reset(self):
13
+ self.history = []
14
  self.task_id = sample_task()
15
  items, hidden, current_time = generate_env(self.task_id)
16
 
 
42
  age = self.current_time - hidden["last_update"]
43
  is_stale = age > hidden["base_ttl"] or random.random() < hidden["update_freq"]
44
 
45
+ self.history.append({
46
+ "action": action_type,
47
+ "is_stale": is_stale
48
+ })
49
+
50
  reward = compute_step_reward(action_type, is_stale)
51
  self.total_reward += reward
52
 
 
73
  self.state["step"] += 1
74
 
75
  done = self.state["step"] >= 10
76
+ from env.grader import evaluate_episode
77
 
78
  if done:
79
+ final_score = evaluate_episode(self.history)
80
  else:
81
  final_score = None
 
82
  return {
83
  "state": self.state,
84
  "reward": reward,
env/grader.py CHANGED
@@ -15,4 +15,61 @@ def compute_step_reward(action_type, is_stale):
15
  def normalize_episode_score(total_reward, max_steps=10):
16
  # expected max ≈ 1.0 per step
17
  score = total_reward / max_steps
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  return max(0.0, min(1.0, score))
 
15
  def normalize_episode_score(total_reward, max_steps=10):
16
  # expected max ≈ 1.0 per step
17
  score = total_reward / max_steps
18
+ return max(0.0, min(1.0, score))
19
+
20
+
21
+
22
+ def evaluate_episode(history):
23
+ """
24
+ history = list of:
25
+ {
26
+ "action": str,
27
+ "is_stale": bool
28
+ }
29
+ """
30
+
31
+ total_steps = len(history)
32
+
33
+ if total_steps == 0:
34
+ return 0.0
35
+
36
+ correct_decisions = 0
37
+ unnecessary_invalidations = 0
38
+ oscillations = 0
39
+
40
+ last_action = None
41
+
42
+ for step in history:
43
+ action = step["action"]
44
+ is_stale = step["is_stale"]
45
+
46
+ # ✅ correctness (freshness proxy)
47
+ if (is_stale and action in ["invalidate", "refresh"]) or \
48
+ (not is_stale and action == "keep"):
49
+ correct_decisions += 1
50
+
51
+ # ❌ unnecessary invalidation
52
+ if action == "invalidate" and not is_stale:
53
+ unnecessary_invalidations += 1
54
+
55
+ # ❌ oscillation (flip behavior)
56
+ if last_action and last_action != action:
57
+ oscillations += 1
58
+
59
+ last_action = action
60
+
61
+ # ---- normalize metrics ----
62
+ freshness = correct_decisions / total_steps
63
+
64
+ efficiency = 1 - (unnecessary_invalidations / total_steps)
65
+
66
+ stability = 1 - (oscillations / total_steps)
67
+
68
+ # ---- weighted score ----
69
+ score = (
70
+ 0.5 * freshness +
71
+ 0.3 * efficiency +
72
+ 0.2 * stability
73
+ )
74
+
75
  return max(0.0, min(1.0, score))
inference.py CHANGED
@@ -1,38 +1,113 @@
1
  import os
2
  import requests
 
3
  from openai import OpenAI
4
 
 
 
 
 
 
 
 
5
  client = OpenAI(
6
  base_url=os.getenv("API_BASE_URL"),
7
  api_key=os.getenv("HF_TOKEN")
8
  )
9
-
10
- MODEL = os.getenv("MODEL_NAME")
11
-
12
- ENV_URL = "http://localhost:8000"
13
-
14
- def choose_action(state):
15
- import json
16
-
17
- prompt = f"""
18
- You are an expert cache invalidation system.
19
-
20
- Goal:
21
- Maximize correctness while avoiding unnecessary invalidations.
22
-
23
- Rules:
24
- - If item shows signs of staleness → invalidate
25
- - If uncertain → refresh
26
- - If stable → keep
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
 
28
  State:
29
- {json.dumps(state, indent=2)}
30
 
31
- Respond ONLY with valid JSON:
32
  {{"type": "...", "key": "..."}}
33
  """
34
-
35
- try:
36
  response = client.chat.completions.create(
37
  model=MODEL,
38
  messages=[{"role": "user", "content": prompt}],
@@ -40,51 +115,64 @@ Respond ONLY with valid JSON:
40
  )
41
 
42
  text = response.choices[0].message.content.strip()
43
-
44
  action = json.loads(text)
45
 
46
- # basic validation
47
  if "type" in action and "key" in action:
48
  return action
49
-
50
  except:
51
  pass
52
 
53
- # fallback (important for robustness)
54
- item = state["items"][0]
55
-
56
- if item["last_result"] == "stale":
57
- return {"type": "invalidate", "key": item["key"]}
58
-
59
- if item["age"] > 5:
60
- return {"type": "refresh", "key": item["key"]}
61
-
62
- return {"type": "keep", "key": item["key"]}
63
 
 
64
  def run():
65
  res = requests.post(f"{ENV_URL}/reset").json()
66
 
 
 
67
  task_id = res.get("task_id", "unknown")
68
- print(f"[START] task_id={task_id}")
69
-
70
- state = res
71
  total_reward = 0
 
 
 
 
72
 
73
- for _ in range(10):
74
- action = choose_action(state)
 
 
 
 
 
 
75
 
76
  step_res = requests.post(f"{ENV_URL}/step", json=action).json()
77
 
78
  reward = step_res["reward"]
79
- total_reward += reward
 
 
 
 
 
 
 
 
 
 
 
80
 
81
- print(f"[STEP] action={action} reward={reward}")
82
  state = step_res["state"]
83
 
84
- if step_res["done"]:
85
  break
86
 
87
- print(f"[END] total_reward={total_reward}")
 
 
 
 
 
88
 
89
 
90
  if __name__ == "__main__":
 
1
  import os
2
  import requests
3
+ import json
4
  from openai import OpenAI
5
 
6
+ # ---- CONFIG ----
7
+ API_BASE = os.getenv("API_BASE_URL")
8
+ API_KEY = os.getenv("OPENAI_API_KEY")
9
+ MODEL = os.getenv("MODEL_NAME", "gpt-4o-mini")
10
+
11
+ ENV_URL = "https://parvpareek-cache-env.hf.space"
12
+
13
  client = OpenAI(
14
  base_url=os.getenv("API_BASE_URL"),
15
  api_key=os.getenv("HF_TOKEN")
16
  )
17
+ # ---- MEMORY ----
18
+ MEMORY = {}
19
+
20
+ # ---- ITEM SELECTION ----
21
+ LAST_USED = None
22
+
23
+ def log_start(task, env, model):
24
+ print(f"[START] task={task} env={env} model={model}", flush=True)
25
+
26
+
27
+ def log_step(step, action, reward, done, error):
28
+ error_val = error if error else "null"
29
+ print(
30
+ f"[STEP] step={step} action={action} reward={reward:.2f} done={str(done).lower()} error={error_val}",
31
+ flush=True,
32
+ )
33
+
34
+
35
+ def log_end(success, steps, rewards):
36
+ rewards_str = ",".join(f"{r:.2f}" for r in rewards)
37
+ print(
38
+ f"[END] success={str(success).lower()} steps={steps} rewards={rewards_str}",
39
+ flush=True,
40
+ )
41
+
42
+ def select_item(state, step):
43
+ global LAST_USED
44
+ items = state["items"]
45
+
46
+ def score(item):
47
+ s = 0
48
+ if item["last_result"] == "stale":
49
+ s += 3
50
+ if item["age"] > 5:
51
+ s += 2
52
+ if item["access_count"] > 10:
53
+ s += 1
54
+ return s
55
+
56
+ # best candidate
57
+ best = max(items, key=score)
58
+
59
+ # 🧠 exploration every 2 steps
60
+ if step % 2 == 1:
61
+ for item in items:
62
+ if item["key"] != LAST_USED:
63
+ LAST_USED = item["key"]
64
+ return item
65
+
66
+ LAST_USED = best["key"]
67
+ return best
68
+
69
+ # ---- DECISION POLICY ----
70
+ def decide(item, step):
71
+ key = item["key"]
72
+ last_result = item["last_result"]
73
+ age = item["age"]
74
+
75
+ mem = MEMORY.get(key, {})
76
+
77
+ # 🚫 cooldown after invalidate
78
+ if mem.get("last_action") == "invalidate" and step - mem.get("last_step", -10) < 2:
79
+ return {"type": "keep", "key": key}
80
+
81
+ # strong signal
82
+ if last_result == "stale" and age > 2:
83
+ return {"type": "invalidate", "key": key}
84
+
85
+ # uncertainty zone
86
+ if 3 <= age <= 6:
87
+ return {"type": "refresh", "key": key}
88
+
89
+ # safe zone
90
+ if last_result == "hit" and age < 3:
91
+ return {"type": "keep", "key": key}
92
+
93
+ # fallback
94
+ if age > 6:
95
+ return {"type": "refresh", "key": key}
96
+
97
+ return {"type": "keep", "key": key}
98
+
99
+ # ---- OPTIONAL LLM ASSIST (SAFE) ----
100
+ def llm_assist(state):
101
+ try:
102
+ prompt = f"""
103
+ You are a cache invalidation agent.
104
 
105
  State:
106
+ {json.dumps(state)}
107
 
108
+ Return JSON:
109
  {{"type": "...", "key": "..."}}
110
  """
 
 
111
  response = client.chat.completions.create(
112
  model=MODEL,
113
  messages=[{"role": "user", "content": prompt}],
 
115
  )
116
 
117
  text = response.choices[0].message.content.strip()
 
118
  action = json.loads(text)
119
 
 
120
  if "type" in action and "key" in action:
121
  return action
 
122
  except:
123
  pass
124
 
125
+ return None
 
 
 
 
 
 
 
 
 
126
 
127
+ # ---- MAIN LOOP ----
128
  def run():
129
  res = requests.post(f"{ENV_URL}/reset").json()
130
 
131
+ # handle wrapped state (important fix)
132
+ state = res.get("state", res)
133
  task_id = res.get("task_id", "unknown")
 
 
 
134
  total_reward = 0
135
+ rewards = []
136
+ steps_taken = 0
137
+
138
+ log_start(task_id, "cache_env", MODEL)
139
 
140
+ for step in range(1, 11):
141
+ item = select_item(state, step)
142
+ action = decide(item, step)
143
+
144
+ MEMORY[item["key"]] = {
145
+ "last_action": action["type"],
146
+ "last_step": step
147
+ }
148
 
149
  step_res = requests.post(f"{ENV_URL}/step", json=action).json()
150
 
151
  reward = step_res["reward"]
152
+ done = step_res["done"]
153
+
154
+ rewards.append(reward)
155
+ steps_taken = step
156
+
157
+ log_step(
158
+ step=step,
159
+ action=json.dumps(action),
160
+ reward=reward,
161
+ done=done,
162
+ error=None
163
+ )
164
 
 
165
  state = step_res["state"]
166
 
167
+ if done:
168
  break
169
 
170
+ # success criteria
171
+ avg_reward = sum(rewards) / len(rewards)
172
+ success = avg_reward > 0.3
173
+
174
+ log_end(success, steps_taken, rewards)
175
+
176
 
177
 
178
  if __name__ == "__main__":
pyproject.toml ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [build-system]
2
+ requires = ["setuptools>=61", "wheel"]
3
+ build-backend = "setuptools.build_meta"
4
+
5
+ [project]
6
+ name = "cache-invalidation-env"
7
+ version = "0.1.0"
8
+ description = "Cache invalidation decision environment for OpenEnv"
9
+ requires-python = ">=3.10"
10
+ dependencies = [
11
+ "openenv-core[core]>=0.2.2",
12
+ "fastapi>=0.100.0",
13
+ "uvicorn[standard]>=0.22.0",
14
+ "pydantic>=2.0.0",
15
+ "requests>=2.28.0",
16
+ "openai>=1.0.0",
17
+ ]
18
+
19
+ [project.scripts]
20
+ server = "server.app:main"
21
+
22
+ [tool.setuptools.packages.find]
23
+ where = ["."]
24
+ include = ["env*", "server*"]
requirements.txt CHANGED
@@ -2,4 +2,5 @@ fastapi
2
  uvicorn
3
  pydantic
4
  requests
5
- openai
 
 
2
  uvicorn
3
  pydantic
4
  requests
5
+ openai
6
+ openenv-core
server/__init__.py ADDED
@@ -0,0 +1 @@
 
 
1
+ # OpenEnv HTTP server package
server/app.py ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """OpenEnv entry: validator requires server/app.py with def main(...) and if __name__ + main()."""
2
+
3
+ import uvicorn
4
+
5
+
6
+ def main(host: str = "0.0.0.0", port: int = 7860):
7
+ from app import app as fastapi_app
8
+
9
+ uvicorn.run(fastapi_app, host=host, port=port)
10
+
11
+
12
+ if __name__ == "__main__":
13
+ main()
uv.lock ADDED
The diff for this file is too large to render. See raw diff
 
validate-submission.sh ADDED
@@ -0,0 +1,191 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env bash
2
+ #
3
+ # validate-submission.sh — OpenEnv Submission Validator
4
+ #
5
+ # Checks that your HF Space is live, Docker image builds, and openenv validate passes.
6
+ #
7
+ # Prerequisites:
8
+ # - Docker: https://docs.docker.com/get-docker/
9
+ # - openenv-core: pip install openenv-core
10
+ # - curl (usually pre-installed)
11
+ #
12
+ # Run:
13
+ # curl -fsSL https://raw.githubusercontent.com/<owner>/<repo>/main/scripts/validate-submission.sh | bash -s -- <ping_url> [repo_dir]
14
+ #
15
+ # Or download and run locally:
16
+ # chmod +x validate-submission.sh
17
+ # ./validate-submission.sh <ping_url> [repo_dir]
18
+ #
19
+ # Arguments:
20
+ # ping_url Your HuggingFace Space URL (e.g. https://your-space.hf.space)
21
+ # repo_dir Path to your repo (default: current directory)
22
+ #
23
+ # Examples:
24
+ # ./validate-submission.sh https://my-team.hf.space
25
+ # ./validate-submission.sh https://my-team.hf.space ./my-repo
26
+ #
27
+
28
+ set -uo pipefail
29
+
30
+ DOCKER_BUILD_TIMEOUT=600
31
+ if [ -t 1 ]; then
32
+ RED='\033[0;31m'
33
+ GREEN='\033[0;32m'
34
+ YELLOW='\033[1;33m'
35
+ BOLD='\033[1m'
36
+ NC='\033[0m'
37
+ else
38
+ RED='' GREEN='' YELLOW='' BOLD='' NC=''
39
+ fi
40
+
41
+ run_with_timeout() {
42
+ local secs="$1"; shift
43
+ if command -v timeout &>/dev/null; then
44
+ timeout "$secs" "$@"
45
+ elif command -v gtimeout &>/dev/null; then
46
+ gtimeout "$secs" "$@"
47
+ else
48
+ "$@" &
49
+ local pid=$!
50
+ ( sleep "$secs" && kill "$pid" 2>/dev/null ) &
51
+ local watcher=$!
52
+ wait "$pid" 2>/dev/null
53
+ local rc=$?
54
+ kill "$watcher" 2>/dev/null
55
+ wait "$watcher" 2>/dev/null
56
+ return $rc
57
+ fi
58
+ }
59
+
60
+ portable_mktemp() {
61
+ local prefix="${1:-validate}"
62
+ mktemp "${TMPDIR:-/tmp}/${prefix}-XXXXXX" 2>/dev/null || mktemp
63
+ }
64
+
65
+ CLEANUP_FILES=()
66
+ cleanup() { rm -f "${CLEANUP_FILES[@]+"${CLEANUP_FILES[@]}"}"; }
67
+ trap cleanup EXIT
68
+
69
+ PING_URL="${1:-}"
70
+ REPO_DIR="${2:-.}"
71
+
72
+ if [ -z "$PING_URL" ]; then
73
+ printf "Usage: %s <ping_url> [repo_dir]\n" "$0"
74
+ printf "\n"
75
+ printf " ping_url Your HuggingFace Space URL (e.g. https://your-space.hf.space)\n"
76
+ printf " repo_dir Path to your repo (default: current directory)\n"
77
+ exit 1
78
+ fi
79
+
80
+ if ! REPO_DIR="$(cd "$REPO_DIR" 2>/dev/null && pwd)"; then
81
+ printf "Error: directory '%s' not found\n" "${2:-.}"
82
+ exit 1
83
+ fi
84
+ PING_URL="${PING_URL%/}"
85
+ export PING_URL
86
+ PASS=0
87
+
88
+ log() { printf "[%s] %b\n" "$(date -u +%H:%M:%S)" "$*"; }
89
+ pass() { log "${GREEN}PASSED${NC} -- $1"; PASS=$((PASS + 1)); }
90
+ fail() { log "${RED}FAILED${NC} -- $1"; }
91
+ hint() { printf " ${YELLOW}Hint:${NC} %b\n" "$1"; }
92
+ stop_at() {
93
+ printf "\n"
94
+ printf "${RED}${BOLD}Validation stopped at %s.${NC} Fix the above before continuing.\n" "$1"
95
+ exit 1
96
+ }
97
+
98
+ printf "\n"
99
+ printf "${BOLD}========================================${NC}\n"
100
+ printf "${BOLD} OpenEnv Submission Validator${NC}\n"
101
+ printf "${BOLD}========================================${NC}\n"
102
+ log "Repo: $REPO_DIR"
103
+ log "Ping URL: $PING_URL"
104
+ printf "\n"
105
+
106
+ log "${BOLD}Step 1/3: Pinging HF Space${NC} ($PING_URL/reset) ..."
107
+
108
+ CURL_OUTPUT=$(portable_mktemp "validate-curl")
109
+ CLEANUP_FILES+=("$CURL_OUTPUT")
110
+ HTTP_CODE=$(curl -s -o "$CURL_OUTPUT" -w "%{http_code}" -X POST \
111
+ -H "Content-Type: application/json" -d '{}' \
112
+ "$PING_URL/reset" --max-time 30 2>"$CURL_OUTPUT" || printf "000")
113
+
114
+ if [ "$HTTP_CODE" = "200" ]; then
115
+ pass "HF Space is live and responds to /reset"
116
+ elif [ "$HTTP_CODE" = "000" ]; then
117
+ fail "HF Space not reachable (connection failed or timed out)"
118
+ hint "Check your network connection and that the Space is running."
119
+ hint "Try: curl -s -o /dev/null -w '%%{http_code}' -X POST $PING_URL/reset"
120
+ stop_at "Step 1"
121
+ else
122
+ fail "HF Space /reset returned HTTP $HTTP_CODE (expected 200)"
123
+ hint "Make sure your Space is running and the URL is correct."
124
+ hint "Try opening $PING_URL in your browser first."
125
+ stop_at "Step 1"
126
+ fi
127
+
128
+ log "${BOLD}Step 2/3: Running docker build${NC} ..."
129
+
130
+ if ! command -v docker &>/dev/null; then
131
+ fail "docker command not found"
132
+ hint "Install Docker: https://docs.docker.com/get-docker/"
133
+ stop_at "Step 2"
134
+ fi
135
+
136
+ if [ -f "$REPO_DIR/Dockerfile" ]; then
137
+ DOCKER_CONTEXT="$REPO_DIR"
138
+ elif [ -f "$REPO_DIR/server/Dockerfile" ]; then
139
+ DOCKER_CONTEXT="$REPO_DIR/server"
140
+ else
141
+ fail "No Dockerfile found in repo root or server/ directory"
142
+ stop_at "Step 2"
143
+ fi
144
+
145
+ log " Found Dockerfile in $DOCKER_CONTEXT"
146
+
147
+ BUILD_LOG=$(portable_mktemp "validate-docker")
148
+ CLEANUP_FILES+=("$BUILD_LOG")
149
+ BUILD_OK=false
150
+ # Plain progress: BuildKit's default UI can block or buffer when stderr is not a TTY (e.g. $(...)),
151
+ # which makes Step 2 look hung; writing to a file avoids that.
152
+ if run_with_timeout "$DOCKER_BUILD_TIMEOUT" env DOCKER_BUILDKIT=1 docker build --progress=plain "$DOCKER_CONTEXT" >"$BUILD_LOG" 2>&1; then
153
+ BUILD_OK=true
154
+ fi
155
+
156
+ if [ "$BUILD_OK" = true ]; then
157
+ pass "Docker build succeeded"
158
+ else
159
+ fail "Docker build failed (timeout=${DOCKER_BUILD_TIMEOUT}s)"
160
+ tail -20 "$BUILD_LOG" 2>/dev/null || true
161
+ stop_at "Step 2"
162
+ fi
163
+
164
+ log "${BOLD}Step 3/3: Running openenv validate${NC} ..."
165
+
166
+ if ! command -v openenv &>/dev/null; then
167
+ fail "openenv command not found"
168
+ hint "Install it: pip install openenv-core"
169
+ stop_at "Step 3"
170
+ fi
171
+
172
+ VALIDATE_OK=false
173
+ VALIDATE_OUTPUT=$(cd "$REPO_DIR" && openenv validate 2>&1) && VALIDATE_OK=true
174
+
175
+ if [ "$VALIDATE_OK" = true ]; then
176
+ pass "openenv validate passed"
177
+ [ -n "$VALIDATE_OUTPUT" ] && log " $VALIDATE_OUTPUT"
178
+ else
179
+ fail "openenv validate failed"
180
+ printf "%s\n" "$VALIDATE_OUTPUT"
181
+ stop_at "Step 3"
182
+ fi
183
+
184
+ printf "\n"
185
+ printf "${BOLD}========================================${NC}\n"
186
+ printf "${GREEN}${BOLD} All 3/3 checks passed!${NC}\n"
187
+ printf "${GREEN}${BOLD} Your submission is ready to submit.${NC}\n"
188
+ printf "${BOLD}========================================${NC}\n"
189
+ printf "\n"
190
+
191
+ exit 0