sanjay7676 commited on
Commit
2b35bd5
·
1 Parent(s): a2ac82d

fix(api): Pydantic v2 step payload (exclude_none); env candidate_solutions; README Space API curl guide; Gradio API tab

Browse files
Files changed (6) hide show
  1. BLOG.md +39 -34
  2. MINI_BLOG.md +0 -47
  3. README.md +30 -3
  4. api_server.py +2 -1
  5. app.py +9 -7
  6. env.py +1 -1
BLOG.md CHANGED
@@ -1,47 +1,52 @@
1
- # 🛡️ FORGE-v4: Building the "Immune System" for AI Code Generation
2
 
3
- ### The Silent Crisis in AI Coding
4
- We've all seen it: an AI writes a perfect "Quick Sort" in seconds. But what happens when you give that same code an array of 10,000 duplicate zeros? Or a list of mixed large negatives? Often, the AI's "perfect" code crashes, enters an infinite loop, or returns incorrect results.
5
 
6
- Standard benchmarks measure **capability**. We built **FORGE-v4** to measure **robustness**.
 
7
 
8
- ---
 
 
9
 
10
- ## ⚔️ The Concept: Adversarial Red-Teaming
11
- FORGE-v4 isn't just a static test suite; it's a living environment. We implemented a **Red-vs-Blue** dynamic:
12
- - **The Defender (Blue Team)**: Our Coder agent tries to solve sorting tasks correctly.
13
- - **The Adversary (Red Team)**: Our Breaker agent actively searches for the Coder's "blind spots."
14
 
15
- As the Coder improves, the Breaker escalates. It progresses through **4 Tiers of difficulty**—from basic lists to extreme boundary values and stress tests. This tiered red-teaming ensures that the model isn't just memorizing common patterns, but actually hardening its logic.
 
16
 
17
- ---
 
18
 
19
- ## 🧠 The Secret Sauce: CoachMemory
20
- One of the most innovative features of FORGE-v4 is the **CoachMemory feedback loop**.
 
 
21
 
22
- In most training environments, a model fails, gets a low reward, and moves on. In FORGE-v4, every failure is analyzed by the "Coach."
23
- * Did the model fail on negatives?
24
- * Did it time out on large arrays?
25
- * Did it destroy duplicates?
26
 
27
- These insights are stored in persistent memory. In the next episode, the model reads these "lessons" and adapts its strategy. This mimics the human engineering process: **Mistake → Analysis → Correction.**
 
28
 
29
- ---
 
30
 
31
- ## 📈 Results that Matter
32
- Our benchmarks show that while a baseline heuristic policy might have a high "average" pass rate (91%), it is easily broken by Tier 3 and Tier 4 attacks.
33
 
34
- Our **FORGE-v4 Model Policy** achieved:
35
- - **100% Pass Rate** across all adversarial tiers.
36
- - **+2.10 Reward Gain** over the baseline.
37
- - **Sustained Tier 4 Robustness**: It didn't just survive; it thrived under extreme pressure.
 
 
 
 
 
 
38
 
39
- ---
40
-
41
- ## 🌍 Why This Matters
42
- As AI agents move from "writing scripts" to "building infrastructure," robustness is no longer optional. FORGE-v4 provides the framework to ensure that the code powering our world is not just smart, but **unbreakable**.
43
-
44
- **Try the demo:** [Hugging Face Space](https://huggingface.co/spaces/sanjay7676/Team404_FORGE)
45
-
46
- ---
47
- *Created with ❤️ for the Meta OpenEnv Hackathon by Team 404.*
 
1
+ # FORGE-v4 Mini Blog: From Fragile Code to Adversarial Robustness
2
 
3
+ ## The story in one line
4
+ FORGE-v4 trains a coding agent to survive adversarial edge cases by making it fight a breaker, learn from failures, and improve over repeated reward-driven episodes.
5
 
6
+ ## Why we built this
7
+ Most coding models look good on clean examples and then fail on real inputs: negatives, duplicates, boundary values, and timeout-prone cases. We wanted an environment where failure is explicit, measurable, and useful for training.
8
 
9
+ ## The journey
10
+ ### Chapter 1: baseline confidence, hidden fragility
11
+ We started with a defender that often passed easy tests but broke under stress tiers. That gave us a critical signal: average correctness is not robustness.
12
 
13
+ ### Chapter 2: breaker escalation
14
+ We added a tiered breaker that progressively attacked blind spots. The environment moved from simple lists to harder adversarial distributions.
 
 
15
 
16
+ ### Chapter 3: memory as improvement engine
17
+ CoachMemory converted repeated failure patterns into structured lessons. Instead of forgetting mistakes each episode, the loop made mistakes actionable.
18
 
19
+ ### Chapter 4: measurable training loop
20
+ We used benchmark/compare runs to produce reward and pass-rate evidence, exported preference pairs, and connected that to a small-model-first adapter training path.
21
 
22
+ ## What changed after training cycles
23
+ - Defender pass rate stabilized under tougher tiers.
24
+ - Average defender reward improved versus baseline runs.
25
+ - Breaker pressure remained high, but the defender failed less often on known edge patterns.
26
 
27
+ ## Evidence (committed outputs)
28
+ ### Reward trend
29
+ ![Reward curve](outputs/reward_curve.png)
 
30
 
31
+ ### Pass-rate trend
32
+ ![Pass rate curve](outputs/pass_rate.png)
33
 
34
+ ### Loss-like training signal
35
+ ![Loss curve](outputs/loss_curve.png)
36
 
37
+ ### Machine-readable benchmark summary
38
+ - `outputs/final_report.json`
39
 
40
+ ## Deliverables
41
+ - Hugging Face Space: https://huggingface.co/spaces/sanjay7676/Team404_FORGE
42
+ - GitHub repository: https://github.com/Sanjay767676/Meta-x-Scaler-Team404--Round2
43
+ - **Docker image (public anyone can pull)**
44
+ - **Docker Hub (browse tags):** https://hub.docker.com/r/sanjay767676/forge
45
+ - **Pull command:** `docker pull sanjay767676/forge:latest`
46
+ - **Registry image reference:** `docker.io/sanjay767676/forge:latest`
47
+ - Colab notebook: https://colab.research.google.com/github/Sanjay767676/Meta-x-Scaler-Team404--Round2/blob/main/FORGE_Training_Colab.ipynb
48
+ - Colab model + adapter training: https://colab.research.google.com/drive/1mKXjIX-eB2GSiebI-_n37KzVlN1NKCu8?usp=sharing
49
+ - YouTube demo placeholder: https://youtube.com/watch?v=YOUR_DEMO_VIDEO_ID
50
 
51
+ ## Why this matters
52
+ FORGE-v4 is designed to train coding behavior that is verifiable, harder to reward-hack, and more resilient under adversarial conditions. That is the capability gap we think matters most for real LLM deployment.
 
 
 
 
 
 
 
MINI_BLOG.md DELETED
@@ -1,47 +0,0 @@
1
- # FORGE-v4 Mini Blog: From Fragile Code to Adversarial Robustness
2
-
3
- ## The story in one line
4
- FORGE-v4 trains a coding agent to survive adversarial edge cases by making it fight a breaker, learn from failures, and improve over repeated reward-driven episodes.
5
-
6
- ## Why we built this
7
- Most coding models look good on clean examples and then fail on real inputs: negatives, duplicates, boundary values, and timeout-prone cases. We wanted an environment where failure is explicit, measurable, and useful for training.
8
-
9
- ## The journey
10
- ### Chapter 1: baseline confidence, hidden fragility
11
- We started with a defender that often passed easy tests but broke under stress tiers. That gave us a critical signal: average correctness is not robustness.
12
-
13
- ### Chapter 2: breaker escalation
14
- We added a tiered breaker that progressively attacked blind spots. The environment moved from simple lists to harder adversarial distributions.
15
-
16
- ### Chapter 3: memory as improvement engine
17
- CoachMemory converted repeated failure patterns into structured lessons. Instead of forgetting mistakes each episode, the loop made mistakes actionable.
18
-
19
- ### Chapter 4: measurable training loop
20
- We used benchmark/compare runs to produce reward and pass-rate evidence, exported preference pairs, and connected that to a small-model-first adapter training path.
21
-
22
- ## What changed after training cycles
23
- - Defender pass rate stabilized under tougher tiers.
24
- - Average defender reward improved versus baseline runs.
25
- - Breaker pressure remained high, but the defender failed less often on known edge patterns.
26
-
27
- ## Evidence (committed outputs)
28
- ### Reward trend
29
- ![Reward curve](outputs/reward_curve.png)
30
-
31
- ### Pass-rate trend
32
- ![Pass rate curve](outputs/pass_rate.png)
33
-
34
- ### Loss-like training signal
35
- ![Loss curve](outputs/loss_curve.png)
36
-
37
- ### Machine-readable benchmark summary
38
- - `outputs/final_report.json`
39
-
40
- ## Deliverables
41
- - Hugging Face Space: https://huggingface.co/spaces/sanjay7676/Team404_FORGE
42
- - GitHub repository: https://github.com/Sanjay767676/Meta-x-Scaler-Team404--Round2
43
- - Colab notebook: https://colab.research.google.com/github/Sanjay767676/Meta-x-Scaler-Team404--Round2/blob/main/FORGE_Training_Colab.ipynb
44
- - YouTube demo placeholder: https://youtube.com/watch?v=YOUR_DEMO_VIDEO_ID
45
-
46
- ## Why this matters
47
- FORGE-v4 is designed to train coding behavior that is verifiable, harder to reward-hack, and more resilient under adversarial conditions. That is the capability gap we think matters most for real LLM deployment.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
README.md CHANGED
@@ -20,6 +20,7 @@ suggested_hardware: cpu-basic
20
  [![Colab (Drive)](https://img.shields.io/badge/Training-Colab-orange)](https://colab.research.google.com/drive/1mKXjIX-eB2GSiebI-_n37KzVlN1NKCu8?usp=sharing)
21
  [![Colab (GitHub)](https://img.shields.io/badge/Colab-GitHub_sync-green)](https://colab.research.google.com/github/Sanjay767676/Meta-x-Scaler-Team404--Round2/blob/main/FORGE_Training_Colab.ipynb)
22
  [![Adapter](https://img.shields.io/badge/HF-Adapter-blue)](https://huggingface.co/sanjay7676/forge-qwen-final)
 
23
  [![Hackathon Guide](https://img.shields.io/badge/Meta-OpenEnv%20Guide-0a66c2)](https://docs.google.com/document/d/1Odznuzwtb1ecDOm2t6ToZd4MuMXXfO6vWUGcxbC6mFs/edit?tab=t.0#bookmark=kix.2dz0x0nie3me)
24
 
25
  ### Judge quick links (all materials)
@@ -30,10 +31,12 @@ suggested_hardware: cpu-basic
30
  | **OpenEnv + TRL (framework docs)** | [Hugging Face TRL — OpenEnv integration](https://huggingface.co/docs/trl/openenv) |
31
  | **Hugging Face Space (submit this URL)** | [huggingface.co/spaces/sanjay7676/Team404_FORGE](https://huggingface.co/spaces/sanjay7676/Team404_FORGE) |
32
  | **Source code** | [github.com/Sanjay767676/Meta-x-Scaler-Team404--Round2](https://github.com/Sanjay767676/Meta-x-Scaler-Team404--Round2) |
33
- | **Mini-blog (writeup)** | [MINI_BLOG.md](MINI_BLOG.md) in repo |
34
  | **Training Colab (author Drive)** | [Colab notebook](https://colab.research.google.com/drive/1mKXjIX-eB2GSiebI-_n37KzVlN1NKCu8?usp=sharing) |
 
35
  | **Training Colab (synced from GitHub)** | [FORGE_Training_Colab.ipynb on Colab](https://colab.research.google.com/github/Sanjay767676/Meta-x-Scaler-Team404--Round2/blob/main/FORGE_Training_Colab.ipynb) |
36
  | **Trained adapter** | [sanjay7676/forge-qwen-final](https://huggingface.co/sanjay7676/forge-qwen-final) |
 
37
  | **Command / security cheat sheet** | [guide.md](guide.md) |
38
  | **Video / slides** | YouTube demo placeholder: https://youtube.com/watch?v=YOUR_DEMO_VIDEO_ID |
39
 
@@ -44,6 +47,25 @@ suggested_hardware: cpu-basic
44
  - For a stable demo on CPU, set Space secret **`CODE_PROVIDER_MODE=mock`** (or use **NIM** / **OpenRouter** keys so the router never loads local `custom_hf`). Loading **`Qwen2.5-Coder-1.5B` + LoRA** on free CPU is likely to **OOM or time out**.
45
  - Full training stack: install **[`requirements-train.txt`](requirements-train.txt)** on **Colab** or locally (see Quickstart).
46
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
47
  ### NOTE 1 — Non‑negotiable submission requirements (checklist)
48
 
49
  | # | Requirement | FORGE-v4 |
@@ -51,7 +73,7 @@ suggested_hardware: cpu-basic
51
  | 1 | **OpenEnv (latest):** build on the framework | **`openenv-core>=0.2.3`** in [`requirements.txt`](requirements.txt). Training extras in [`requirements-train.txt`](requirements-train.txt). Wrapper: [`env_openenv.py`](env_openenv.py). Core: [`env.py`](env.py). |
52
  | 2 | **Training:** Unsloth or TRL (or other RL stack) + **Colab** | [`train_unsloth.py`](train_unsloth.py) (Unsloth + TRL), [`train_colab.py`](train_colab.py), [`FORGE_Training_Colab.ipynb`](FORGE_Training_Colab.ipynb), Colab links in the table above. |
53
  | 3 | **Evidence of training:** loss + reward plots (real run) | Committed: [`outputs/reward_curve.png`](outputs/reward_curve.png), [`outputs/loss_curve.png`](outputs/loss_curve.png), [`outputs/pass_rate.png`](outputs/pass_rate.png), [`outputs/final_report.json`](outputs/final_report.json). |
54
- | 4 | **Writeup / video:** mini-blog on HF *or* <2 min YouTube *etc.* | **[MINI_BLOG.md](MINI_BLOG.md)** linked here; add **public YouTube or slide URL** in the table row when published. |
55
  | 5 | **Hugging Face Space:** discoverable & runnable | **[Team404_FORGE](https://huggingface.co/spaces/sanjay7676/Team404_FORGE)** — **use this URL in the submission form.** |
56
  | 6 | **README:** motivate, explain env, show results + **link Space + all materials** | This file. |
57
  | 7 | **No huge video files** on Hub | Only **URLs** to external video/slides (see table). |
@@ -77,7 +99,7 @@ suggested_hardware: cpu-basic
77
 
78
  ## Minimum submission checklist (summary)
79
 
80
- Same items as **NOTE 1** above: OpenEnv dependency + wrapper, Colab + training scripts, committed plots/JSON, writeup link, runnable Space URL, README hub — all linked from the **Judge quick links** table.
81
 
82
  ---
83
 
@@ -304,6 +326,11 @@ python train_unsloth.py --mode dpo
304
 
305
  Public image on **Docker Hub**: **`sanjay767676/forge`** (repository `forge` under user `sanjay767676`).
306
 
 
 
 
 
 
307
  ### Pull & run (no build — public image)
308
 
309
  ```bash
 
20
  [![Colab (Drive)](https://img.shields.io/badge/Training-Colab-orange)](https://colab.research.google.com/drive/1mKXjIX-eB2GSiebI-_n37KzVlN1NKCu8?usp=sharing)
21
  [![Colab (GitHub)](https://img.shields.io/badge/Colab-GitHub_sync-green)](https://colab.research.google.com/github/Sanjay767676/Meta-x-Scaler-Team404--Round2/blob/main/FORGE_Training_Colab.ipynb)
22
  [![Adapter](https://img.shields.io/badge/HF-Adapter-blue)](https://huggingface.co/sanjay7676/forge-qwen-final)
23
+ [![Docker Hub](https://img.shields.io/badge/Docker%20Hub-sanjay767676%2Fforge-2496ED?logo=docker&logoColor=white)](https://hub.docker.com/r/sanjay767676/forge)
24
  [![Hackathon Guide](https://img.shields.io/badge/Meta-OpenEnv%20Guide-0a66c2)](https://docs.google.com/document/d/1Odznuzwtb1ecDOm2t6ToZd4MuMXXfO6vWUGcxbC6mFs/edit?tab=t.0#bookmark=kix.2dz0x0nie3me)
25
 
26
  ### Judge quick links (all materials)
 
31
  | **OpenEnv + TRL (framework docs)** | [Hugging Face TRL — OpenEnv integration](https://huggingface.co/docs/trl/openenv) |
32
  | **Hugging Face Space (submit this URL)** | [huggingface.co/spaces/sanjay7676/Team404_FORGE](https://huggingface.co/spaces/sanjay7676/Team404_FORGE) |
33
  | **Source code** | [github.com/Sanjay767676/Meta-x-Scaler-Team404--Round2](https://github.com/Sanjay767676/Meta-x-Scaler-Team404--Round2) |
34
+ | **Blog (writeup)** | [BLOG.md](BLOG.md) in repo |
35
  | **Training Colab (author Drive)** | [Colab notebook](https://colab.research.google.com/drive/1mKXjIX-eB2GSiebI-_n37KzVlN1NKCu8?usp=sharing) |
36
+ | **Colab model + adapter training** | https://colab.research.google.com/drive/1mKXjIX-eB2GSiebI-_n37KzVlN1NKCu8?usp=sharing |
37
  | **Training Colab (synced from GitHub)** | [FORGE_Training_Colab.ipynb on Colab](https://colab.research.google.com/github/Sanjay767676/Meta-x-Scaler-Team404--Round2/blob/main/FORGE_Training_Colab.ipynb) |
38
  | **Trained adapter** | [sanjay7676/forge-qwen-final](https://huggingface.co/sanjay7676/forge-qwen-final) |
39
+ | **Docker image (public — anyone can pull)** | **Hub (tags, README):** [hub.docker.com/r/sanjay767676/forge](https://hub.docker.com/r/sanjay767676/forge) — **pull:** `docker pull sanjay767676/forge:latest` — **registry ref:** `docker.io/sanjay767676/forge:latest` |
40
  | **Command / security cheat sheet** | [guide.md](guide.md) |
41
  | **Video / slides** | YouTube demo placeholder: https://youtube.com/watch?v=YOUR_DEMO_VIDEO_ID |
42
 
 
47
  - For a stable demo on CPU, set Space secret **`CODE_PROVIDER_MODE=mock`** (or use **NIM** / **OpenRouter** keys so the router never loads local `custom_hf`). Loading **`Qwen2.5-Coder-1.5B` + LoRA** on free CPU is likely to **OOM or time out**.
48
  - Full training stack: install **[`requirements-train.txt`](requirements-train.txt)** on **Colab** or locally (see Quickstart).
49
 
50
+ ### OpenEnv HTTP API on the Hugging Face Space
51
+
52
+ The Space runs the same FastAPI routes as [`api_server.py`](api_server.py) on the **app root** (Gradio UI is at **`/ui`**; `/` redirects to `/ui`). There is **no `/start`** endpoint — begin an episode with **`POST /reset`**, then drive it with **`POST /step`**.
53
+
54
+ 1. **Base URL:** open the live Space, then use the **`*.hf.space`** host shown in the address bar (for this project it is typically **`https://sanjay7676-team404-forge.hf.space`**). If yours differs, copy it from the running app or from the Space **Embed** snippet.
55
+ 2. **Check liveness:** `curl -sS "https://sanjay7676-team404-forge.hf.space/health"`
56
+ 3. **New episode:** `curl -sS -X POST "https://sanjay7676-team404-forge.hf.space/reset" -H "Content-Type: application/json"`
57
+ 4. **Step** (JSON body must include `coder_code` and `coder_version`; omit `candidate_solutions` or send a JSON array of strings):
58
+
59
+ ```bash
60
+ curl -sS -X POST "https://sanjay7676-team404-forge.hf.space/step" \
61
+ -H "Content-Type: application/json" \
62
+ -d "{\"coder_code\": \"def solution(arr):\\n return sorted(list(arr))\", \"coder_version\": \"demo\"}"
63
+ ```
64
+
65
+ 5. **Observe state:** `curl -sS "https://sanjay7676-team404-forge.hf.space/state"`
66
+
67
+ **Note:** The Space shares **one** in-memory environment across all visitors — concurrent `reset` / `step` calls can interleave. For isolated runs, use **Docker** or **local** `api_server.py` on port `8000`.
68
+
69
  ### NOTE 1 — Non‑negotiable submission requirements (checklist)
70
 
71
  | # | Requirement | FORGE-v4 |
 
73
  | 1 | **OpenEnv (latest):** build on the framework | **`openenv-core>=0.2.3`** in [`requirements.txt`](requirements.txt). Training extras in [`requirements-train.txt`](requirements-train.txt). Wrapper: [`env_openenv.py`](env_openenv.py). Core: [`env.py`](env.py). |
74
  | 2 | **Training:** Unsloth or TRL (or other RL stack) + **Colab** | [`train_unsloth.py`](train_unsloth.py) (Unsloth + TRL), [`train_colab.py`](train_colab.py), [`FORGE_Training_Colab.ipynb`](FORGE_Training_Colab.ipynb), Colab links in the table above. |
75
  | 3 | **Evidence of training:** loss + reward plots (real run) | Committed: [`outputs/reward_curve.png`](outputs/reward_curve.png), [`outputs/loss_curve.png`](outputs/loss_curve.png), [`outputs/pass_rate.png`](outputs/pass_rate.png), [`outputs/final_report.json`](outputs/final_report.json). |
76
+ | 4 | **Writeup / video:** mini-blog on HF *or* <2 min YouTube *etc.* | **[BLOG.md](BLOG.md)** linked here; add **public YouTube or slide URL** in the table row when published. |
77
  | 5 | **Hugging Face Space:** discoverable & runnable | **[Team404_FORGE](https://huggingface.co/spaces/sanjay7676/Team404_FORGE)** — **use this URL in the submission form.** |
78
  | 6 | **README:** motivate, explain env, show results + **link Space + all materials** | This file. |
79
  | 7 | **No huge video files** on Hub | Only **URLs** to external video/slides (see table). |
 
99
 
100
  ## Minimum submission checklist (summary)
101
 
102
+ Same items as **NOTE 1** above: OpenEnv dependency + wrapper, Colab + training scripts, committed plots/JSON, writeup link, runnable Space URL, **public Docker image** ([Hub](https://hub.docker.com/r/sanjay767676/forge) + `docker pull sanjay767676/forge:latest`), README hub — all linked from the **Judge quick links** table.
103
 
104
  ---
105
 
 
326
 
327
  Public image on **Docker Hub**: **`sanjay767676/forge`** (repository `forge` under user `sanjay767676`).
328
 
329
+ | What | URL / reference |
330
+ | :-- | :-- |
331
+ | **Browse image (tags, description)** | [https://hub.docker.com/r/sanjay767676/forge](https://hub.docker.com/r/sanjay767676/forge) |
332
+ | **Pull from CLI** | `docker pull sanjay767676/forge:latest` (same as `docker pull docker.io/sanjay767676/forge:latest`) |
333
+
334
  ### Pull & run (no build — public image)
335
 
336
  ```bash
api_server.py CHANGED
@@ -30,7 +30,8 @@ async def reset():
30
  async def step(action: Action):
31
  """Perform a step in the environment."""
32
  try:
33
- result = env.step(action.model_dump())
 
34
  return result
35
  except RuntimeError as e:
36
  raise HTTPException(status_code=400, detail=str(e))
 
30
  async def step(action: Action):
31
  """Perform a step in the environment."""
32
  try:
33
+ # exclude_none: otherwise candidate_solutions=None breaks env (get() returns None, not default []).
34
+ result = env.step(action.model_dump(exclude_none=True))
35
  return result
36
  except RuntimeError as e:
37
  raise HTTPException(status_code=400, detail=str(e))
app.py CHANGED
@@ -162,13 +162,15 @@ with gr.Blocks(theme=gr.themes.Soft()) as demo:
162
  with gr.Tab("3. API Endpoints"):
163
  gr.Markdown("""
164
  ### OpenEnv API Standard
165
- FORGE-v4 exposes a FastAPI server (available at `:8000` when running locally) with the following endpoints:
166
-
167
- - **`POST /reset`**: Initializes a new episode and returns the problem description.
168
- - **`POST /step`**: Receives code candidates, evaluates them, and returns rewards/diagnostics.
169
- - **`GET /state`**: Returns current environment status and memory summary.
170
-
171
- These endpoints allow external agents to interface with FORGE-v4 programmatically.
 
 
172
  """)
173
 
174
  # Event handlers
 
162
  with gr.Tab("3. API Endpoints"):
163
  gr.Markdown("""
164
  ### OpenEnv API Standard
165
+ FORGE-v4 exposes a FastAPI server on the **same origin** as this UI: routes live at the **site root**, while Gradio is under **`/ui`**. Locally, `python api_server.py` serves on **`:8000`**; on this Space, use your **`*.hf.space`** base URL (no separate `/start` — use **`POST /reset`** then **`POST /step`**).
166
+
167
+ - **`GET /health`**: Liveness / version check.
168
+ - **`POST /reset`**: Starts a new episode and returns the initial state.
169
+ - **`POST /step`**: JSON body: `coder_code`, `coder_version`, optional `candidate_solutions` (array of strings). Returns rewards and updated state.
170
+ - **`GET /state`**: Current environment snapshot.
171
+
172
+ **Example (replace `BASE` with your Space `https://….hf.space` host):**
173
+ `curl -sS "$BASE/health"` → `curl -sS -X POST "$BASE/reset" -H "Content-Type: application/json"` → `curl -sS -X POST "$BASE/step" -H "Content-Type: application/json" -d '{"coder_code":"def solution(arr):\\n return sorted(list(arr))","coder_version":"demo"}'`
174
  """)
175
 
176
  # Event handlers
env.py CHANGED
@@ -120,7 +120,7 @@ class FORGEEnv:
120
 
121
  coder_code = action.get("coder_code", "")
122
  coder_version = action.get("coder_version", "unknown")
123
- candidate_solutions = action.get("candidate_solutions", [])
124
  if not isinstance(coder_code, str):
125
  raise TypeError("action['coder_code'] must be a string.")
126
  if not isinstance(coder_version, str):
 
120
 
121
  coder_code = action.get("coder_code", "")
122
  coder_version = action.get("coder_version", "unknown")
123
+ candidate_solutions = action.get("candidate_solutions") or []
124
  if not isinstance(coder_code, str):
125
  raise TypeError("action['coder_code'] must be a string.")
126
  if not isinstance(coder_version, str):