Spaces:
Sleeping
Sleeping
Commit ·
2b35bd5
1
Parent(s): a2ac82d
fix(api): Pydantic v2 step payload (exclude_none); env candidate_solutions; README Space API curl guide; Gradio API tab
Browse files- BLOG.md +39 -34
- MINI_BLOG.md +0 -47
- README.md +30 -3
- api_server.py +2 -1
- app.py +9 -7
- env.py +1 -1
BLOG.md
CHANGED
|
@@ -1,47 +1,52 @@
|
|
| 1 |
-
#
|
| 2 |
|
| 3 |
-
##
|
| 4 |
-
|
| 5 |
|
| 6 |
-
|
|
|
|
| 7 |
|
| 8 |
-
|
|
|
|
|
|
|
| 9 |
|
| 10 |
-
##
|
| 11 |
-
|
| 12 |
-
- **The Defender (Blue Team)**: Our Coder agent tries to solve sorting tasks correctly.
|
| 13 |
-
- **The Adversary (Red Team)**: Our Breaker agent actively searches for the Coder's "blind spots."
|
| 14 |
|
| 15 |
-
|
|
|
|
| 16 |
|
| 17 |
-
|
|
|
|
| 18 |
|
| 19 |
-
##
|
| 20 |
-
|
|
|
|
|
|
|
| 21 |
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
* Did it destroy duplicates?
|
| 26 |
|
| 27 |
-
|
|
|
|
| 28 |
|
| 29 |
-
-
|
|
|
|
| 30 |
|
| 31 |
-
##
|
| 32 |
-
|
| 33 |
|
| 34 |
-
|
| 35 |
-
-
|
| 36 |
-
-
|
| 37 |
-
- **
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 38 |
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
## 🌍 Why This Matters
|
| 42 |
-
As AI agents move from "writing scripts" to "building infrastructure," robustness is no longer optional. FORGE-v4 provides the framework to ensure that the code powering our world is not just smart, but **unbreakable**.
|
| 43 |
-
|
| 44 |
-
**Try the demo:** [Hugging Face Space](https://huggingface.co/spaces/sanjay7676/Team404_FORGE)
|
| 45 |
-
|
| 46 |
-
---
|
| 47 |
-
*Created with ❤️ for the Meta OpenEnv Hackathon by Team 404.*
|
|
|
|
| 1 |
+
# FORGE-v4 Mini Blog: From Fragile Code to Adversarial Robustness
|
| 2 |
|
| 3 |
+
## The story in one line
|
| 4 |
+
FORGE-v4 trains a coding agent to survive adversarial edge cases by making it fight a breaker, learn from failures, and improve over repeated reward-driven episodes.
|
| 5 |
|
| 6 |
+
## Why we built this
|
| 7 |
+
Most coding models look good on clean examples and then fail on real inputs: negatives, duplicates, boundary values, and timeout-prone cases. We wanted an environment where failure is explicit, measurable, and useful for training.
|
| 8 |
|
| 9 |
+
## The journey
|
| 10 |
+
### Chapter 1: baseline confidence, hidden fragility
|
| 11 |
+
We started with a defender that often passed easy tests but broke under stress tiers. That gave us a critical signal: average correctness is not robustness.
|
| 12 |
|
| 13 |
+
### Chapter 2: breaker escalation
|
| 14 |
+
We added a tiered breaker that progressively attacked blind spots. The environment moved from simple lists to harder adversarial distributions.
|
|
|
|
|
|
|
| 15 |
|
| 16 |
+
### Chapter 3: memory as improvement engine
|
| 17 |
+
CoachMemory converted repeated failure patterns into structured lessons. Instead of forgetting mistakes each episode, the loop made mistakes actionable.
|
| 18 |
|
| 19 |
+
### Chapter 4: measurable training loop
|
| 20 |
+
We used benchmark/compare runs to produce reward and pass-rate evidence, exported preference pairs, and connected that to a small-model-first adapter training path.
|
| 21 |
|
| 22 |
+
## What changed after training cycles
|
| 23 |
+
- Defender pass rate stabilized under tougher tiers.
|
| 24 |
+
- Average defender reward improved versus baseline runs.
|
| 25 |
+
- Breaker pressure remained high, but the defender failed less often on known edge patterns.
|
| 26 |
|
| 27 |
+
## Evidence (committed outputs)
|
| 28 |
+
### Reward trend
|
| 29 |
+

|
|
|
|
| 30 |
|
| 31 |
+
### Pass-rate trend
|
| 32 |
+

|
| 33 |
|
| 34 |
+
### Loss-like training signal
|
| 35 |
+

|
| 36 |
|
| 37 |
+
### Machine-readable benchmark summary
|
| 38 |
+
- `outputs/final_report.json`
|
| 39 |
|
| 40 |
+
## Deliverables
|
| 41 |
+
- Hugging Face Space: https://huggingface.co/spaces/sanjay7676/Team404_FORGE
|
| 42 |
+
- GitHub repository: https://github.com/Sanjay767676/Meta-x-Scaler-Team404--Round2
|
| 43 |
+
- **Docker image (public — anyone can pull)**
|
| 44 |
+
- **Docker Hub (browse tags):** https://hub.docker.com/r/sanjay767676/forge
|
| 45 |
+
- **Pull command:** `docker pull sanjay767676/forge:latest`
|
| 46 |
+
- **Registry image reference:** `docker.io/sanjay767676/forge:latest`
|
| 47 |
+
- Colab notebook: https://colab.research.google.com/github/Sanjay767676/Meta-x-Scaler-Team404--Round2/blob/main/FORGE_Training_Colab.ipynb
|
| 48 |
+
- Colab model + adapter training: https://colab.research.google.com/drive/1mKXjIX-eB2GSiebI-_n37KzVlN1NKCu8?usp=sharing
|
| 49 |
+
- YouTube demo placeholder: https://youtube.com/watch?v=YOUR_DEMO_VIDEO_ID
|
| 50 |
|
| 51 |
+
## Why this matters
|
| 52 |
+
FORGE-v4 is designed to train coding behavior that is verifiable, harder to reward-hack, and more resilient under adversarial conditions. That is the capability gap we think matters most for real LLM deployment.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
MINI_BLOG.md
DELETED
|
@@ -1,47 +0,0 @@
|
|
| 1 |
-
# FORGE-v4 Mini Blog: From Fragile Code to Adversarial Robustness
|
| 2 |
-
|
| 3 |
-
## The story in one line
|
| 4 |
-
FORGE-v4 trains a coding agent to survive adversarial edge cases by making it fight a breaker, learn from failures, and improve over repeated reward-driven episodes.
|
| 5 |
-
|
| 6 |
-
## Why we built this
|
| 7 |
-
Most coding models look good on clean examples and then fail on real inputs: negatives, duplicates, boundary values, and timeout-prone cases. We wanted an environment where failure is explicit, measurable, and useful for training.
|
| 8 |
-
|
| 9 |
-
## The journey
|
| 10 |
-
### Chapter 1: baseline confidence, hidden fragility
|
| 11 |
-
We started with a defender that often passed easy tests but broke under stress tiers. That gave us a critical signal: average correctness is not robustness.
|
| 12 |
-
|
| 13 |
-
### Chapter 2: breaker escalation
|
| 14 |
-
We added a tiered breaker that progressively attacked blind spots. The environment moved from simple lists to harder adversarial distributions.
|
| 15 |
-
|
| 16 |
-
### Chapter 3: memory as improvement engine
|
| 17 |
-
CoachMemory converted repeated failure patterns into structured lessons. Instead of forgetting mistakes each episode, the loop made mistakes actionable.
|
| 18 |
-
|
| 19 |
-
### Chapter 4: measurable training loop
|
| 20 |
-
We used benchmark/compare runs to produce reward and pass-rate evidence, exported preference pairs, and connected that to a small-model-first adapter training path.
|
| 21 |
-
|
| 22 |
-
## What changed after training cycles
|
| 23 |
-
- Defender pass rate stabilized under tougher tiers.
|
| 24 |
-
- Average defender reward improved versus baseline runs.
|
| 25 |
-
- Breaker pressure remained high, but the defender failed less often on known edge patterns.
|
| 26 |
-
|
| 27 |
-
## Evidence (committed outputs)
|
| 28 |
-
### Reward trend
|
| 29 |
-

|
| 30 |
-
|
| 31 |
-
### Pass-rate trend
|
| 32 |
-

|
| 33 |
-
|
| 34 |
-
### Loss-like training signal
|
| 35 |
-

|
| 36 |
-
|
| 37 |
-
### Machine-readable benchmark summary
|
| 38 |
-
- `outputs/final_report.json`
|
| 39 |
-
|
| 40 |
-
## Deliverables
|
| 41 |
-
- Hugging Face Space: https://huggingface.co/spaces/sanjay7676/Team404_FORGE
|
| 42 |
-
- GitHub repository: https://github.com/Sanjay767676/Meta-x-Scaler-Team404--Round2
|
| 43 |
-
- Colab notebook: https://colab.research.google.com/github/Sanjay767676/Meta-x-Scaler-Team404--Round2/blob/main/FORGE_Training_Colab.ipynb
|
| 44 |
-
- YouTube demo placeholder: https://youtube.com/watch?v=YOUR_DEMO_VIDEO_ID
|
| 45 |
-
|
| 46 |
-
## Why this matters
|
| 47 |
-
FORGE-v4 is designed to train coding behavior that is verifiable, harder to reward-hack, and more resilient under adversarial conditions. That is the capability gap we think matters most for real LLM deployment.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
README.md
CHANGED
|
@@ -20,6 +20,7 @@ suggested_hardware: cpu-basic
|
|
| 20 |
[](https://colab.research.google.com/drive/1mKXjIX-eB2GSiebI-_n37KzVlN1NKCu8?usp=sharing)
|
| 21 |
[](https://colab.research.google.com/github/Sanjay767676/Meta-x-Scaler-Team404--Round2/blob/main/FORGE_Training_Colab.ipynb)
|
| 22 |
[](https://huggingface.co/sanjay7676/forge-qwen-final)
|
|
|
|
| 23 |
[](https://docs.google.com/document/d/1Odznuzwtb1ecDOm2t6ToZd4MuMXXfO6vWUGcxbC6mFs/edit?tab=t.0#bookmark=kix.2dz0x0nie3me)
|
| 24 |
|
| 25 |
### Judge quick links (all materials)
|
|
@@ -30,10 +31,12 @@ suggested_hardware: cpu-basic
|
|
| 30 |
| **OpenEnv + TRL (framework docs)** | [Hugging Face TRL — OpenEnv integration](https://huggingface.co/docs/trl/openenv) |
|
| 31 |
| **Hugging Face Space (submit this URL)** | [huggingface.co/spaces/sanjay7676/Team404_FORGE](https://huggingface.co/spaces/sanjay7676/Team404_FORGE) |
|
| 32 |
| **Source code** | [github.com/Sanjay767676/Meta-x-Scaler-Team404--Round2](https://github.com/Sanjay767676/Meta-x-Scaler-Team404--Round2) |
|
| 33 |
-
| **
|
| 34 |
| **Training Colab (author Drive)** | [Colab notebook](https://colab.research.google.com/drive/1mKXjIX-eB2GSiebI-_n37KzVlN1NKCu8?usp=sharing) |
|
|
|
|
| 35 |
| **Training Colab (synced from GitHub)** | [FORGE_Training_Colab.ipynb on Colab](https://colab.research.google.com/github/Sanjay767676/Meta-x-Scaler-Team404--Round2/blob/main/FORGE_Training_Colab.ipynb) |
|
| 36 |
| **Trained adapter** | [sanjay7676/forge-qwen-final](https://huggingface.co/sanjay7676/forge-qwen-final) |
|
|
|
|
| 37 |
| **Command / security cheat sheet** | [guide.md](guide.md) |
|
| 38 |
| **Video / slides** | YouTube demo placeholder: https://youtube.com/watch?v=YOUR_DEMO_VIDEO_ID |
|
| 39 |
|
|
@@ -44,6 +47,25 @@ suggested_hardware: cpu-basic
|
|
| 44 |
- For a stable demo on CPU, set Space secret **`CODE_PROVIDER_MODE=mock`** (or use **NIM** / **OpenRouter** keys so the router never loads local `custom_hf`). Loading **`Qwen2.5-Coder-1.5B` + LoRA** on free CPU is likely to **OOM or time out**.
|
| 45 |
- Full training stack: install **[`requirements-train.txt`](requirements-train.txt)** on **Colab** or locally (see Quickstart).
|
| 46 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 47 |
### NOTE 1 — Non‑negotiable submission requirements (checklist)
|
| 48 |
|
| 49 |
| # | Requirement | FORGE-v4 |
|
|
@@ -51,7 +73,7 @@ suggested_hardware: cpu-basic
|
|
| 51 |
| 1 | **OpenEnv (latest):** build on the framework | **`openenv-core>=0.2.3`** in [`requirements.txt`](requirements.txt). Training extras in [`requirements-train.txt`](requirements-train.txt). Wrapper: [`env_openenv.py`](env_openenv.py). Core: [`env.py`](env.py). |
|
| 52 |
| 2 | **Training:** Unsloth or TRL (or other RL stack) + **Colab** | [`train_unsloth.py`](train_unsloth.py) (Unsloth + TRL), [`train_colab.py`](train_colab.py), [`FORGE_Training_Colab.ipynb`](FORGE_Training_Colab.ipynb), Colab links in the table above. |
|
| 53 |
| 3 | **Evidence of training:** loss + reward plots (real run) | Committed: [`outputs/reward_curve.png`](outputs/reward_curve.png), [`outputs/loss_curve.png`](outputs/loss_curve.png), [`outputs/pass_rate.png`](outputs/pass_rate.png), [`outputs/final_report.json`](outputs/final_report.json). |
|
| 54 |
-
| 4 | **Writeup / video:** mini-blog on HF *or* <2 min YouTube *etc.* | **[
|
| 55 |
| 5 | **Hugging Face Space:** discoverable & runnable | **[Team404_FORGE](https://huggingface.co/spaces/sanjay7676/Team404_FORGE)** — **use this URL in the submission form.** |
|
| 56 |
| 6 | **README:** motivate, explain env, show results + **link Space + all materials** | This file. |
|
| 57 |
| 7 | **No huge video files** on Hub | Only **URLs** to external video/slides (see table). |
|
|
@@ -77,7 +99,7 @@ suggested_hardware: cpu-basic
|
|
| 77 |
|
| 78 |
## Minimum submission checklist (summary)
|
| 79 |
|
| 80 |
-
Same items as **NOTE 1** above: OpenEnv dependency + wrapper, Colab + training scripts, committed plots/JSON, writeup link, runnable Space URL, README hub — all linked from the **Judge quick links** table.
|
| 81 |
|
| 82 |
---
|
| 83 |
|
|
@@ -304,6 +326,11 @@ python train_unsloth.py --mode dpo
|
|
| 304 |
|
| 305 |
Public image on **Docker Hub**: **`sanjay767676/forge`** (repository `forge` under user `sanjay767676`).
|
| 306 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 307 |
### Pull & run (no build — public image)
|
| 308 |
|
| 309 |
```bash
|
|
|
|
| 20 |
[](https://colab.research.google.com/drive/1mKXjIX-eB2GSiebI-_n37KzVlN1NKCu8?usp=sharing)
|
| 21 |
[](https://colab.research.google.com/github/Sanjay767676/Meta-x-Scaler-Team404--Round2/blob/main/FORGE_Training_Colab.ipynb)
|
| 22 |
[](https://huggingface.co/sanjay7676/forge-qwen-final)
|
| 23 |
+
[](https://hub.docker.com/r/sanjay767676/forge)
|
| 24 |
[](https://docs.google.com/document/d/1Odznuzwtb1ecDOm2t6ToZd4MuMXXfO6vWUGcxbC6mFs/edit?tab=t.0#bookmark=kix.2dz0x0nie3me)
|
| 25 |
|
| 26 |
### Judge quick links (all materials)
|
|
|
|
| 31 |
| **OpenEnv + TRL (framework docs)** | [Hugging Face TRL — OpenEnv integration](https://huggingface.co/docs/trl/openenv) |
|
| 32 |
| **Hugging Face Space (submit this URL)** | [huggingface.co/spaces/sanjay7676/Team404_FORGE](https://huggingface.co/spaces/sanjay7676/Team404_FORGE) |
|
| 33 |
| **Source code** | [github.com/Sanjay767676/Meta-x-Scaler-Team404--Round2](https://github.com/Sanjay767676/Meta-x-Scaler-Team404--Round2) |
|
| 34 |
+
| **Blog (writeup)** | [BLOG.md](BLOG.md) in repo |
|
| 35 |
| **Training Colab (author Drive)** | [Colab notebook](https://colab.research.google.com/drive/1mKXjIX-eB2GSiebI-_n37KzVlN1NKCu8?usp=sharing) |
|
| 36 |
+
| **Colab model + adapter training** | https://colab.research.google.com/drive/1mKXjIX-eB2GSiebI-_n37KzVlN1NKCu8?usp=sharing |
|
| 37 |
| **Training Colab (synced from GitHub)** | [FORGE_Training_Colab.ipynb on Colab](https://colab.research.google.com/github/Sanjay767676/Meta-x-Scaler-Team404--Round2/blob/main/FORGE_Training_Colab.ipynb) |
|
| 38 |
| **Trained adapter** | [sanjay7676/forge-qwen-final](https://huggingface.co/sanjay7676/forge-qwen-final) |
|
| 39 |
+
| **Docker image (public — anyone can pull)** | **Hub (tags, README):** [hub.docker.com/r/sanjay767676/forge](https://hub.docker.com/r/sanjay767676/forge) — **pull:** `docker pull sanjay767676/forge:latest` — **registry ref:** `docker.io/sanjay767676/forge:latest` |
|
| 40 |
| **Command / security cheat sheet** | [guide.md](guide.md) |
|
| 41 |
| **Video / slides** | YouTube demo placeholder: https://youtube.com/watch?v=YOUR_DEMO_VIDEO_ID |
|
| 42 |
|
|
|
|
| 47 |
- For a stable demo on CPU, set Space secret **`CODE_PROVIDER_MODE=mock`** (or use **NIM** / **OpenRouter** keys so the router never loads local `custom_hf`). Loading **`Qwen2.5-Coder-1.5B` + LoRA** on free CPU is likely to **OOM or time out**.
|
| 48 |
- Full training stack: install **[`requirements-train.txt`](requirements-train.txt)** on **Colab** or locally (see Quickstart).
|
| 49 |
|
| 50 |
+
### OpenEnv HTTP API on the Hugging Face Space
|
| 51 |
+
|
| 52 |
+
The Space runs the same FastAPI routes as [`api_server.py`](api_server.py) on the **app root** (Gradio UI is at **`/ui`**; `/` redirects to `/ui`). There is **no `/start`** endpoint — begin an episode with **`POST /reset`**, then drive it with **`POST /step`**.
|
| 53 |
+
|
| 54 |
+
1. **Base URL:** open the live Space, then use the **`*.hf.space`** host shown in the address bar (for this project it is typically **`https://sanjay7676-team404-forge.hf.space`**). If yours differs, copy it from the running app or from the Space **Embed** snippet.
|
| 55 |
+
2. **Check liveness:** `curl -sS "https://sanjay7676-team404-forge.hf.space/health"`
|
| 56 |
+
3. **New episode:** `curl -sS -X POST "https://sanjay7676-team404-forge.hf.space/reset" -H "Content-Type: application/json"`
|
| 57 |
+
4. **Step** (JSON body must include `coder_code` and `coder_version`; omit `candidate_solutions` or send a JSON array of strings):
|
| 58 |
+
|
| 59 |
+
```bash
|
| 60 |
+
curl -sS -X POST "https://sanjay7676-team404-forge.hf.space/step" \
|
| 61 |
+
-H "Content-Type: application/json" \
|
| 62 |
+
-d "{\"coder_code\": \"def solution(arr):\\n return sorted(list(arr))\", \"coder_version\": \"demo\"}"
|
| 63 |
+
```
|
| 64 |
+
|
| 65 |
+
5. **Observe state:** `curl -sS "https://sanjay7676-team404-forge.hf.space/state"`
|
| 66 |
+
|
| 67 |
+
**Note:** The Space shares **one** in-memory environment across all visitors — concurrent `reset` / `step` calls can interleave. For isolated runs, use **Docker** or **local** `api_server.py` on port `8000`.
|
| 68 |
+
|
| 69 |
### NOTE 1 — Non‑negotiable submission requirements (checklist)
|
| 70 |
|
| 71 |
| # | Requirement | FORGE-v4 |
|
|
|
|
| 73 |
| 1 | **OpenEnv (latest):** build on the framework | **`openenv-core>=0.2.3`** in [`requirements.txt`](requirements.txt). Training extras in [`requirements-train.txt`](requirements-train.txt). Wrapper: [`env_openenv.py`](env_openenv.py). Core: [`env.py`](env.py). |
|
| 74 |
| 2 | **Training:** Unsloth or TRL (or other RL stack) + **Colab** | [`train_unsloth.py`](train_unsloth.py) (Unsloth + TRL), [`train_colab.py`](train_colab.py), [`FORGE_Training_Colab.ipynb`](FORGE_Training_Colab.ipynb), Colab links in the table above. |
|
| 75 |
| 3 | **Evidence of training:** loss + reward plots (real run) | Committed: [`outputs/reward_curve.png`](outputs/reward_curve.png), [`outputs/loss_curve.png`](outputs/loss_curve.png), [`outputs/pass_rate.png`](outputs/pass_rate.png), [`outputs/final_report.json`](outputs/final_report.json). |
|
| 76 |
+
| 4 | **Writeup / video:** mini-blog on HF *or* <2 min YouTube *etc.* | **[BLOG.md](BLOG.md)** linked here; add **public YouTube or slide URL** in the table row when published. |
|
| 77 |
| 5 | **Hugging Face Space:** discoverable & runnable | **[Team404_FORGE](https://huggingface.co/spaces/sanjay7676/Team404_FORGE)** — **use this URL in the submission form.** |
|
| 78 |
| 6 | **README:** motivate, explain env, show results + **link Space + all materials** | This file. |
|
| 79 |
| 7 | **No huge video files** on Hub | Only **URLs** to external video/slides (see table). |
|
|
|
|
| 99 |
|
| 100 |
## Minimum submission checklist (summary)
|
| 101 |
|
| 102 |
+
Same items as **NOTE 1** above: OpenEnv dependency + wrapper, Colab + training scripts, committed plots/JSON, writeup link, runnable Space URL, **public Docker image** ([Hub](https://hub.docker.com/r/sanjay767676/forge) + `docker pull sanjay767676/forge:latest`), README hub — all linked from the **Judge quick links** table.
|
| 103 |
|
| 104 |
---
|
| 105 |
|
|
|
|
| 326 |
|
| 327 |
Public image on **Docker Hub**: **`sanjay767676/forge`** (repository `forge` under user `sanjay767676`).
|
| 328 |
|
| 329 |
+
| What | URL / reference |
|
| 330 |
+
| :-- | :-- |
|
| 331 |
+
| **Browse image (tags, description)** | [https://hub.docker.com/r/sanjay767676/forge](https://hub.docker.com/r/sanjay767676/forge) |
|
| 332 |
+
| **Pull from CLI** | `docker pull sanjay767676/forge:latest` (same as `docker pull docker.io/sanjay767676/forge:latest`) |
|
| 333 |
+
|
| 334 |
### Pull & run (no build — public image)
|
| 335 |
|
| 336 |
```bash
|
api_server.py
CHANGED
|
@@ -30,7 +30,8 @@ async def reset():
|
|
| 30 |
async def step(action: Action):
|
| 31 |
"""Perform a step in the environment."""
|
| 32 |
try:
|
| 33 |
-
|
|
|
|
| 34 |
return result
|
| 35 |
except RuntimeError as e:
|
| 36 |
raise HTTPException(status_code=400, detail=str(e))
|
|
|
|
| 30 |
async def step(action: Action):
|
| 31 |
"""Perform a step in the environment."""
|
| 32 |
try:
|
| 33 |
+
# exclude_none: otherwise candidate_solutions=None breaks env (get() returns None, not default []).
|
| 34 |
+
result = env.step(action.model_dump(exclude_none=True))
|
| 35 |
return result
|
| 36 |
except RuntimeError as e:
|
| 37 |
raise HTTPException(status_code=400, detail=str(e))
|
app.py
CHANGED
|
@@ -162,13 +162,15 @@ with gr.Blocks(theme=gr.themes.Soft()) as demo:
|
|
| 162 |
with gr.Tab("3. API Endpoints"):
|
| 163 |
gr.Markdown("""
|
| 164 |
### OpenEnv API Standard
|
| 165 |
-
FORGE-v4 exposes a FastAPI server
|
| 166 |
-
|
| 167 |
-
- **`
|
| 168 |
-
- **`POST /
|
| 169 |
-
- **`
|
| 170 |
-
|
| 171 |
-
|
|
|
|
|
|
|
| 172 |
""")
|
| 173 |
|
| 174 |
# Event handlers
|
|
|
|
| 162 |
with gr.Tab("3. API Endpoints"):
|
| 163 |
gr.Markdown("""
|
| 164 |
### OpenEnv API Standard
|
| 165 |
+
FORGE-v4 exposes a FastAPI server on the **same origin** as this UI: routes live at the **site root**, while Gradio is under **`/ui`**. Locally, `python api_server.py` serves on **`:8000`**; on this Space, use your **`*.hf.space`** base URL (no separate `/start` — use **`POST /reset`** then **`POST /step`**).
|
| 166 |
+
|
| 167 |
+
- **`GET /health`**: Liveness / version check.
|
| 168 |
+
- **`POST /reset`**: Starts a new episode and returns the initial state.
|
| 169 |
+
- **`POST /step`**: JSON body: `coder_code`, `coder_version`, optional `candidate_solutions` (array of strings). Returns rewards and updated state.
|
| 170 |
+
- **`GET /state`**: Current environment snapshot.
|
| 171 |
+
|
| 172 |
+
**Example (replace `BASE` with your Space `https://….hf.space` host):**
|
| 173 |
+
`curl -sS "$BASE/health"` → `curl -sS -X POST "$BASE/reset" -H "Content-Type: application/json"` → `curl -sS -X POST "$BASE/step" -H "Content-Type: application/json" -d '{"coder_code":"def solution(arr):\\n return sorted(list(arr))","coder_version":"demo"}'`
|
| 174 |
""")
|
| 175 |
|
| 176 |
# Event handlers
|
env.py
CHANGED
|
@@ -120,7 +120,7 @@ class FORGEEnv:
|
|
| 120 |
|
| 121 |
coder_code = action.get("coder_code", "")
|
| 122 |
coder_version = action.get("coder_version", "unknown")
|
| 123 |
-
candidate_solutions = action.get("candidate_solutions"
|
| 124 |
if not isinstance(coder_code, str):
|
| 125 |
raise TypeError("action['coder_code'] must be a string.")
|
| 126 |
if not isinstance(coder_version, str):
|
|
|
|
| 120 |
|
| 121 |
coder_code = action.get("coder_code", "")
|
| 122 |
coder_version = action.get("coder_version", "unknown")
|
| 123 |
+
candidate_solutions = action.get("candidate_solutions") or []
|
| 124 |
if not isinstance(coder_code, str):
|
| 125 |
raise TypeError("action['coder_code'] must be a string.")
|
| 126 |
if not isinstance(coder_version, str):
|