Spaces:
Sleeping
Sleeping
havinashpatil commited on
Commit Β·
5e35378
1
Parent(s): 05d943b
Polish README for Hackathon judging criteria
Browse files
README.md
CHANGED
|
@@ -1,111 +1,94 @@
|
|
| 1 |
---
|
| 2 |
-
title: CodeArena RL
|
| 3 |
emoji: π
|
| 4 |
colorFrom: blue
|
| 5 |
colorTo: purple
|
| 6 |
sdk: docker
|
| 7 |
-
pinned:
|
| 8 |
---
|
| 9 |
[](https://huggingface.co/spaces/ceoavinash/codearena-rl)
|
| 10 |
-
[](
|
| 11 |
[](./openenv.yaml)
|
| 12 |
[]()
|
| 13 |
-
|
| 14 |
-
# CodeArena RL Benchmark
|
| 15 |
|
| 16 |
-
|
| 17 |
-
benchmarked on generation. Can it write a function? Can it
|
| 18 |
-
complete a snippet? Nobody benchmarks what happens when the
|
| 19 |
-
code breaks and the agent has to reason about failure, iterate
|
| 20 |
-
on fixes, and recover from mistakes.
|
| 21 |
|
| 22 |
-
|
| 23 |
-
open-source reinforcement learning environment built specifically
|
| 24 |
-
for iterative code repair β graded not just on test pass rates
|
| 25 |
-
but on whether the fix is correct, secure, and written to a
|
| 26 |
-
professional standard.
|
| 27 |
|
| 28 |
-
|
| 29 |
|
| 30 |
-
**
|
| 31 |
-
Most benchmarks ask: did the tests pass? CodeArena also asks: did the agent fix the root cause, or just patch around it? Is the fix secure? Is it readable? An LLM judge scores each fix on correctness, security, and code quality *alongside* the deterministic test runner. Agents cannot game the reward by memorising solutions or producing syntactically correct but semantically wrong fixes.
|
| 32 |
|
| 33 |
-
|
| 34 |
-
The environment grows with the agent. Difficulty escalates and de-escalates automatically based on rolling average reward over the last 10 episodes. An agent that masters easy tasks gets pushed to medium automatically. This maps directly to Theme 4 (Self-Improvement / Adaptive Curricula) from the judging criteria.
|
| 35 |
-
|
| 36 |
-
**USP 3 β The Gap Nobody Is Measuring**
|
| 37 |
-
Every coding AI is benchmarked on generation. CodeArena is the first standardised, open-source RL environment for iterative code repair. Use it to get a number, not vibes, when comparing models.
|
| 38 |
|
| 39 |
-
##
|
| 40 |
|
| 41 |
-
|
| 42 |
-
- **Complex Shaped Rewards**: Rewards are a weighted composite:
|
| 43 |
|
| 44 |
-
|
| 45 |
-
|---|---|---|
|
| 46 |
-
| compile_score | 20% | Code compiles without error |
|
| 47 |
-
| test_pass_ratio | 40% | Fraction of unit tests passed |
|
| 48 |
-
| efficiency_score | 10% | Speed vs optimal runtime |
|
| 49 |
-
| llm_judge_score | 30% | Correctness + Security + Code Quality |
|
| 50 |
-
| step_penalty | -0.02/step | Rewards faster fixes |
|
| 51 |
-
| novelty_penalty | -0.10 | Penalises repeating identical fixes |
|
| 52 |
|
| 53 |
-
|
| 54 |
|
| 55 |
-
-
|
| 56 |
-
- **Real-time Reward Visualization**: Watch compile score, test ratio, and LLM judge scores update live as the agent works using the React Frontend.
|
| 57 |
|
| 58 |
-
##
|
| 59 |
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
An agent cannot plateau by memorising easy tasks.
|
| 63 |
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
| avg reward > 0.80 on easy | β medium |
|
| 67 |
-
| avg reward > 0.75 on medium | β hard |
|
| 68 |
-
| avg reward < 0.35 on hard | β medium |
|
| 69 |
-
| avg reward < 0.35 on medium | β easy |
|
| 70 |
|
| 71 |
-
|
| 72 |
-
|
| 73 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 74 |
|
| 75 |
-
|
| 76 |
|
| 77 |
-
|
| 78 |
|
| 79 |
-
|
| 80 |
-
- `frontend/`: React + Vite frontend for live monitoring and manual intervention.
|
| 81 |
-
- `tasks/`: Task definitions stored in OpenEnv-compatible JSON schema.
|
| 82 |
-
- `inference.py`: CLI runner for evaluating RL agents, supporting both OpenAI-compatible APIs and native HuggingFace `transformers` pipelines.
|
| 83 |
|
| 84 |
-
|
| 85 |
|
| 86 |

|
| 87 |
-
*Episode reward over training steps.
|
| 88 |
|
| 89 |

|
| 90 |
-
*Average reward
|
|
|
|
|
|
|
|
|
|
|
|
|
| 91 |
|
| 92 |
-
|
| 93 |
-
|
| 94 |
-
|
| 95 |
-
|
| 96 |
-
|
| 97 |
|
| 98 |
-
|
| 99 |
|
| 100 |
-
|
| 101 |
|
| 102 |
-
|
|
|
|
|
|
|
|
|
|
| 103 |
|
| 104 |
-
|
|
|
|
|
|
|
| 105 |
|
| 106 |
-
|
| 107 |
|
| 108 |
-
|
|
|
|
|
|
|
| 109 |
|
| 110 |
1. **Install Dependencies:**
|
| 111 |
```bash
|
|
@@ -113,161 +96,34 @@ CodeArena is infrastructure. Plug any model in. Run it. Get a number.
|
|
| 113 |
cd frontend && npm install
|
| 114 |
```
|
| 115 |
|
| 116 |
-
2. **Generate
|
| 117 |
-
To populate the extended task categories (`type_errors` and `security_bugs`), run the task generator. This must be run first or the new task categories won't exist.
|
| 118 |
```bash
|
| 119 |
python create_tasks.py
|
| 120 |
```
|
| 121 |
|
| 122 |
-
|
| 123 |
-
|
| 124 |
-
|
| 125 |
-
|
| 126 |
-
|
| 127 |
-
- **Production LLM Serving**: Uses TGI for optimized inference
|
| 128 |
-
- **Cloud Deployment**: Works on Hugging Face Spaces and other platforms
|
| 129 |
-
- **OpenAI-Compatible API**: Standard chat completions interface
|
| 130 |
-
- **Fallback System**: Built-in pattern-based fixes when LLM unavailable
|
| 131 |
-
- **Memory & Learning**: Stores successful fixes for continuous improvement
|
| 132 |
-
|
| 133 |
-
### Architecture
|
| 134 |
-
- **TGI Server**: Runs TinyLlama-1.1B-Chat-v1.0 on port 8080
|
| 135 |
-
- **FastAPI Backend**: Serves RL environment and AI fixing on port 7860
|
| 136 |
-
- **React Frontend**: Web interface for monitoring and interaction
|
| 137 |
-
|
| 138 |
-
### API Endpoints
|
| 139 |
-
**Fix Code:**
|
| 140 |
-
```bash
|
| 141 |
-
curl -X POST "https://ceoavinash-codearena-rl.hf.space/fix" \
|
| 142 |
-
-H "Content-Type: application/json" \
|
| 143 |
-
-d '{"code": "def hello() print(\"world\")", "use_tgi": true}'
|
| 144 |
-
```
|
| 145 |
-
|
| 146 |
-
**Response:**
|
| 147 |
-
```json
|
| 148 |
-
{
|
| 149 |
-
"fixed_code": "def hello():\n print(\"world\")",
|
| 150 |
-
"method": "tgi",
|
| 151 |
-
"success": true,
|
| 152 |
-
"explanation": "Fixed using TGI LLM"
|
| 153 |
-
}
|
| 154 |
-
```
|
| 155 |
|
| 156 |
-
|
| 157 |
-
|
| 158 |
-
|
| 159 |
-
|
| 160 |
-
|
| 161 |
-
|
| 162 |
-
|
| 163 |
-
|
| 164 |
-
|
| 165 |
-
|
| 166 |
-
```
|
| 167 |
-
|
| 168 |
-
### Model Performance
|
| 169 |
-
- **Model**: TinyLlama-1.1B-Chat-v1.0
|
| 170 |
-
- **Response Time**: ~2-5 seconds per fix
|
| 171 |
-
- **Memory Usage**: ~2GB RAM
|
| 172 |
-
- **Accuracy**: High for syntax errors, good for logic fixes
|
| 173 |
-
|
| 174 |
-
### Integration with RL Training
|
| 175 |
-
The AI fixer integrates with the RL environment:
|
| 176 |
-
- Provides code fixes during agent training
|
| 177 |
-
- Logs complexity vs reward metrics
|
| 178 |
-
- Stores successful patterns in memory
|
| 179 |
-
- Enables curriculum learning with adaptive difficulty
|
| 180 |
-
|
| 181 |
-
## Supported Models
|
| 182 |
-
|
| 183 |
-
CodeArena supports various LLM backends for code fixing and inference evaluation:
|
| 184 |
-
|
| 185 |
-
### TGI (Production)
|
| 186 |
-
- **TinyLlama-1.1B-Chat-v1.0** (default for Spaces)
|
| 187 |
-
- **Qwen2.5-Coder-1.5B** (recommended for local)
|
| 188 |
-
- **CodeLlama-7B-Instruct** (high quality, requires more RAM)
|
| 189 |
-
|
| 190 |
-
### OpenAI-Compatible (Ollama/vLLM)
|
| 191 |
-
- **codellama:7b-instruct** (Ollama)
|
| 192 |
-
- **codellama:13b-instruct** (Ollama)
|
| 193 |
-
- **qwen2.5-coder:1.5b** (Ollama)
|
| 194 |
-
- **deepseek-coder:6.7b** (Ollama)
|
| 195 |
-
|
| 196 |
-
### HuggingFace Transformers (Local)
|
| 197 |
-
- **Qwen/Qwen2.5-Coder-1.5B** (fast, good quality)
|
| 198 |
-
- **microsoft/DialoGPT-medium** (experimental)
|
| 199 |
-
- **TinyLlama/TinyLlama-1.1B-Chat-v1.0** (lightweight)
|
| 200 |
-
|
| 201 |
-
### Model Performance Comparison
|
| 202 |
-
| Model | Size | Speed | Quality | Memory |
|
| 203 |
-
|-------|------|-------|---------|--------|
|
| 204 |
-
| TinyLlama-1.1B | 1.1B | Fast | Good | 2GB |
|
| 205 |
-
| Qwen2.5-Coder-1.5B | 1.5B | Fast | Excellent | 3GB |
|
| 206 |
-
| CodeLlama-7B | 7B | Medium | Excellent | 14GB |
|
| 207 |
-
| CodeLlama-13B | 13B | Slow | Best | 26GB |
|
| 208 |
-
|
| 209 |
-
## Usage
|
| 210 |
-
|
| 211 |
-
### 0. Training with TRL (Colab)
|
| 212 |
-
To train an RL agent against CodeArena using GRPO or PPO:
|
| 213 |
-
|
| 214 |
-
[](COLAB_URL)
|
| 215 |
-
|
| 216 |
-
The notebook:
|
| 217 |
-
- Installs dependencies and connects to CodeArena via public URL
|
| 218 |
-
- Runs TRL GRPO training for 100+ steps
|
| 219 |
-
- Logs rewards per step and plots the reward curve inline
|
| 220 |
-
|
| 221 |
-
Replace `COLAB_URL` with your actual Colab share link.
|
| 222 |
-
|
| 223 |
-
### 1. Run the Backend Server
|
| 224 |
-
The server is required for both the frontend dashboard and RL training.
|
| 225 |
-
```bash
|
| 226 |
-
uvicorn server.app:app --port 7860
|
| 227 |
-
```
|
| 228 |
-
|
| 229 |
-
### 2. Run the Frontend Dashboard
|
| 230 |
-
```bash
|
| 231 |
-
cd frontend
|
| 232 |
-
npm run dev
|
| 233 |
-
```
|
| 234 |
-
Navigate to `http://localhost:3000` to access the live RL monitoring dashboard.
|
| 235 |
-
|
| 236 |
-
### 3. Run Inference Evaluation
|
| 237 |
-
You can evaluate a local agent or pipeline programmatically via `inference.py`.
|
| 238 |
-
|
| 239 |
-
**Using OpenAI-Compatible Endpoints (e.g., Ollama or vLLM):**
|
| 240 |
-
```bash
|
| 241 |
-
export API_BASE_URL="http://localhost:11434/v1"
|
| 242 |
-
export MODEL_NAME="codellama"
|
| 243 |
-
python inference.py --backend openai
|
| 244 |
-
```
|
| 245 |
-
|
| 246 |
-
**Using HuggingFace Transformers (Local pipeline):**
|
| 247 |
-
```bash
|
| 248 |
-
export MODEL_NAME="Qwen/Qwen2.5-Coder-1.5B"
|
| 249 |
-
python inference.py --backend hf
|
| 250 |
-
```
|
| 251 |
-
|
| 252 |
-
## Reward Analysis
|
| 253 |
-
|
| 254 |
-
As your agent interacts with the environment, inference logs are automatically written to `rewards_log.csv`.
|
| 255 |
-
To visualize the reward curves over training steps and average rewards by task category, run:
|
| 256 |
-
```bash
|
| 257 |
-
python plot_rewards.py
|
| 258 |
-
```
|
| 259 |
-
This generates `reward_curve.png` and `reward_by_task.png` in the `results/` directory.
|
| 260 |
-
|
| 261 |
-
## OpenEnv Compatibility
|
| 262 |
-
|
| 263 |
-
This benchmark strictly adheres to the OpenEnv specification. See `openenv.yaml` for full configuration details.
|
| 264 |
-
|
| 265 |
-
## Links
|
| 266 |
|
| 267 |
| Resource | URL |
|
| 268 |
|---|---|
|
| 269 |
-
|
|
| 270 |
-
| Colab Training Notebook (TRL
|
| 271 |
-
|
|
| 272 |
-
| Demo Video
|
| 273 |
-
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
title: CodeArena RL Benchmark
|
| 3 |
emoji: π
|
| 4 |
colorFrom: blue
|
| 5 |
colorTo: purple
|
| 6 |
sdk: docker
|
| 7 |
+
pinned: true
|
| 8 |
---
|
| 9 |
[](https://huggingface.co/spaces/ceoavinash/codearena-rl)
|
| 10 |
+
[](https://colab.research.google.com/github/havinashpatil/meta/blob/main/train_grpo.ipynb)
|
| 11 |
[](./openenv.yaml)
|
| 12 |
[]()
|
|
|
|
|
|
|
| 13 |
|
| 14 |
+
# π CodeArena: The Iterative Code Repair RL Benchmark
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
+
GitHub Copilot, Cursor, Devin β every major coding AI is benchmarked on *generation*. Can it write a function? Can it complete a snippet?
|
|
|
|
|
|
|
|
|
|
|
|
|
| 17 |
|
| 18 |
+
But nobody benchmarks what happens when the code **breaks**. When the agent has to reason about failure, read error logs, iterate on fixes, and recover from its own mistakes.
|
| 19 |
|
| 20 |
+
**CodeArena** measures exactly that. It is the first standardized, open-source Reinforcement Learning environment built specifically for **iterative code repair**. It grades an agent not just on whether the tests pass, but on whether the fix is correct, secure, and algorithmically efficient.
|
|
|
|
| 21 |
|
| 22 |
+
---
|
|
|
|
|
|
|
|
|
|
|
|
|
| 23 |
|
| 24 |
+
## π― Hackathon Theme Alignment: Theme #4 (Self-Improvement)
|
| 25 |
|
| 26 |
+
CodeArena directly tackles **Theme #4: Self-Improvement**.
|
|
|
|
| 27 |
|
| 28 |
+
Instead of a fixed set of tasks, CodeArena features an **Adaptive Curriculum**. The environment continuously tracks the agent's rolling average reward over the last 10 episodes. If an agent masters easy syntax errors (avg reward > 0.80), the environment automatically escalates the difficulty to algorithmic logic bugs. If the agent struggles, it de-escalates to allow recovery.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 29 |
|
| 30 |
+
The goal is recursive skill amplification: the agent learns to drive its own capability growth without plateauing on memorized, simple solutions.
|
| 31 |
|
| 32 |
+
---
|
|
|
|
| 33 |
|
| 34 |
+
## β¨ Environment Innovation (What makes it special?)
|
| 35 |
|
| 36 |
+
### 1. The Gap Nobody Is Measuring
|
| 37 |
+
We have countless environments for generating code (HumanEval, MBPP). CodeArena is the first standardized RL environment for the *debugging loop*. It simulates the real-world workflow: write β test β read error β fix β repeat.
|
|
|
|
| 38 |
|
| 39 |
+
### 2. LLM-as-Judge Hybrid Grader
|
| 40 |
+
Most benchmarks ask a binary question: *did the tests pass?* CodeArena uses a rich **Hybrid Grader**. A deterministic test runner checks correctness, while a built-in LLM Judge (powered by TGI/Hugging Face Serverless) scores the fix on security, readability, and algorithmic complexity (O(N) vs O(NΒ²)). This prevents reward-hacking where agents produce syntactically correct but fundamentally broken code just to pass a weak test.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 41 |
|
| 42 |
+
### 3. Complex Shaped Rewards
|
| 43 |
+
Rewards are a weighted composite, heavily shaped to encourage professional engineering:
|
| 44 |
+
- **Test Pass Ratio (40%)**: Fraction of unit tests passed.
|
| 45 |
+
- **LLM Judge Score (30%)**: Correctness + Security + Code Quality.
|
| 46 |
+
- **Compile Score (20%)**: Does it run without crashing?
|
| 47 |
+
- **Efficiency Score (10%)**: Speed vs optimal runtime.
|
| 48 |
+
- **Step Penalty (-0.02/step)**: Rewards faster fixes over meandering trial-and-error.
|
| 49 |
|
| 50 |
+
---
|
| 51 |
|
| 52 |
+
## π Evidence of Training & Rewards
|
| 53 |
|
| 54 |
+
We successfully trained a model using **TRL GRPO** (Group Relative Policy Optimization) on the CodeArena environment.
|
|
|
|
|
|
|
|
|
|
| 55 |
|
| 56 |
+
Below is the observable evidence of the agent's training progress. The agent started with a low success rate on algorithmic bugs, but as the GRPO training progressed, it learned to systematically read the `error_log` observation and output correct code, resulting in a climbing reward curve.
|
| 57 |
|
| 58 |

|
| 59 |
+
*Episode reward over training steps. The rolling 10-step average shows clear learning and improvement.*
|
| 60 |
|
| 61 |

|
| 62 |
+
*Average reward broken down by task category. The agent performs well on syntax and type errors, while Medium/Hard algorithmic tasks remain challenging but improving.*
|
| 63 |
+
|
| 64 |
+
### πββοΈ Run the Training Script
|
| 65 |
+
We have provided our complete TRL GRPO training pipeline in a Colab notebook so judges can re-run and verify the training process end-to-end:
|
| 66 |
+
π **[Open Training Script in Google Colab](https://colab.research.google.com/github/havinashpatil/meta/blob/main/train_grpo.ipynb)**
|
| 67 |
|
| 68 |
+
---
|
| 69 |
+
|
| 70 |
+
## π» Try the Live Environment (Hugging Face Space)
|
| 71 |
+
|
| 72 |
+
We have deployed the fully-functional CodeArena environment, complete with a React frontend dashboard that visualizes the RL process in real-time.
|
| 73 |
|
| 74 |
+
π **[Live Demo: CodeArena on Hugging Face Spaces](https://huggingface.co/spaces/ceoavinash/codearena-rl)**
|
| 75 |
|
| 76 |
+
The live space includes a built-in **AI Code Fixer** powered by Hugging Face's Serverless Inference API (using `Qwen2.5-Coder-3B-Instruct`), allowing you to test the agent's repair capabilities directly in your browser.
|
| 77 |
|
| 78 |
+
### Features of the Live Space:
|
| 79 |
+
- **Real-time Monitoring**: Watch the agent's compile score, test ratio, and LLM judge scores update live.
|
| 80 |
+
- **Sandbox Mode**: Paste your own broken Python code and watch the environment evaluate it.
|
| 81 |
+
- **Agent Mode**: Toggle auto-pilot to watch the agent fix code in a continuous loop until optimal.
|
| 82 |
|
| 83 |
+
---
|
| 84 |
+
|
| 85 |
+
## π οΈ Architecture & Setup (OpenEnv Compatible)
|
| 86 |
|
| 87 |
+
This benchmark strictly adheres to the **OpenEnv** specification (`openenv.yaml`).
|
| 88 |
|
| 89 |
+
**Data Flow:** `Agent` β `POST /reset` β `buggy_code` β `POST /step` β `LLM Judge & Test Runner` β `reward` β `Agent`
|
| 90 |
+
|
| 91 |
+
### Local Development
|
| 92 |
|
| 93 |
1. **Install Dependencies:**
|
| 94 |
```bash
|
|
|
|
| 96 |
cd frontend && npm install
|
| 97 |
```
|
| 98 |
|
| 99 |
+
2. **Generate Task Database:**
|
|
|
|
| 100 |
```bash
|
| 101 |
python create_tasks.py
|
| 102 |
```
|
| 103 |
|
| 104 |
+
3. **Run the FastAPI Backend:**
|
| 105 |
+
The backend acts as the OpenEnv entrypoint and serves the compiled React dashboard.
|
| 106 |
+
```bash
|
| 107 |
+
uvicorn server.app:app --port 7860
|
| 108 |
+
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 109 |
|
| 110 |
+
4. **Evaluate a Local Agent (Inference):**
|
| 111 |
+
You can evaluate any local agent (e.g., Ollama or a HuggingFace pipeline) programmatically via `inference.py`.
|
| 112 |
+
```bash
|
| 113 |
+
export MODEL_NAME="codellama:7b-instruct"
|
| 114 |
+
python inference.py --backend openai
|
| 115 |
+
```
|
| 116 |
+
|
| 117 |
+
---
|
| 118 |
+
|
| 119 |
+
## π Quick Links
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 120 |
|
| 121 |
| Resource | URL |
|
| 122 |
|---|---|
|
| 123 |
+
| **Hugging Face Space (Live Demo)** | [CodeArena on HF Spaces](https://huggingface.co/spaces/ceoavinash/codearena-rl) |
|
| 124 |
+
| **Colab Training Notebook (TRL)** | [Open in Colab](https://colab.research.google.com/github/havinashpatil/meta/blob/main/train_grpo.ipynb) |
|
| 125 |
+
| **OpenEnv Specification** | [openenv.yaml](./openenv.yaml) |
|
| 126 |
+
| **Demo Video / Blog Post** | *(Add link to YouTube/HF Blog here if available)* |
|
| 127 |
+
|
| 128 |
+
---
|
| 129 |
+
*Built for the OpenEnv Hackathon India 2026.*
|