Spaces:
Runtime error
Runtime error
File size: 4,320 Bytes
32247b2 43cbf31 b936b6f 32247b2 43cbf31 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 | ---
title: RLM-Forge
emoji: π
colorFrom: blue
colorTo: indigo
sdk: gradio
python_version: 3.11
sdk_version: 6.9.0
app_file: server/app.py
base_path: /rlm_forge
---
# RLM-Forge
**Recursive Language Model training environment for AI coding agents.**
RLM-Forge is an [OpenEnv](https://github.com/meta-pytorch/OpenEnv) environment that trains language models to solve coding tasks on real Python repositories using Recursive Language Model (RLM) patterns.
## How It Works
1. **Clone** a real Python repo (e.g., python-slugify, humanize)
2. **Extract** a source file and replace it with a broken stub (correct signatures, wrong implementations)
3. **Agent** explores the repo via a sandboxed multi-step REPL with built-in tools
4. **Reward** = test pass rate (55%) + structural validity (15%) + efficiency (30%)
5. **Train** with GRPO to improve the agent's coding ability over time
### The REPL Tools
The agent has access to these functions in the sandbox:
| Function | Description |
|----------|-------------|
| `read_file(path)` | Read a file from the repo |
| `list_dir(path='.')` | List directory contents |
| `search(pattern, path='.')` | Grep for a pattern |
| `write_file(path, content)` | Write/create a file |
| `run_tests(test_path=None)` | Run pytest |
| `spawn_agent(scope, mission)` | Explore a directory scope |
| `FINAL()` | Signal implementation is complete |
## Project Structure
```
rlm_forge/
βββ __init__.py # Package exports
βββ models.py # Pydantic models (Action, Observation, State)
βββ client.py # EnvClient for remote connections
βββ server/
βββ app.py # FastAPI server (create_app)
βββ environment.py # Core Environment (reset/step)
βββ sandbox.py # Sandboxed Python REPL
βββ repo_manager.py # Repo cloning & dependency management
βββ feature_extractor.py # Source file extraction & stub generation
βββ reward.py # Composite reward computation
```
## Quick Start
### Install
```bash
uv sync
```
### Run the Server
```bash
uv run uvicorn rlm_forge.server.app:app --host 0.0.0.0 --port 8000
```
### Use the Environment Directly
```python
from rlm_forge.server.environment import RLMForgeEnvironment
from rlm_forge.models import RLMForgeAction
env = RLMForgeEnvironment()
obs = env.reset(seed=1)
print(obs.task_description)
# Agent takes actions
obs = env.step(RLMForgeAction(code="print(read_file('test.py'))"))
obs = env.step(RLMForgeAction(code="write_file('slugify/slugify.py', '...')"))
obs = env.step(RLMForgeAction(code="FINAL()"))
print(f"Reward: {obs.reward}")
```
### Connect via Client
```python
from rlm_forge.client import RLMForgeClient
from rlm_forge.models import RLMForgeAction
client = RLMForgeClient(base_url="http://localhost:8000")
client.connect()
result = client.reset(seed=1)
result = client.step(RLMForgeAction(code="print(list_dir())"))
result = client.step(RLMForgeAction(code="FINAL()"))
print(f"Reward: {result.reward}")
```
## Training
See `rlm_forge_training.ipynb` for the full GRPO training notebook. Designed for Google Colab with an H100 GPU.
Key training approach:
- **Multi-step trajectory concatenation**: Full episode (all code actions) treated as one GRPO "completion"
- **Group Relative Policy Optimization**: Multiple completions per task, advantages computed relative to group mean
- **LoRA fine-tuning**: 4-bit quantized Qwen2.5-Coder-32B with LoRA adapter
## Reward Breakdown
| Component | Weight | Description |
|-----------|--------|-------------|
| Test Pass Rate | 55% | Fraction of tests passing |
| Structural Validity | 15% | AST parse check + import check |
| Efficiency | 30% | Tiered by iteration budget used |
## Curated Repos
| Repo | Source File | Tests | Difficulty |
|------|-----------|-------|------------|
| python-slugify | `slugify/slugify.py` | 82 | Easy |
| humanize (number) | `src/humanize/number.py` | 219 | Medium |
| humanize (time) | `src/humanize/time.py` | varies | Medium |
## Docker
```bash
docker build -t rlm-forge .
docker run -p 8000:8000 rlm-forge
```
The Dockerfile pre-clones curated repos to avoid network I/O on each `reset()`.
## Deploy to HF Spaces
```bash
openenv push -r your-username/rlm-forge
```
|