Spaces:
Runtime error
Runtime error
metadata
title: RLM-Forge
emoji: π
colorFrom: blue
colorTo: indigo
sdk: gradio
python_version: 3.11
sdk_version: 6.9.0
app_file: server/app.py
base_path: /rlm_forge
RLM-Forge
Recursive Language Model training environment for AI coding agents.
RLM-Forge is an OpenEnv environment that trains language models to solve coding tasks on real Python repositories using Recursive Language Model (RLM) patterns.
How It Works
- Clone a real Python repo (e.g., python-slugify, humanize)
- Extract a source file and replace it with a broken stub (correct signatures, wrong implementations)
- Agent explores the repo via a sandboxed multi-step REPL with built-in tools
- Reward = test pass rate (55%) + structural validity (15%) + efficiency (30%)
- Train with GRPO to improve the agent's coding ability over time
The REPL Tools
The agent has access to these functions in the sandbox:
| Function | Description |
|---|---|
read_file(path) |
Read a file from the repo |
list_dir(path='.') |
List directory contents |
search(pattern, path='.') |
Grep for a pattern |
write_file(path, content) |
Write/create a file |
run_tests(test_path=None) |
Run pytest |
spawn_agent(scope, mission) |
Explore a directory scope |
FINAL() |
Signal implementation is complete |
Project Structure
rlm_forge/
βββ __init__.py # Package exports
βββ models.py # Pydantic models (Action, Observation, State)
βββ client.py # EnvClient for remote connections
βββ server/
βββ app.py # FastAPI server (create_app)
βββ environment.py # Core Environment (reset/step)
βββ sandbox.py # Sandboxed Python REPL
βββ repo_manager.py # Repo cloning & dependency management
βββ feature_extractor.py # Source file extraction & stub generation
βββ reward.py # Composite reward computation
Quick Start
Install
uv sync
Run the Server
uv run uvicorn rlm_forge.server.app:app --host 0.0.0.0 --port 8000
Use the Environment Directly
from rlm_forge.server.environment import RLMForgeEnvironment
from rlm_forge.models import RLMForgeAction
env = RLMForgeEnvironment()
obs = env.reset(seed=1)
print(obs.task_description)
# Agent takes actions
obs = env.step(RLMForgeAction(code="print(read_file('test.py'))"))
obs = env.step(RLMForgeAction(code="write_file('slugify/slugify.py', '...')"))
obs = env.step(RLMForgeAction(code="FINAL()"))
print(f"Reward: {obs.reward}")
Connect via Client
from rlm_forge.client import RLMForgeClient
from rlm_forge.models import RLMForgeAction
client = RLMForgeClient(base_url="http://localhost:8000")
client.connect()
result = client.reset(seed=1)
result = client.step(RLMForgeAction(code="print(list_dir())"))
result = client.step(RLMForgeAction(code="FINAL()"))
print(f"Reward: {result.reward}")
Training
See rlm_forge_training.ipynb for the full GRPO training notebook. Designed for Google Colab with an H100 GPU.
Key training approach:
- Multi-step trajectory concatenation: Full episode (all code actions) treated as one GRPO "completion"
- Group Relative Policy Optimization: Multiple completions per task, advantages computed relative to group mean
- LoRA fine-tuning: 4-bit quantized Qwen2.5-Coder-32B with LoRA adapter
Reward Breakdown
| Component | Weight | Description |
|---|---|---|
| Test Pass Rate | 55% | Fraction of tests passing |
| Structural Validity | 15% | AST parse check + import check |
| Efficiency | 30% | Tiered by iteration budget used |
Curated Repos
| Repo | Source File | Tests | Difficulty |
|---|---|---|---|
| python-slugify | slugify/slugify.py |
82 | Easy |
| humanize (number) | src/humanize/number.py |
219 | Medium |
| humanize (time) | src/humanize/time.py |
varies | Medium |
Docker
docker build -t rlm-forge .
docker run -p 8000:8000 rlm-forge
The Dockerfile pre-clones curated repos to avoid network I/O on each reset().
Deploy to HF Spaces
openenv push -r your-username/rlm-forge