rlm_forge / README.md
NeoCodes-dev's picture
Upload folder using huggingface_hub
b936b6f verified
---
title: RLM-Forge
emoji: πŸš€
colorFrom: blue
colorTo: indigo
sdk: gradio
python_version: 3.11
sdk_version: 6.9.0
app_file: server/app.py
base_path: /rlm_forge
---
# RLM-Forge
**Recursive Language Model training environment for AI coding agents.**
RLM-Forge is an [OpenEnv](https://github.com/meta-pytorch/OpenEnv) environment that trains language models to solve coding tasks on real Python repositories using Recursive Language Model (RLM) patterns.
## How It Works
1. **Clone** a real Python repo (e.g., python-slugify, humanize)
2. **Extract** a source file and replace it with a broken stub (correct signatures, wrong implementations)
3. **Agent** explores the repo via a sandboxed multi-step REPL with built-in tools
4. **Reward** = test pass rate (55%) + structural validity (15%) + efficiency (30%)
5. **Train** with GRPO to improve the agent's coding ability over time
### The REPL Tools
The agent has access to these functions in the sandbox:
| Function | Description |
|----------|-------------|
| `read_file(path)` | Read a file from the repo |
| `list_dir(path='.')` | List directory contents |
| `search(pattern, path='.')` | Grep for a pattern |
| `write_file(path, content)` | Write/create a file |
| `run_tests(test_path=None)` | Run pytest |
| `spawn_agent(scope, mission)` | Explore a directory scope |
| `FINAL()` | Signal implementation is complete |
## Project Structure
```
rlm_forge/
β”œβ”€β”€ __init__.py # Package exports
β”œβ”€β”€ models.py # Pydantic models (Action, Observation, State)
β”œβ”€β”€ client.py # EnvClient for remote connections
└── server/
β”œβ”€β”€ app.py # FastAPI server (create_app)
β”œβ”€β”€ environment.py # Core Environment (reset/step)
β”œβ”€β”€ sandbox.py # Sandboxed Python REPL
β”œβ”€β”€ repo_manager.py # Repo cloning & dependency management
β”œβ”€β”€ feature_extractor.py # Source file extraction & stub generation
└── reward.py # Composite reward computation
```
## Quick Start
### Install
```bash
uv sync
```
### Run the Server
```bash
uv run uvicorn rlm_forge.server.app:app --host 0.0.0.0 --port 8000
```
### Use the Environment Directly
```python
from rlm_forge.server.environment import RLMForgeEnvironment
from rlm_forge.models import RLMForgeAction
env = RLMForgeEnvironment()
obs = env.reset(seed=1)
print(obs.task_description)
# Agent takes actions
obs = env.step(RLMForgeAction(code="print(read_file('test.py'))"))
obs = env.step(RLMForgeAction(code="write_file('slugify/slugify.py', '...')"))
obs = env.step(RLMForgeAction(code="FINAL()"))
print(f"Reward: {obs.reward}")
```
### Connect via Client
```python
from rlm_forge.client import RLMForgeClient
from rlm_forge.models import RLMForgeAction
client = RLMForgeClient(base_url="http://localhost:8000")
client.connect()
result = client.reset(seed=1)
result = client.step(RLMForgeAction(code="print(list_dir())"))
result = client.step(RLMForgeAction(code="FINAL()"))
print(f"Reward: {result.reward}")
```
## Training
See `rlm_forge_training.ipynb` for the full GRPO training notebook. Designed for Google Colab with an H100 GPU.
Key training approach:
- **Multi-step trajectory concatenation**: Full episode (all code actions) treated as one GRPO "completion"
- **Group Relative Policy Optimization**: Multiple completions per task, advantages computed relative to group mean
- **LoRA fine-tuning**: 4-bit quantized Qwen2.5-Coder-32B with LoRA adapter
## Reward Breakdown
| Component | Weight | Description |
|-----------|--------|-------------|
| Test Pass Rate | 55% | Fraction of tests passing |
| Structural Validity | 15% | AST parse check + import check |
| Efficiency | 30% | Tiered by iteration budget used |
## Curated Repos
| Repo | Source File | Tests | Difficulty |
|------|-----------|-------|------------|
| python-slugify | `slugify/slugify.py` | 82 | Easy |
| humanize (number) | `src/humanize/number.py` | 219 | Medium |
| humanize (time) | `src/humanize/time.py` | varies | Medium |
## Docker
```bash
docker build -t rlm-forge .
docker run -p 8000:8000 rlm-forge
```
The Dockerfile pre-clones curated repos to avoid network I/O on each `reset()`.
## Deploy to HF Spaces
```bash
openenv push -r your-username/rlm-forge
```