Spaces:

NeoCodes-dev
/

rlm_forge

Runtime error

App Files Files Community

rlm_forge / README.md

NeoCodes-dev

Upload folder using huggingface_hub

b936b6f verified 5 days ago

preview code

raw

history blame contribute delete

4.32 kB

metadata

title: RLM-Forge
emoji: 🚀
colorFrom: blue
colorTo: indigo
sdk: gradio
python_version: 3.11
sdk_version: 6.9.0
app_file: server/app.py
base_path: /rlm_forge

RLM-Forge

Recursive Language Model training environment for AI coding agents.

RLM-Forge is an OpenEnv environment that trains language models to solve coding tasks on real Python repositories using Recursive Language Model (RLM) patterns.

How It Works

Clone a real Python repo (e.g., python-slugify, humanize)
Extract a source file and replace it with a broken stub (correct signatures, wrong implementations)
Agent explores the repo via a sandboxed multi-step REPL with built-in tools
Reward = test pass rate (55%) + structural validity (15%) + efficiency (30%)
Train with GRPO to improve the agent's coding ability over time

The REPL Tools

The agent has access to these functions in the sandbox:

Function	Description
`read_file(path)`	Read a file from the repo
`list_dir(path='.')`	List directory contents
`search(pattern, path='.')`	Grep for a pattern
`write_file(path, content)`	Write/create a file
`run_tests(test_path=None)`	Run pytest
`spawn_agent(scope, mission)`	Explore a directory scope
`FINAL()`	Signal implementation is complete

Project Structure

rlm_forge/
├── __init__.py              # Package exports
├── models.py                # Pydantic models (Action, Observation, State)
├── client.py                # EnvClient for remote connections
└── server/
    ├── app.py               # FastAPI server (create_app)
    ├── environment.py       # Core Environment (reset/step)
    ├── sandbox.py           # Sandboxed Python REPL
    ├── repo_manager.py      # Repo cloning & dependency management
    ├── feature_extractor.py # Source file extraction & stub generation
    └── reward.py            # Composite reward computation

Quick Start

Install

uv sync

Run the Server

uv run uvicorn rlm_forge.server.app:app --host 0.0.0.0 --port 8000

Use the Environment Directly

from rlm_forge.server.environment import RLMForgeEnvironment
from rlm_forge.models import RLMForgeAction

env = RLMForgeEnvironment()
obs = env.reset(seed=1)
print(obs.task_description)

# Agent takes actions
obs = env.step(RLMForgeAction(code="print(read_file('test.py'))"))
obs = env.step(RLMForgeAction(code="write_file('slugify/slugify.py', '...')"))
obs = env.step(RLMForgeAction(code="FINAL()"))
print(f"Reward: {obs.reward}")

Connect via Client

from rlm_forge.client import RLMForgeClient
from rlm_forge.models import RLMForgeAction

client = RLMForgeClient(base_url="http://localhost:8000")
client.connect()

result = client.reset(seed=1)
result = client.step(RLMForgeAction(code="print(list_dir())"))
result = client.step(RLMForgeAction(code="FINAL()"))
print(f"Reward: {result.reward}")

Training

See rlm_forge_training.ipynb for the full GRPO training notebook. Designed for Google Colab with an H100 GPU.

Key training approach:

Multi-step trajectory concatenation: Full episode (all code actions) treated as one GRPO "completion"
Group Relative Policy Optimization: Multiple completions per task, advantages computed relative to group mean
LoRA fine-tuning: 4-bit quantized Qwen2.5-Coder-32B with LoRA adapter

Reward Breakdown

Component	Weight	Description
Test Pass Rate	55%	Fraction of tests passing
Structural Validity	15%	AST parse check + import check
Efficiency	30%	Tiered by iteration budget used

Curated Repos

Repo	Source File	Tests	Difficulty
python-slugify	`slugify/slugify.py`	82	Easy
humanize (number)	`src/humanize/number.py`	219	Medium
humanize (time)	`src/humanize/time.py`	varies	Medium

Docker

docker build -t rlm-forge .
docker run -p 8000:8000 rlm-forge

The Dockerfile pre-clones curated repos to avoid network I/O on each reset().

Deploy to HF Spaces

openenv push -r your-username/rlm-forge