rlm_forge / README.md
NeoCodes-dev's picture
Upload folder using huggingface_hub
b936b6f verified
metadata
title: RLM-Forge
emoji: πŸš€
colorFrom: blue
colorTo: indigo
sdk: gradio
python_version: 3.11
sdk_version: 6.9.0
app_file: server/app.py
base_path: /rlm_forge

RLM-Forge

Recursive Language Model training environment for AI coding agents.

RLM-Forge is an OpenEnv environment that trains language models to solve coding tasks on real Python repositories using Recursive Language Model (RLM) patterns.

How It Works

  1. Clone a real Python repo (e.g., python-slugify, humanize)
  2. Extract a source file and replace it with a broken stub (correct signatures, wrong implementations)
  3. Agent explores the repo via a sandboxed multi-step REPL with built-in tools
  4. Reward = test pass rate (55%) + structural validity (15%) + efficiency (30%)
  5. Train with GRPO to improve the agent's coding ability over time

The REPL Tools

The agent has access to these functions in the sandbox:

Function Description
read_file(path) Read a file from the repo
list_dir(path='.') List directory contents
search(pattern, path='.') Grep for a pattern
write_file(path, content) Write/create a file
run_tests(test_path=None) Run pytest
spawn_agent(scope, mission) Explore a directory scope
FINAL() Signal implementation is complete

Project Structure

rlm_forge/
β”œβ”€β”€ __init__.py              # Package exports
β”œβ”€β”€ models.py                # Pydantic models (Action, Observation, State)
β”œβ”€β”€ client.py                # EnvClient for remote connections
└── server/
    β”œβ”€β”€ app.py               # FastAPI server (create_app)
    β”œβ”€β”€ environment.py       # Core Environment (reset/step)
    β”œβ”€β”€ sandbox.py           # Sandboxed Python REPL
    β”œβ”€β”€ repo_manager.py      # Repo cloning & dependency management
    β”œβ”€β”€ feature_extractor.py # Source file extraction & stub generation
    └── reward.py            # Composite reward computation

Quick Start

Install

uv sync

Run the Server

uv run uvicorn rlm_forge.server.app:app --host 0.0.0.0 --port 8000

Use the Environment Directly

from rlm_forge.server.environment import RLMForgeEnvironment
from rlm_forge.models import RLMForgeAction

env = RLMForgeEnvironment()
obs = env.reset(seed=1)
print(obs.task_description)

# Agent takes actions
obs = env.step(RLMForgeAction(code="print(read_file('test.py'))"))
obs = env.step(RLMForgeAction(code="write_file('slugify/slugify.py', '...')"))
obs = env.step(RLMForgeAction(code="FINAL()"))
print(f"Reward: {obs.reward}")

Connect via Client

from rlm_forge.client import RLMForgeClient
from rlm_forge.models import RLMForgeAction

client = RLMForgeClient(base_url="http://localhost:8000")
client.connect()

result = client.reset(seed=1)
result = client.step(RLMForgeAction(code="print(list_dir())"))
result = client.step(RLMForgeAction(code="FINAL()"))
print(f"Reward: {result.reward}")

Training

See rlm_forge_training.ipynb for the full GRPO training notebook. Designed for Google Colab with an H100 GPU.

Key training approach:

  • Multi-step trajectory concatenation: Full episode (all code actions) treated as one GRPO "completion"
  • Group Relative Policy Optimization: Multiple completions per task, advantages computed relative to group mean
  • LoRA fine-tuning: 4-bit quantized Qwen2.5-Coder-32B with LoRA adapter

Reward Breakdown

Component Weight Description
Test Pass Rate 55% Fraction of tests passing
Structural Validity 15% AST parse check + import check
Efficiency 30% Tiered by iteration budget used

Curated Repos

Repo Source File Tests Difficulty
python-slugify slugify/slugify.py 82 Easy
humanize (number) src/humanize/number.py 219 Medium
humanize (time) src/humanize/time.py varies Medium

Docker

docker build -t rlm-forge .
docker run -p 8000:8000 rlm-forge

The Dockerfile pre-clones curated repos to avoid network I/O on each reset().

Deploy to HF Spaces

openenv push -r your-username/rlm-forge