Spaces:

NeoCodes-dev
/

rlm_forge

Runtime error

App Files Files Community

rlm_forge / README.md

NeoCodes-dev

Upload folder using huggingface_hub

b936b6f verified 5 days ago

preview code

raw

history blame contribute delete

4.32 kB

	---
	title: RLM-Forge
	emoji: 🚀
	colorFrom: blue
	colorTo: indigo
	sdk: gradio
	python_version: 3.11
	sdk_version: 6.9.0
	app_file: server/app.py
	base_path: /rlm_forge
	---

	# RLM-Forge

	Recursive Language Model training environment for AI coding agents.

	RLM-Forge is an [OpenEnv](https://github.com/meta-pytorch/OpenEnv) environment that trains language models to solve coding tasks on real Python repositories using Recursive Language Model (RLM) patterns.

	## How It Works

	1. Clone a real Python repo (e.g., python-slugify, humanize)
	2. Extract a source file and replace it with a broken stub (correct signatures, wrong implementations)
	3. Agent explores the repo via a sandboxed multi-step REPL with built-in tools
	4. Reward = test pass rate (55%) + structural validity (15%) + efficiency (30%)
	5. Train with GRPO to improve the agent's coding ability over time

	### The REPL Tools

	The agent has access to these functions in the sandbox:

	\| Function \| Description \|
	\|----------\|-------------\|
	\| `read_file(path)` \| Read a file from the repo \|
	\| `list_dir(path='.')` \| List directory contents \|
	\| `search(pattern, path='.')` \| Grep for a pattern \|
	\| `write_file(path, content)` \| Write/create a file \|
	\| `run_tests(test_path=None)` \| Run pytest \|
	\| `spawn_agent(scope, mission)` \| Explore a directory scope \|
	\| `FINAL()` \| Signal implementation is complete \|

	## Project Structure

	```
	rlm_forge/
	├── __init__.py # Package exports
	├── models.py # Pydantic models (Action, Observation, State)
	├── client.py # EnvClient for remote connections
	└── server/
	├── app.py # FastAPI server (create_app)
	├── environment.py # Core Environment (reset/step)
	├── sandbox.py # Sandboxed Python REPL
	├── repo_manager.py # Repo cloning & dependency management
	├── feature_extractor.py # Source file extraction & stub generation
	└── reward.py # Composite reward computation
	```

	## Quick Start

	### Install

	```bash
	uv sync
	```

	### Run the Server

	```bash
	uv run uvicorn rlm_forge.server.app:app --host 0.0.0.0 --port 8000
	```

	### Use the Environment Directly

	```python
	from rlm_forge.server.environment import RLMForgeEnvironment
	from rlm_forge.models import RLMForgeAction

	env = RLMForgeEnvironment()
	obs = env.reset(seed=1)
	print(obs.task_description)

	# Agent takes actions
	obs = env.step(RLMForgeAction(code="print(read_file('test.py'))"))
	obs = env.step(RLMForgeAction(code="write_file('slugify/slugify.py', '...')"))
	obs = env.step(RLMForgeAction(code="FINAL()"))
	print(f"Reward: {obs.reward}")
	```

	### Connect via Client

	```python
	from rlm_forge.client import RLMForgeClient
	from rlm_forge.models import RLMForgeAction

	client = RLMForgeClient(base_url="http://localhost:8000")
	client.connect()

	result = client.reset(seed=1)
	result = client.step(RLMForgeAction(code="print(list_dir())"))
	result = client.step(RLMForgeAction(code="FINAL()"))
	print(f"Reward: {result.reward}")
	```

	## Training

	See `rlm_forge_training.ipynb` for the full GRPO training notebook. Designed for Google Colab with an H100 GPU.

	Key training approach:
	- Multi-step trajectory concatenation: Full episode (all code actions) treated as one GRPO "completion"
	- Group Relative Policy Optimization: Multiple completions per task, advantages computed relative to group mean
	- LoRA fine-tuning: 4-bit quantized Qwen2.5-Coder-32B with LoRA adapter

	## Reward Breakdown

	\| Component \| Weight \| Description \|
	\|-----------\|--------\|-------------\|
	\| Test Pass Rate \| 55% \| Fraction of tests passing \|
	\| Structural Validity \| 15% \| AST parse check + import check \|
	\| Efficiency \| 30% \| Tiered by iteration budget used \|

	## Curated Repos

	\| Repo \| Source File \| Tests \| Difficulty \|
	\|------\|-----------\|-------\|------------\|
	\| python-slugify \| `slugify/slugify.py` \| 82 \| Easy \|
	\| humanize (number) \| `src/humanize/number.py` \| 219 \| Medium \|
	\| humanize (time) \| `src/humanize/time.py` \| varies \| Medium \|

	## Docker

	```bash
	docker build -t rlm-forge .
	docker run -p 8000:8000 rlm-forge
	```

	The Dockerfile pre-clones curated repos to avoid network I/O on each `reset()`.

	## Deploy to HF Spaces

	```bash
	openenv push -r your-username/rlm-forge
	```