rl4rlm-star / README.md

omar81939

Upload README.md with huggingface_hub

4b637e2 verified about 2 months ago

preview code

raw

history blame contribute delete

1.74 kB

metadata

license: apache-2.0
base_model: Qwen/Qwen3-1.7B
tags:
  - rlm
  - recursive-language-model
  - lora
  - qwen3
datasets:
  - custom
language:
  - en
pipeline_tag: text-generation

RL4RLM-STaR: Iterative Self-Improvement RLM

LoRA adapter for Qwen3-1.7B trained as a Recursive Language Model (RLM) — a model that writes Python code to decompose and solve long-context tasks via a persistent REPL environment.

Paper

Training Native Recursive Language Models — CS234 Final Project, Stanford University (Winter 2026)

GitHub: pythonomar22/rl4rlm

Training Details

Method: Second round of SFT (STaR-style) on combined 132 trajectories
Data: 87 original NIAH + 45 self-generated on harder tasks (multi-needle, doc classification)
Training: 5 epochs from base model on combined set
Key result: Most balanced model — 58.4% Multi-NIAH, 83.4% DocClassify

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-1.7B")
model = PeftModel.from_pretrained(base, "omar81939/rl4rlm-star")
tokenizer = AutoTokenizer.from_pretrained("omar81939/rl4rlm-star")

Results

Model	NIAH (100)	Multi-NIAH (24)	DocClassify (20)	Avg
Base	72.0	38.3	80.3	63.5
SFT	90.0	57.9	82.4	76.8
STaR	87.0	58.4	83.4	76.3
DPO	83.0	87.9	82.6	84.5
GRPO-v4	82.0	85.1	83.2	83.4

LoRA Config

Rank: 16, Alpha: 32, Dropout: 0.05
Target modules: all attention and MLP projections