TinyForge-Zero LoRA — Qwen2.5-14B

LoRA adapter for Qwen/Qwen2.5-14B trained via the TinyForge-Zero self-bootstrap recipe. No human-written training data; only (broken, fixed) repair pairs that the base model mined from its own divergent solutions.

Headline results

Benchmark	Base	This adapter	Δ
HumanEval (chat-template)	26.8% (44/164)	79.9% (131/164)	+53.0pp
HumanEval+	—	74.4% (122/164)	—
HumanEval (multi-pair eval format)	40.9% (67/164)	80.5% (132/164)	+39.6pp

The 6.1pp HumanEval → HumanEval+ drop is in the range of strong instruct models (5–8pp typical), not the 15–25pp drop seen for memorization.

Training

Method: LoRA (rank 32, q/k/v/o projections), 2 epochs, lr=1e-4, bf16
Data: 100 self-mined (broken, fixed) pairs (40 warmup + 60 aggressive-mined), no human data
Compute: single H100 80GB, ~95 minutes total, under $4 of RunPod credit

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

base = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-14B", torch_dtype=torch.bfloat16, device_map="auto"
)
model = PeftModel.from_pretrained(base, "ranausmans/tinyforge-zero-qwen25-14b-lora")
tok = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-14B")

Citation

@misc{usman2026tinyforgezero,
  title  = {How Far Can an Open Base Model Self-Improve?
            Recipes, Limits, and Test-Time Synergy},
  author = {Rana Usman},
  year   = {2026},
  archivePrefix = {arXiv},
  primaryClass = {cs.AI}
}

Model tree for ranausmans/tinyforge-zero-qwen25-14b-lora

Base model

Qwen/Qwen2.5-14B

Adapter

(48)

this model