TinyForge-Zero LoRA β€” Qwen2.5-14B

LoRA adapter for Qwen/Qwen2.5-14B trained via the TinyForge-Zero self-bootstrap recipe. No human-written training data; only (broken, fixed) repair pairs that the base model mined from its own divergent solutions.

Headline results

Benchmark Base This adapter Ξ”
HumanEval (chat-template) 26.8% (44/164) 79.9% (131/164) +53.0pp
HumanEval+ β€” 74.4% (122/164) β€”
HumanEval (multi-pair eval format) 40.9% (67/164) 80.5% (132/164) +39.6pp

The 6.1pp HumanEval β†’ HumanEval+ drop is in the range of strong instruct models (5–8pp typical), not the 15–25pp drop seen for memorization.

Training

  • Method: LoRA (rank 32, q/k/v/o projections), 2 epochs, lr=1e-4, bf16
  • Data: 100 self-mined (broken, fixed) pairs (40 warmup + 60 aggressive-mined), no human data
  • Compute: single H100 80GB, ~95 minutes total, under $4 of RunPod credit

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

base = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-14B", torch_dtype=torch.bfloat16, device_map="auto"
)
model = PeftModel.from_pretrained(base, "ranausmans/tinyforge-zero-qwen25-14b-lora")
tok = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-14B")

Citation

@misc{usman2026tinyforgezero,
  title  = {How Far Can an Open Base Model Self-Improve?
            Recipes, Limits, and Test-Time Synergy},
  author = {Rana Usman},
  year   = {2026},
  archivePrefix = {arXiv},
  primaryClass = {cs.AI}
}

Links

Downloads last month
20
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for ranausmans/tinyforge-zero-qwen25-14b-lora

Base model

Qwen/Qwen2.5-14B
Adapter
(48)
this model