Buleyean Qwen2.5-32B
Trained from rejection alone. No reward model. No chosen examples. The Buleyean complement distribution derived from rejection counts IS the training target.
What is Buleyean RL?
Standard RLHF/DPO learns what to say by imitating chosen completions. Buleyean RL learns what not to say by studying rejections. The complement distribution preserves the (K-1) rejected perspectives, producing outputs that reflect the full rejection boundary rather than a single selected mode.
The theoretical foundation is mechanized in 500+ Lean 4 theorems (zero sorry):
- Positivity: Every option retains strictly positive weight (the +1 sliver)
- Concentration: Less-rejected options receive higher weight
- Dominance: The failure set carries (N-1) bits vs 1 bit for selection
- Convergence: Same rejection history produces same distribution
Training Details
| Parameter | Value |
|---|---|
| Base model | Qwen/Qwen2.5-32B-Instruct |
| Method | QLoRA (4-bit NF4, double quantization) |
| LoRA rank | 16 (alpha 32) |
| LoRA targets | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Loss | Buleyean complement KL divergence (sparse) |
| Alpha | 0.7 |
| Training data | Rejection-only (converted from UltraFeedback, chosen discarded) |
| Curriculum | Void curriculum (rejection_density weighting) |
| Steps | 563 |
| Training time | 62 minutes (A100 80GB) |
| Final loss | 0.852 |
| Optimality gap | 1.9% |
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-32B-Instruct", device_map="auto")
model = PeftModel.from_pretrained(base, "forkjoin-ai/buleyean-qwen2.5-32b")
model = model.merge_and_unload()
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-32B-Instruct")
Links
- Training library
- Training data
- Paper: "Proof of Life" -- Being Irreversible
- Live demo: The Void
- Colab notebook
Citation
@misc{buley2026buleyean,
title={Buleyean Reinforcement Learning: Training from Rejection Alone},
author={Taylor Buley},
year={2026},
url={https://github.com/forkjoin-ai/buleyean-rl}
}
- Downloads last month
- 25