Buleyean Qwen2.5-32B

Trained from rejection alone. No reward model. No chosen examples. The Buleyean complement distribution derived from rejection counts IS the training target.

What is Buleyean RL?

Standard RLHF/DPO learns what to say by imitating chosen completions. Buleyean RL learns what not to say by studying rejections. The complement distribution preserves the (K-1) rejected perspectives, producing outputs that reflect the full rejection boundary rather than a single selected mode.

The theoretical foundation is mechanized in 500+ Lean 4 theorems (zero sorry):

Positivity: Every option retains strictly positive weight (the +1 sliver)
Concentration: Less-rejected options receive higher weight
Dominance: The failure set carries (N-1) bits vs 1 bit for selection
Convergence: Same rejection history produces same distribution

Training Details

Parameter	Value
Base model	Qwen/Qwen2.5-32B-Instruct
Method	QLoRA (4-bit NF4, double quantization)
LoRA rank	16 (alpha 32)
LoRA targets	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Loss	Buleyean complement KL divergence (sparse)
Alpha	0.7
Training data	Rejection-only (converted from UltraFeedback, chosen discarded)
Curriculum	Void curriculum (rejection_density weighting)
Steps	563
Training time	62 minutes (A100 80GB)
Final loss	0.852
Optimality gap	1.9%

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-32B-Instruct", device_map="auto")
model = PeftModel.from_pretrained(base, "forkjoin-ai/buleyean-qwen2.5-32b")
model = model.merge_and_unload()

tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-32B-Instruct")

Citation

@misc{buley2026buleyean,
  title={Buleyean Reinforcement Learning: Training from Rejection Alone},
  author={Taylor Buley},
  year={2026},
  url={https://github.com/forkjoin-ai/buleyean-rl}
}

Downloads last month: 25

Model tree for forkjoin-ai/buleyean-qwen2.5-32b

Base model

Qwen/Qwen2.5-32B

Finetuned

Qwen/Qwen2.5-32B-Instruct

Adapter

(89)

this model

forkjoin-ai
/

buleyean-qwen2.5-32b