buleyean-qwen2.5-7b-gpu

Buleyean RL -- trained on what is NOT rather than positive reinforcement.

No reward model. No chosen examples. The complement distribution derived from rejection counts alone is the training target.

Model Details

Base Model Qwen/Qwen2.5-7B-Instruct
Parameters 7B
Fine-tuning Buleyean RL (LoRA rank 16, alpha 0.7)
Data 5,000 UltraFeedback rejection records (chosen discarded)
Format LoRA
Hardware T4 GPU
Steps 563
Final Loss 1.03
Optimality Gap 0.017

What is Buleyean RL?

P(i) = (T - v_i + 1) / sum_j(T - v_j + 1)

Three Lean 4 axioms (zero sorry): positivity, normalization, monotonicity.

Loss: L = 0.7 * KL(P_bule || P_model) + 0.3 * ContrastLoss

Key Result

When prompted with "hello" (real output, SmolLM2-360M GGUF via llama-cpp-python):

  • Base: hello
  • Buleyean: I'm here to help. What's on your mind?

Whitepaper

Proof of Life: Bottling Infinity in Distributed Systems -- φ² = φ + 1

500+ Lean 4 theorems. Zero sorry markers. Section 15.29 covers Buleyean RL. Chapter 29 is the full treatment.

Links

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for forkjoin-ai/buleyean-qwen2.5-7b-gpu

Base model

Qwen/Qwen2.5-7B
Finetuned
(3175)
this model