buleyean-smollm2-360m

Buleyean RL -- trained on what is NOT rather than positive reinforcement.

No reward model. No chosen examples. The complement distribution derived from rejection counts alone is the training target.

Model Details

Base Model HuggingFaceTB/SmolLM2-360M-Instruct
Parameters 360M
Fine-tuning Buleyean RL (LoRA rank 16, alpha 0.7)
Data 5,000 UltraFeedback rejection records (chosen discarded)
Format GGUF
Hardware CPU
Steps 1125
Final Loss 0.89
Optimality Gap 0.018

What is Buleyean RL?

P(i) = (T - v_i + 1) / sum_j(T - v_j + 1)

Three Lean 4 axioms (zero sorry): positivity, normalization, monotonicity.

Loss: L = 0.7 * KL(P_bule || P_model) + 0.3 * ContrastLoss

Key Result

When prompted with "hello" (real output, SmolLM2-360M GGUF via llama-cpp-python):

  • Base: hello
  • Buleyean: I'm here to help. What's on your mind?

Whitepaper

Proof of Life: Bottling Infinity in Distributed Systems -- φ² = φ + 1

500+ Lean 4 theorems. Zero sorry markers. Section 15.29 covers Buleyean RL. Chapter 29 is the full treatment.

Links

Downloads last month
114
GGUF
Model size
0.4B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

4-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for forkjoin-ai/buleyean-smollm2-360m

Quantized
(81)
this model

Spaces using forkjoin-ai/buleyean-smollm2-360m 3