LFM2.5-1.2B-Meditation

A fine-tuned version of LiquidAI/LFM2.5-1.2B-Thinking trained with a novel meditation phase: self-supervised mathematical introspection inserted between SFT and task RL.

Training Pipeline

What is Meditation?

Given a mathematical concept, the model produces a free-form exploration: restating concepts, probing edge cases, constructing examples/counterexamples, posing and solving novel problems, and synthesizing observations. This is scored by a composite reward:

Checkpoints

File	Description
sft/sft.zip	SFT checkpoint (LoRA adapter, 42.4MB)
meditation_rl/checkpoint-60.tar.gz	Meditation RL step 60 (training in progress)

Training Details

Base model: LFM2.5-1.2B-Thinking (1.17B dense, hybrid conv+attention)
Method: QLoRA (4-bit NF4, r=32, alpha=64, 22.2M trainable params)
LoRA targets: q_proj, k_proj, v_proj, out_proj, in_proj, w1, w2, w3
GRPO: K=8 generations, LR 5e-7, KL beta 0.04
Judge: Gemini 3.1 Pro Preview (paid API)
Hardware: Google Colab L4 (24GB VRAM, Ada Lovelace)
Attention: PyTorch SDPA (built-in), bf16 compute

Current Metrics (Step 60)

Metric	Value
Reward mean	0.43
KL divergence	0.0009
Step time	77s
GPU memory	3.7 / 22.5 GB

Dataset

Training data: Nirav-Madhani/meditation-math-seeds

Paper

Working paper available in the repository.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for Nirav-Madhani/LFM2.5-1.2B-Meditation

Base model

LiquidAI/LFM2.5-1.2B-Base

Finetuned

LiquidAI/LFM2.5-1.2B-Thinking