LFM2.5-1.2B-Meditation
A fine-tuned version of LiquidAI/LFM2.5-1.2B-Thinking trained with a novel meditation phase: self-supervised mathematical introspection inserted between SFT and task RL.
Training Pipeline
What is Meditation?
Given a mathematical concept, the model produces a free-form exploration: restating concepts, probing edge cases, constructing examples/counterexamples, posing and solving novel problems, and synthesizing observations. This is scored by a composite reward:
Checkpoints
| File | Description |
|---|---|
| sft/sft.zip | SFT checkpoint (LoRA adapter, 42.4MB) |
| meditation_rl/checkpoint-60.tar.gz | Meditation RL step 60 (training in progress) |
Training Details
- Base model: LFM2.5-1.2B-Thinking (1.17B dense, hybrid conv+attention)
- Method: QLoRA (4-bit NF4, r=32, alpha=64, 22.2M trainable params)
- LoRA targets: q_proj, k_proj, v_proj, out_proj, in_proj, w1, w2, w3
- GRPO: K=8 generations, LR 5e-7, KL beta 0.04
- Judge: Gemini 3.1 Pro Preview (paid API)
- Hardware: Google Colab L4 (24GB VRAM, Ada Lovelace)
- Attention: PyTorch SDPA (built-in), bf16 compute
Current Metrics (Step 60)
| Metric | Value |
|---|---|
| Reward mean | 0.43 |
| KL divergence | 0.0009 |
| Step time | 77s |
| GPU memory | 3.7 / 22.5 GB |
Dataset
Training data: Nirav-Madhani/meditation-math-seeds
Paper
Working paper available in the repository.
Model tree for Nirav-Madhani/LFM2.5-1.2B-Meditation
Base model
LiquidAI/LFM2.5-1.2B-Base Finetuned
LiquidAI/LFM2.5-1.2B-Thinking