LFM2.5-1.2B-Meditation

A fine-tuned version of LiquidAI/LFM2.5-1.2B-Thinking trained with a novel meditation phase: self-supervised mathematical introspection inserted between SFT and task RL.

Training Pipeline

What is Meditation?

Given a mathematical concept, the model produces a free-form exploration: restating concepts, probing edge cases, constructing examples/counterexamples, posing and solving novel problems, and synthesizing observations. This is scored by a composite reward:

Checkpoints

File Description
sft/sft.zip SFT checkpoint (LoRA adapter, 42.4MB)
meditation_rl/checkpoint-60.tar.gz Meditation RL step 60 (training in progress)

Training Details

  • Base model: LFM2.5-1.2B-Thinking (1.17B dense, hybrid conv+attention)
  • Method: QLoRA (4-bit NF4, r=32, alpha=64, 22.2M trainable params)
  • LoRA targets: q_proj, k_proj, v_proj, out_proj, in_proj, w1, w2, w3
  • GRPO: K=8 generations, LR 5e-7, KL beta 0.04
  • Judge: Gemini 3.1 Pro Preview (paid API)
  • Hardware: Google Colab L4 (24GB VRAM, Ada Lovelace)
  • Attention: PyTorch SDPA (built-in), bf16 compute

Current Metrics (Step 60)

Metric Value
Reward mean 0.43
KL divergence 0.0009
Step time 77s
GPU memory 3.7 / 22.5 GB

Dataset

Training data: Nirav-Madhani/meditation-math-seeds

Paper

Working paper available in the repository.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Nirav-Madhani/LFM2.5-1.2B-Meditation

Finetuned
(25)
this model

Space using Nirav-Madhani/LFM2.5-1.2B-Meditation 1