Spaces:
Running
CRMA Fine-Tuner: QLoRA + stability adapter (CRMA + ZClip) for TinyLlama, Gemma, Mistral โ ablation results inside
Hi PEFT community,
I built a fine-tuning SaaS called CRMA Fine-Tuner that adds a lightweight stability adapter (CRMA) on top of QLoRA. Sharing the ablation results here in case they're useful to anyone working on training stability.
What is CRMA?
CRMA (Constrained Residual Mixing Adapter) is a low-rank adapter that runs alongside LoRA. It uses a Sinkhorn-constrained doubly stochastic mixing matrix to keep gradient dynamics stable during fine-tuning, especially on larger models.
Key design choices:
- Low-rank stream projections (rank=4) to keep parameter count competitive with LoRA
- PiSSA initialization for better early learning signal
- ZClip adaptive gradient clipping (arXiv:2504.02507)
- Near-identity initialization so CRMA doesn't disturb pretrained weights at step 0
Ablation Results (TinyLlama 1.1B, 200-row Alpaca, seed=42)
Same model, same data, same seed, same hyperparams. Only CRMA on vs off.
| Metric | LoRA only | LoRA + CRMA | Delta |
|---|---|---|---|
| Final loss | 0.1658 | 0.1651 | -0.4% (noise floor) |
| Peak grad norm | 12.15 | 5.75 | -52.7% |
| Mean grad norm | 2.34 | 2.07 | -11.5% |
| Spectral norm | - | 1.000000 | guaranteed <= 1 |
Mistral-7B Stability
On Mistral-7B, plain LoRA hit a catastrophic gradient spike at step 43 (grad norm ~263). CRMA held the same step at ~3.0 โ a 98.9% reduction. The run completed cleanly.
Space
https://huggingface.co/spaces/Fourwheels2512/crma-fine-tuner
Happy to answer questions or discuss the approach. Would love feedback from anyone who has worked on training stability or PEFT methods.