--- license: apache-2.0 base_model: HuggingFaceTB/SmolLM-135M datasets: - openai/gsm8k - meta-math/MetaMathQA - AI-MO/NuminaMath-1.5 tags: - math - reasoning - efficient-training - cggr - sparse-gradients model_name: SmolLM-135M-CGGR-Math --- # SmolLM-135M-CGGR-Math This model is a specialized version of **HuggingFaceTB/SmolLM-135M**, fine-tuned for mathematical reasoning using **Confidence-Gated Gradient Routing (CGGR)**. ## 🚀 The CGGR Breakthrough This model was trained using a novel training strategy that selects only the "hardest" tokens for gradient updates, allowing for: - **4.08x Higher Throughput:** Processing 4x more data in the same wall-clock time compared to standard training. - **66% VRAM Savings:** Fitting large-batch training on consumer hardware (RTX 3060). - **Superior Convergence:** Achieving a **+19% relative accuracy improvement** on math reasoning tasks (AIME 2024) compared to standard fine-tuning. ### Benchmark Results (6-Hour Training Race) | Metric | Standard (Baseline) | CGGR (Ours) | Relative Gain | | :-------------------------- | :------------------ | :----------------- | :---------------- | | **Solving Accuracy (AIME)** | 8.00% | **9.50%** | **+18.75%** | | **Training Throughput** | 14,368 samples | **58,716 samples** | **+308%** | | **Final Loss** | 0.3610 | **0.0980** | **-73% Error** | | **Max Batch Size (12GB)** | 18 | **69** | **3.8x Capacity** | ## 📈 Performance Visuals ![Benchmark Dashboard](https://huggingface.co/MinimaML/SmolLM-135M-CGGR-Math/resolve/main/benchmark_dashboard.png) ## Model Details - **Architecture:** Transformer Decoder (SmolLM-135M) - **Training Method:** CGGR (Confidence-Gated Gradient Routing) - **Selection Strategy:** Fixed Quota (Top 25% hardest tokens) - **Compute:** Trained on a single NVIDIA RTX 3060 (12GB) - **Duration:** 6 Total Hours ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "MinimaML/SmolLM-135M-CGGR-Math" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name) prompt = "Question: If x + y = 10 and x - y = 2, what is the value of x?\n\nAnswer:" inputs = tokenizer(prompt, return_tensors="pt") outputs = model.generate(**inputs, max_new_tokens=50) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ## Citation If you use this model or the CGGR technique in your research, please cite: ```bibtex @software{cggr2026, title={CGGR: Confidence-Gated Gradient Routing}, author={MinimaML}, year={2026}, url={https://github.com/MinimaML/CGGR} } ```