--- language: - en license: apache-2.0 base_model: HuggingFaceTB/SmolLM3-3B tags: - smollm3 - lora - sft - math-reasoning - gsm8k datasets: - HuggingFaceTB/smoltalk2 pipeline_tag: text-generation --- # SmolLM3-3B-MathReason A math-focused fine-tuned version of SmolLM3-3B, optimized for step-by-step mathematical reasoning and problem solving. ## Highlights 📚 **Math-First**: Trained on ~7K high-quality math and reasoning samples 🧠 **Chain-of-Thought**: Supports `/think` mode for detailed reasoning ⚡ **Lightweight**: 3B parameters, runs on consumer GPUs ## Training Details | Parameter | Value | |-----------|-------| | Base Model | HuggingFaceTB/SmolLM3-3B | | Method | LoRA (r=16, alpha=32) | | Training Data | ~7K samples | | - OpenThoughts3_1.2M_think | 5,000 (math reasoning) | | - s1k_1.1_think | ~1,000 (high-quality math) | | - smoltalk_everyday_convs | 1,000 (everyday reasoning) | | Epochs | 2 | | Learning Rate | 2e-4 (cosine) | | Effective Batch Size | 16 | ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained("real-jiakai/SmolLM3-3B-MathReason") tokenizer = AutoTokenizer.from_pretrained("real-jiakai/SmolLM3-3B-MathReason") messages = [ {"role": "system", "content": "/think"}, # Enable reasoning mode {"role": "user", "content": "A store sells apples for $2 each. If John buys 5 apples and pays with a $20 bill, how much change does he get?"} ] formatted = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer(formatted, return_tensors="pt") outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ## Intended Use - GSM8K style math problems - Step-by-step problem solving - Educational math tutoring - Arithmetic and algebra reasoning ## Limitations - English only - May struggle with very complex multi-step problems - Not designed for factual knowledge retrieval ## Training Infrastructure - GPU: NVIDIA A100 - Training Time: ~2 hours - Framework: TRL + PEFT