real-jiakai
/

SmolLM3-3B-MathReason

Text Generation

Model card Files Files and versions

SmolLM3-3B-MathReason / README.md

real-jiakai's picture

Update README.md

06d6f57 verified 27 days ago

|

history blame contribute delete

2.12 kB


	---
	language:
	- en
	license: apache-2.0
	base_model: HuggingFaceTB/SmolLM3-3B
	tags:
	- smollm3
	- lora
	- sft
	- math-reasoning
	- gsm8k
	datasets:
	- HuggingFaceTB/smoltalk2
	pipeline_tag: text-generation
	---

	# SmolLM3-3B-MathReason

	A math-focused fine-tuned version of SmolLM3-3B, optimized for step-by-step mathematical reasoning and problem solving.

	## Highlights

	📚 Math-First: Trained on ~7K high-quality math and reasoning samples

	🧠 Chain-of-Thought: Supports `/think` mode for detailed reasoning

	⚡ Lightweight: 3B parameters, runs on consumer GPUs

	## Training Details

	\| Parameter \| Value \|
	\|-----------\|-------\|
	\| Base Model \| HuggingFaceTB/SmolLM3-3B \|
	\| Method \| LoRA (r=16, alpha=32) \|
	\| Training Data \| ~7K samples \|
	\| - OpenThoughts3_1.2M_think \| 5,000 (math reasoning) \|
	\| - s1k_1.1_think \| ~1,000 (high-quality math) \|
	\| - smoltalk_everyday_convs \| 1,000 (everyday reasoning) \|
	\| Epochs \| 2 \|
	\| Learning Rate \| 2e-4 (cosine) \|
	\| Effective Batch Size \| 16 \|

	## Usage
	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model = AutoModelForCausalLM.from_pretrained("real-jiakai/SmolLM3-3B-MathReason")
	tokenizer = AutoTokenizer.from_pretrained("real-jiakai/SmolLM3-3B-MathReason")

	messages = [
	{"role": "system", "content": "/think"}, # Enable reasoning mode
	{"role": "user", "content": "A store sells apples for $2 each. If John buys 5 apples and pays with a $20 bill, how much change does he get?"}
	]

	formatted = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
	inputs = tokenizer(formatted, return_tensors="pt")
	outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	## Intended Use

	- GSM8K style math problems
	- Step-by-step problem solving
	- Educational math tutoring
	- Arithmetic and algebra reasoning

	## Limitations

	- English only
	- May struggle with very complex multi-step problems
	- Not designed for factual knowledge retrieval

	## Training Infrastructure

	- GPU: NVIDIA A100
	- Training Time: ~2 hours
	- Framework: TRL + PEFT