File size: 4,420 Bytes
64b157d 510aa98 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 |
---
datasets:
- ddrg/math_formulas
language:
- en
base_model:
- HuggingFaceTB/SmolLM3-3B
tags:
- maths
- lora
- peft
- bitsandbytes
- small_model
- 4_bit
---
# SmolLM3-3B-Math-Formulas-4bit
## Model Description
**SmolLM3-3B-Math-Formulas-4bit** is a fine-tuned version of [HuggingFaceTB/SmolLM3-3B](https://huggingface.co/HuggingFaceTB/SmolLM3-3B) specialized for mathematical formula understanding and generation. The model has been optimized using 4-bit quantization (NF4) with LoRA adapters for efficient training and inference.
- **Base Model**: HuggingFaceTB/SmolLM3-3B
- **Model Type**: Causal Language Model
- **Quantization**: 4-bit NF4 with double quantization
- **Fine-tuning Method**: QLoRA (Quantized Low-Rank Adaptation)
- **Specialization**: Mathematical formulas and expressions
## Training Details
### Dataset
- **Source**: [ddrg/math_formulas](https://huggingface.co/datasets/ddrg/math_formulas)
- **Size**: 1,000 samples (randomly selected from 2.89M total)
- **Content**: Mathematical formulas, equations, and expressions in LaTeX format
### Training Configuration
- **Training Loss**: 0.589 (final)
- **Epochs**: 6
- **Batch Size**: 8 (per device)
- **Learning Rate**: 2.5e-4 with cosine scheduler
- **Max Sequence Length**: 128 tokens
- **Gradient Accumulation**: 2 steps
- **Optimizer**: AdamW with 0.01 weight decay
- **Precision**: FP16
- **LoRA Configuration**:
- r=4, alpha=8
- Dropout: 0.1
- Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
### Hardware & Performance
- **Training Time**: 265 seconds (4.4 minutes)
- **Training Speed**: 5.68 samples/second
- **Total Steps**: 96
- **Memory Efficiency**: 4-bit quantization for reduced VRAM usage
## Usage
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
# Load the model and tokenizer
model_name = "sweatSmile/HF-SmolLM3-3B-Math-Formulas-4bit"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.float16,
device_map="auto"
)
# Generate mathematical content
prompt = "Explain this mathematical formula:"
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=150,
temperature=0.7,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```
## Intended Use Cases
- **Mathematical Education**: Explaining mathematical formulas and concepts
- **LaTeX Generation**: Creating properly formatted mathematical expressions
- **Formula Analysis**: Understanding and breaking down complex mathematical equations
- **Mathematical Problem Solving**: Assisting with mathematical computations and derivations
## Limitations
- **Domain Specific**: Optimized primarily for mathematical content
- **Training Data Size**: Fine-tuned on only 1,000 samples
- **Quantization Effects**: 4-bit quantization may introduce minor precision loss
- **Context Length**: Limited to 128 tokens for mathematical expressions
- **Language**: Primarily trained on English mathematical notation
## Performance Metrics
- **Final Training Loss**: 0.589
- **Convergence**: Achieved in 6 epochs (efficient training)
- **Improvement**: 52% loss reduction compared to baseline configuration
- **Efficiency**: 51% faster training compared to initial setup
## Model Architecture
Based on SmolLM3-3B with the following modifications:
- 4-bit NF4 quantization for memory efficiency
- LoRA adapters for parameter-efficient fine-tuning
- Specialized for mathematical formula understanding
## Citation
If you use this model, please cite:
```bibtex
@model{smollm3-math-formulas-4bit,
title={SmolLM3-3B-Math-Formulas-4bit},
author={sweatSmile},
year={2025},
base_model={HuggingFaceTB/SmolLM3-3B},
dataset={ddrg/math_formulas},
method={QLoRA fine-tuning with 4-bit quantization}
}
```
## License
This model inherits the license from the base SmolLM3-3B model. Please refer to the original model's license for usage terms.
## Acknowledgments
- **Base Model**: HuggingFace Team for SmolLM3-3B
- **Dataset**: Dresden Database Research Group for the math_formulas dataset
- **Training Framework**: Hugging Face Transformers and TRL libraries
- **Quantization**: bitsandbytes library for 4-bit optimization |