File size: 4,420 Bytes
64b157d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
510aa98
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
---
datasets:
- ddrg/math_formulas
language:
- en
base_model:
- HuggingFaceTB/SmolLM3-3B
tags:
- maths
- lora
- peft
- bitsandbytes
- small_model
- 4_bit
---
# SmolLM3-3B-Math-Formulas-4bit

## Model Description

**SmolLM3-3B-Math-Formulas-4bit** is a fine-tuned version of [HuggingFaceTB/SmolLM3-3B](https://huggingface.co/HuggingFaceTB/SmolLM3-3B) specialized for mathematical formula understanding and generation. The model has been optimized using 4-bit quantization (NF4) with LoRA adapters for efficient training and inference.

- **Base Model**: HuggingFaceTB/SmolLM3-3B
- **Model Type**: Causal Language Model
- **Quantization**: 4-bit NF4 with double quantization
- **Fine-tuning Method**: QLoRA (Quantized Low-Rank Adaptation)
- **Specialization**: Mathematical formulas and expressions

## Training Details

### Dataset
- **Source**: [ddrg/math_formulas](https://huggingface.co/datasets/ddrg/math_formulas)
- **Size**: 1,000 samples (randomly selected from 2.89M total)
- **Content**: Mathematical formulas, equations, and expressions in LaTeX format

### Training Configuration
- **Training Loss**: 0.589 (final)
- **Epochs**: 6
- **Batch Size**: 8 (per device)
- **Learning Rate**: 2.5e-4 with cosine scheduler
- **Max Sequence Length**: 128 tokens
- **Gradient Accumulation**: 2 steps
- **Optimizer**: AdamW with 0.01 weight decay
- **Precision**: FP16
- **LoRA Configuration**:
  - r=4, alpha=8
  - Dropout: 0.1
  - Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

### Hardware & Performance
- **Training Time**: 265 seconds (4.4 minutes)
- **Training Speed**: 5.68 samples/second
- **Total Steps**: 96
- **Memory Efficiency**: 4-bit quantization for reduced VRAM usage

## Usage

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load the model and tokenizer
model_name = "sweatSmile/HF-SmolLM3-3B-Math-Formulas-4bit"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto"
)

# Generate mathematical content
prompt = "Explain this mathematical formula:"
inputs = tokenizer(prompt, return_tensors="pt")

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=150,
        temperature=0.7,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id
    )

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```

## Intended Use Cases

- **Mathematical Education**: Explaining mathematical formulas and concepts
- **LaTeX Generation**: Creating properly formatted mathematical expressions
- **Formula Analysis**: Understanding and breaking down complex mathematical equations
- **Mathematical Problem Solving**: Assisting with mathematical computations and derivations

## Limitations

- **Domain Specific**: Optimized primarily for mathematical content
- **Training Data Size**: Fine-tuned on only 1,000 samples
- **Quantization Effects**: 4-bit quantization may introduce minor precision loss
- **Context Length**: Limited to 128 tokens for mathematical expressions
- **Language**: Primarily trained on English mathematical notation

## Performance Metrics

- **Final Training Loss**: 0.589
- **Convergence**: Achieved in 6 epochs (efficient training)
- **Improvement**: 52% loss reduction compared to baseline configuration
- **Efficiency**: 51% faster training compared to initial setup

## Model Architecture

Based on SmolLM3-3B with the following modifications:
- 4-bit NF4 quantization for memory efficiency
- LoRA adapters for parameter-efficient fine-tuning
- Specialized for mathematical formula understanding

## Citation

If you use this model, please cite:

```bibtex
@model{smollm3-math-formulas-4bit,
  title={SmolLM3-3B-Math-Formulas-4bit},
  author={sweatSmile},
  year={2025},
  base_model={HuggingFaceTB/SmolLM3-3B},
  dataset={ddrg/math_formulas},
  method={QLoRA fine-tuning with 4-bit quantization}
}
```

## License

This model inherits the license from the base SmolLM3-3B model. Please refer to the original model's license for usage terms.

## Acknowledgments

- **Base Model**: HuggingFace Team for SmolLM3-3B
- **Dataset**: Dresden Database Research Group for the math_formulas dataset
- **Training Framework**: Hugging Face Transformers and TRL libraries
- **Quantization**: bitsandbytes library for 4-bit optimization