sweatSmile commited on
Commit
510aa98
·
verified ·
1 Parent(s): f92c878

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +125 -0
README.md ADDED
@@ -0,0 +1,125 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # SmolLM3-3B-Math-Formulas-4bit
2
+
3
+ ## Model Description
4
+
5
+ **SmolLM3-3B-Math-Formulas-4bit** is a fine-tuned version of [HuggingFaceTB/SmolLM3-3B](https://huggingface.co/HuggingFaceTB/SmolLM3-3B) specialized for mathematical formula understanding and generation. The model has been optimized using 4-bit quantization (NF4) with LoRA adapters for efficient training and inference.
6
+
7
+ - **Base Model**: HuggingFaceTB/SmolLM3-3B
8
+ - **Model Type**: Causal Language Model
9
+ - **Quantization**: 4-bit NF4 with double quantization
10
+ - **Fine-tuning Method**: QLoRA (Quantized Low-Rank Adaptation)
11
+ - **Specialization**: Mathematical formulas and expressions
12
+
13
+ ## Training Details
14
+
15
+ ### Dataset
16
+ - **Source**: [ddrg/math_formulas](https://huggingface.co/datasets/ddrg/math_formulas)
17
+ - **Size**: 1,000 samples (randomly selected from 2.89M total)
18
+ - **Content**: Mathematical formulas, equations, and expressions in LaTeX format
19
+
20
+ ### Training Configuration
21
+ - **Training Loss**: 0.589 (final)
22
+ - **Epochs**: 6
23
+ - **Batch Size**: 8 (per device)
24
+ - **Learning Rate**: 2.5e-4 with cosine scheduler
25
+ - **Max Sequence Length**: 128 tokens
26
+ - **Gradient Accumulation**: 2 steps
27
+ - **Optimizer**: AdamW with 0.01 weight decay
28
+ - **Precision**: FP16
29
+ - **LoRA Configuration**:
30
+ - r=4, alpha=8
31
+ - Dropout: 0.1
32
+ - Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
33
+
34
+ ### Hardware & Performance
35
+ - **Training Time**: 265 seconds (4.4 minutes)
36
+ - **Training Speed**: 5.68 samples/second
37
+ - **Total Steps**: 96
38
+ - **Memory Efficiency**: 4-bit quantization for reduced VRAM usage
39
+
40
+ ## Usage
41
+
42
+ ```python
43
+ from transformers import AutoTokenizer, AutoModelForCausalLM
44
+ import torch
45
+
46
+ # Load the model and tokenizer
47
+ model_name = "sweatSmile/HF-SmolLM3-3B-Math-Formulas-4bit"
48
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
49
+ model = AutoModelForCausalLM.from_pretrained(
50
+ model_name,
51
+ torch_dtype=torch.float16,
52
+ device_map="auto"
53
+ )
54
+
55
+ # Generate mathematical content
56
+ prompt = "Explain this mathematical formula:"
57
+ inputs = tokenizer(prompt, return_tensors="pt")
58
+
59
+ with torch.no_grad():
60
+ outputs = model.generate(
61
+ **inputs,
62
+ max_new_tokens=150,
63
+ temperature=0.7,
64
+ do_sample=True,
65
+ pad_token_id=tokenizer.eos_token_id
66
+ )
67
+
68
+ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
69
+ print(response)
70
+ ```
71
+
72
+ ## Intended Use Cases
73
+
74
+ - **Mathematical Education**: Explaining mathematical formulas and concepts
75
+ - **LaTeX Generation**: Creating properly formatted mathematical expressions
76
+ - **Formula Analysis**: Understanding and breaking down complex mathematical equations
77
+ - **Mathematical Problem Solving**: Assisting with mathematical computations and derivations
78
+
79
+ ## Limitations
80
+
81
+ - **Domain Specific**: Optimized primarily for mathematical content
82
+ - **Training Data Size**: Fine-tuned on only 1,000 samples
83
+ - **Quantization Effects**: 4-bit quantization may introduce minor precision loss
84
+ - **Context Length**: Limited to 128 tokens for mathematical expressions
85
+ - **Language**: Primarily trained on English mathematical notation
86
+
87
+ ## Performance Metrics
88
+
89
+ - **Final Training Loss**: 0.589
90
+ - **Convergence**: Achieved in 6 epochs (efficient training)
91
+ - **Improvement**: 52% loss reduction compared to baseline configuration
92
+ - **Efficiency**: 51% faster training compared to initial setup
93
+
94
+ ## Model Architecture
95
+
96
+ Based on SmolLM3-3B with the following modifications:
97
+ - 4-bit NF4 quantization for memory efficiency
98
+ - LoRA adapters for parameter-efficient fine-tuning
99
+ - Specialized for mathematical formula understanding
100
+
101
+ ## Citation
102
+
103
+ If you use this model, please cite:
104
+
105
+ ```bibtex
106
+ @model{smollm3-math-formulas-4bit,
107
+ title={SmolLM3-3B-Math-Formulas-4bit},
108
+ author={sweatSmile},
109
+ year={2025},
110
+ base_model={HuggingFaceTB/SmolLM3-3B},
111
+ dataset={ddrg/math_formulas},
112
+ method={QLoRA fine-tuning with 4-bit quantization}
113
+ }
114
+ ```
115
+
116
+ ## License
117
+
118
+ This model inherits the license from the base SmolLM3-3B model. Please refer to the original model's license for usage terms.
119
+
120
+ ## Acknowledgments
121
+
122
+ - **Base Model**: HuggingFace Team for SmolLM3-3B
123
+ - **Dataset**: Dresden Database Research Group for the math_formulas dataset
124
+ - **Training Framework**: Hugging Face Transformers and TRL libraries
125
+ - **Quantization**: bitsandbytes library for 4-bit optimization