AyoubChLin commited on
Commit
b4c5cea
·
verified ·
1 Parent(s): 2cdb205

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +137 -115
README.md CHANGED
@@ -6,197 +6,219 @@ base_model:
6
  - LiquidAI/LFM2.5-1.2B-Instruct
7
  ---
8
 
9
- # Model Card for Model ID
10
 
11
- <!-- Provide a quick summary of what the model is/does. -->
12
 
 
13
 
 
14
 
15
- ## Model Details
16
 
17
- ### Model Description
18
 
19
- <!-- Provide a longer summary of what this model is. -->
20
-
21
- This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
22
-
23
- - **Developed by:** [More Information Needed]
24
- - **Funded by [optional]:** [More Information Needed]
25
- - **Shared by [optional]:** [More Information Needed]
26
- - **Model type:** [More Information Needed]
27
- - **Language(s) (NLP):** [More Information Needed]
28
- - **License:** [More Information Needed]
29
- - **Finetuned from model [optional]:** [More Information Needed]
30
-
31
- ### Model Sources [optional]
32
-
33
- <!-- Provide the basic links for the model. -->
34
-
35
- - **Repository:** [More Information Needed]
36
- - **Paper [optional]:** [More Information Needed]
37
- - **Demo [optional]:** [More Information Needed]
38
-
39
- ## Uses
40
-
41
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
42
 
43
- ### Direct Use
44
 
45
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
46
 
47
- [More Information Needed]
 
 
 
 
 
48
 
49
- ### Downstream Use [optional]
50
 
51
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
 
 
 
52
 
53
- [More Information Needed]
54
 
55
- ### Out-of-Scope Use
56
 
57
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
58
 
59
- [More Information Needed]
 
 
 
 
60
 
61
- ## Bias, Risks, and Limitations
62
 
63
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
64
 
65
- [More Information Needed]
66
 
67
- ### Recommendations
 
68
 
69
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
 
70
 
71
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
 
72
 
73
- ## How to Get Started with the Model
 
74
 
75
- Use the code below to get started with the model.
 
76
 
77
- [More Information Needed]
78
 
79
- ## Training Details
80
 
81
- ### Training Data
82
 
83
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
 
 
 
 
 
 
 
 
 
 
 
84
 
85
- [More Information Needed]
86
 
87
  ### Training Procedure
88
 
89
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
90
-
91
- #### Preprocessing [optional]
92
-
93
- [More Information Needed]
94
 
 
 
 
 
 
95
 
96
- #### Training Hyperparameters
97
 
98
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
99
-
100
- #### Speeds, Sizes, Times [optional]
101
-
102
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
103
-
104
- [More Information Needed]
105
 
106
  ## Evaluation
107
 
108
- <!-- This section describes the evaluation protocols and provides the results. -->
109
-
110
- ### Testing Data, Factors & Metrics
111
-
112
- #### Testing Data
113
-
114
- <!-- This should link to a Dataset Card if possible. -->
115
-
116
- [More Information Needed]
117
 
118
- #### Factors
119
 
120
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
121
 
122
- [More Information Needed]
 
 
 
123
 
124
- #### Metrics
125
 
126
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
127
 
128
- [More Information Needed]
129
 
130
- ### Results
131
 
132
- [More Information Needed]
 
 
 
133
 
134
- #### Summary
135
 
 
 
 
 
136
 
 
137
 
138
- ## Model Examination [optional]
139
 
140
- <!-- Relevant interpretability work for the model goes here -->
141
 
142
- [More Information Needed]
 
 
 
143
 
144
- ## Environmental Impact
145
 
146
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
 
 
147
 
148
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
149
 
150
- - **Hardware Type:** [More Information Needed]
151
- - **Hours used:** [More Information Needed]
152
- - **Cloud Provider:** [More Information Needed]
153
- - **Compute Region:** [More Information Needed]
154
- - **Carbon Emitted:** [More Information Needed]
155
 
156
- ## Technical Specifications [optional]
157
 
158
- ### Model Architecture and Objective
 
 
159
 
160
- [More Information Needed]
161
 
162
- ### Compute Infrastructure
163
 
164
- [More Information Needed]
 
165
 
166
- #### Hardware
167
 
168
- [More Information Needed]
 
169
 
170
- #### Software
171
 
172
- [More Information Needed]
 
173
 
174
- ## Citation [optional]
 
175
 
176
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
177
 
178
- **BibTeX:**
179
 
180
- [More Information Needed]
 
 
 
181
 
182
- **APA:**
183
 
184
- [More Information Needed]
185
 
186
- ## Glossary [optional]
187
 
188
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
189
 
190
- [More Information Needed]
191
 
192
- ## More Information [optional]
193
 
194
- [More Information Needed]
 
 
 
 
 
 
 
195
 
196
- ## Model Card Authors [optional]
197
 
198
- [More Information Needed]
199
 
200
- ## Model Card Contact
 
 
201
 
202
- [More Information Needed]
 
6
  - LiquidAI/LFM2.5-1.2B-Instruct
7
  ---
8
 
 
9
 
10
+ # Saudi Dialect LFM2.5 Instruction-Tuned Arabic Dialect Model
11
 
12
+ ## Model Description
13
 
14
+ This model is a fine-tuned version of **Liquid AI**’s **LFM2.5‑1.2B‑Instruct**, adapted for Saudi dialect conversational generation.
15
 
16
+ The base model belongs to the LFM2.5 family — hybrid state-space + attention language models designed for **fast on-device inference**,low memory usage, and strong performance relative to size. It contains ~1.17B parameters, 32k context length, and supports multilingual generation including Arabic.
17
 
18
+ This fine-tuned variant specializes the model for **Saudi dialect conversational patterns**, improving fluency, dialect authenticity, and instruction following for regional Arabic use cases.
19
 
20
+ ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
 
22
+ ## Intended Use
23
 
24
+ ### Primary Use Cases
25
 
26
+ * Saudi dialect chatbots
27
+ * Customer support assistants
28
+ * Conversational agents
29
+ * Arabic NLP research
30
+ * Dialect-aware RAG pipelines
31
+ * Dialogue generation systems
32
 
33
+ ### Out-of-Scope Uses
34
 
35
+ * Legal/medical advice
36
+ * Safety-critical decision making
37
+ * High-precision knowledge tasks without retrieval
38
+ * Sensitive content generation
39
 
40
+ ---
41
 
42
+ ## Training Details
43
 
44
+ ### Base Model
45
 
46
+ * Architecture: Hybrid state-space + attention
47
+ * Parameters: ~1.17B
48
+ * Context length: 32,768 tokens
49
+ * Training tokens: ~28T
50
+ * Languages: Multilingual including Arabic
51
 
52
+ ---
53
 
54
+ ### Dataset
55
 
56
+ Fine-tuned on:
57
 
58
+ **Dataset:**
59
+ `HeshamHaroon/saudi-dialect-conversations`
60
 
61
+ **Domain:**
62
+ Conversational dialogue
63
 
64
+ **Language:**
65
+ Saudi dialect Arabic
66
 
67
+ **Format:**
68
+ Instruction → Response pairs
69
 
70
+ **Purpose:**
71
+ Increase dialect authenticity and conversational naturalness.
72
 
73
+ ---
74
 
75
+ ### Training Configuration
76
 
77
+ (Extracted from training notebook)
78
 
79
+ | Parameter | Value |
80
+ | --------------------- | ---------------------------- |
81
+ | Epochs | 8 |
82
+ | Learning Rate | 2e-4 |
83
+ | Batch Size | 16 |
84
+ | Gradient Accumulation | 4 |
85
+ | Optimizer | AdamW |
86
+ | LR Scheduler | Linear |
87
+ | Warmup Ratio | 0.03 |
88
+ | Sequence Length | 8096 |
89
+ | Precision | FP16 |
90
+ | Training Type | Supervised Fine-Tuning (SFT) |
91
 
92
+ ---
93
 
94
  ### Training Procedure
95
 
96
+ Training was performed using:
 
 
 
 
97
 
98
+ * Transformers
99
+ * TRL SFTTrainer
100
+ * LoRA fine-tuning
101
+ * Mixed precision
102
+ * Gradient accumulation
103
 
104
+ The base model weights were adapted rather than retrained from scratch.
105
 
106
+ ---
 
 
 
 
 
 
107
 
108
  ## Evaluation
109
 
 
 
 
 
 
 
 
 
 
110
 
 
111
 
112
+ Qualitative evaluation indicates:
113
 
114
+ * Improved dialect fluency
115
+ * Reduced MSA leakage
116
+ * Better conversational tone
117
+ * Higher lexical authenticity
118
 
119
+ Dialect-specific fine-tuning is known to significantly increase dialect generation accuracy and reduce standard-Arabic drift in Arabic LLMs.
120
 
121
+ ---
122
 
123
+ ## Performance Characteristics
124
 
125
+ **Strengths**
126
 
127
+ * Very fast inference
128
+ * Low memory footprint
129
+ * Strong conversational coherence
130
+ * Good instruction following
131
 
132
+ **Limitations**
133
 
134
+ * Smaller model → limited factual depth
135
+ * May hallucinate
136
+ * Less capable for complex reasoning vs larger models
137
+ * Dialect bias toward Saudi Arabic
138
 
139
+ ---
140
 
141
+ ## Bias, Risks, and Safety
142
 
143
+ Potential risks:
144
 
145
+ * Dialect bias
146
+ * Cultural bias from dataset
147
+ * Toxic outputs if prompted maliciously
148
+ * Hallucinated facts
149
 
150
+ Mitigations:
151
 
152
+ * Filtering dataset
153
+ * Instruction alignment
154
+ * Moderation layers recommended
155
 
156
+ ---
157
 
158
+ ## Hardware Requirements
 
 
 
 
159
 
160
+ Runs efficiently on:
161
 
162
+ * CPU inference (<1GB memory quantized)
163
+ * Mobile NPUs
164
+ * Edge devices
165
 
166
+ ---
167
 
168
+ ## Example Usage
169
 
170
+ ```python
171
+ from transformers import AutoTokenizer, AutoModelForCausalLM
172
 
173
+ model_id = "AyoubChLin/lfm2.5-saudi-dialect"
174
 
175
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
176
+ model = AutoModelForCausalLM.from_pretrained(model_id)
177
 
178
+ prompt = "تكلم باللهجة السعودية عن القهوة"
179
 
180
+ inputs = tokenizer(prompt, return_tensors="pt")
181
+ outputs = model.generate(**inputs, max_new_tokens=200)
182
 
183
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
184
+ ```
185
 
186
+ ---
187
 
188
+ ## Training Compute
189
 
190
+ * **GPU:** 1 × NVIDIA A100 (40 GB VRAM)
191
+ * **CPU:** 8 cores
192
+ * **RAM:** 16 GiB
193
+ * **Compute Environment:** Cloud training instance
194
 
195
+ ---
196
 
197
+ ## License
198
 
199
+ Same as base model license unless otherwise specified.
200
 
201
+ ---
202
 
203
+ ## Citation
204
 
205
+ If you use this model:
206
 
207
+ ```
208
+ @misc{saudi-dialect-lfm2.5,
209
+ author = {Cherguelaine Ayoub},
210
+ title = {Saudi Dialect LFM2.5},
211
+ year = {2026},
212
+ publisher = {Hugging Face}
213
+ }
214
+ ```
215
 
216
+ ---
217
 
218
+ ## Acknowledgments
219
 
220
+ * Liquid AI for base model
221
+ * Dataset creators
222
+ * Open-source tooling ecosystem
223
 
224
+ ---