aashish1904 commited on
Commit
69fd671
·
verified ·
1 Parent(s): 6bf0a99

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +473 -0
README.md ADDED
@@ -0,0 +1,473 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+
4
+ license: gemma
5
+ library_name: transformers
6
+ pipeline_tag: text-generation
7
+ extra_gated_button_content: Acknowledge license
8
+ tags:
9
+ - conversational
10
+ language:
11
+ - ar
12
+ - en
13
+
14
+ ---
15
+
16
+ ![](https://lh7-rt.googleusercontent.com/docsz/AD_4nXeiuCm7c8lEwEJuRey9kiVZsRn2W-b4pWlu3-X534V3YmVuVc2ZL-NXg2RkzSOOS2JXGHutDuyyNAUtdJI65jGTo8jT9Y99tMi4H4MqL44Uc5QKG77B0d6-JfIkZHFaUA71-RtjyYZWVIhqsNZcx8-OMaA?key=xt3VSDoCbmTY7o-cwwOFwQ)
17
+
18
+ # QuantFactory/SILMA-9B-Instruct-v1.0-GGUF
19
+ This is quantized version of [silma-ai/SILMA-9B-Instruct-v1.0](https://huggingface.co/silma-ai/SILMA-9B-Instruct-v1.0) created using llama.cpp
20
+
21
+ # Original Model Card
22
+
23
+
24
+
25
+ # SILMA AI
26
+
27
+ SILMA.AI is a leading Generative AI startup dedicated to empowering Arabic speakers with state-of-the-art AI solutions.
28
+
29
+
30
+ ## 🚀 Our Flagship Model: SILMA 1.0 🚀
31
+
32
+ * **SILMA 1.0** is the **TOP-RANKED** open-weights Arabic LLM with an impressive **9 billion parameter size**, surpassing models that are over seven times larger 🏆
33
+
34
+
35
+ ## What makes SILMA exceptional?
36
+
37
+ * SIMLA is a small language model outperforming 72B models in most arabic language tasks, thus more practical for business use-cases
38
+ * SILMA is built over the robust foundational models of Google Gemma, combining the strengths of both to provide you with unparalleled performance
39
+ * SILMA is an open-weight model, free to use in accordance with our open license
40
+
41
+
42
+ ## 👥 Our Team
43
+
44
+ We are a team of seasoned **Arabic AI experts** who understand the nuances of the language and cultural considerations, enabling us to build solutions that truly resonate with Arabic users.
45
+
46
+ **Authors**: [silma.ai](https://silma.ai)
47
+
48
+
49
+ ### Usage
50
+
51
+ Below we share some code snippets on how to get quickly started with running the model. First, install the Transformers library with:
52
+
53
+ ```sh
54
+ pip install -U transformers sentencepiece
55
+ ```
56
+
57
+ Then, copy the snippet from the section that is relevant for your usecase.
58
+
59
+ #### Running with the `pipeline` API
60
+
61
+ ```python
62
+ import torch
63
+ from transformers import pipeline
64
+
65
+ pipe = pipeline(
66
+ "text-generation",
67
+ model="silma-ai/SILMA-9B-Instruct-v1.0",
68
+ model_kwargs={"torch_dtype": torch.bfloat16},
69
+ device="cuda", # replace with "mps" to run on a Mac device
70
+ )
71
+
72
+ messages = [
73
+ {"role": "user", "content": "اكتب رسالة تعتذر فيها لمديري في العمل عن الحضور اليوم لأسباب مرضية."},
74
+ ]
75
+
76
+ outputs = pipe(messages, max_new_tokens=256)
77
+ assistant_response = outputs[0]["generated_text"][-1]["content"].strip()
78
+ print(assistant_response)
79
+ ```
80
+
81
+ - Response:
82
+
83
+ ```text
84
+ السلام عليكم ورحمة الله وبركاته
85
+
86
+ أودّ أن أعتذر عن عدم الحضور إلى العمل اليوم بسبب مرضي. أشعر بالسوء الشديد وأحتاج إلى الراحة. سأعود إلى العمل فور تعافيي.
87
+ شكراً لتفهمكم.
88
+
89
+ مع تحياتي،
90
+ [اسمك]
91
+ ```
92
+
93
+ #### Running the model on a single / multi GPU
94
+
95
+ ```sh
96
+ pip install accelerate
97
+ ```
98
+
99
+ ```python
100
+ from transformers import AutoTokenizer, AutoModelForCausalLM
101
+ import torch
102
+
103
+ model_id = "silma-ai/SILMA-9B-Instruct-v1.0"
104
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
105
+ model = AutoModelForCausalLM.from_pretrained(
106
+ model_id,
107
+ device_map="auto",
108
+ torch_dtype=torch.bfloat16,
109
+ )
110
+
111
+ messages = [
112
+ {"role": "system", "content": "أنت مساعد ذكي للإجابة عن أسئلة المستخدمين."},
113
+ {"role": "user", "content": "أيهما أبعد عن الأرض, الشمس أم القمر؟"},
114
+ ]
115
+
116
+ input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt", return_dict=True).to("cuda")
117
+
118
+ outputs = model.generate(**input_ids, max_new_tokens=256)
119
+
120
+ print(tokenizer.decode(outputs[0]))
121
+ ```
122
+
123
+ - Response:
124
+ ```text
125
+ الشمس
126
+ ```
127
+
128
+ You can ensure the correct chat template is applied by using `tokenizer.apply_chat_template` as follows:
129
+ ```python
130
+
131
+ from transformers import AutoTokenizer, AutoModelForCausalLM
132
+ import torch
133
+
134
+ model_id = "silma-ai/SILMA-9B-Instruct-v1.0"
135
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
136
+ model = AutoModelForCausalLM.from_pretrained(
137
+ model_id,
138
+ device_map="auto",
139
+ torch_dtype=torch.bfloat16,
140
+ )
141
+
142
+ messages = [
143
+ {"role": "system", "content": "أنت مساعد ذكي للإجابة عن أسئلة المستخدمين."},
144
+ {"role": "user", "content": "اكتب كود بايثون لتوليد متسلسلة أرقام زوجية."},
145
+ ]
146
+ input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt", return_dict=True).to("cuda")
147
+
148
+ outputs = model.generate(**input_ids, max_new_tokens=256)
149
+ print(tokenizer.decode(outputs[0]).split("<start_of_turn>model")[-1])
150
+ ```
151
+
152
+ - Response:
153
+ ```python
154
+ def generate_even_numbers(n):
155
+ """
156
+ This function generates a list of even numbers from 1 to n.
157
+ Args:
158
+ n: The upper limit of the range.
159
+
160
+ Returns:
161
+ A list of even numbers.
162
+ """
163
+ return [i for i in range(1, n + 1) if i % 2 == 0]
164
+
165
+ # Example usage
166
+ n = 10
167
+ even_numbers = generate_even_numbers(n)
168
+ print(f"The first {n} even numbers are: {even_numbers}")
169
+ ```
170
+
171
+ #### Quantized Versions through `bitsandbytes`
172
+
173
+ <details>
174
+ <summary>
175
+ Using 8-bit precision (int8)
176
+ </summary>
177
+
178
+ ```sh
179
+ pip install bitsandbytes accelerate
180
+ ```
181
+
182
+ ```python
183
+ # pip install bitsandbytes accelerate
184
+ from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
185
+
186
+ model_id = "silma-ai/SILMA-9B-Instruct-v1.0"
187
+ quantization_config = BitsAndBytesConfig(load_in_8bit=True)
188
+
189
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
190
+ model = AutoModelForCausalLM.from_pretrained(
191
+ model_id,
192
+ quantization_config=quantization_config,
193
+ )
194
+
195
+ messages = [
196
+ {"role": "system", "content": "أنت مساعد ذكي للإجابة عن أسئلة المستخدمين."},
197
+ {"role": "user", "content": "اذكر خمس انواع فواكه بها نسب عالية من فيتامين ج."},
198
+ ]
199
+ input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt", return_dict=True).to("cuda")
200
+
201
+ outputs = model.generate(**input_ids, max_new_tokens=256)
202
+ print(tokenizer.decode(outputs[0]).split("<start_of_turn>model")[-1])
203
+ ```
204
+
205
+ - Response:
206
+ ```text
207
+ الليمون، البرتقال، الموز، الكيوي، الفراولة
208
+ ```
209
+
210
+ </details>
211
+
212
+ <details>
213
+ <summary>
214
+ Using 4-bit precision
215
+ </summary>
216
+
217
+ ```python
218
+ # pip install bitsandbytes accelerate
219
+ from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
220
+
221
+ model_id = "silma-ai/SILMA-9B-Instruct-v1.0"
222
+ quantization_config = BitsAndBytesConfig(load_in_4bit=True)
223
+
224
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
225
+ model = AutoModelForCausalLM.from_pretrained(
226
+ model_id,
227
+ quantization_config=quantization_config,
228
+ )
229
+
230
+ messages = [
231
+ {"role": "system", "content": "أنت مساعد ذكي للإجابة عن أسئلة المستخدمين."},
232
+ {"role": "user", "content": "في أي عام توفى صلاح الدين الأيوبي؟"},
233
+ ]
234
+ input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt", return_dict=True).to("cuda")
235
+
236
+ outputs = model.generate(**input_ids, max_new_tokens=256)
237
+ print(tokenizer.decode(outputs[0]).split("<start_of_turn>model")[-1])
238
+ ```
239
+
240
+ - Response:
241
+ ```text
242
+ 1193
243
+ ```
244
+
245
+ </details>
246
+
247
+ #### Advanced Usage
248
+
249
+ <details>
250
+ <summary>
251
+ Torch compile
252
+ </summary>
253
+
254
+ [Torch compile](https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html) is a method for speeding-up the
255
+ inference of PyTorch modules. The Silma model can be run up to 6x faster by leveraging torch compile.
256
+
257
+ Note that two warm-up steps are required before the full inference speed is realised:
258
+
259
+ ```python
260
+ import os
261
+ os.environ["TOKENIZERS_PARALLELISM"] = "false"
262
+
263
+ from transformers import AutoTokenizer, Gemma2ForCausalLM
264
+ from transformers.cache_utils import HybridCache
265
+ import torch
266
+
267
+ torch.set_float32_matmul_precision("high")
268
+
269
+ # load the model + tokenizer
270
+ model_id = "silma-ai/SILMA-9B-Instruct-v1.0"
271
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
272
+ model = Gemma2ForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16)
273
+ model.to("cuda")
274
+
275
+ # apply the torch compile transformation
276
+ model.forward = torch.compile(model.forward, mode="reduce-overhead", fullgraph=True)
277
+
278
+ # pre-process inputs
279
+
280
+ messages = [
281
+ {"role": "system", "content": "أنت مساعد ذكي للإجابة عن أسئلة المستخدمين."},
282
+ {"role": "user", "content": "من الرئيس الذي تولى المنصب في أمريكا بعد دونالد ترامب؟"},
283
+ ]
284
+ model_inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", return_dict=True).to("cuda")
285
+
286
+ input_text = "من الرئيس الذي تولى المنصب في أمريكا بعد دونالد ترامب؟"
287
+ model_inputs = tokenizer(input_text, return_tensors="pt").to("cuda")
288
+ prompt_length = model_inputs.input_ids.shape[1]
289
+
290
+ # set-up k/v cache
291
+ past_key_values = HybridCache(
292
+ config=model.config,
293
+ max_batch_size=1,
294
+ max_cache_len=model.config.max_position_embeddings,
295
+ device=model.device,
296
+ dtype=model.dtype
297
+ )
298
+
299
+ # enable passing kv cache to generate
300
+ model._supports_cache_class = True
301
+ model.generation_config.cache_implementation = None
302
+
303
+ # two warm-up steps
304
+ for idx in range(2):
305
+ outputs = model.generate(**model_inputs, past_key_values=past_key_values, do_sample=True, temperature=1.0, max_new_tokens=128)
306
+ past_key_values.reset()
307
+
308
+ # fast run
309
+ outputs = model.generate(**model_inputs, past_key_values=past_key_values, do_sample=True, temperature=1.0, max_new_tokens=128)
310
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
311
+ ```
312
+
313
+ - Response:
314
+ ```text
315
+ جو بايدن
316
+ ```
317
+
318
+ For more details, refer to the [Transformers documentation](https://huggingface.co/docs/transformers/main/en/llm_optims?static-kv=basic+usage%3A+generation_config).
319
+
320
+ </details>
321
+
322
+ ### Chat Template
323
+
324
+ The instruction-tuned models use a chat template that must be adhered to for conversational use.
325
+ The easiest way to apply it is using the tokenizer's built-in chat template, as shown in the following snippet.
326
+
327
+ Let's load the model and apply the chat template to a conversation. In this example, we'll start with a single user interaction:
328
+
329
+ ```python
330
+ from transformers import AutoTokenizer, AutoModelForCausalLM
331
+ import transformers
332
+ import torch
333
+
334
+ model_id = "silma-ai/SILMA-9B-Instruct-v1.0"
335
+ dtype = torch.bfloat16
336
+
337
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
338
+ model = AutoModelForCausalLM.from_pretrained(
339
+ model_id,
340
+ device_map="cuda",
341
+ torch_dtype=dtype,)
342
+
343
+ chat = [
344
+ { "role": "user", "content": "ما اشهر اطارات العمل في البايثون لبناء نماذج الذكاء الاصطناعي؟" },
345
+ ]
346
+ prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
347
+ ```
348
+
349
+ At this point, the prompt contains the following text:
350
+
351
+ ```
352
+ <bos><start_of_turn>user
353
+ ما اشهر اطارات العمل في البايثون لبناء نماذج الذكاء الاصطناعي؟<end_of_turn>
354
+ <start_of_turn>model
355
+ ```
356
+
357
+ As you can see, each turn is preceded by a `<start_of_turn>` delimiter and then the role of the entity
358
+ (either `user`, for content supplied by the user, or `model` for LLM responses). Turns finish with
359
+ the `<end_of_turn>` token.
360
+
361
+ You can follow this format to build the prompt manually, if you need to do it without the tokenizer's
362
+ chat template.
363
+
364
+ After the prompt is ready, generation can be performed like this:
365
+
366
+ ```python
367
+ inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
368
+ outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=150)
369
+ print(tokenizer.decode(outputs[0]))
370
+ ```
371
+
372
+ ### Inputs and outputs
373
+
374
+ * **Input:** Text string, such as a question, a prompt, or a document to be
375
+ summarized.
376
+ * **Output:** Generated Arabic or English text in response to the input, such
377
+ as an answer to a question, or a summary of a document.
378
+
379
+ ### Citation
380
+
381
+ ```none
382
+ @article{silma_01_2024,
383
+ title={Silma},
384
+ url={https://www.silma.ai},
385
+ publisher={Silma},
386
+ author={Silma Team},
387
+ year={2024}
388
+ }
389
+ ```
390
+
391
+ ## Usage and Limitations
392
+
393
+ These models have certain limitations that users should be aware of.
394
+
395
+ ### Intended Usage
396
+
397
+ Open Large Language Models (LLMs) have a wide range of applications across
398
+ various industries and domains. The following list of potential uses is not
399
+ comprehensive. The purpose of this list is to provide contextual information
400
+ about the possible use-cases that the model creators considered as part of model
401
+ training and development.
402
+
403
+ * Content Creation and Communication
404
+ * Text Generation: These models can be used to generate creative text formats
405
+ such as poems, scripts, code, marketing copy, and email drafts.
406
+ * Chatbots and Conversational AI: Power conversational interfaces for customer
407
+ service, virtual assistants, or interactive applications.
408
+ * Text Summarization: Generate concise summaries of a text corpus, research
409
+ papers, or reports.
410
+ * Research and Education
411
+ * Natural Language Processing (NLP) Research: These models can serve as a
412
+ foundation for researchers to experiment with NLP techniques, develop
413
+ algorithms, and contribute to the advancement of the field.
414
+ * Language Learning Tools: Support interactive language learning experiences,
415
+ aiding in grammar correction or providing writing practice.
416
+ * Knowledge Exploration: Assist researchers in exploring large bodies of text
417
+ by generating summaries or answering questions about specific topics.
418
+
419
+ ### Limitations
420
+
421
+ * Training Data
422
+ * The quality and diversity of the training data significantly influence the
423
+ model's capabilities. Biases or gaps in the training data can lead to
424
+ limitations in the model's responses.
425
+ * The scope of the training dataset determines the subject areas the model can
426
+ handle effectively.
427
+ * Context and Task Complexity
428
+ * LLMs are better at tasks that can be framed with clear prompts and
429
+ instructions. Open-ended or highly complex tasks might be challenging.
430
+ * A model's performance can be influenced by the amount of context provided
431
+ (longer context generally leads to better outputs, up to a certain point).
432
+ * Language Ambiguity and Nuance
433
+ * Natural language is inherently complex. LLMs might struggle to grasp subtle
434
+ nuances, sarcasm, or figurative language.
435
+ * Factual Accuracy
436
+ * LLMs generate responses based on information they learned from their
437
+ training datasets, but they are not knowledge bases. They may generate
438
+ incorrect or outdated factual statements.
439
+ * Common Sense
440
+ * LLMs rely on statistical patterns in language. They might lack the ability
441
+ to apply common sense reasoning in certain situations.
442
+
443
+ ### Ethical Considerations and Risks
444
+
445
+ The development of large language models (LLMs) raises several ethical concerns.
446
+ In creating an open model, we have carefully considered the following:
447
+
448
+ * Bias and Fairness
449
+ * LLMs trained on large-scale, real-world text data can reflect socio-cultural
450
+ biases embedded in the training material.
451
+ * Misinformation and Misuse
452
+ * LLMs can be misused to generate text that is false, misleading, or harmful.
453
+ * Guidelines are provided for responsible use with the model, see the
454
+ [Responsible Generative AI Toolkit][rai-toolkit].
455
+ * Transparency and Accountability:
456
+ * This model card summarizes details on the models' architecture,
457
+ capabilities, limitations, and evaluation processes.
458
+ * A responsibly developed open model offers the opportunity to share
459
+ innovation by making LLM technology accessible to developers and researchers
460
+ across the AI ecosystem.
461
+
462
+ Risks identified and mitigations:
463
+
464
+ * Perpetuation of biases: It's encouraged to perform continuous monitoring
465
+ (using evaluation metrics, human review) and the exploration of de-biasing
466
+ techniques during model training, fine-tuning, and other use cases.
467
+ * Generation of harmful content: Mechanisms and guidelines for content safety
468
+ are essential. Developers are encouraged to exercise caution and implement
469
+ appropriate content safety safeguards based on their specific product policies
470
+ and application use cases.
471
+ * Privacy violations: Models were trained on data filtered for removal of PII
472
+ (Personally Identifiable Information). Developers are encouraged to adhere to
473
+ privacy regulations with privacy-preserving techniques.