usmanxia commited on
Commit
6a688cd
·
verified ·
1 Parent(s): e2861ae

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -194
README.md CHANGED
@@ -8,29 +8,22 @@ widget:
8
  inference:
9
  parameters:
10
  max_new_tokens: 200
11
- extra_gated_heading: "Access Gemma on Hugging Face"
12
- extra_gated_prompt: "To access Gemma on Hugging Face, you’re required to review and agree to Google’s usage license. To do this, please ensure you’re logged-in to Hugging Face and click below. Requests are processed immediately."
13
  extra_gated_button_content: "Acknowledge license"
14
  license: other
15
- license_name: gemma-terms-of-use
16
- license_link: https://ai.google.dev/gemma/terms
17
- ---
18
 
19
- # Gemma Model Card
20
 
21
- **Model Page**: [Gemma](https://ai.google.dev/gemma/docs)
22
 
23
- This model card corresponds to the 2B instruct version of the Gemma model. You can also visit the model card of the [2B base model](https://huggingface.co/google/gemma-2b), [7B base model](https://huggingface.co/google/gemma-7b), and [7B instruct model](https://huggingface.co/google/gemma-7b-it).
24
 
25
- **Resources and Technical Documentation**:
26
 
27
- * [Responsible Generative AI Toolkit](https://ai.google.dev/responsible)
28
- * [Gemma on Kaggle](https://www.kaggle.com/models/google/gemma)
29
- * [Gemma on Vertex Model Garden](https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/335?version=gemma-2b-it-gg-hf)
30
 
31
- **Terms of Use**: [Terms](https://www.kaggle.com/models/google/gemma/license/consent)
32
 
33
- **Authors**: Google
34
 
35
  ## Model Information
36
 
@@ -38,10 +31,9 @@ Summary description and brief definition of inputs and outputs.
38
 
39
  ### Description
40
 
41
- Gemma is a family of lightweight, state-of-the-art open models from Google,
42
- built from the same research and technology used to create the Gemini models.
43
  They are text-to-text, decoder-only large language models, available in English,
44
- with open weights, pre-trained variants, and instruction-tuned variants. Gemma
45
  models are well-suited for a variety of text generation tasks, including
46
  question answering, summarization, and reasoning. Their relatively small size
47
  makes it possible to deploy them in environments with limited resources such as
@@ -58,8 +50,8 @@ Below we share some code snippets on how to get quickly started with running the
58
  ```python
59
  from transformers import AutoTokenizer, AutoModelForCausalLM
60
 
61
- tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b-it")
62
- model = AutoModelForCausalLM.from_pretrained("google/gemma-2b-it")
63
 
64
  input_text = "Write me a poem about Machine Learning."
65
  input_ids = tokenizer(input_text, return_tensors="pt")
@@ -76,8 +68,8 @@ print(tokenizer.decode(outputs[0]))
76
  # pip install accelerate
77
  from transformers import AutoTokenizer, AutoModelForCausalLM
78
 
79
- tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b-it")
80
- model = AutoModelForCausalLM.from_pretrained("google/gemma-2b-it", device_map="auto")
81
 
82
  input_text = "Write me a poem about Machine Learning."
83
  input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
@@ -95,8 +87,8 @@ print(tokenizer.decode(outputs[0]))
95
  # pip install accelerate
96
  from transformers import AutoTokenizer, AutoModelForCausalLM
97
 
98
- tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b-it")
99
- model = AutoModelForCausalLM.from_pretrained("google/gemma-2b-it", device_map="auto", torch_dtype=torch.float16)
100
 
101
  input_text = "Write me a poem about Machine Learning."
102
  input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
@@ -111,8 +103,8 @@ print(tokenizer.decode(outputs[0]))
111
  # pip install accelerate
112
  from transformers import AutoTokenizer, AutoModelForCausalLM
113
 
114
- tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b-it")
115
- model = AutoModelForCausalLM.from_pretrained("google/gemma-2b-it", device_map="auto", torch_dtype=torch.bfloat16)
116
 
117
  input_text = "Write me a poem about Machine Learning."
118
  input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
@@ -131,8 +123,8 @@ from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
131
 
132
  quantization_config = BitsAndBytesConfig(load_in_8bit=True)
133
 
134
- tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b-it")
135
- model = AutoModelForCausalLM.from_pretrained("google/gemma-2b-it", quantization_config=quantization_config)
136
 
137
  input_text = "Write me a poem about Machine Learning."
138
  input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
@@ -149,8 +141,8 @@ from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
149
 
150
  quantization_config = BitsAndBytesConfig(load_in_4bit=True)
151
 
152
- tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b-it")
153
- model = AutoModelForCausalLM.from_pretrained("google/gemma-2b-it", quantization_config=quantization_config)
154
 
155
  input_text = "Write me a poem about Machine Learning."
156
  input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
@@ -186,7 +178,7 @@ from transformers import AutoTokenizer, AutoModelForCausalLM
186
  import transformers
187
  import torch
188
 
189
- model_id = "gg-hf/gemma-2b-it"
190
  dtype = torch.bfloat16
191
 
192
  tokenizer = AutoTokenizer.from_pretrained(model_id)
@@ -231,166 +223,6 @@ outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=150)
231
  * **Output:** Generated English-language text in response to the input, such
232
  as an answer to a question, or a summary of a document.
233
 
234
- ## Model Data
235
-
236
- Data used for model training and how the data was processed.
237
-
238
- ### Training Dataset
239
-
240
- These models were trained on a dataset of text data that includes a wide variety
241
- of sources, totaling 6 trillion tokens. Here are the key components:
242
-
243
- * Web Documents: A diverse collection of web text ensures the model is exposed
244
- to a broad range of linguistic styles, topics, and vocabulary. Primarily
245
- English-language content.
246
- * Code: Exposing the model to code helps it to learn the syntax and patterns of
247
- programming languages, which improves its ability to generate code or
248
- understand code-related questions.
249
- * Mathematics: Training on mathematical text helps the model learn logical
250
- reasoning, symbolic representation, and to address mathematical queries.
251
-
252
- The combination of these diverse data sources is crucial for training a powerful
253
- language model that can handle a wide variety of different tasks and text
254
- formats.
255
-
256
- ### Data Preprocessing
257
-
258
- Here are the key data cleaning and filtering methods applied to the training
259
- data:
260
-
261
- * CSAM Filtering: Rigorous CSAM (Child Sexual Abuse Material) filtering was
262
- applied at multiple stages in the data preparation process to ensure the
263
- exclusion of harmful and illegal content
264
- * Sensitive Data Filtering: As part of making Gemma pre-trained models safe and
265
- reliable, automated techniques were used to filter out certain personal
266
- information and other sensitive data from training sets.
267
- * Additional methods: Filtering based on content quality and safely in line with
268
- [our policies](https://storage.googleapis.com/gweb-uniblog-publish-prod/documents/2023_Google_AI_Principles_Progress_Update.pdf#page=11).
269
-
270
- ## Implementation Information
271
-
272
- Details about the model internals.
273
-
274
- ### Hardware
275
-
276
- Gemma was trained using the latest generation of
277
- [Tensor Processing Unit (TPU)](https://cloud.google.com/tpu/docs/intro-to-tpu) hardware (TPUv5e).
278
-
279
- Training large language models requires significant computational power. TPUs,
280
- designed specifically for matrix operations common in machine learning, offer
281
- several advantages in this domain:
282
-
283
- * Performance: TPUs are specifically designed to handle the massive computations
284
- involved in training LLMs. They can speed up training considerably compared to
285
- CPUs.
286
- * Memory: TPUs often come with large amounts of high-bandwidth memory, allowing
287
- for the handling of large models and batch sizes during training. This can
288
- lead to better model quality.
289
- * Scalability: TPU Pods (large clusters of TPUs) provide a scalable solution for
290
- handling the growing complexity of large foundation models. You can distribute
291
- training across multiple TPU devices for faster and more efficient processing.
292
- * Cost-effectiveness: In many scenarios, TPUs can provide a more cost-effective
293
- solution for training large models compared to CPU-based infrastructure,
294
- especially when considering the time and resources saved due to faster
295
- training.
296
- * These advantages are aligned with
297
- [Google's commitments to operate sustainably](https://sustainability.google/operating-sustainably/).
298
-
299
- ### Software
300
-
301
- Training was done using [JAX](https://github.com/google/jax) and [ML Pathways](https://blog.google/technology/ai/introducing-pathways-next-generation-ai-architecture/ml-pathways).
302
-
303
- JAX allows researchers to take advantage of the latest generation of hardware,
304
- including TPUs, for faster and more efficient training of large models.
305
-
306
- ML Pathways is Google's latest effort to build artificially intelligent systems
307
- capable of generalizing across multiple tasks. This is specially suitable for
308
- [foundation models](https://ai.google/discover/foundation-models/), including large language models like
309
- these ones.
310
-
311
- Together, JAX and ML Pathways are used as described in the
312
- [paper about the Gemini family of models](https://arxiv.org/abs/2312.11805); "the 'single
313
- controller' programming model of Jax and Pathways allows a single Python
314
- process to orchestrate the entire training run, dramatically simplifying the
315
- development workflow."
316
-
317
- ## Evaluation
318
-
319
- Model evaluation metrics and results.
320
-
321
- ### Benchmark Results
322
-
323
- These models were evaluated against a large collection of different datasets and
324
- metrics to cover different aspects of text generation:
325
-
326
- | Benchmark | Metric | 2B Params | 7B Params |
327
- | ------------------------------ | ------------- | ----------- | --------- |
328
- | [MMLU](https://arxiv.org/abs/2009.03300) | 5-shot, top-1 | 42.3 | 64.3 |
329
- | [HellaSwag](https://arxiv.org/abs/1905.07830) | 0-shot |71.4 | 81.2 |
330
- | [PIQA](https://arxiv.org/abs/1911.11641) | 0-shot | 77.3 | 81.2 |
331
- | [SocialIQA](https://arxiv.org/abs/1904.09728) | 0-shot | 59.7 | 51.8 |
332
- | [BooIQ](https://arxiv.org/abs/1905.10044) | 0-shot | 69.4 | 83.2 |
333
- | [WinoGrande](https://arxiv.org/abs/1907.10641) | partial score | 65.4 | 72.3 |
334
- | [CommonsenseQA](https://arxiv.org/abs/1811.00937) | 7-shot | 65.3 | 71.3 |
335
- | [OpenBookQA](https://arxiv.org/abs/1809.02789) | | 47.8 | 52.8 |
336
- | [ARC-e](https://arxiv.org/abs/1911.01547) | | 73.2 | 81.5 |
337
- | [ARC-c](https://arxiv.org/abs/1911.01547) | | 42.1 | 53.2 |
338
- | [TriviaQA](https://arxiv.org/abs/1705.03551) | 5-shot | 53.2 | 63.4 |
339
- | [Natural Questions](https://github.com/google-research-datasets/natural-questions) | 5-shot | - | 23 |
340
- | [HumanEval](https://arxiv.org/abs/2107.03374) | pass@1 | 22.0 | 32.3 |
341
- | [MBPP](https://arxiv.org/abs/2108.07732) | 3-shot | 29.2 | 44.4 |
342
- | [GSM8K](https://arxiv.org/abs/2110.14168) | maj@1 | 17.7 | 46.4 |
343
- | [MATH](https://arxiv.org/abs/2108.07732) | 4-shot | 11.8 | 24.3 |
344
- | [AGIEval](https://arxiv.org/abs/2304.06364) | | 24.2 | 41.7 |
345
- | [BIG-Bench](https://arxiv.org/abs/2206.04615) | | 35.2 | 55.1 |
346
- | ------------------------------ | ------------- | ----------- | --------- |
347
- | **Average** | | **54.0** | **56.4** |
348
-
349
- ## Ethics and Safety
350
-
351
- Ethics and safety evaluation approach and results.
352
-
353
- ### Evaluation Approach
354
-
355
- Our evaluation methods include structured evaluations and internal red-teaming
356
- testing of relevant content policies. Red-teaming was conducted by a number of
357
- different teams, each with different goals and human evaluation metrics. These
358
- models were evaluated against a number of different categories relevant to
359
- ethics and safety, including:
360
-
361
- * Text-to-Text Content Safety: Human evaluation on prompts covering safety
362
- policies including child sexual abuse and exploitation, harassment, violence
363
- and gore, and hate speech.
364
- * Text-to-Text Representational Harms: Benchmark against relevant academic
365
- datasets such as [WinoBias](https://arxiv.org/abs/1804.06876) and [BBQ Dataset](https://arxiv.org/abs/2110.08193v2).
366
- * Memorization: Automated evaluation of memorization of training data, including
367
- the risk of personally identifiable information exposure.
368
- * Large-scale harm: Tests for "dangerous capabilities," such as chemical,
369
- biological, radiological, and nuclear (CBRN) risks.
370
-
371
- ### Evaluation Results
372
-
373
- The results of ethics and safety evaluations are within acceptable thresholds
374
- for meeting [internal policies](https://storage.googleapis.com/gweb-uniblog-publish-prod/documents/2023_Google_AI_Principles_Progress_Update.pdf#page=11) for categories such as child
375
- safety, content safety, representational harms, memorization, large-scale harms.
376
- On top of robust internal evaluations, the results of well known safety
377
- benchmarks like BBQ, BOLD, Winogender, Winobias, RealToxicity, and TruthfulQA
378
- are shown here.
379
-
380
- | Benchmark | Metric | 2B Params | 7B Params |
381
- | ------------------------------ | ------------- | ----------- | --------- |
382
- | [RealToxicity](https://arxiv.org/abs/2009.11462) | average | 6.86 | 7.90 |
383
- | [BOLD](https://arxiv.org/abs/2101.11718) | | 45.57 | 49.08 |
384
- | [CrowS-Pairs](https://aclanthology.org/2020.emnlp-main.154/) | top-1 | 45.82 | 51.33 |
385
- | [BBQ Ambig](https://arxiv.org/abs/2110.08193v2) | 1-shot, top-1 | 62.58 | 92.54 |
386
- | [BBQ Disambig](https://arxiv.org/abs/2110.08193v2) | top-1 | 54.62 | 71.99 |
387
- | [Winogender](https://arxiv.org/abs/1804.09301) | top-1 | 51.25 | 54.17 |
388
- | [TruthfulQA](https://arxiv.org/abs/2109.07958) | | 44.84 | 31.81 |
389
- | [Winobias 1_2](https://arxiv.org/abs/1804.06876) | | 56.12 | 59.09 |
390
- | [Winobias 2_2](https://arxiv.org/abs/1804.06876) | | 91.10 | 92.23 |
391
- | [Toxigen](https://arxiv.org/abs/2203.09509) | | 29.77 | 39.59 |
392
- | ------------------------------ | ------------- | ----------- | --------- |
393
-
394
 
395
  ## Usage and Limitations
396
 
@@ -456,8 +288,6 @@ In creating an open model, we have carefully considered the following:
456
  reported in this card.
457
  * Misinformation and Misuse
458
  * LLMs can be misused to generate text that is false, misleading, or harmful.
459
- * Guidelines are provided for responsible use with the model, see the
460
- [Responsible Generative AI Toolkit](http://ai.google.dev/gemma/responsible).
461
  * Transparency and Accountability:
462
  * This model card summarizes details on the models' architecture,
463
  capabilities, limitations, and evaluation processes.
@@ -477,8 +307,7 @@ Risks identified and mitigations:
477
  * Misuse for malicious purposes: Technical limitations and developer and
478
  end-user education can help mitigate against malicious applications of LLMs.
479
  Educational resources and reporting mechanisms for users to flag misuse are
480
- provided. Prohibited uses of Gemma models are outlined in the
481
- [Gemma Prohibited Use Policy](https://ai.google.dev/gemma/prohibited_use_policy).
482
  * Privacy violations: Models were trained on data filtered for removal of PII
483
  (Personally Identifiable Information). Developers are encouraged to adhere to
484
  privacy regulations with privacy-preserving techniques.
 
8
  inference:
9
  parameters:
10
  max_new_tokens: 200
11
+ extra_gated_heading: "Access Resonance on Hugging Face"
12
+ extra_gated_prompt: "To access Resonance on Hugging Face, you’re required to review and agree to Resonance’s usage license. To do this, please ensure you’re logged-in to Hugging Face and click below. Requests are processed immediately."
13
  extra_gated_button_content: "Acknowledge license"
14
  license: other
 
 
 
15
 
16
+ ---
17
 
18
+ # Resonance Model Card
19
 
 
20
 
21
+ This model card corresponds to the 2B instruct version of the Resonance model.
22
 
 
 
 
23
 
24
+ **Terms of Use**:
25
 
26
+ **Authors**: AI Reseaerch Lab, NUST
27
 
28
  ## Model Information
29
 
 
31
 
32
  ### Description
33
 
34
+ Resonance is a family of lightweight, state-of-the-art open models.
 
35
  They are text-to-text, decoder-only large language models, available in English,
36
+ with open weights, pre-trained variants, and instruction-tuned variants. Resonance
37
  models are well-suited for a variety of text generation tasks, including
38
  question answering, summarization, and reasoning. Their relatively small size
39
  makes it possible to deploy them in environments with limited resources such as
 
50
  ```python
51
  from transformers import AutoTokenizer, AutoModelForCausalLM
52
 
53
+ tokenizer = AutoTokenizer.from_pretrained("usmanxia/resonance-2b-it")
54
+ model = AutoModelForCausalLM.from_pretrained("usmanxia/resonance-2b-it")
55
 
56
  input_text = "Write me a poem about Machine Learning."
57
  input_ids = tokenizer(input_text, return_tensors="pt")
 
68
  # pip install accelerate
69
  from transformers import AutoTokenizer, AutoModelForCausalLM
70
 
71
+ tokenizer = AutoTokenizer.from_pretrained("usmanxia/resonance-2b-it")
72
+ model = AutoModelForCausalLM.from_pretrained("usmanxia/resonance-2b-it", device_map="auto")
73
 
74
  input_text = "Write me a poem about Machine Learning."
75
  input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
 
87
  # pip install accelerate
88
  from transformers import AutoTokenizer, AutoModelForCausalLM
89
 
90
+ tokenizer = AutoTokenizer.from_pretrained("usmanxia/resonance-2b-it")
91
+ model = AutoModelForCausalLM.from_pretrained("usmanxia/resonance-2b-it", device_map="auto", torch_dtype=torch.float16)
92
 
93
  input_text = "Write me a poem about Machine Learning."
94
  input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
 
103
  # pip install accelerate
104
  from transformers import AutoTokenizer, AutoModelForCausalLM
105
 
106
+ tokenizer = AutoTokenizer.from_pretrained("usmanxia/resonance-2b-it")
107
+ model = AutoModelForCausalLM.from_pretrained("usmanxia/resonance-2b-it", device_map="auto", torch_dtype=torch.bfloat16)
108
 
109
  input_text = "Write me a poem about Machine Learning."
110
  input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
 
123
 
124
  quantization_config = BitsAndBytesConfig(load_in_8bit=True)
125
 
126
+ tokenizer = AutoTokenizer.from_pretrained("usmanxia/resonance-2b-it")
127
+ model = AutoModelForCausalLM.from_pretrained("usmanxia/resonance-2b-it", quantization_config=quantization_config)
128
 
129
  input_text = "Write me a poem about Machine Learning."
130
  input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
 
141
 
142
  quantization_config = BitsAndBytesConfig(load_in_4bit=True)
143
 
144
+ tokenizer = AutoTokenizer.from_pretrained("usmanxia/resonance-2b-it")
145
+ model = AutoModelForCausalLM.from_pretrained("usmanxia/resonance-2b-it", quantization_config=quantization_config)
146
 
147
  input_text = "Write me a poem about Machine Learning."
148
  input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
 
178
  import transformers
179
  import torch
180
 
181
+ model_id = "usmanxia/resonance-it"
182
  dtype = torch.bfloat16
183
 
184
  tokenizer = AutoTokenizer.from_pretrained(model_id)
 
223
  * **Output:** Generated English-language text in response to the input, such
224
  as an answer to a question, or a summary of a document.
225
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
226
 
227
  ## Usage and Limitations
228
 
 
288
  reported in this card.
289
  * Misinformation and Misuse
290
  * LLMs can be misused to generate text that is false, misleading, or harmful.
 
 
291
  * Transparency and Accountability:
292
  * This model card summarizes details on the models' architecture,
293
  capabilities, limitations, and evaluation processes.
 
307
  * Misuse for malicious purposes: Technical limitations and developer and
308
  end-user education can help mitigate against malicious applications of LLMs.
309
  Educational resources and reporting mechanisms for users to flag misuse are
310
+ provided.
 
311
  * Privacy violations: Models were trained on data filtered for removal of PII
312
  (Personally Identifiable Information). Developers are encouraged to adhere to
313
  privacy regulations with privacy-preserving techniques.