osanseviero commited on
Commit
33b93ac
·
verified ·
1 Parent(s): b968cba

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +36 -38
README.md CHANGED
@@ -174,6 +174,42 @@ metrics to cover different aspects of text generation. Evaluation results marked
174
  with **IT** are for instruction-tuned models. Evaluation results marked with
175
  **PT** are for pre-trained models.
176
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
177
  #### Gemma 3 1B, 4B, 12B & 27B
178
 
179
  ##### Reasoning and factuality
@@ -327,42 +363,6 @@ with **IT** are for instruction-tuned models. Evaluation results marked with
327
  [countbenchqa]: https://github.com/google-research/big_vision/blob/main/big_vision/datasets/countbenchqa/
328
  [mathvista]: https://arxiv.org/abs/2310.02255
329
 
330
- #### Gemma 3 270M
331
-
332
- | **Benchmark** | **n-shot** | **Gemma 3 PT 270M** |
333
- | :------------------------ | :-----------: | ------------------: |
334
- | [HellaSwag][hellaswag] | 10-shot | 40.9 |
335
- | [BoolQ][boolq] | 0-shot | 61.4 |
336
- | [PIQA][piqa] | 0-shot | 67.7 |
337
- | [TriviaQA][triviaqa] | 5-shot | 15.4 |
338
- | [ARC-c][arc] | 25-shot | 29.0 |
339
- | [ARC-e][arc] | 0-shot | 57.7 |
340
- | [WinoGrande][winogrande] | 5-shot | 52.0 |
341
-
342
- [hellaswag]: https://arxiv.org/abs/1905.07830
343
- [boolq]: https://arxiv.org/abs/1905.10044
344
- [piqa]: https://arxiv.org/abs/1911.11641
345
- [triviaqa]: https://arxiv.org/abs/1705.03551
346
- [arc]: https://arxiv.org/abs/1911.01547
347
- [winogrande]: https://arxiv.org/abs/1907.10641
348
-
349
- | **Benchmark** | **n-shot** | **Gemma 3 IT 270m** |
350
- | :------------------------ | :-----------: | ------------------: |
351
- | [HellaSwag][hellaswag] | 0-shot | 37.7 |
352
- | [PIQA][piqa] | 0-shot | 66.2 |
353
- | [ARC-c][arc] | 0-shot | 28.2 |
354
- | [WinoGrande][winogrande] | 0-shot | 52.3 |
355
- | [BIG-Bench Hard][bbh] | few-shot | 26.7 |
356
- | [IF Eval][ifeval] | 0-shot | 51.2 |
357
-
358
- [hellaswag]: https://arxiv.org/abs/1905.07830
359
- [piqa]: https://arxiv.org/abs/1911.11641
360
- [arc]: https://arxiv.org/abs/1911.01547
361
- [winogrande]: https://arxiv.org/abs/1907.10641
362
- [bbh]: https://paperswithcode.com/dataset/bbh
363
- [bbh]: https://paperswithcode.com/dataset/bbh
364
- [ifeval]: https://arxiv.org/abs/2311.07911
365
-
366
  ## Ethics and Safety
367
 
368
  Ethics and safety evaluation approach and results.
@@ -528,5 +528,3 @@ alternatives.
528
  [ml-pathways]: https://blog.google/technology/ai/introducing-pathways-next-generation-ai-architecture/
529
  [sustainability]: https://sustainability.google/operating-sustainably/
530
  [gemini-2-paper]: https://arxiv.org/abs/2312.11805
531
-
532
-
 
174
  with **IT** are for instruction-tuned models. Evaluation results marked with
175
  **PT** are for pre-trained models.
176
 
177
+ #### Gemma 3 270M
178
+
179
+ | **Benchmark** | **n-shot** | **Gemma 3 PT 270M** |
180
+ | :------------------------ | :-----------: | ------------------: |
181
+ | [HellaSwag][hellaswag] | 10-shot | 40.9 |
182
+ | [BoolQ][boolq] | 0-shot | 61.4 |
183
+ | [PIQA][piqa] | 0-shot | 67.7 |
184
+ | [TriviaQA][triviaqa] | 5-shot | 15.4 |
185
+ | [ARC-c][arc] | 25-shot | 29.0 |
186
+ | [ARC-e][arc] | 0-shot | 57.7 |
187
+ | [WinoGrande][winogrande] | 5-shot | 52.0 |
188
+
189
+ [hellaswag]: https://arxiv.org/abs/1905.07830
190
+ [boolq]: https://arxiv.org/abs/1905.10044
191
+ [piqa]: https://arxiv.org/abs/1911.11641
192
+ [triviaqa]: https://arxiv.org/abs/1705.03551
193
+ [arc]: https://arxiv.org/abs/1911.01547
194
+ [winogrande]: https://arxiv.org/abs/1907.10641
195
+
196
+ | **Benchmark** | **n-shot** | **Gemma 3 IT 270m** |
197
+ | :------------------------ | :-----------: | ------------------: |
198
+ | [HellaSwag][hellaswag] | 0-shot | 37.7 |
199
+ | [PIQA][piqa] | 0-shot | 66.2 |
200
+ | [ARC-c][arc] | 0-shot | 28.2 |
201
+ | [WinoGrande][winogrande] | 0-shot | 52.3 |
202
+ | [BIG-Bench Hard][bbh] | few-shot | 26.7 |
203
+ | [IF Eval][ifeval] | 0-shot | 51.2 |
204
+
205
+ [hellaswag]: https://arxiv.org/abs/1905.07830
206
+ [piqa]: https://arxiv.org/abs/1911.11641
207
+ [arc]: https://arxiv.org/abs/1911.01547
208
+ [winogrande]: https://arxiv.org/abs/1907.10641
209
+ [bbh]: https://paperswithcode.com/dataset/bbh
210
+ [bbh]: https://paperswithcode.com/dataset/bbh
211
+ [ifeval]: https://arxiv.org/abs/2311.07911
212
+
213
  #### Gemma 3 1B, 4B, 12B & 27B
214
 
215
  ##### Reasoning and factuality
 
363
  [countbenchqa]: https://github.com/google-research/big_vision/blob/main/big_vision/datasets/countbenchqa/
364
  [mathvista]: https://arxiv.org/abs/2310.02255
365
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
366
  ## Ethics and Safety
367
 
368
  Ethics and safety evaluation approach and results.
 
528
  [ml-pathways]: https://blog.google/technology/ai/introducing-pathways-next-generation-ai-architecture/
529
  [sustainability]: https://sustainability.google/operating-sustainably/
530
  [gemini-2-paper]: https://arxiv.org/abs/2312.11805