HashNuke
/

google-gemma-3-270m-it

@@ -174,6 +174,42 @@ metrics to cover different aspects of text generation. Evaluation results marked
 with **IT** are for instruction-tuned models. Evaluation results marked with
 **PT** are for pre-trained models.
 #### Gemma 3 1B, 4B, 12B & 27B
 ##### Reasoning and factuality
@@ -327,42 +363,6 @@ with **IT** are for instruction-tuned models. Evaluation results marked with
 [countbenchqa]: https://github.com/google-research/big_vision/blob/main/big_vision/datasets/countbenchqa/
 [mathvista]: https://arxiv.org/abs/2310.02255
-#### Gemma 3 270M
-| **Benchmark**             |  **n-shot**   | **Gemma 3 PT 270M** |
-| :------------------------ | :-----------: | ------------------: |
-| [HellaSwag][hellaswag]    |    10-shot    |                40.9 |
-| [BoolQ][boolq]            |    0-shot     |                61.4 |
-| [PIQA][piqa]              |    0-shot     |                67.7 |
-| [TriviaQA][triviaqa]      |    5-shot     |                15.4 |
-| [ARC-c][arc]              |    25-shot    |                29.0 |
-| [ARC-e][arc]              |    0-shot     |                57.7 |
-| [WinoGrande][winogrande]  |    5-shot     |                52.0 |
-[hellaswag]: https://arxiv.org/abs/1905.07830
-[boolq]: https://arxiv.org/abs/1905.10044
-[piqa]: https://arxiv.org/abs/1911.11641
-[triviaqa]: https://arxiv.org/abs/1705.03551
-[arc]: https://arxiv.org/abs/1911.01547
-[winogrande]: https://arxiv.org/abs/1907.10641
-| **Benchmark**             |  **n-shot**   | **Gemma 3 IT 270m** |
-| :------------------------ | :-----------: | ------------------: |
-| [HellaSwag][hellaswag]    |    0-shot     |                37.7 |
-| [PIQA][piqa]              |    0-shot     |                66.2 |
-| [ARC-c][arc]              |    0-shot     |                28.2 |
-| [WinoGrande][winogrande]  |    0-shot     |                52.3 |
-| [BIG-Bench Hard][bbh]     |   few-shot    |                26.7 |
-| [IF Eval][ifeval]         |    0-shot     |                51.2 |
-[hellaswag]: https://arxiv.org/abs/1905.07830
-[piqa]: https://arxiv.org/abs/1911.11641
-[arc]: https://arxiv.org/abs/1911.01547
-[winogrande]: https://arxiv.org/abs/1907.10641
-[bbh]: https://paperswithcode.com/dataset/bbh
-[bbh]: https://paperswithcode.com/dataset/bbh
-[ifeval]: https://arxiv.org/abs/2311.07911
 ## Ethics and Safety
 Ethics and safety evaluation approach and results.
@@ -528,5 +528,3 @@ alternatives.
 [ml-pathways]: https://blog.google/technology/ai/introducing-pathways-next-generation-ai-architecture/
 [sustainability]: https://sustainability.google/operating-sustainably/
 [gemini-2-paper]: https://arxiv.org/abs/2312.11805

 with **IT** are for instruction-tuned models. Evaluation results marked with
 **PT** are for pre-trained models.
+#### Gemma 3 270M
+| **Benchmark**             |  **n-shot**   | **Gemma 3 PT 270M** |
+| :------------------------ | :-----------: | ------------------: |
+| [HellaSwag][hellaswag]    |    10-shot    |                40.9 |
+| [BoolQ][boolq]            |    0-shot     |                61.4 |
+| [PIQA][piqa]              |    0-shot     |                67.7 |
+| [TriviaQA][triviaqa]      |    5-shot     |                15.4 |
+| [ARC-c][arc]              |    25-shot    |                29.0 |
+| [ARC-e][arc]              |    0-shot     |                57.7 |
+| [WinoGrande][winogrande]  |    5-shot     |                52.0 |
+[hellaswag]: https://arxiv.org/abs/1905.07830
+[boolq]: https://arxiv.org/abs/1905.10044
+[piqa]: https://arxiv.org/abs/1911.11641
+[triviaqa]: https://arxiv.org/abs/1705.03551
+[arc]: https://arxiv.org/abs/1911.01547
+[winogrande]: https://arxiv.org/abs/1907.10641
+| **Benchmark**             |  **n-shot**   | **Gemma 3 IT 270m** |
+| :------------------------ | :-----------: | ------------------: |
+| [HellaSwag][hellaswag]    |    0-shot     |                37.7 |
+| [PIQA][piqa]              |    0-shot     |                66.2 |
+| [ARC-c][arc]              |    0-shot     |                28.2 |
+| [WinoGrande][winogrande]  |    0-shot     |                52.3 |
+| [BIG-Bench Hard][bbh]     |   few-shot    |                26.7 |
+| [IF Eval][ifeval]         |    0-shot     |                51.2 |
+[hellaswag]: https://arxiv.org/abs/1905.07830
+[piqa]: https://arxiv.org/abs/1911.11641
+[arc]: https://arxiv.org/abs/1911.01547
+[winogrande]: https://arxiv.org/abs/1907.10641
+[bbh]: https://paperswithcode.com/dataset/bbh
+[bbh]: https://paperswithcode.com/dataset/bbh
+[ifeval]: https://arxiv.org/abs/2311.07911
 #### Gemma 3 1B, 4B, 12B & 27B
 ##### Reasoning and factuality
 [countbenchqa]: https://github.com/google-research/big_vision/blob/main/big_vision/datasets/countbenchqa/
 [mathvista]: https://arxiv.org/abs/2310.02255
 ## Ethics and Safety
 Ethics and safety evaluation approach and results.
 [ml-pathways]: https://blog.google/technology/ai/introducing-pathways-next-generation-ai-architecture/
 [sustainability]: https://sustainability.google/operating-sustainably/
 [gemini-2-paper]: https://arxiv.org/abs/2312.11805