Update README.md
Browse files
README.md
CHANGED
|
@@ -1,4 +1,16 @@
|
|
| 1 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
|
| 3 |
**Key features**
|
| 4 |
- Released under the Apache 2 License
|
|
@@ -6,14 +18,17 @@ Mistral-NeMo is a Large Language Model (LLM) composed of 12B parameters, trained
|
|
| 6 |
- Trained with a 128k context window
|
| 7 |
- Trained on a large proportion of multilingual and code data
|
| 8 |
|
| 9 |
-
|
| 10 |
-
license: apache-2.0
|
| 11 |
-
---
|
| 12 |
|
| 13 |
-
|
| 14 |
-
|
|
|
|
| 15 |
|
| 16 |
-
Mistral-NeMo
|
|
|
|
|
|
|
|
|
|
|
|
|
| 17 |
|
| 18 |
- Layers: 40
|
| 19 |
- Dim: 5,120
|
|
@@ -24,11 +39,12 @@ Mistral-NeMo is a transformer model, with the following architecture choices:
|
|
| 24 |
- Number of kv-heads: 8 (GQA)
|
| 25 |
- Rotary embeddings (theta = 1M)
|
| 26 |
- Vocabulary size: 2**17 ~= 128k
|
|
|
|
|
|
|
| 27 |
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
Main benchmarks
|
| 31 |
|
|
|
|
| 32 |
- HellaSwag (0-shot): 83.5%
|
| 33 |
- Winogrande (0-shot): 76.8%
|
| 34 |
- OpenBookQA (0-shot): 60.6%
|
|
@@ -38,9 +54,9 @@ Main benchmarks
|
|
| 38 |
- TriviaQA (5-shot): 73.8%
|
| 39 |
- NaturalQuestions (5-shot): 31.2%
|
| 40 |
|
| 41 |
-
Multilingual benchmarks
|
| 42 |
|
| 43 |
-
MMLU
|
| 44 |
- French: 62.3%
|
| 45 |
- German: 62.7%
|
| 46 |
- Spanish: 64.6%
|
|
@@ -48,4 +64,4 @@ MMLU
|
|
| 48 |
- Portuguese: 63.3%
|
| 49 |
- Russian: 59.2%
|
| 50 |
- Chinese: 59.0%
|
| 51 |
-
-Japanese: 59.0%
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
tags:
|
| 4 |
+
- nvidia
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
+
## Mistral-NeMo-12B-Base
|
| 8 |
+
|
| 9 |
+
[](#model-architecture)[](#model-architecture)[](#datasets)
|
| 10 |
+
|
| 11 |
+
### Model Overview:
|
| 12 |
+
|
| 13 |
+
Mistral-NeMo-12B-Base is a Large Language Model (LLM) composed of 12B parameters, trained jointly by NVIDIA and Mistral AI. It significantly outperforms existing models smaller or similar in size.
|
| 14 |
|
| 15 |
**Key features**
|
| 16 |
- Released under the Apache 2 License
|
|
|
|
| 18 |
- Trained with a 128k context window
|
| 19 |
- Trained on a large proportion of multilingual and code data
|
| 20 |
|
| 21 |
+
### Intended use
|
|
|
|
|
|
|
| 22 |
|
| 23 |
+
Mistral-NeMo-12B-Base is a completion model intended for use in over 80+ programming languages and designed for global, multilingual applications. It is fast, trained on function-calling, has a large context window, and is particularly strong in English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi. It is compatible with [NVIDIA NeMo Framework](https://docs.nvidia.com/nemo-framework/index.html). For best performance on a given task, users are encouraged to customize the model using the NeMo Framework suite of customization tools including Parameter-Efficient Fine-Tuning (P-tuning, Adapters, LoRA, and more), and Model Alignment (SFT, SteerLM, RLHF, and more) using [NeMo-Aligner](https://github.com/NVIDIA/NeMo-Aligner). Refer to the [documentation](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/nemotron/index.html) for examples.
|
| 24 |
+
|
| 25 |
+
**Model Developer:** [NVIDIA](https://www.nvidia.com/en-us/) and [MistralAI](https://mistral.ai/)
|
| 26 |
|
| 27 |
+
**Model Dates:** Mistral-NeMo-12B-Base was trained between 2023 and July 2024.
|
| 28 |
+
|
| 29 |
+
### Model Architecture:
|
| 30 |
+
|
| 31 |
+
Mistral-NeMo-12B-Base is a transformer model, with the following architecture choices:
|
| 32 |
|
| 33 |
- Layers: 40
|
| 34 |
- Dim: 5,120
|
|
|
|
| 39 |
- Number of kv-heads: 8 (GQA)
|
| 40 |
- Rotary embeddings (theta = 1M)
|
| 41 |
- Vocabulary size: 2**17 ~= 128k
|
| 42 |
+
|
| 43 |
+
**Architecture Type:** Transformer Decoder (auto-regressive language model)
|
| 44 |
|
| 45 |
+
### Evaluation Results
|
|
|
|
|
|
|
| 46 |
|
| 47 |
+
**Main Benchmarks**
|
| 48 |
- HellaSwag (0-shot): 83.5%
|
| 49 |
- Winogrande (0-shot): 76.8%
|
| 50 |
- OpenBookQA (0-shot): 60.6%
|
|
|
|
| 54 |
- TriviaQA (5-shot): 73.8%
|
| 55 |
- NaturalQuestions (5-shot): 31.2%
|
| 56 |
|
| 57 |
+
**Multilingual benchmarks**
|
| 58 |
|
| 59 |
+
Multilingual MMLU in 5-shot setting:
|
| 60 |
- French: 62.3%
|
| 61 |
- German: 62.7%
|
| 62 |
- Spanish: 64.6%
|
|
|
|
| 64 |
- Portuguese: 63.3%
|
| 65 |
- Russian: 59.2%
|
| 66 |
- Chinese: 59.0%
|
| 67 |
+
- Japanese: 59.0%
|