nvidia
/

Mistral-NeMo-12B-Base

NeMo

nvidia

Model card Files Files and versions

xet

Community

shrimai19 commited on Jul 18, 2024

Commit

f96a64a

verified ·

1 Parent(s): 9a02a14

Update README.md

Browse files

Files changed (1) hide show

README.md +29 -13

README.md CHANGED Viewed

@@ -1,4 +1,16 @@
-Mistral-NeMo is a Large Language Model (LLM) composed of 12B parameters, trained jointly by Mistral AI and NVIDIA. It significantly outperforms existing models smaller or similar in size.
 **Key features**
 - Released under the Apache 2 License
@@ -6,14 +18,17 @@ Mistral-NeMo is a Large Language Model (LLM) composed of 12B parameters, trained
 - Trained with a 128k context window
 - Trained on a large proportion of multilingual and code data
----
-license: apache-2.0
----
----
-Model Architecture
-Mistral-NeMo is a transformer model, with the following architecture choices:
 - Layers: 40
 - Dim: 5,120
@@ -24,11 +39,12 @@ Mistral-NeMo is a transformer model, with the following architecture choices:
 - Number of kv-heads: 8 (GQA)
 - Rotary embeddings (theta = 1M)
 - Vocabulary size: 2**17 ~= 128k
----
-Main benchmarks
 - HellaSwag (0-shot): 83.5%
 - Winogrande (0-shot): 76.8%
 - OpenBookQA (0-shot): 60.6%
@@ -38,9 +54,9 @@ Main benchmarks
 - TriviaQA (5-shot): 73.8%
 - NaturalQuestions (5-shot): 31.2%
-Multilingual benchmarks
-MMLU
 - French: 62.3%
 - German: 62.7%
 - Spanish: 64.6%
@@ -48,4 +64,4 @@ MMLU
 - Portuguese: 63.3%
 - Russian: 59.2%
 - Chinese: 59.0%
--Japanese: 59.0%

+---
+license: apache-2.0
+tags:
+- nvidia
+---
+## Mistral-NeMo-12B-Base
+[![Model architecture](https://img.shields.io/badge/Model%20Arch-Transformer%20Decoder-green)](#model-architecture)[![Model size](https://img.shields.io/badge/Params-12B-green)](#model-architecture)[![Language](https://img.shields.io/badge/Language-Multilingual-green)](#datasets)
+### Model Overview:
+Mistral-NeMo-12B-Base is a Large Language Model (LLM) composed of 12B parameters, trained jointly by NVIDIA and Mistral AI. It significantly outperforms existing models smaller or similar in size.
 **Key features**
 - Released under the Apache 2 License
 - Trained with a 128k context window
 - Trained on a large proportion of multilingual and code data
+### Intended use
+Mistral-NeMo-12B-Base is a completion model intended for use in over 80+ programming languages and designed for global, multilingual applications. It is fast, trained on function-calling, has a large context window, and is particularly strong in English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, and Hindi. It is compatible with [NVIDIA NeMo Framework](https://docs.nvidia.com/nemo-framework/index.html). For best performance on a given task, users are encouraged to customize the model using the NeMo Framework suite of customization tools including Parameter-Efficient Fine-Tuning (P-tuning, Adapters, LoRA, and more), and Model Alignment (SFT, SteerLM, RLHF, and more) using [NeMo-Aligner](https://github.com/NVIDIA/NeMo-Aligner). Refer to the [documentation](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/nemotron/index.html) for examples.
+**Model Developer:** [NVIDIA](https://www.nvidia.com/en-us/) and [MistralAI](https://mistral.ai/)
+**Model Dates:** Mistral-NeMo-12B-Base was trained between  2023 and July 2024.
+### Model Architecture:
+Mistral-NeMo-12B-Base is a transformer model, with the following architecture choices:
 - Layers: 40
 - Dim: 5,120
 - Number of kv-heads: 8 (GQA)
 - Rotary embeddings (theta = 1M)
 - Vocabulary size: 2**17 ~= 128k
+**Architecture Type:** Transformer Decoder (auto-regressive language model)
+### Evaluation Results
+**Main Benchmarks**
 - HellaSwag (0-shot): 83.5%
 - Winogrande (0-shot): 76.8%
 - OpenBookQA (0-shot): 60.6%
 - TriviaQA (5-shot): 73.8%
 - NaturalQuestions (5-shot): 31.2%
+**Multilingual benchmarks**
+Multilingual MMLU in 5-shot setting:
 - French: 62.3%
 - German: 62.7%
 - Spanish: 64.6%
 - Portuguese: 63.3%
 - Russian: 59.2%
 - Chinese: 59.0%
+- Japanese: 59.0%