legolasyiu commited on
Commit
6050da2
·
verified ·
1 Parent(s): 4ad722e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +71 -0
README.md CHANGED
@@ -9,6 +9,8 @@ tags:
9
  - unsloth
10
  - mistral
11
  - trl
 
 
12
  ---
13
 
14
  # Uploaded model
@@ -16,7 +18,76 @@ tags:
16
  - **Developed by:** EpistemeAI
17
  - **License:** apache-2.0
18
  - **Finetuned from model :** unsloth/Mistral-Nemo-Base-2407-bnb-4bit
 
 
 
 
 
 
 
 
 
 
 
 
 
19
 
20
  This mistral model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
21
 
22
  [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  - unsloth
10
  - mistral
11
  - trl
12
+ datasets:
13
+ - meta-math/MetaMathQA
14
  ---
15
 
16
  # Uploaded model
 
18
  - **Developed by:** EpistemeAI
19
  - **License:** apache-2.0
20
  - **Finetuned from model :** unsloth/Mistral-Nemo-Base-2407-bnb-4bit
21
+ -
22
+ This mistral model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
23
+
24
+ [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
25
+
26
+ # Fireball-MathMistral-Nemo-Base-2407
27
+
28
+ This model is fine-tune to provide better math response than Mistral-Nemo-Base-2407
29
+
30
+ ## Training Dataset
31
+ Supervised fine-tuning with datasets with meta-math/MetaMathQA
32
+
33
+
34
 
35
  This mistral model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
36
 
37
  [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
38
+
39
+ # Model Card for Mistral-Nemo-Base-2407
40
+
41
+ The Fireball-MathMistral-Nemo-Base-2407 Large Language Model (LLM) is a pretrained generative text model of 12B parameters, it significantly outperforms existing models smaller or similar in size.
42
+
43
+ For more details about this model please refer to our release [blog post](https://mistral.ai/news/mistral-nemo/).
44
+
45
+ ## Key features
46
+ - Released under the **Apache 2 License**
47
+ - Trained with a **128k context window**
48
+ - Trained on a large proportion of **multilingual and code data**
49
+ - Drop-in replacement of Mistral 7B
50
+
51
+ ## Model Architecture
52
+ Mistral Nemo is a transformer model, with the following architecture choices:
53
+ - **Layers:** 40
54
+ - **Dim:** 5,120
55
+ - **Head dim:** 128
56
+ - **Hidden dim:** 14,436
57
+ - **Activation Function:** SwiGLU
58
+ - **Number of heads:** 32
59
+ - **Number of kv-heads:** 8 (GQA)
60
+ - **Vocabulary size:** 2**17 ~= 128k
61
+ - **Rotary embeddings (theta = 1M)**
62
+
63
+ #### Demo
64
+
65
+ After installing `mistral_inference`, a `mistral-demo` CLI command should be available in your environment.
66
+
67
+
68
+ ### Transformers
69
+
70
+ > [!IMPORTANT]
71
+ > NOTE: Until a new release has been made, you need to install transformers from source:
72
+ > ```sh
73
+ > pip install git+https://github.com/huggingface/transformers.git
74
+ > ```
75
+
76
+ If you want to use Hugging Face `transformers` to generate text, you can do something like this.
77
+
78
+ ```py
79
+ from transformers import AutoModelForCausalLM, AutoTokenizer
80
+ model_id = "EpistemeAI/Fireball-MathMistral-Nemo-Base-2407"
81
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
82
+ model = AutoModelForCausalLM.from_pretrained(model_id)
83
+ inputs = tokenizer("Hello my name is", return_tensors="pt")
84
+ outputs = model.generate(**inputs, max_new_tokens=20)
85
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
86
+ ```
87
+
88
+ > [!TIP]
89
+ > Unlike previous Mistral models, Mistral Nemo requires smaller temperatures. We recommend to use a temperature of 0.3.
90
+
91
+ ## Note
92
+
93
+ `Mistral-Nemo-Base-2407` is a pretrained base model and therefore does not have any moderation mechanisms.