Update README.md
Browse files
README.md
CHANGED
|
@@ -19,6 +19,13 @@ This is NanoLM-70M-Instruct-v1. The model currently supports **English only**.
|
|
| 19 |
|
| 20 |
## Model Details
|
| 21 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
The tokenizer and model architecture of NanoLM-70M-Instruct-v1 are the same as [SmolLM-135M](https://huggingface.co/HuggingFaceTB/SmolLM-135M), but the number of layers has been reduced from 30 to 12.
|
| 23 |
|
| 24 |
Essentially, it is a pure LLaMA architecture, specifically LlamaForCausalLM.
|
|
|
|
| 19 |
|
| 20 |
## Model Details
|
| 21 |
|
| 22 |
+
| Nano LMs | Non-emb Params | Arch | Layers | Dim | Heads | Seq Len |
|
| 23 |
+
| :----------: | :------------------: | :---: | :----: | :-------: | :---: | :---: |
|
| 24 |
+
| 25M | 15M | MistralForCausalLM | 12 | 312 | 12 |2K|
|
| 25 |
+
| **70M** | **42M** | **LlamaForCausalLM** | **12** | **576** | **9** | **2K** |
|
| 26 |
+
| 0.3B | 180M | Qwen2ForCausalLM | 12 | 896 | 14 |4K|
|
| 27 |
+
| 1B | 840M | Qwen2ForCausalLM | 18 | 1536 | 12 |4K|
|
| 28 |
+
|
| 29 |
The tokenizer and model architecture of NanoLM-70M-Instruct-v1 are the same as [SmolLM-135M](https://huggingface.co/HuggingFaceTB/SmolLM-135M), but the number of layers has been reduced from 30 to 12.
|
| 30 |
|
| 31 |
Essentially, it is a pure LLaMA architecture, specifically LlamaForCausalLM.
|