Instructions to use modrill/olmo_3_1025_7b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use modrill/olmo_3_1025_7b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="modrill/olmo_3_1025_7b") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("modrill/olmo_3_1025_7b") model = AutoModelForCausalLM.from_pretrained("modrill/olmo_3_1025_7b") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use modrill/olmo_3_1025_7b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "modrill/olmo_3_1025_7b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "modrill/olmo_3_1025_7b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/modrill/olmo_3_1025_7b
- SGLang
How to use modrill/olmo_3_1025_7b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "modrill/olmo_3_1025_7b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "modrill/olmo_3_1025_7b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "modrill/olmo_3_1025_7b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "modrill/olmo_3_1025_7b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use modrill/olmo_3_1025_7b with Docker Model Runner:
docker model run hf.co/modrill/olmo_3_1025_7b
Add files using upload-large-folder tool
Browse files- .gitattributes +1 -0
- README.md +165 -10
- chat_template.jinja +16 -0
- config.json +68 -0
- generation_config.json +4 -0
- merges.txt +0 -0
- model-00001-of-00003.safetensors +3 -0
- model-00002-of-00003.safetensors +3 -0
- model-00003-of-00003.safetensors +3 -0
- model.safetensors.index.json +363 -0
- olmo-base.png +3 -0
- olmo3-logo +1 -0
- special_tokens_map.json +11 -0
- tokenizer.json +0 -0
- tokenizer_config.json +189 -0
- vocab.json +0 -0
.gitattributes
CHANGED
|
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
olmo-base.png filter=lfs diff=lfs merge=lfs -text
|
README.md
CHANGED
|
@@ -1,16 +1,171 @@
|
|
| 1 |
---
|
| 2 |
-
license:
|
| 3 |
-
|
| 4 |
-
-
|
| 5 |
-
- text-generation
|
| 6 |
library_name: transformers
|
|
|
|
|
|
|
| 7 |
---
|
| 8 |
|
| 9 |
-
#
|
|
|
|
| 10 |
|
| 11 |
-
Auto-uploaded from local output (MergeBench and LlamaFactory excluded).
|
| 12 |
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
language:
|
| 4 |
+
- en
|
|
|
|
| 5 |
library_name: transformers
|
| 6 |
+
datasets:
|
| 7 |
+
- allenai/dolma3_mix-6T-1025
|
| 8 |
---
|
| 9 |
|
| 10 |
+
## Model Details
|
| 11 |
+
<img alt="Logo for Olmo 3 7B Base model" src="olmo-base.png" width="211px" style="margin-left:'auto' margin-right:'auto' display:'block'">
|
| 12 |
|
|
|
|
| 13 |
|
| 14 |
+
# Model Card for Olmo 3 7B
|
| 15 |
+
|
| 16 |
+
We introduce Olmo 3, a new family of 7B and 32B models. This suite includes Base, Instruct, and Think variants. The Base models were trained using a staged training approach.
|
| 17 |
+
|
| 18 |
+
Olmo is a series of **O**pen **l**anguage **mo**dels designed to enable the science of language models.
|
| 19 |
+
These models are trained on the Dolma 3 dataset. We are releasing all code, checkpoints, and associated training details.
|
| 20 |
+
|
| 21 |
+
| Size | Training Tokens | Layers | Hidden Size | Q Heads | KV Heads | Context Length |
|
| 22 |
+
|--------|-----------------|--------|-------------|---------|----------|----------------|
|
| 23 |
+
| [OLMo 3 7B](https://huggingface.co/allenai/Olmo-3-1025-7B) | 5.93 Trillion | 32 | 4096 | 32 | 32 | 65,536 |
|
| 24 |
+
| [OLMo 3 32B](https://huggingface.co/allenai/Olmo-3-1125-32B) | 5.50 Trillion | 64 | 5120 | 40 | 8 | 65,536 |
|
| 25 |
+
|
| 26 |
+
|
| 27 |
+
The core models released in this batch include the following:
|
| 28 |
+
|
| 29 |
+
| **Stage** | **Olmo 3 7B Think** | **Olmo 3 32B Think** | **Olmo 3 7B Instruct** |
|
| 30 |
+
|--------------------------|-----------------------|------------------------|---------------------------|
|
| 31 |
+
| **Base Model** | [Olmo-3-7B](https://huggingface.co/allenai/Olmo-3-1025-7B) | [Olmo-3-32B](https://huggingface.co/allenai/Olmo-3-1125-32B) | [Olmo-3-7B](https://huggingface.co/allenai/Olmo-3-1025-7B) |
|
| 32 |
+
| **SFT** | [Olmo-3-7B-Think-SFT](https://huggingface.co/allenai/Olmo-3-7B-Think-SFT) | [Olmo-3-32B-Think-SFT](https://huggingface.co/allenai/Olmo-3-32B-Think-SFT) | [Olmo-3-7B-Instruct-SFT](https://huggingface.co/allenai/Olmo-3-7B-Instruct-SFT) |
|
| 33 |
+
| **DPO** | [Olmo-3-7B-Think-DPO](https://huggingface.co/allenai/Olmo-3-7B-Think-DPO) | [Olmo-3-32B-Think-DPO](https://huggingface.co/allenai/Olmo-3-32B-Think-DPO) | [Olmo-3-7B-Instruct-DPO](https://huggingface.co/allenai/Olmo-3-7B-Instruct-DPO) |
|
| 34 |
+
| **Final Models (RLVR)** | [Olmo-3-7B-Think](https://huggingface.co/allenai/Olmo-3-7B-Think) | [Olmo-3-32B-Think](https://huggingface.co/allenai/Olmo-3-32B-Think) | [Olmo-3-7B-Instruct](https://huggingface.co/allenai/Olmo-3-7B-Instruct) |
|
| 35 |
+
|
| 36 |
+
|
| 37 |
+
## Installation
|
| 38 |
+
|
| 39 |
+
Olmo 3 is supported in transformers v4.57.0 or higher:
|
| 40 |
+
```bash
|
| 41 |
+
pip install transformers>=4.57.0
|
| 42 |
+
```
|
| 43 |
+
|
| 44 |
+
## Inference
|
| 45 |
+
|
| 46 |
+
You can use OLMo with the standard HuggingFace transformers library:
|
| 47 |
+
```python
|
| 48 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 49 |
+
olmo = AutoModelForCausalLM.from_pretrained("allenai/Olmo-3-1025-7B")
|
| 50 |
+
tokenizer = AutoTokenizer.from_pretrained("allenai/Olmo-3-1025-7B")
|
| 51 |
+
message = ["Language modeling is "]
|
| 52 |
+
inputs = tokenizer(message, return_tensors='pt', return_token_type_ids=False)
|
| 53 |
+
# optional verifying cuda
|
| 54 |
+
# inputs = {k: v.to('cuda') for k,v in inputs.items()}
|
| 55 |
+
# olmo = olmo.to('cuda')
|
| 56 |
+
response = olmo.generate(**inputs, max_new_tokens=100, do_sample=True, top_k=0, temperature=1.0, top_p=0.7)
|
| 57 |
+
print(tokenizer.batch_decode(response, skip_special_tokens=True)[0])
|
| 58 |
+
>> 'Language modeling is a key component of any text-based application, but its effectiveness...'
|
| 59 |
+
```
|
| 60 |
+
|
| 61 |
+
For faster performance, you can quantize the model using the following method:
|
| 62 |
+
```python
|
| 63 |
+
AutoModelForCausalLM.from_pretrained("allenai/Olmo-3-1025-7B",
|
| 64 |
+
torch_dtype=torch.float16,
|
| 65 |
+
load_in_8bit=True) # Requires bitsandbytes
|
| 66 |
+
```
|
| 67 |
+
The quantized model is more sensitive to data types and CUDA operations. To avoid potential issues, it's recommended to pass the inputs directly to CUDA using:
|
| 68 |
+
```python
|
| 69 |
+
inputs.input_ids.to('cuda')
|
| 70 |
+
```
|
| 71 |
+
|
| 72 |
+
We have released checkpoints for these models. For pretraining, the naming convention is `stage1-stepXXX`. The conventions for midtraining and long context are `stage2-stepXXX` and `stage3-stepXXX`, respectively.
|
| 73 |
+
|
| 74 |
+
|
| 75 |
+
To load a specific model revision with HuggingFace, simply add the argument `revision`:
|
| 76 |
+
```bash
|
| 77 |
+
olmo = AutoModelForCausalLM.from_pretrained("allenai/Olmo-3-1025-7B", revision="stage1-step10000")
|
| 78 |
+
```
|
| 79 |
+
|
| 80 |
+
Or, you can access all the revisions for the models via the following code snippet:
|
| 81 |
+
```python
|
| 82 |
+
from huggingface_hub import list_repo_refs
|
| 83 |
+
out = list_repo_refs("allenai/Olmo-3-1025-7B")
|
| 84 |
+
branches = [b.name for b in out.branches]
|
| 85 |
+
```
|
| 86 |
+
|
| 87 |
+
### Fine-tuning
|
| 88 |
+
Model fine-tuning can be done from the final checkpoint (the `main` revision of this model) or many intermediate checkpoints. Two recipes for tuning are available.
|
| 89 |
+
1. Fine-tune with the OLMo-core repository:
|
| 90 |
+
```bash
|
| 91 |
+
torchrun --nproc-per-node=8 ./src/scripts/official/OLMo3/OLMo-3-1025-7B-pretrain-1.py run01
|
| 92 |
+
```
|
| 93 |
+
You can override most configuration options from the command-line. For example, to override the learning rate you could launch the script like this:
|
| 94 |
+
|
| 95 |
+
```bash
|
| 96 |
+
torchrun --nproc-per-node=8 ./src/scripts/official/OLMo3/OLMo-3-1025-7B-pretrain-1.py run01 --train_module.optim.lr=3e-4
|
| 97 |
+
```
|
| 98 |
+
For more documentation, see the [GitHub readme](https://github.com/allenai/OLMo-core).
|
| 99 |
+
|
| 100 |
+
### Model Description
|
| 101 |
+
|
| 102 |
+
- **Developed by:** Allen Institute for AI (Ai2)
|
| 103 |
+
- **Model type:** a Transformer style autoregressive language model.
|
| 104 |
+
- **Language(s) (NLP):** English
|
| 105 |
+
- **License:** The code and model are released under Apache 2.0.
|
| 106 |
+
- **Contact:** Technical inquiries: `olmo@allenai.org`. Press: `press@allenai.org`
|
| 107 |
+
- **Date cutoff:** Dec 2024
|
| 108 |
+
|
| 109 |
+
|
| 110 |
+
### Model Sources
|
| 111 |
+
|
| 112 |
+
- **Project Page:** https://allenai.org/olmo
|
| 113 |
+
- **Repositories:**
|
| 114 |
+
- Core repo (training, inference, fine-tuning etc.): https://github.com/allenai/OLMo-core
|
| 115 |
+
- Evaluation code: https://github.com/allenai/OLMo-Eval
|
| 116 |
+
- Further fine-tuning code: https://github.com/allenai/open-instruct
|
| 117 |
+
- **W&B Report:** https://wandb.ai/ai2-llm/Olmo-3-1025-7B/reports/Olmo-3-7B-October-2025--VmlldzoxNDcwOTM0NA
|
| 118 |
+
- **Paper:** https://allenai.org/papers/olmo3
|
| 119 |
+
<!-- - **Technical blog post:** (URL) -->
|
| 120 |
+
|
| 121 |
+
|
| 122 |
+
## Evaluation
|
| 123 |
+
Core model results for Olmo 3 7B are found below.
|
| 124 |
+
|
| 125 |
+
| Model | Olmo 3-Eval Math | BigCodeBench | HumanEval | DeepSeek LeetCode | DS 1000 | MBPP | MultiPL HumanEval | MultiPL MBPPP | Olmo 3-Eval Code | ARC MC | MMLU STEM | MedMCQA MC | MedQA MC | SciQ MC | Olmo 3-Eval MC_STEM | MMLU Humanities | MMLU Social Sci. | MMLU Other | CSQA MC | PIQA MC | SocialIQA MC | CoQA Gen2MC MC | DROP Gen2MC MC | Jeopardy Gen2MC MC | NaturalQs Gen2MC MC | SQuAD Gen2MC MC | Olmo 3-Eval MC_Non-STEM | HellaSwag RC | Winogrande RC | Lambada | Basic Skills | DROP | Jeopardy | NaturalQs | SQuAD | CoQA | Olmo 3-Eval GenQA | BBH | MMLU Pro MC | Deepmind Math | LBPP |
|
| 126 |
+
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
| 127 |
+
| **Open-weight Models** | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
|
| 128 |
+
| Marin-8B | 39.6 | 21.5 | 31.6 | 0.5 | 16.5 | 36.5 | 15.6 | 27.6 | 21.4 | 89.2 | 58.1 | 52.7 | 47.3 | 93.2 | 68.1 | 71.4 | 77.4 | 68.3 | 75.3 | 85.7 | 79.8 | 86.2 | 63.7 | 90.8 | 71.5 | 96.5 | 78.8 | 84.0 | 88.6 | 73.9 | 85.6 | 73.0 | 72.7 | 42.6 | 93.4 | 69.5 | 75.9 | 55.6 | 38.8 | 20.2 | 5.8 |
|
| 129 |
+
| Apertus-8B | 29.2 | 20.9 | 21.6 | 0.6 | 11.8 | 33.5 | 15.5 | 29.2 | 19.0 | 87.9 | 52.4 | 51.7 | 47.6 | 91.9 | 66.3 | 67.8 | 74.7 | 66.1 | 72.1 | 80.5 | 76.3 | 82.8 | 47.5 | 90.3 | 66.7 | 91.3 | 74.2 | 81.0 | 85.8 | 70.9 | 83.8 | 37.1 | 70.1 | 35.0 | 89.6 | 67.4 | 69.0 | 48.1 | 33.9 | 17.1 | 7.1 |
|
| 130 |
+
| OLMo 2-7B | 41.7 | 8.8 | 16.3 | 0.2 | 10.1 | 21.2 | 4.2 | 12.2 | 10.4 | 85.7 | 53.2 | 49.2 | 43.8 | 90.9 | 64.6 | 67.9 | 73.1 | 65.2 | 72.0 | 80.1 | 77.5 | 85.0 | 55.6 | 89.5 | 66.3 | 95.3 | 75.2 | 82.2 | 87.4 | 70.5 | 82.2 | 61.5 | 70.8 | 37.4 | 91.5 | 68.3 | 72.4 | 49.6 | 33.1 | 16.3 | 3.1 |
|
| 131 |
+
| Qwen3-8B | 67.2 | 42.5 | 71.7 | 8.3 | 33.1 | 66.2 | 52.3 | 48.4 | 46.1 | 95.4 | 76.7 | 63.5 | 62.1 | 96.1 | 78.8 | 78.6 | 84.8 | 76.8 | 84.1 | 89.9 | 83.3 | 93.7 | 78.3 | 92.3 | 74.1 | 97.5 | 84.8 | 80.5 | 86.4 | 73.0 | 93.5 | 57.2 | 65.1 | 33.8 | 89.2 | 61.6 | 71.1 | 76.5 | 50.3 | 47.7 | 25.7 |
|
| 132 |
+
| Nemotron MiniD 8B | 49.8 | 43.2 | 71.7 | 6.8 | 30.3 | 62.3 | 40.0 | 47.5 | 43.1 | 94.1 | 71.1 | 54.5 | 53.5 | 94.3 | 73.5 | 78.0 | 82.2 | 73.8 | 74.4 | 86.0 | 78.7 | 92.2 | 70.0 | 90.7 | 71.1 | 97.4 | 81.3 | 80.2 | 86.2 | 67.9 | 91.4 | 71.4 | 64.9 | 31.2 | 92.3 | 60.4 | 71.8 | 77.0 | 50.2 | 31.4 | 31.7 |
|
| 133 |
+
| Gemma-2-9B | 48.8 | 30.9 | 40.0 | 1.9 | 28.4 | 49.1 | 27.9 | 38.2 | 30.2 | 92.7 | 62.8 | 58.9 | 55.4 | 94.4 | 72.8 | 74.5 | 82.9 | 74.2 | 75.3 | 85.7 | 80.3 | 92.7 | 65.8 | 92.8 | 72.5 | 97.3 | 81.3 | 81.8 | 88.8 | 76.3 | 89.3 | 68.2 | 75.1 | 40.4 | 88.8 | 71.5 | 75.6 | 68.8 | 44.7 | 23.0 | 12.4 |
|
| 134 |
+
| Qwen-2.5-7B | 60.7 | 39.7 | 66.1 | 5.1 | 35.2 | 55.4 | 40.3 | 45.4 | 41.0 | 93.4 | 67.6 | 60.3 | 56.6 | 95.4 | 74.7 | 76.2 | 83.0 | 74.4 | 85.0 | 88.5 | 82.9 | 93.5 | 69.1 | 92.1 | 70.5 | 96.4 | 82.9 | 81.0 | 86.0 | 70.3 | 91.4 | 56.7 | 63.0 | 31.2 | 87.0 | 40.5 | 67.5 | 54.7 | 48.1 | 32.8 | 22.1 |
|
| 135 |
+
| Llama-3.1-8B | 36.9 | 30.7 | 40.4 | 0.1 | 22.2 | 12.1 | 14.5 | 28.3 | 21.2 | 86.4 | 55.7 | 56.5 | 53.7 | 92.7 | 69.0 | 70.1 | 75.5 | 69.1 | 72.9 | 78.3 | 77.0 | 89.9 | 53.3 | 88.9 | 68.0 | 94.4 | 76.1 | 81.5 | 87.3 | 75.5 | 88.0 | 59.5 | 70.9 | 36.7 | 89.2 | 69.0 | 73.1 | 63.0 | 37.4 | 24.1 | 9.1 |
|
| 136 |
+
| Granite-3.3-8B | 41.5 | 0.4 | 0.0 | 0.0 | 22.6 | 48.5 | 22.3 | 32.3 | 18.0 | 86.2 | 55.6 | 49.6 | 43.0 | 90.8 | 65.0 | 67.6 | 71.8 | 64.5 | 82.3 | 81.5 | 83.1 | 87.6 | 55.0 | 88.4 | 69.2 | 94.5 | 76.9 | 83.7 | 89.4 | 76.0 | 88.7 | 38.4 | 69.7 | 37.0 | 89.6 | 37.8 | 67.8 | 61.5 | 33.9 | 32.2 | 18.5 |
|
| 137 |
+
| MiMo-7B | 54.3 | 38.3 | 57.0 | 1.2 | 28.1 | 48.3 | 34.5 | 42.5 | 35.7 | 91.7 | 63.5 | 56.2 | 53.0 | 93.5 | 71.6 | 73.6 | 80.8 | 72.7 | 76.1 | 87.2 | 80.7 | 91.4 | 64.1 | 89.5 | 72.2 | 96.7 | 80.5 | 80.6 | 86.5 | 73.1 | 89.7 | 69.3 | 65.6 | 33.1 | 90.3 | 54.4 | 71.4 | 75.1 | 44.3 | 25.4 | 21.5 |
|
| 138 |
+
| **Olmo 3 7B** | 54.7 | 34.1 | 49.1 | 1.4 | 20.2 | 43.6 | 28.7 | 38.2 | 30.7 | 89.2 | 59.7 | 48.3 | 41.8 | 92.8 | 66.4 | 68.9 | 75.0 | 66.9 | 75.3 | 80.2 | 80.3 | 92.5 | 67.3 | 86.9 | 69.4 | 96.9 | 78.2 | 77.7 | 85.7 | 68.9 | 89.5 | 71.5 | 60.4 | 32.6 | 93.5 | 72.8 | 72.5 | 63.5 | 37.3 | 23.7 | 17.1 |
|
| 139 |
+
## Model Details
|
| 140 |
+
|
| 141 |
+
#### Stage 1: Initial Pretraining
|
| 142 |
+
- Dataset: [dolma3_6T-mix-1025](https://huggingface.co/datasets/allenai/dolma3_mix-6T-1025)
|
| 143 |
+
- 5.93T tokens
|
| 144 |
+
- Coverage: 97.53%+ of total pretraining budget
|
| 145 |
+
|
| 146 |
+
#### Stage 2: Mid-training
|
| 147 |
+
- Dataset: [dolma3-dolmino-mix-1025](https://huggingface.co/datasets/allenai/dolma3_dolmino_mix-100B-1025)
|
| 148 |
+
- 100B tokens
|
| 149 |
+
- Mix composition: 20% code, 28% web pages, 19% math, 14% QA, 8% thinking, 6% instruction, and 5% PDFs
|
| 150 |
+
- Note: We also include the three checkpoints reported in Table 7 of the Olmo 3 paper showing domain tradeoffs: Gen-QA Mix, Math-code-thinking mix, and Round 5 (final) mix. These experiments were run earlier in the Stage 1 pretraining process, and therefore represent training on 100B midtraining tokens starting from a Stage 1 checkpoint that had been trained to 2T tokens.
|
| 151 |
+
|
| 152 |
+
#### Stage 3: Long Context
|
| 153 |
+
- Dataset: [dolma3-longmino-mix-1025](https://huggingface.co/datasets/allenai/dolma3_longmino_mix-50B-1025)
|
| 154 |
+
- 50B tokens
|
| 155 |
+
- Mix composition: 66% midtraining data, 34% PDFs
|
| 156 |
+
|
| 157 |
+
#### Model Merging
|
| 158 |
+
- 7B Model: No merging
|
| 159 |
+
- 32B Model: 2 versions on 100B mix, merged before starting long context run. Final checkpoint is merged 4 final checkpoints.
|
| 160 |
+
|
| 161 |
+
## Bias, Risks, and Limitations
|
| 162 |
+
Like any base language model or fine-tuned model without safety filtering, these models can easily be prompted by users to generate harmful and sensitive content. Such content may also be produced unintentionally, especially in cases involving bias, so we recommend that users consider the risks when applying this technology. Additionally, many statements from OLMo or any LLM are often inaccurate, so facts should be verified.
|
| 163 |
+
|
| 164 |
+
## License
|
| 165 |
+
This model is licensed under Apache 2.0. It is intended for research and educational use in accordance with [Ai2's Responsible Use Guidelines](https://allenai.org/responsible-use).
|
| 166 |
+
|
| 167 |
+
## Citation
|
| 168 |
+
Find the paper at: https://allenai.org/papers/olmo3
|
| 169 |
+
|
| 170 |
+
## Model Card Contact
|
| 171 |
+
For errors in this model card, contact `olmo@allenai.org`.
|
chat_template.jinja
ADDED
|
@@ -0,0 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{%- set has_system = messages|selectattr('role', 'equalto', 'system')|list|length > 0 -%}{%- if not has_system -%}{{- '<|im_start|>system
|
| 2 |
+
You are a helpful function-calling AI assistant. ' -}}{%- if tools is none -%}{{- 'You do not currently have access to any functions. <functions></functions><|im_end|>
|
| 3 |
+
' -}}{%- else -%}{{- 'You are provided with function signatures within <functions></functions> XML tags. You may call one or more functions to assist with the user query. Output any function calls within <function_calls></function_calls> XML tags. Do not make assumptions about what values to plug into functions.' -}}{{- '<functions>' -}}{{- tools | tojson -}}{{- '</functions><|im_end|>
|
| 4 |
+
' -}}{%- endif -%}{%- endif -%}{%- for message in messages -%}{%- if message['role'] == 'system' -%}{{- '<|im_start|>system
|
| 5 |
+
' + message['content'] -}}{%- if tools is not none -%}{{- '<functions>' -}}{{- tools | tojson -}}{{- '</functions>' -}}{%- elif message.get('functions', none) is not none -%}{{- ' <functions>' + message['functions'] + '</functions>' -}}{%- endif -%}{{- '<|im_end|>
|
| 6 |
+
' -}}{%- elif message['role'] == 'user' -%}{{- '<|im_start|>user
|
| 7 |
+
' + message['content'] + '<|im_end|>
|
| 8 |
+
' -}}{%- elif message['role'] == 'assistant' -%}{{- '<|im_start|>assistant
|
| 9 |
+
' -}}{%- if message.get('content', none) is not none -%}{{- message['content'] -}}{%- endif -%}{%- if message.get('function_calls', none) is not none -%}{{- '<function_calls>' + message['function_calls'] + '</function_calls>' -}}{% elif message.get('tool_calls', none) is not none %}{{- '<function_calls>' -}}{%- for tool_call in message['tool_calls'] %}{%- if tool_call is mapping and tool_call.get('function', none) is not none %}{%- set args = tool_call['function']['arguments'] -%}{%- set ns = namespace(arguments_list=[]) -%}{%- for key, value in args.items() -%}{%- set ns.arguments_list = ns.arguments_list + [key ~ '=' ~ (value | tojson)] -%}{%- endfor -%}{%- set arguments = ns.arguments_list | join(', ') -%}{{- tool_call['function']['name'] + '(' + arguments + ')' -}}{%- if not loop.last -%}{{ '
|
| 10 |
+
' }}{%- endif -%}{% else %}{{- tool_call -}}{%- endif %}{%- endfor %}{{- '</function_calls>' -}}{%- endif -%}{%- if not loop.last -%}{{- '<|im_end|>' + '
|
| 11 |
+
' -}}{%- else -%}{{- eos_token -}}{%- endif -%}{%- elif message['role'] == 'environment' -%}{{- '<|im_start|>environment
|
| 12 |
+
' + message['content'] + '<|im_end|>
|
| 13 |
+
' -}}{%- elif message['role'] == 'tool' -%}{{- '<|im_start|>environment
|
| 14 |
+
' + message['content'] + '<|im_end|>
|
| 15 |
+
' -}}{%- endif -%}{%- if loop.last and add_generation_prompt -%}{{- '<|im_start|>assistant
|
| 16 |
+
' -}}{%- endif -%}{%- endfor -%}
|
config.json
ADDED
|
@@ -0,0 +1,68 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"architectures": [
|
| 3 |
+
"Olmo3ForCausalLM"
|
| 4 |
+
],
|
| 5 |
+
"attention_bias": false,
|
| 6 |
+
"attention_dropout": 0.0,
|
| 7 |
+
"dtype": "bfloat16",
|
| 8 |
+
"eos_token_id": 100257,
|
| 9 |
+
"hidden_act": "silu",
|
| 10 |
+
"hidden_size": 4096,
|
| 11 |
+
"initializer_range": 0.02,
|
| 12 |
+
"intermediate_size": 11008,
|
| 13 |
+
"layer_types": [
|
| 14 |
+
"sliding_attention",
|
| 15 |
+
"sliding_attention",
|
| 16 |
+
"sliding_attention",
|
| 17 |
+
"full_attention",
|
| 18 |
+
"sliding_attention",
|
| 19 |
+
"sliding_attention",
|
| 20 |
+
"sliding_attention",
|
| 21 |
+
"full_attention",
|
| 22 |
+
"sliding_attention",
|
| 23 |
+
"sliding_attention",
|
| 24 |
+
"sliding_attention",
|
| 25 |
+
"full_attention",
|
| 26 |
+
"sliding_attention",
|
| 27 |
+
"sliding_attention",
|
| 28 |
+
"sliding_attention",
|
| 29 |
+
"full_attention",
|
| 30 |
+
"sliding_attention",
|
| 31 |
+
"sliding_attention",
|
| 32 |
+
"sliding_attention",
|
| 33 |
+
"full_attention",
|
| 34 |
+
"sliding_attention",
|
| 35 |
+
"sliding_attention",
|
| 36 |
+
"sliding_attention",
|
| 37 |
+
"full_attention",
|
| 38 |
+
"sliding_attention",
|
| 39 |
+
"sliding_attention",
|
| 40 |
+
"sliding_attention",
|
| 41 |
+
"full_attention",
|
| 42 |
+
"sliding_attention",
|
| 43 |
+
"sliding_attention",
|
| 44 |
+
"sliding_attention",
|
| 45 |
+
"full_attention"
|
| 46 |
+
],
|
| 47 |
+
"max_position_embeddings": 65536,
|
| 48 |
+
"model_type": "olmo3",
|
| 49 |
+
"num_attention_heads": 32,
|
| 50 |
+
"num_hidden_layers": 32,
|
| 51 |
+
"num_key_value_heads": 32,
|
| 52 |
+
"pad_token_id": 100277,
|
| 53 |
+
"rms_norm_eps": 1e-06,
|
| 54 |
+
"rope_scaling": {
|
| 55 |
+
"attention_factor": 1.2079441541679836,
|
| 56 |
+
"beta_fast": 32,
|
| 57 |
+
"beta_slow": 1,
|
| 58 |
+
"factor": 8.0,
|
| 59 |
+
"original_max_position_embeddings": 8192,
|
| 60 |
+
"rope_type": "yarn"
|
| 61 |
+
},
|
| 62 |
+
"rope_theta": 500000,
|
| 63 |
+
"sliding_window": 4096,
|
| 64 |
+
"tie_word_embeddings": false,
|
| 65 |
+
"transformers_version": "4.57.0",
|
| 66 |
+
"use_cache": true,
|
| 67 |
+
"vocab_size": 100278
|
| 68 |
+
}
|
generation_config.json
ADDED
|
@@ -0,0 +1,4 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"_from_model_config": true,
|
| 3 |
+
"transformers_version": "4.57.0"
|
| 4 |
+
}
|
merges.txt
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
model-00001-of-00003.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:0490d6668e613a29b23367e3a7aa9cc6aced3d162694445bb969ed7622b3c4e2
|
| 3 |
+
size 4969984976
|
model-00002-of-00003.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:e127ea479fb6e208fe9d48d23b11212b5722f4873f6eef9c009b7a855866c641
|
| 3 |
+
size 4981161496
|
model-00003-of-00003.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:f3ddff10052ffe5de5c6b4cad45c422c0d898acc6beb21b1b8531244adfb3c70
|
| 3 |
+
size 4644917240
|
model.safetensors.index.json
ADDED
|
@@ -0,0 +1,363 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"metadata": {
|
| 3 |
+
"total_parameters": 7298011136,
|
| 4 |
+
"total_size": 14596022272
|
| 5 |
+
},
|
| 6 |
+
"weight_map": {
|
| 7 |
+
"lm_head.weight": "model-00003-of-00003.safetensors",
|
| 8 |
+
"model.embed_tokens.weight": "model-00001-of-00003.safetensors",
|
| 9 |
+
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
| 10 |
+
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
| 11 |
+
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
| 12 |
+
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
| 13 |
+
"model.layers.0.post_feedforward_layernorm.weight": "model-00001-of-00003.safetensors",
|
| 14 |
+
"model.layers.0.self_attn.k_norm.weight": "model-00001-of-00003.safetensors",
|
| 15 |
+
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
| 16 |
+
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
| 17 |
+
"model.layers.0.self_attn.q_norm.weight": "model-00001-of-00003.safetensors",
|
| 18 |
+
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
| 19 |
+
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
| 20 |
+
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
| 21 |
+
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
| 22 |
+
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
| 23 |
+
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
| 24 |
+
"model.layers.1.post_feedforward_layernorm.weight": "model-00001-of-00003.safetensors",
|
| 25 |
+
"model.layers.1.self_attn.k_norm.weight": "model-00001-of-00003.safetensors",
|
| 26 |
+
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
| 27 |
+
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
| 28 |
+
"model.layers.1.self_attn.q_norm.weight": "model-00001-of-00003.safetensors",
|
| 29 |
+
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
| 30 |
+
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
| 31 |
+
"model.layers.10.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
| 32 |
+
"model.layers.10.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
| 33 |
+
"model.layers.10.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
| 34 |
+
"model.layers.10.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
| 35 |
+
"model.layers.10.post_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
|
| 36 |
+
"model.layers.10.self_attn.k_norm.weight": "model-00002-of-00003.safetensors",
|
| 37 |
+
"model.layers.10.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
| 38 |
+
"model.layers.10.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
| 39 |
+
"model.layers.10.self_attn.q_norm.weight": "model-00002-of-00003.safetensors",
|
| 40 |
+
"model.layers.10.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
| 41 |
+
"model.layers.10.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
| 42 |
+
"model.layers.11.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
| 43 |
+
"model.layers.11.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
| 44 |
+
"model.layers.11.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
| 45 |
+
"model.layers.11.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
| 46 |
+
"model.layers.11.post_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
|
| 47 |
+
"model.layers.11.self_attn.k_norm.weight": "model-00002-of-00003.safetensors",
|
| 48 |
+
"model.layers.11.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
| 49 |
+
"model.layers.11.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
| 50 |
+
"model.layers.11.self_attn.q_norm.weight": "model-00002-of-00003.safetensors",
|
| 51 |
+
"model.layers.11.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
| 52 |
+
"model.layers.11.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
| 53 |
+
"model.layers.12.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
| 54 |
+
"model.layers.12.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
| 55 |
+
"model.layers.12.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
| 56 |
+
"model.layers.12.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
| 57 |
+
"model.layers.12.post_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
|
| 58 |
+
"model.layers.12.self_attn.k_norm.weight": "model-00002-of-00003.safetensors",
|
| 59 |
+
"model.layers.12.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
| 60 |
+
"model.layers.12.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
| 61 |
+
"model.layers.12.self_attn.q_norm.weight": "model-00002-of-00003.safetensors",
|
| 62 |
+
"model.layers.12.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
| 63 |
+
"model.layers.12.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
| 64 |
+
"model.layers.13.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
| 65 |
+
"model.layers.13.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
| 66 |
+
"model.layers.13.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
| 67 |
+
"model.layers.13.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
| 68 |
+
"model.layers.13.post_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
|
| 69 |
+
"model.layers.13.self_attn.k_norm.weight": "model-00002-of-00003.safetensors",
|
| 70 |
+
"model.layers.13.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
| 71 |
+
"model.layers.13.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
| 72 |
+
"model.layers.13.self_attn.q_norm.weight": "model-00002-of-00003.safetensors",
|
| 73 |
+
"model.layers.13.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
| 74 |
+
"model.layers.13.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
| 75 |
+
"model.layers.14.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
| 76 |
+
"model.layers.14.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
| 77 |
+
"model.layers.14.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
| 78 |
+
"model.layers.14.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
| 79 |
+
"model.layers.14.post_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
|
| 80 |
+
"model.layers.14.self_attn.k_norm.weight": "model-00002-of-00003.safetensors",
|
| 81 |
+
"model.layers.14.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
| 82 |
+
"model.layers.14.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
| 83 |
+
"model.layers.14.self_attn.q_norm.weight": "model-00002-of-00003.safetensors",
|
| 84 |
+
"model.layers.14.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
| 85 |
+
"model.layers.14.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
| 86 |
+
"model.layers.15.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
| 87 |
+
"model.layers.15.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
| 88 |
+
"model.layers.15.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
| 89 |
+
"model.layers.15.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
| 90 |
+
"model.layers.15.post_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
|
| 91 |
+
"model.layers.15.self_attn.k_norm.weight": "model-00002-of-00003.safetensors",
|
| 92 |
+
"model.layers.15.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
| 93 |
+
"model.layers.15.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
| 94 |
+
"model.layers.15.self_attn.q_norm.weight": "model-00002-of-00003.safetensors",
|
| 95 |
+
"model.layers.15.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
| 96 |
+
"model.layers.15.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
| 97 |
+
"model.layers.16.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
| 98 |
+
"model.layers.16.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
| 99 |
+
"model.layers.16.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
| 100 |
+
"model.layers.16.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
| 101 |
+
"model.layers.16.post_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
|
| 102 |
+
"model.layers.16.self_attn.k_norm.weight": "model-00002-of-00003.safetensors",
|
| 103 |
+
"model.layers.16.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
| 104 |
+
"model.layers.16.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
| 105 |
+
"model.layers.16.self_attn.q_norm.weight": "model-00002-of-00003.safetensors",
|
| 106 |
+
"model.layers.16.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
| 107 |
+
"model.layers.16.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
| 108 |
+
"model.layers.17.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
| 109 |
+
"model.layers.17.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
| 110 |
+
"model.layers.17.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
| 111 |
+
"model.layers.17.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
| 112 |
+
"model.layers.17.post_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
|
| 113 |
+
"model.layers.17.self_attn.k_norm.weight": "model-00002-of-00003.safetensors",
|
| 114 |
+
"model.layers.17.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
| 115 |
+
"model.layers.17.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
| 116 |
+
"model.layers.17.self_attn.q_norm.weight": "model-00002-of-00003.safetensors",
|
| 117 |
+
"model.layers.17.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
| 118 |
+
"model.layers.17.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
| 119 |
+
"model.layers.18.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
| 120 |
+
"model.layers.18.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
| 121 |
+
"model.layers.18.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
| 122 |
+
"model.layers.18.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
| 123 |
+
"model.layers.18.post_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
|
| 124 |
+
"model.layers.18.self_attn.k_norm.weight": "model-00002-of-00003.safetensors",
|
| 125 |
+
"model.layers.18.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
| 126 |
+
"model.layers.18.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
| 127 |
+
"model.layers.18.self_attn.q_norm.weight": "model-00002-of-00003.safetensors",
|
| 128 |
+
"model.layers.18.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
| 129 |
+
"model.layers.18.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
| 130 |
+
"model.layers.19.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
| 131 |
+
"model.layers.19.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
| 132 |
+
"model.layers.19.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
| 133 |
+
"model.layers.19.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
| 134 |
+
"model.layers.19.post_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
|
| 135 |
+
"model.layers.19.self_attn.k_norm.weight": "model-00002-of-00003.safetensors",
|
| 136 |
+
"model.layers.19.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
| 137 |
+
"model.layers.19.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
| 138 |
+
"model.layers.19.self_attn.q_norm.weight": "model-00002-of-00003.safetensors",
|
| 139 |
+
"model.layers.19.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
| 140 |
+
"model.layers.19.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
| 141 |
+
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
| 142 |
+
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
| 143 |
+
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
| 144 |
+
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
| 145 |
+
"model.layers.2.post_feedforward_layernorm.weight": "model-00001-of-00003.safetensors",
|
| 146 |
+
"model.layers.2.self_attn.k_norm.weight": "model-00001-of-00003.safetensors",
|
| 147 |
+
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
| 148 |
+
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
| 149 |
+
"model.layers.2.self_attn.q_norm.weight": "model-00001-of-00003.safetensors",
|
| 150 |
+
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
| 151 |
+
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
| 152 |
+
"model.layers.20.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
| 153 |
+
"model.layers.20.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
| 154 |
+
"model.layers.20.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
| 155 |
+
"model.layers.20.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
| 156 |
+
"model.layers.20.post_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
|
| 157 |
+
"model.layers.20.self_attn.k_norm.weight": "model-00002-of-00003.safetensors",
|
| 158 |
+
"model.layers.20.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
| 159 |
+
"model.layers.20.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
| 160 |
+
"model.layers.20.self_attn.q_norm.weight": "model-00002-of-00003.safetensors",
|
| 161 |
+
"model.layers.20.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
| 162 |
+
"model.layers.20.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
| 163 |
+
"model.layers.21.mlp.down_proj.weight": "model-00002-of-00003.safetensors",
|
| 164 |
+
"model.layers.21.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
| 165 |
+
"model.layers.21.mlp.up_proj.weight": "model-00002-of-00003.safetensors",
|
| 166 |
+
"model.layers.21.post_attention_layernorm.weight": "model-00002-of-00003.safetensors",
|
| 167 |
+
"model.layers.21.post_feedforward_layernorm.weight": "model-00002-of-00003.safetensors",
|
| 168 |
+
"model.layers.21.self_attn.k_norm.weight": "model-00002-of-00003.safetensors",
|
| 169 |
+
"model.layers.21.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
| 170 |
+
"model.layers.21.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
| 171 |
+
"model.layers.21.self_attn.q_norm.weight": "model-00002-of-00003.safetensors",
|
| 172 |
+
"model.layers.21.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
| 173 |
+
"model.layers.21.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
| 174 |
+
"model.layers.22.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
|
| 175 |
+
"model.layers.22.mlp.gate_proj.weight": "model-00002-of-00003.safetensors",
|
| 176 |
+
"model.layers.22.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
|
| 177 |
+
"model.layers.22.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
|
| 178 |
+
"model.layers.22.post_feedforward_layernorm.weight": "model-00003-of-00003.safetensors",
|
| 179 |
+
"model.layers.22.self_attn.k_norm.weight": "model-00002-of-00003.safetensors",
|
| 180 |
+
"model.layers.22.self_attn.k_proj.weight": "model-00002-of-00003.safetensors",
|
| 181 |
+
"model.layers.22.self_attn.o_proj.weight": "model-00002-of-00003.safetensors",
|
| 182 |
+
"model.layers.22.self_attn.q_norm.weight": "model-00002-of-00003.safetensors",
|
| 183 |
+
"model.layers.22.self_attn.q_proj.weight": "model-00002-of-00003.safetensors",
|
| 184 |
+
"model.layers.22.self_attn.v_proj.weight": "model-00002-of-00003.safetensors",
|
| 185 |
+
"model.layers.23.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
|
| 186 |
+
"model.layers.23.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
|
| 187 |
+
"model.layers.23.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
|
| 188 |
+
"model.layers.23.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
|
| 189 |
+
"model.layers.23.post_feedforward_layernorm.weight": "model-00003-of-00003.safetensors",
|
| 190 |
+
"model.layers.23.self_attn.k_norm.weight": "model-00003-of-00003.safetensors",
|
| 191 |
+
"model.layers.23.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
|
| 192 |
+
"model.layers.23.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
|
| 193 |
+
"model.layers.23.self_attn.q_norm.weight": "model-00003-of-00003.safetensors",
|
| 194 |
+
"model.layers.23.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
|
| 195 |
+
"model.layers.23.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
|
| 196 |
+
"model.layers.24.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
|
| 197 |
+
"model.layers.24.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
|
| 198 |
+
"model.layers.24.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
|
| 199 |
+
"model.layers.24.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
|
| 200 |
+
"model.layers.24.post_feedforward_layernorm.weight": "model-00003-of-00003.safetensors",
|
| 201 |
+
"model.layers.24.self_attn.k_norm.weight": "model-00003-of-00003.safetensors",
|
| 202 |
+
"model.layers.24.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
|
| 203 |
+
"model.layers.24.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
|
| 204 |
+
"model.layers.24.self_attn.q_norm.weight": "model-00003-of-00003.safetensors",
|
| 205 |
+
"model.layers.24.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
|
| 206 |
+
"model.layers.24.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
|
| 207 |
+
"model.layers.25.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
|
| 208 |
+
"model.layers.25.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
|
| 209 |
+
"model.layers.25.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
|
| 210 |
+
"model.layers.25.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
|
| 211 |
+
"model.layers.25.post_feedforward_layernorm.weight": "model-00003-of-00003.safetensors",
|
| 212 |
+
"model.layers.25.self_attn.k_norm.weight": "model-00003-of-00003.safetensors",
|
| 213 |
+
"model.layers.25.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
|
| 214 |
+
"model.layers.25.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
|
| 215 |
+
"model.layers.25.self_attn.q_norm.weight": "model-00003-of-00003.safetensors",
|
| 216 |
+
"model.layers.25.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
|
| 217 |
+
"model.layers.25.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
|
| 218 |
+
"model.layers.26.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
|
| 219 |
+
"model.layers.26.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
|
| 220 |
+
"model.layers.26.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
|
| 221 |
+
"model.layers.26.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
|
| 222 |
+
"model.layers.26.post_feedforward_layernorm.weight": "model-00003-of-00003.safetensors",
|
| 223 |
+
"model.layers.26.self_attn.k_norm.weight": "model-00003-of-00003.safetensors",
|
| 224 |
+
"model.layers.26.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
|
| 225 |
+
"model.layers.26.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
|
| 226 |
+
"model.layers.26.self_attn.q_norm.weight": "model-00003-of-00003.safetensors",
|
| 227 |
+
"model.layers.26.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
|
| 228 |
+
"model.layers.26.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
|
| 229 |
+
"model.layers.27.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
|
| 230 |
+
"model.layers.27.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
|
| 231 |
+
"model.layers.27.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
|
| 232 |
+
"model.layers.27.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
|
| 233 |
+
"model.layers.27.post_feedforward_layernorm.weight": "model-00003-of-00003.safetensors",
|
| 234 |
+
"model.layers.27.self_attn.k_norm.weight": "model-00003-of-00003.safetensors",
|
| 235 |
+
"model.layers.27.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
|
| 236 |
+
"model.layers.27.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
|
| 237 |
+
"model.layers.27.self_attn.q_norm.weight": "model-00003-of-00003.safetensors",
|
| 238 |
+
"model.layers.27.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
|
| 239 |
+
"model.layers.27.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
|
| 240 |
+
"model.layers.28.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
|
| 241 |
+
"model.layers.28.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
|
| 242 |
+
"model.layers.28.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
|
| 243 |
+
"model.layers.28.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
|
| 244 |
+
"model.layers.28.post_feedforward_layernorm.weight": "model-00003-of-00003.safetensors",
|
| 245 |
+
"model.layers.28.self_attn.k_norm.weight": "model-00003-of-00003.safetensors",
|
| 246 |
+
"model.layers.28.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
|
| 247 |
+
"model.layers.28.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
|
| 248 |
+
"model.layers.28.self_attn.q_norm.weight": "model-00003-of-00003.safetensors",
|
| 249 |
+
"model.layers.28.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
|
| 250 |
+
"model.layers.28.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
|
| 251 |
+
"model.layers.29.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
|
| 252 |
+
"model.layers.29.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
|
| 253 |
+
"model.layers.29.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
|
| 254 |
+
"model.layers.29.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
|
| 255 |
+
"model.layers.29.post_feedforward_layernorm.weight": "model-00003-of-00003.safetensors",
|
| 256 |
+
"model.layers.29.self_attn.k_norm.weight": "model-00003-of-00003.safetensors",
|
| 257 |
+
"model.layers.29.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
|
| 258 |
+
"model.layers.29.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
|
| 259 |
+
"model.layers.29.self_attn.q_norm.weight": "model-00003-of-00003.safetensors",
|
| 260 |
+
"model.layers.29.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
|
| 261 |
+
"model.layers.29.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
|
| 262 |
+
"model.layers.3.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
| 263 |
+
"model.layers.3.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
| 264 |
+
"model.layers.3.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
| 265 |
+
"model.layers.3.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
| 266 |
+
"model.layers.3.post_feedforward_layernorm.weight": "model-00001-of-00003.safetensors",
|
| 267 |
+
"model.layers.3.self_attn.k_norm.weight": "model-00001-of-00003.safetensors",
|
| 268 |
+
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
| 269 |
+
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
| 270 |
+
"model.layers.3.self_attn.q_norm.weight": "model-00001-of-00003.safetensors",
|
| 271 |
+
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
| 272 |
+
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
| 273 |
+
"model.layers.30.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
|
| 274 |
+
"model.layers.30.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
|
| 275 |
+
"model.layers.30.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
|
| 276 |
+
"model.layers.30.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
|
| 277 |
+
"model.layers.30.post_feedforward_layernorm.weight": "model-00003-of-00003.safetensors",
|
| 278 |
+
"model.layers.30.self_attn.k_norm.weight": "model-00003-of-00003.safetensors",
|
| 279 |
+
"model.layers.30.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
|
| 280 |
+
"model.layers.30.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
|
| 281 |
+
"model.layers.30.self_attn.q_norm.weight": "model-00003-of-00003.safetensors",
|
| 282 |
+
"model.layers.30.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
|
| 283 |
+
"model.layers.30.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
|
| 284 |
+
"model.layers.31.mlp.down_proj.weight": "model-00003-of-00003.safetensors",
|
| 285 |
+
"model.layers.31.mlp.gate_proj.weight": "model-00003-of-00003.safetensors",
|
| 286 |
+
"model.layers.31.mlp.up_proj.weight": "model-00003-of-00003.safetensors",
|
| 287 |
+
"model.layers.31.post_attention_layernorm.weight": "model-00003-of-00003.safetensors",
|
| 288 |
+
"model.layers.31.post_feedforward_layernorm.weight": "model-00003-of-00003.safetensors",
|
| 289 |
+
"model.layers.31.self_attn.k_norm.weight": "model-00003-of-00003.safetensors",
|
| 290 |
+
"model.layers.31.self_attn.k_proj.weight": "model-00003-of-00003.safetensors",
|
| 291 |
+
"model.layers.31.self_attn.o_proj.weight": "model-00003-of-00003.safetensors",
|
| 292 |
+
"model.layers.31.self_attn.q_norm.weight": "model-00003-of-00003.safetensors",
|
| 293 |
+
"model.layers.31.self_attn.q_proj.weight": "model-00003-of-00003.safetensors",
|
| 294 |
+
"model.layers.31.self_attn.v_proj.weight": "model-00003-of-00003.safetensors",
|
| 295 |
+
"model.layers.4.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
| 296 |
+
"model.layers.4.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
| 297 |
+
"model.layers.4.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
| 298 |
+
"model.layers.4.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
| 299 |
+
"model.layers.4.post_feedforward_layernorm.weight": "model-00001-of-00003.safetensors",
|
| 300 |
+
"model.layers.4.self_attn.k_norm.weight": "model-00001-of-00003.safetensors",
|
| 301 |
+
"model.layers.4.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
| 302 |
+
"model.layers.4.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
| 303 |
+
"model.layers.4.self_attn.q_norm.weight": "model-00001-of-00003.safetensors",
|
| 304 |
+
"model.layers.4.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
| 305 |
+
"model.layers.4.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
| 306 |
+
"model.layers.5.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
| 307 |
+
"model.layers.5.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
| 308 |
+
"model.layers.5.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
| 309 |
+
"model.layers.5.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
| 310 |
+
"model.layers.5.post_feedforward_layernorm.weight": "model-00001-of-00003.safetensors",
|
| 311 |
+
"model.layers.5.self_attn.k_norm.weight": "model-00001-of-00003.safetensors",
|
| 312 |
+
"model.layers.5.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
| 313 |
+
"model.layers.5.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
| 314 |
+
"model.layers.5.self_attn.q_norm.weight": "model-00001-of-00003.safetensors",
|
| 315 |
+
"model.layers.5.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
| 316 |
+
"model.layers.5.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
| 317 |
+
"model.layers.6.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
| 318 |
+
"model.layers.6.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
| 319 |
+
"model.layers.6.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
| 320 |
+
"model.layers.6.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
| 321 |
+
"model.layers.6.post_feedforward_layernorm.weight": "model-00001-of-00003.safetensors",
|
| 322 |
+
"model.layers.6.self_attn.k_norm.weight": "model-00001-of-00003.safetensors",
|
| 323 |
+
"model.layers.6.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
| 324 |
+
"model.layers.6.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
| 325 |
+
"model.layers.6.self_attn.q_norm.weight": "model-00001-of-00003.safetensors",
|
| 326 |
+
"model.layers.6.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
| 327 |
+
"model.layers.6.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
| 328 |
+
"model.layers.7.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
| 329 |
+
"model.layers.7.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
| 330 |
+
"model.layers.7.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
| 331 |
+
"model.layers.7.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
| 332 |
+
"model.layers.7.post_feedforward_layernorm.weight": "model-00001-of-00003.safetensors",
|
| 333 |
+
"model.layers.7.self_attn.k_norm.weight": "model-00001-of-00003.safetensors",
|
| 334 |
+
"model.layers.7.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
| 335 |
+
"model.layers.7.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
| 336 |
+
"model.layers.7.self_attn.q_norm.weight": "model-00001-of-00003.safetensors",
|
| 337 |
+
"model.layers.7.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
| 338 |
+
"model.layers.7.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
| 339 |
+
"model.layers.8.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
| 340 |
+
"model.layers.8.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
| 341 |
+
"model.layers.8.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
| 342 |
+
"model.layers.8.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
| 343 |
+
"model.layers.8.post_feedforward_layernorm.weight": "model-00001-of-00003.safetensors",
|
| 344 |
+
"model.layers.8.self_attn.k_norm.weight": "model-00001-of-00003.safetensors",
|
| 345 |
+
"model.layers.8.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
| 346 |
+
"model.layers.8.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
| 347 |
+
"model.layers.8.self_attn.q_norm.weight": "model-00001-of-00003.safetensors",
|
| 348 |
+
"model.layers.8.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
| 349 |
+
"model.layers.8.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
| 350 |
+
"model.layers.9.mlp.down_proj.weight": "model-00001-of-00003.safetensors",
|
| 351 |
+
"model.layers.9.mlp.gate_proj.weight": "model-00001-of-00003.safetensors",
|
| 352 |
+
"model.layers.9.mlp.up_proj.weight": "model-00001-of-00003.safetensors",
|
| 353 |
+
"model.layers.9.post_attention_layernorm.weight": "model-00001-of-00003.safetensors",
|
| 354 |
+
"model.layers.9.post_feedforward_layernorm.weight": "model-00001-of-00003.safetensors",
|
| 355 |
+
"model.layers.9.self_attn.k_norm.weight": "model-00001-of-00003.safetensors",
|
| 356 |
+
"model.layers.9.self_attn.k_proj.weight": "model-00001-of-00003.safetensors",
|
| 357 |
+
"model.layers.9.self_attn.o_proj.weight": "model-00001-of-00003.safetensors",
|
| 358 |
+
"model.layers.9.self_attn.q_norm.weight": "model-00001-of-00003.safetensors",
|
| 359 |
+
"model.layers.9.self_attn.q_proj.weight": "model-00001-of-00003.safetensors",
|
| 360 |
+
"model.layers.9.self_attn.v_proj.weight": "model-00001-of-00003.safetensors",
|
| 361 |
+
"model.norm.weight": "model-00003-of-00003.safetensors"
|
| 362 |
+
}
|
| 363 |
+
}
|
olmo-base.png
ADDED
|
Git LFS Details
|
olmo3-logo
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+

|
special_tokens_map.json
ADDED
|
@@ -0,0 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"eos_token": "<|endoftext|>",
|
| 3 |
+
"pad_token": "<|pad|>",
|
| 4 |
+
"unk_token": {
|
| 5 |
+
"content": "<|endoftext|>",
|
| 6 |
+
"lstrip": false,
|
| 7 |
+
"normalized": false,
|
| 8 |
+
"rstrip": false,
|
| 9 |
+
"single_word": false
|
| 10 |
+
}
|
| 11 |
+
}
|
tokenizer.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
tokenizer_config.json
ADDED
|
@@ -0,0 +1,189 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"add_prefix_space": false,
|
| 3 |
+
"added_tokens_decoder": {
|
| 4 |
+
"100256": {
|
| 5 |
+
"content": "<|extra_id_0|>",
|
| 6 |
+
"lstrip": false,
|
| 7 |
+
"normalized": false,
|
| 8 |
+
"rstrip": false,
|
| 9 |
+
"single_word": false,
|
| 10 |
+
"special": false
|
| 11 |
+
},
|
| 12 |
+
"100257": {
|
| 13 |
+
"content": "<|endoftext|>",
|
| 14 |
+
"lstrip": false,
|
| 15 |
+
"normalized": false,
|
| 16 |
+
"rstrip": false,
|
| 17 |
+
"single_word": false,
|
| 18 |
+
"special": true
|
| 19 |
+
},
|
| 20 |
+
"100258": {
|
| 21 |
+
"content": "<|fim_prefix|>",
|
| 22 |
+
"lstrip": false,
|
| 23 |
+
"normalized": false,
|
| 24 |
+
"rstrip": false,
|
| 25 |
+
"single_word": false,
|
| 26 |
+
"special": true
|
| 27 |
+
},
|
| 28 |
+
"100259": {
|
| 29 |
+
"content": "<|fim_middle|>",
|
| 30 |
+
"lstrip": false,
|
| 31 |
+
"normalized": false,
|
| 32 |
+
"rstrip": false,
|
| 33 |
+
"single_word": false,
|
| 34 |
+
"special": true
|
| 35 |
+
},
|
| 36 |
+
"100260": {
|
| 37 |
+
"content": "<|fim_suffix|>",
|
| 38 |
+
"lstrip": false,
|
| 39 |
+
"normalized": false,
|
| 40 |
+
"rstrip": false,
|
| 41 |
+
"single_word": false,
|
| 42 |
+
"special": true
|
| 43 |
+
},
|
| 44 |
+
"100261": {
|
| 45 |
+
"content": "|||PHONE_NUMBER|||",
|
| 46 |
+
"lstrip": false,
|
| 47 |
+
"normalized": false,
|
| 48 |
+
"rstrip": false,
|
| 49 |
+
"single_word": false,
|
| 50 |
+
"special": false
|
| 51 |
+
},
|
| 52 |
+
"100262": {
|
| 53 |
+
"content": "|||EMAIL_ADDRESS|||",
|
| 54 |
+
"lstrip": false,
|
| 55 |
+
"normalized": false,
|
| 56 |
+
"rstrip": false,
|
| 57 |
+
"single_word": false,
|
| 58 |
+
"special": false
|
| 59 |
+
},
|
| 60 |
+
"100263": {
|
| 61 |
+
"content": "|||IP_ADDRESS|||",
|
| 62 |
+
"lstrip": false,
|
| 63 |
+
"normalized": false,
|
| 64 |
+
"rstrip": false,
|
| 65 |
+
"single_word": false,
|
| 66 |
+
"special": false
|
| 67 |
+
},
|
| 68 |
+
"100264": {
|
| 69 |
+
"content": "<|im_start|>",
|
| 70 |
+
"lstrip": false,
|
| 71 |
+
"normalized": false,
|
| 72 |
+
"rstrip": false,
|
| 73 |
+
"single_word": false,
|
| 74 |
+
"special": true
|
| 75 |
+
},
|
| 76 |
+
"100265": {
|
| 77 |
+
"content": "<|im_end|>",
|
| 78 |
+
"lstrip": false,
|
| 79 |
+
"normalized": false,
|
| 80 |
+
"rstrip": false,
|
| 81 |
+
"single_word": false,
|
| 82 |
+
"special": true
|
| 83 |
+
},
|
| 84 |
+
"100266": {
|
| 85 |
+
"content": "<|extra_id_1|>",
|
| 86 |
+
"lstrip": false,
|
| 87 |
+
"normalized": false,
|
| 88 |
+
"rstrip": false,
|
| 89 |
+
"single_word": false,
|
| 90 |
+
"special": false
|
| 91 |
+
},
|
| 92 |
+
"100267": {
|
| 93 |
+
"content": "<|extra_id_2|>",
|
| 94 |
+
"lstrip": false,
|
| 95 |
+
"normalized": false,
|
| 96 |
+
"rstrip": false,
|
| 97 |
+
"single_word": false,
|
| 98 |
+
"special": false
|
| 99 |
+
},
|
| 100 |
+
"100268": {
|
| 101 |
+
"content": "<|extra_id_3|>",
|
| 102 |
+
"lstrip": false,
|
| 103 |
+
"normalized": false,
|
| 104 |
+
"rstrip": false,
|
| 105 |
+
"single_word": false,
|
| 106 |
+
"special": false
|
| 107 |
+
},
|
| 108 |
+
"100269": {
|
| 109 |
+
"content": "<|extra_id_4|>",
|
| 110 |
+
"lstrip": false,
|
| 111 |
+
"normalized": false,
|
| 112 |
+
"rstrip": false,
|
| 113 |
+
"single_word": false,
|
| 114 |
+
"special": false
|
| 115 |
+
},
|
| 116 |
+
"100270": {
|
| 117 |
+
"content": "<|extra_id_5|>",
|
| 118 |
+
"lstrip": false,
|
| 119 |
+
"normalized": false,
|
| 120 |
+
"rstrip": false,
|
| 121 |
+
"single_word": false,
|
| 122 |
+
"special": false
|
| 123 |
+
},
|
| 124 |
+
"100271": {
|
| 125 |
+
"content": "<|extra_id_6|>",
|
| 126 |
+
"lstrip": false,
|
| 127 |
+
"normalized": false,
|
| 128 |
+
"rstrip": false,
|
| 129 |
+
"single_word": false,
|
| 130 |
+
"special": false
|
| 131 |
+
},
|
| 132 |
+
"100272": {
|
| 133 |
+
"content": "<|extra_id_7|>",
|
| 134 |
+
"lstrip": false,
|
| 135 |
+
"normalized": false,
|
| 136 |
+
"rstrip": false,
|
| 137 |
+
"single_word": false,
|
| 138 |
+
"special": false
|
| 139 |
+
},
|
| 140 |
+
"100273": {
|
| 141 |
+
"content": "<|extra_id_8|>",
|
| 142 |
+
"lstrip": false,
|
| 143 |
+
"normalized": false,
|
| 144 |
+
"rstrip": false,
|
| 145 |
+
"single_word": false,
|
| 146 |
+
"special": false
|
| 147 |
+
},
|
| 148 |
+
"100274": {
|
| 149 |
+
"content": "<|extra_id_9|>",
|
| 150 |
+
"lstrip": false,
|
| 151 |
+
"normalized": false,
|
| 152 |
+
"rstrip": false,
|
| 153 |
+
"single_word": false,
|
| 154 |
+
"special": false
|
| 155 |
+
},
|
| 156 |
+
"100275": {
|
| 157 |
+
"content": "<|extra_id_10|>",
|
| 158 |
+
"lstrip": false,
|
| 159 |
+
"normalized": false,
|
| 160 |
+
"rstrip": false,
|
| 161 |
+
"single_word": false,
|
| 162 |
+
"special": false
|
| 163 |
+
},
|
| 164 |
+
"100276": {
|
| 165 |
+
"content": "<|endofprompt|>",
|
| 166 |
+
"lstrip": false,
|
| 167 |
+
"normalized": false,
|
| 168 |
+
"rstrip": false,
|
| 169 |
+
"single_word": false,
|
| 170 |
+
"special": true
|
| 171 |
+
},
|
| 172 |
+
"100277": {
|
| 173 |
+
"content": "<|pad|>",
|
| 174 |
+
"lstrip": false,
|
| 175 |
+
"normalized": false,
|
| 176 |
+
"rstrip": false,
|
| 177 |
+
"single_word": false,
|
| 178 |
+
"special": true
|
| 179 |
+
}
|
| 180 |
+
},
|
| 181 |
+
"bos_token": null,
|
| 182 |
+
"clean_up_tokenization_spaces": false,
|
| 183 |
+
"eos_token": "<|endoftext|>",
|
| 184 |
+
"extra_special_tokens": {},
|
| 185 |
+
"model_max_length": 65536,
|
| 186 |
+
"pad_token": "<|pad|>",
|
| 187 |
+
"tokenizer_class": "GPT2Tokenizer",
|
| 188 |
+
"unk_token": "<|endoftext|>"
|
| 189 |
+
}
|
vocab.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|