Text Generation
Transformers
PyTorch
Norwegian Bokmål
Norwegian Nynorsk
Norwegian
olmo2
norwegian
norsk
HPLT
Instructions to use HPLT/FinOLMo-13B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use HPLT/FinOLMo-13B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="HPLT/FinOLMo-13B")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("HPLT/FinOLMo-13B") model = AutoModelForCausalLM.from_pretrained("HPLT/FinOLMo-13B") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use HPLT/FinOLMo-13B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "HPLT/FinOLMo-13B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "HPLT/FinOLMo-13B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/HPLT/FinOLMo-13B
- SGLang
How to use HPLT/FinOLMo-13B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "HPLT/FinOLMo-13B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "HPLT/FinOLMo-13B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "HPLT/FinOLMo-13B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "HPLT/FinOLMo-13B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use HPLT/FinOLMo-13B with Docker Model Runner:
docker model run hf.co/HPLT/FinOLMo-13B
File size: 5,953 Bytes
3e5cb22 f829c9e 3e5cb22 f829c9e 3e5cb22 f829c9e 3e5cb22 f829c9e 3e5cb22 f829c9e 3e5cb22 57c0c62 3e5cb22 f829c9e | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 | ---
license: apache-2.0
datasets:
- HPLT/HPLT3.0
- allenai/olmo-mix-1124
- HuggingFaceFW/finepdfs
- HuggingFaceTB/finemath
- LLM360/MegaMath
- HuggingFaceTB/stack-edu
- HuggingFaceFW/finepdfs-edu
language:
- nb
- nn
- 'no'
base_model:
- allenai/OLMo-2-1124-13B
library_name: transformers
tags:
- norwegian
- norsk
- HPLT
---
# NorOLMo

This is a base (not instruction-tuned) large language model, continually pre-trained on Finnish data starting from the English [OLMo2-13B](https://huggingface.co/allenai/OLMo-2-1124-13B) model.
The model was trained for 20 000 steps on around 170 billion tokens. Intermediate checkpoints are published here as branches.
## Data Details
### Stage 1 (16 000 steps -- 135B tokens)
Data
- [HPLTv3](https://huggingface.co/datasets/HPLT/HPLT3.0) Finnish
- FinePDFs Finnish
- OLMo-Mix
Data Splits
| Data | Percentage | Unique Tokens | Total Tokens | Number of Documents | Average Document Length |
| ------------------------ | ---------- | ------------- | ------------ | ------------------- | ----------------------- |
| HPLT Finnish | 69.75 | 46.8B | 93.6B | 36.5M | 944 |
| FinePDFs Finnish | 14.45 | 9.7B | 19.4B | 1.5M | 4 895 |
| Wiki (OLMo-Mix) | 0.02 | 0.2B | 26.8M | 0.3M | 690 |
| Alg. Stack (OLMo-Mix) | 0.04 | 0.6B | 53.7M | 0.1M | 4 291 |
| Open Web Math (OLMo-Mix) | 0.04 | 0.6B | 53.7M | 0.1M | 4 291 |
| ArXiv (OLMo-Mix) | 0.05 | 1.1B | 67.1M | 0.2M | 5 318 |
| PeS2o (OLMo-Mix) | 0.15 | 2.6B | 0.2B | 1.6M | 1 692 |
| DCLM (OLMo-Mix) | 9.50 | 49.7B | 12.8B | 35.1M | 1 416 |
| StarCoder (OLMo-Mix) | 2.10 | 31.5B | 8.1B | 23.6M | 1 333 |
> [!NOTE]
> The number of documents represents the total unique number of documents, not the documents used during training.
> [!NOTE]
> We only took a portion of OLMo-Mix as our unique data.
### Stage 2 (4 000 steps -- 35B tokens)
Data
- [HPLTv3](https://huggingface.co/datasets/HPLT/HPLT3.0) (filtered) Finnish
- FinePDFs-Edu Finnish
- Stack-Edu
- MegaMath Web-Pro
- FineMath 4+
- InfiWebMath 4+
Data Splits
| Data | Percentage | Unique Tokens | Total Tokens | Number of Documents | Average Document Length |
| ------------------------ | ---------- | ------------- | ------------ | ------------------- | ----------------------- |
| HPLT Finnish | 40.79 | 3.4B | 13.7B | 3.1M | 1 109 |
| FinePDFs-Edu Finnish | 17.84 | 1.5B | 6.0B | 0.2M | 7 081 |
| FinePDFs-Edu English | 15.00 | 7.5B | 5.0B | 1.2M | 6 485 |
| Stack-Edu | 15.00 | 13.2B | 5.0B | 15.0M | 880 |
| MegaMath Web-Pro | 4.76 | 14.0B | 1.6B | 15.0M | 937 |
| FineMath 4+ | 3.51 | 10.4B | 1.2B | 6.7M | 1 545 |
| InfiWebMath 4+ | 3.09 | 9.1B | 1.0B | 6.3M | 1 447 |
## Training details
### Stage 1
| Hyperparameter | Value |
| ------------------------ | ----------------- |
| Embedding train steps | 1 000 |
| Warmup steps | 2 000 |
| Total train steps | 16 000 |
| Learning rate schedule | Warmup + constant |
| Learning rate | 3e-4 |
| Weight decay | 1e-1 |
| Sequence length | 4 096 |
| Batch size | 2 048 |
| RoPe theta | 500 000 |
| Clip grad | 1.0 |
| Adam epsilon | 1e-8 |
| Adam beta_1 | 0.9 |
| Adam beta_2 | 0.95 |
| RMSNorm epsilon | 1e-6 |
| Z-loss ratio | 1e-5 |
| Diffusion loss ratio | 2e-2 |
### Stage 2
| Hyperparameter | Value |
| ------------------------ | ----------------- |
| Decay steps | 4 000 |
| Total train steps | 4 000 |
| Learning rate schedule | Linear decay |
| Initial learning rate | 3e-4 |
| Final learning rate | 0 |
| Weight decay | 1e-1 |
| Sequence length | 16 384 |
| Batch size | 512 |
| RoPe theta | 2 000 000 |
| Clip grad | 1.0 |
| Adam epsilon | 1e-8 |
| Adam beta_1 | 0.9 |
| Adam beta_2 | 0.95 |
| RMSNorm epsilon | 1e-6 |
| Z-loss ratio | 1e-5 |
| Diffusion loss ratio | 2e-2 |
## Acknowledgements
Training was conducted as a part of the [HPLT project](https://hplt-project.org/).
_This project has received funding from the European Union’s Horizon Europe research and innovation programme under grant agreement No 101070350 and from UK Research and Innovation (UKRI) under the UK government’s Horizon Europe funding guarantee [grant number 10052546]_ |