Update README.md
Browse files
README.md
CHANGED
|
@@ -5,7 +5,7 @@ base_model:
|
|
| 5 |
language:
|
| 6 |
- en
|
| 7 |
datasets:
|
| 8 |
-
- allenai/Dolci-
|
| 9 |
library_name: transformers
|
| 10 |
---
|
| 11 |
|
|
@@ -27,20 +27,20 @@ For the other Olmo 3 RL-Zero models see:
|
|
| 27 |
| **Domain** | **Model** | **RLVR Dataset**
|
| 28 |
|--------------------------|---------------|---------------|
|
| 29 |
| **Base Model** | [Olmo-3-7B](https://huggingface.co/allenai/Olmo-3-1025-7B) |
|
| 30 |
-
| **Math** | [Olmo-3-7B-
|
| 31 |
-
| **Code** | [Olmo-3-7B-
|
| 32 |
-
| **IF** | [Olmo-3-7B-
|
| 33 |
-
| **General** | [Olmo-3-7B-
|
| 34 |
-
| **Mix** | [Olmo-3-7B-
|
| 35 |
|
| 36 |
For the core Olmo 3 models see:
|
| 37 |
|
| 38 |
-
| **Stage** | **
|
| 39 |
-
|--------------------------|---------------|---------------|---------------|---------------|
|
| 40 |
-
| **Base Model** | [Olmo-3-7B](https://huggingface.co/allenai/Olmo-3-1025-7B) | [Olmo-3-32B](https://huggingface.co/allenai/Olmo-3-1125-32B) | | |
|
| 41 |
-
| **SFT** | [Olmo-3-7B-Think-SFT](https://huggingface.co/allenai/Olmo-3-7B-Think-SFT) | [Olmo-3-32B-Think-SFT](https://huggingface.co/allenai/Olmo-3-32B-Think-SFT) | [Olmo-3-7B-Instruct-SFT](https://huggingface.co/allenai/Olmo-3-7B-Instruct-SFT) | [Olmo-3-32B-Instruct-SFT](https://huggingface.co/allenai/Olmo-3-32B-Instruct-SFT) |
|
| 42 |
-
| **DPO** | [Olmo-3-7B-Think-DPO](https://huggingface.co/allenai/Olmo-3-7B-Think-DPO) | [Olmo-3-32B-Think-DPO](https://huggingface.co/allenai/Olmo-3-32B-Think-DPO) | [Olmo-3-7B-Instruct-DPO](https://huggingface.co/allenai/Olmo-3-7B-Instruct-DPO) | [Olmo-3-32B-Instruct-DPO](https://huggingface.co/allenai/Olmo-3-32B-Instruct-DPO) |
|
| 43 |
-
| **Final Models (RLVR)** | [Olmo-3-7B-Think](https://huggingface.co/allenai/Olmo-3-7B-Think) | [Olmo-3-32B-Think](https://huggingface.co/allenai/Olmo-3-32B-Think) | [Olmo-3-7B-Instruct](https://huggingface.co/allenai/Olmo-3-7B-Instruct) | [Olmo-3-32B-Instruct](https://huggingface.co/allenai/Olmo-3-32B-Instruct) |
|
| 44 |
|
| 45 |
|
| 46 |
## Installation
|
|
@@ -117,7 +117,7 @@ For more documentation, see the [GitHub readme](https://github.com/allenai/open-
|
|
| 117 |
|
| 118 |
#### RLVR
|
| 119 |
- reinforcement learning from verifiable rewards on the Dolci-RL-Zero-Code-7B dataset which consists of coding queries.
|
| 120 |
-
- Datasets: [Dolci-
|
| 121 |
|
| 122 |
|
| 123 |
## Bias, Risks, and Limitations
|
|
|
|
| 5 |
language:
|
| 6 |
- en
|
| 7 |
datasets:
|
| 8 |
+
- allenai/Dolci-RL-Zero-Code-7B
|
| 9 |
library_name: transformers
|
| 10 |
---
|
| 11 |
|
|
|
|
| 27 |
| **Domain** | **Model** | **RLVR Dataset**
|
| 28 |
|--------------------------|---------------|---------------|
|
| 29 |
| **Base Model** | [Olmo-3-7B](https://huggingface.co/allenai/Olmo-3-1025-7B) |
|
| 30 |
+
| **Math** | [Olmo-3-7B-RL-Zero-Math](https://huggingface.co/allenai/Olmo-3-7B-RL-Zero-Math/),[Olmo-3.1-7B-RL-Zero-Math](https://huggingface.co/allenai/Olmo-3-7B-RL-Zero-Math/) [Olmo-3-7B-RL-Zero-Math](https://huggingface.co/allenai/Olmo-3-7B-RL-Zero-Math/) | [Dolci-RL-Zero-Math-7B](https://huggingface.co/datasets/allenai/Dolci-RL-Zero-Math-7B)
|
| 31 |
+
| **Code** | [Olmo-3-7B-RL-Zero-Code](https://huggingface.co/allenai/Olmo-3-7B-RL-Zero-Code/), [Olmo-3.1-7B-RL-Zero-Code](https://huggingface.co/allenai/Olmo-3-7B-RL-Zero-Code/) | [Dolci-RL-Zero-Code-7B](https://huggingface.co/datasets/allenai/Dolci-RL-Zero-Code-7B)
|
| 32 |
+
| **IF** | [Olmo-3-7B-RL-Zero-IF](https://huggingface.co/allenai/Olmo-3-7B-RL-Zero-IF/) | [Dolci-RL-Zero-IF-7B](https://huggingface.co/datasets/allenai/Dolci-RL-Zero-IF-7B)
|
| 33 |
+
| **General** | [Olmo-3-7B-RL-Zero-General](https://huggingface.co/allenai/Olmo-3-7B-RL-Zero-General/) | [Dolci-RL-Zero-General-7B](https://huggingface.co/datasets/allenai/Dolci-RL-Zero-General-7B)
|
| 34 |
+
| **Mix** | [Olmo-3-7B-RL-Zero-Mix](https://huggingface.co/allenai/Olmo-3-7B-RL-Zero-Mix/) | [Dolci-RL-Zero-Mix-7B](https://huggingface.co/datasets/allenai/Dolci-RL-Zero-Mix-7B)
|
| 35 |
|
| 36 |
For the core Olmo 3 models see:
|
| 37 |
|
| 38 |
+
| **Stage** | **Olmo 3 7B Think** | **Olmo (3/3.1) 32B Think** | **Olmo 3 7B Instruct** | **Olmo 3.1 32B Instruct** |
|
| 39 |
+
|--------------------------|-----------------------|------------------------|---------------------------|----------------------------|
|
| 40 |
+
| **Base Model** | [Olmo-3-7B](https://huggingface.co/allenai/Olmo-3-1025-7B) | [Olmo-3-32B](https://huggingface.co/allenai/Olmo-3-1125-32B) | [Olmo-3-7B](https://huggingface.co/allenai/Olmo-3-1025-7B) | [Olmo-3-32B](https://huggingface.co/allenai/Olmo-3-1125-32B) |
|
| 41 |
+
| **SFT** | [Olmo-3-7B-Think-SFT](https://huggingface.co/allenai/Olmo-3-7B-Think-SFT) | [Olmo-3-32B-Think-SFT](https://huggingface.co/allenai/Olmo-3-32B-Think-SFT) | [Olmo-3-7B-Instruct-SFT](https://huggingface.co/allenai/Olmo-3-7B-Instruct-SFT) | [Olmo-3.1-32B-Instruct-SFT](https://huggingface.co/allenai/Olmo-3.1-32B-Instruct-SFT) |
|
| 42 |
+
| **DPO** | [Olmo-3-7B-Think-DPO](https://huggingface.co/allenai/Olmo-3-7B-Think-DPO) | [Olmo-3-32B-Think-DPO](https://huggingface.co/allenai/Olmo-3-32B-Think-DPO) | [Olmo-3-7B-Instruct-DPO](https://huggingface.co/allenai/Olmo-3-7B-Instruct-DPO) | [Olmo-3.1-32B-Instruct-DPO](https://huggingface.co/allenai/Olmo-3.1-32B-Instruct-DPO) |
|
| 43 |
+
| **Final Models (RLVR)** | [Olmo-3-7B-Think](https://huggingface.co/allenai/Olmo-3-7B-Think) | [Olmo-3-32B-Think](https://huggingface.co/allenai/Olmo-3-32B-Think)<br>[Olmo-3.1-32B-Think](https://huggingface.co/allenai/Olmo-3.1-32B-Think) | [Olmo-3-7B-Instruct](https://huggingface.co/allenai/Olmo-3-7B-Instruct) | [Olmo-3.1-32B-Instruct](https://huggingface.co/allenai/Olmo-3.1-32B-Instruct) |
|
| 44 |
|
| 45 |
|
| 46 |
## Installation
|
|
|
|
| 117 |
|
| 118 |
#### RLVR
|
| 119 |
- reinforcement learning from verifiable rewards on the Dolci-RL-Zero-Code-7B dataset which consists of coding queries.
|
| 120 |
+
- Datasets: [Dolci-RL-Zero-Code-7B](https://huggingface.co/datasets/allenai/Dolci-RL-Zero-Code-7B)
|
| 121 |
|
| 122 |
|
| 123 |
## Bias, Risks, and Limitations
|