allenai
/

Olmo-3.1-7B-RL-Zero-Code

@@ -5,7 +5,7 @@ base_model:
 language:
 - en
 datasets:
-- allenai/Dolci-RLZero-Code-7B
 library_name: transformers
 ---
@@ -27,20 +27,20 @@ For the other Olmo 3 RL-Zero models see:
 | **Domain**               | **Model**  | **RLVR Dataset**
 |--------------------------|---------------|---------------|
 | **Base Model**           | [Olmo-3-7B](https://huggingface.co/allenai/Olmo-3-1025-7B) |
-| **Math**                 | [Olmo-3-7B-RLZero-Math](https://huggingface.co/allenai/Olmo-3-7B-RLZero-Math/) | [Dolci-RLZero-Math-7B](https://huggingface.co/datasets/allenai/Dolci-RLZero-Math-7B)
-| **Code**                 | [Olmo-3-7B-RLZero-Code](https://huggingface.co/allenai/Olmo-3-7B-RLZero-Code/) | [Dolci-RLZero-Code-7B](https://huggingface.co/datasets/allenai/Dolci-RLZero-Code-7B)
-| **IF**                   | [Olmo-3-7B-RLZero-IF](https://huggingface.co/allenai/Olmo-3-7B-RLZero-IF/) | [Dolci-RLZero-IF-7B](https://huggingface.co/datasets/allenai/Dolci-RLZero-IF-7B)
-| **General**              | [Olmo-3-7B-RLZero-General](https://huggingface.co/allenai/Olmo-3-7B-RLZero-General/) | [Dolci-RLZero-General-7B](https://huggingface.co/datasets/allenai/Dolci-RLZero-General-7B)
-| **Mix**                  | [Olmo-3-7B-RLZero-Mix](https://huggingface.co/allenai/Olmo-3-7B-RLZero-Mix/) | [Dolci-RLZero-Mix-7B](https://huggingface.co/datasets/allenai/Dolci-RLZero-Mix-7B)
 For the core Olmo 3 models see:
-| **Stage**               | **[Olmo 3 7B Think]** | **[Olmo 3 32B Think]** | **[Olmo 3 7B Instruct]** | **[Olmo 3 32B Instruct]** |
-|--------------------------|---------------|---------------|---------------|---------------|
-| **Base Model**           | [Olmo-3-7B](https://huggingface.co/allenai/Olmo-3-1025-7B) | [Olmo-3-32B](https://huggingface.co/allenai/Olmo-3-1125-32B) | | |
-| **SFT**                  | [Olmo-3-7B-Think-SFT](https://huggingface.co/allenai/Olmo-3-7B-Think-SFT) | [Olmo-3-32B-Think-SFT](https://huggingface.co/allenai/Olmo-3-32B-Think-SFT) | [Olmo-3-7B-Instruct-SFT](https://huggingface.co/allenai/Olmo-3-7B-Instruct-SFT) | [Olmo-3-32B-Instruct-SFT](https://huggingface.co/allenai/Olmo-3-32B-Instruct-SFT) |
-| **DPO**                  | [Olmo-3-7B-Think-DPO](https://huggingface.co/allenai/Olmo-3-7B-Think-DPO) | [Olmo-3-32B-Think-DPO](https://huggingface.co/allenai/Olmo-3-32B-Think-DPO) | [Olmo-3-7B-Instruct-DPO](https://huggingface.co/allenai/Olmo-3-7B-Instruct-DPO) | [Olmo-3-32B-Instruct-DPO](https://huggingface.co/allenai/Olmo-3-32B-Instruct-DPO) |
-| **Final Models (RLVR)**  | [Olmo-3-7B-Think](https://huggingface.co/allenai/Olmo-3-7B-Think) | [Olmo-3-32B-Think](https://huggingface.co/allenai/Olmo-3-32B-Think) | [Olmo-3-7B-Instruct](https://huggingface.co/allenai/Olmo-3-7B-Instruct) | [Olmo-3-32B-Instruct](https://huggingface.co/allenai/Olmo-3-32B-Instruct) |
 ## Installation
@@ -117,7 +117,7 @@ For more documentation, see the [GitHub readme](https://github.com/allenai/open-
 #### RLVR
 - reinforcement learning from verifiable rewards on the Dolci-RL-Zero-Code-7B dataset which consists of coding queries.
-- Datasets: [Dolci-RLZero-Code-7B](https://huggingface.co/datasets/allenai/Dolci-RLZero-Code-7B)
 ## Bias, Risks, and Limitations

 language:
 - en
 datasets:
+- allenai/Dolci-RL-Zero-Code-7B
 library_name: transformers
 ---
 | **Domain**               | **Model**  | **RLVR Dataset**
 |--------------------------|---------------|---------------|
 | **Base Model**           | [Olmo-3-7B](https://huggingface.co/allenai/Olmo-3-1025-7B) |
+| **Math**                 | [Olmo-3-7B-RL-Zero-Math](https://huggingface.co/allenai/Olmo-3-7B-RL-Zero-Math/),[Olmo-3.1-7B-RL-Zero-Math](https://huggingface.co/allenai/Olmo-3-7B-RL-Zero-Math/) [Olmo-3-7B-RL-Zero-Math](https://huggingface.co/allenai/Olmo-3-7B-RL-Zero-Math/) | [Dolci-RL-Zero-Math-7B](https://huggingface.co/datasets/allenai/Dolci-RL-Zero-Math-7B)
+| **Code**                 | [Olmo-3-7B-RL-Zero-Code](https://huggingface.co/allenai/Olmo-3-7B-RL-Zero-Code/), [Olmo-3.1-7B-RL-Zero-Code](https://huggingface.co/allenai/Olmo-3-7B-RL-Zero-Code/) | [Dolci-RL-Zero-Code-7B](https://huggingface.co/datasets/allenai/Dolci-RL-Zero-Code-7B)
+| **IF**                   | [Olmo-3-7B-RL-Zero-IF](https://huggingface.co/allenai/Olmo-3-7B-RL-Zero-IF/) | [Dolci-RL-Zero-IF-7B](https://huggingface.co/datasets/allenai/Dolci-RL-Zero-IF-7B)
+| **General**              | [Olmo-3-7B-RL-Zero-General](https://huggingface.co/allenai/Olmo-3-7B-RL-Zero-General/) | [Dolci-RL-Zero-General-7B](https://huggingface.co/datasets/allenai/Dolci-RL-Zero-General-7B)
+| **Mix**                  | [Olmo-3-7B-RL-Zero-Mix](https://huggingface.co/allenai/Olmo-3-7B-RL-Zero-Mix/) | [Dolci-RL-Zero-Mix-7B](https://huggingface.co/datasets/allenai/Dolci-RL-Zero-Mix-7B)
 For the core Olmo 3 models see:
+| **Stage**               | **Olmo 3 7B Think** | **Olmo (3/3.1) 32B Think** | **Olmo 3 7B Instruct** | **Olmo 3.1 32B Instruct** |
+|--------------------------|-----------------------|------------------------|---------------------------|----------------------------|
+| **Base Model**           | [Olmo-3-7B](https://huggingface.co/allenai/Olmo-3-1025-7B) | [Olmo-3-32B](https://huggingface.co/allenai/Olmo-3-1125-32B) | [Olmo-3-7B](https://huggingface.co/allenai/Olmo-3-1025-7B) | [Olmo-3-32B](https://huggingface.co/allenai/Olmo-3-1125-32B) |
+| **SFT**                  | [Olmo-3-7B-Think-SFT](https://huggingface.co/allenai/Olmo-3-7B-Think-SFT) | [Olmo-3-32B-Think-SFT](https://huggingface.co/allenai/Olmo-3-32B-Think-SFT) | [Olmo-3-7B-Instruct-SFT](https://huggingface.co/allenai/Olmo-3-7B-Instruct-SFT) | [Olmo-3.1-32B-Instruct-SFT](https://huggingface.co/allenai/Olmo-3.1-32B-Instruct-SFT) |
+| **DPO**                  | [Olmo-3-7B-Think-DPO](https://huggingface.co/allenai/Olmo-3-7B-Think-DPO) | [Olmo-3-32B-Think-DPO](https://huggingface.co/allenai/Olmo-3-32B-Think-DPO) | [Olmo-3-7B-Instruct-DPO](https://huggingface.co/allenai/Olmo-3-7B-Instruct-DPO) | [Olmo-3.1-32B-Instruct-DPO](https://huggingface.co/allenai/Olmo-3.1-32B-Instruct-DPO) |
+| **Final Models (RLVR)**  | [Olmo-3-7B-Think](https://huggingface.co/allenai/Olmo-3-7B-Think) | [Olmo-3-32B-Think](https://huggingface.co/allenai/Olmo-3-32B-Think)<br>[Olmo-3.1-32B-Think](https://huggingface.co/allenai/Olmo-3.1-32B-Think) | [Olmo-3-7B-Instruct](https://huggingface.co/allenai/Olmo-3-7B-Instruct) | [Olmo-3.1-32B-Instruct](https://huggingface.co/allenai/Olmo-3.1-32B-Instruct) |
 ## Installation
 #### RLVR
 - reinforcement learning from verifiable rewards on the Dolci-RL-Zero-Code-7B dataset which consists of coding queries.
+- Datasets: [Dolci-RL-Zero-Code-7B](https://huggingface.co/datasets/allenai/Dolci-RL-Zero-Code-7B)
 ## Bias, Risks, and Limitations