Text Generation
Transformers
Safetensors
English
olmo3
conversational
natolambert commited on
Commit
af3b45a
·
verified ·
1 Parent(s): 1e69631

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -13
README.md CHANGED
@@ -5,7 +5,7 @@ base_model:
5
  language:
6
  - en
7
  datasets:
8
- - allenai/Dolci-RLZero-Code-7B
9
  library_name: transformers
10
  ---
11
 
@@ -27,20 +27,20 @@ For the other Olmo 3 RL-Zero models see:
27
  | **Domain** | **Model** | **RLVR Dataset**
28
  |--------------------------|---------------|---------------|
29
  | **Base Model** | [Olmo-3-7B](https://huggingface.co/allenai/Olmo-3-1025-7B) |
30
- | **Math** | [Olmo-3-7B-RLZero-Math](https://huggingface.co/allenai/Olmo-3-7B-RLZero-Math/) | [Dolci-RLZero-Math-7B](https://huggingface.co/datasets/allenai/Dolci-RLZero-Math-7B)
31
- | **Code** | [Olmo-3-7B-RLZero-Code](https://huggingface.co/allenai/Olmo-3-7B-RLZero-Code/) | [Dolci-RLZero-Code-7B](https://huggingface.co/datasets/allenai/Dolci-RLZero-Code-7B)
32
- | **IF** | [Olmo-3-7B-RLZero-IF](https://huggingface.co/allenai/Olmo-3-7B-RLZero-IF/) | [Dolci-RLZero-IF-7B](https://huggingface.co/datasets/allenai/Dolci-RLZero-IF-7B)
33
- | **General** | [Olmo-3-7B-RLZero-General](https://huggingface.co/allenai/Olmo-3-7B-RLZero-General/) | [Dolci-RLZero-General-7B](https://huggingface.co/datasets/allenai/Dolci-RLZero-General-7B)
34
- | **Mix** | [Olmo-3-7B-RLZero-Mix](https://huggingface.co/allenai/Olmo-3-7B-RLZero-Mix/) | [Dolci-RLZero-Mix-7B](https://huggingface.co/datasets/allenai/Dolci-RLZero-Mix-7B)
35
 
36
  For the core Olmo 3 models see:
37
 
38
- | **Stage** | **[Olmo 3 7B Think]** | **[Olmo 3 32B Think]** | **[Olmo 3 7B Instruct]** | **[Olmo 3 32B Instruct]** |
39
- |--------------------------|---------------|---------------|---------------|---------------|
40
- | **Base Model** | [Olmo-3-7B](https://huggingface.co/allenai/Olmo-3-1025-7B) | [Olmo-3-32B](https://huggingface.co/allenai/Olmo-3-1125-32B) | | |
41
- | **SFT** | [Olmo-3-7B-Think-SFT](https://huggingface.co/allenai/Olmo-3-7B-Think-SFT) | [Olmo-3-32B-Think-SFT](https://huggingface.co/allenai/Olmo-3-32B-Think-SFT) | [Olmo-3-7B-Instruct-SFT](https://huggingface.co/allenai/Olmo-3-7B-Instruct-SFT) | [Olmo-3-32B-Instruct-SFT](https://huggingface.co/allenai/Olmo-3-32B-Instruct-SFT) |
42
- | **DPO** | [Olmo-3-7B-Think-DPO](https://huggingface.co/allenai/Olmo-3-7B-Think-DPO) | [Olmo-3-32B-Think-DPO](https://huggingface.co/allenai/Olmo-3-32B-Think-DPO) | [Olmo-3-7B-Instruct-DPO](https://huggingface.co/allenai/Olmo-3-7B-Instruct-DPO) | [Olmo-3-32B-Instruct-DPO](https://huggingface.co/allenai/Olmo-3-32B-Instruct-DPO) |
43
- | **Final Models (RLVR)** | [Olmo-3-7B-Think](https://huggingface.co/allenai/Olmo-3-7B-Think) | [Olmo-3-32B-Think](https://huggingface.co/allenai/Olmo-3-32B-Think) | [Olmo-3-7B-Instruct](https://huggingface.co/allenai/Olmo-3-7B-Instruct) | [Olmo-3-32B-Instruct](https://huggingface.co/allenai/Olmo-3-32B-Instruct) |
44
 
45
 
46
  ## Installation
@@ -117,7 +117,7 @@ For more documentation, see the [GitHub readme](https://github.com/allenai/open-
117
 
118
  #### RLVR
119
  - reinforcement learning from verifiable rewards on the Dolci-RL-Zero-Code-7B dataset which consists of coding queries.
120
- - Datasets: [Dolci-RLZero-Code-7B](https://huggingface.co/datasets/allenai/Dolci-RLZero-Code-7B)
121
 
122
 
123
  ## Bias, Risks, and Limitations
 
5
  language:
6
  - en
7
  datasets:
8
+ - allenai/Dolci-RL-Zero-Code-7B
9
  library_name: transformers
10
  ---
11
 
 
27
  | **Domain** | **Model** | **RLVR Dataset**
28
  |--------------------------|---------------|---------------|
29
  | **Base Model** | [Olmo-3-7B](https://huggingface.co/allenai/Olmo-3-1025-7B) |
30
+ | **Math** | [Olmo-3-7B-RL-Zero-Math](https://huggingface.co/allenai/Olmo-3-7B-RL-Zero-Math/),[Olmo-3.1-7B-RL-Zero-Math](https://huggingface.co/allenai/Olmo-3-7B-RL-Zero-Math/) [Olmo-3-7B-RL-Zero-Math](https://huggingface.co/allenai/Olmo-3-7B-RL-Zero-Math/) | [Dolci-RL-Zero-Math-7B](https://huggingface.co/datasets/allenai/Dolci-RL-Zero-Math-7B)
31
+ | **Code** | [Olmo-3-7B-RL-Zero-Code](https://huggingface.co/allenai/Olmo-3-7B-RL-Zero-Code/), [Olmo-3.1-7B-RL-Zero-Code](https://huggingface.co/allenai/Olmo-3-7B-RL-Zero-Code/) | [Dolci-RL-Zero-Code-7B](https://huggingface.co/datasets/allenai/Dolci-RL-Zero-Code-7B)
32
+ | **IF** | [Olmo-3-7B-RL-Zero-IF](https://huggingface.co/allenai/Olmo-3-7B-RL-Zero-IF/) | [Dolci-RL-Zero-IF-7B](https://huggingface.co/datasets/allenai/Dolci-RL-Zero-IF-7B)
33
+ | **General** | [Olmo-3-7B-RL-Zero-General](https://huggingface.co/allenai/Olmo-3-7B-RL-Zero-General/) | [Dolci-RL-Zero-General-7B](https://huggingface.co/datasets/allenai/Dolci-RL-Zero-General-7B)
34
+ | **Mix** | [Olmo-3-7B-RL-Zero-Mix](https://huggingface.co/allenai/Olmo-3-7B-RL-Zero-Mix/) | [Dolci-RL-Zero-Mix-7B](https://huggingface.co/datasets/allenai/Dolci-RL-Zero-Mix-7B)
35
 
36
  For the core Olmo 3 models see:
37
 
38
+ | **Stage** | **Olmo 3 7B Think** | **Olmo (3/3.1) 32B Think** | **Olmo 3 7B Instruct** | **Olmo 3.1 32B Instruct** |
39
+ |--------------------------|-----------------------|------------------------|---------------------------|----------------------------|
40
+ | **Base Model** | [Olmo-3-7B](https://huggingface.co/allenai/Olmo-3-1025-7B) | [Olmo-3-32B](https://huggingface.co/allenai/Olmo-3-1125-32B) | [Olmo-3-7B](https://huggingface.co/allenai/Olmo-3-1025-7B) | [Olmo-3-32B](https://huggingface.co/allenai/Olmo-3-1125-32B) |
41
+ | **SFT** | [Olmo-3-7B-Think-SFT](https://huggingface.co/allenai/Olmo-3-7B-Think-SFT) | [Olmo-3-32B-Think-SFT](https://huggingface.co/allenai/Olmo-3-32B-Think-SFT) | [Olmo-3-7B-Instruct-SFT](https://huggingface.co/allenai/Olmo-3-7B-Instruct-SFT) | [Olmo-3.1-32B-Instruct-SFT](https://huggingface.co/allenai/Olmo-3.1-32B-Instruct-SFT) |
42
+ | **DPO** | [Olmo-3-7B-Think-DPO](https://huggingface.co/allenai/Olmo-3-7B-Think-DPO) | [Olmo-3-32B-Think-DPO](https://huggingface.co/allenai/Olmo-3-32B-Think-DPO) | [Olmo-3-7B-Instruct-DPO](https://huggingface.co/allenai/Olmo-3-7B-Instruct-DPO) | [Olmo-3.1-32B-Instruct-DPO](https://huggingface.co/allenai/Olmo-3.1-32B-Instruct-DPO) |
43
+ | **Final Models (RLVR)** | [Olmo-3-7B-Think](https://huggingface.co/allenai/Olmo-3-7B-Think) | [Olmo-3-32B-Think](https://huggingface.co/allenai/Olmo-3-32B-Think)<br>[Olmo-3.1-32B-Think](https://huggingface.co/allenai/Olmo-3.1-32B-Think) | [Olmo-3-7B-Instruct](https://huggingface.co/allenai/Olmo-3-7B-Instruct) | [Olmo-3.1-32B-Instruct](https://huggingface.co/allenai/Olmo-3.1-32B-Instruct) |
44
 
45
 
46
  ## Installation
 
117
 
118
  #### RLVR
119
  - reinforcement learning from verifiable rewards on the Dolci-RL-Zero-Code-7B dataset which consists of coding queries.
120
+ - Datasets: [Dolci-RL-Zero-Code-7B](https://huggingface.co/datasets/allenai/Dolci-RL-Zero-Code-7B)
121
 
122
 
123
  ## Bias, Risks, and Limitations