Update README.md
Browse files
README.md
CHANGED
|
@@ -22,50 +22,6 @@ base_model: allenai/OLMoE-1B-7B-0924-SFT
|
|
| 22 |
- Paper:
|
| 23 |
- Logs: https://github.com/allenai/OLMoE/blob/main/logs/olmoe-dpo-logs.txt
|
| 24 |
|
| 25 |
-
### Evaluation Summary
|
| 26 |
-
|
| 27 |
-
| Task (→) | MMLU | GSM8k | BBH | Human-Eval | Alpaca-Eval 1.0 | XSTest | IFEval | Avg |
|
| 28 |
-
|---------------|------|-------|------|------------|-----------------|--------|--------|------|
|
| 29 |
-
| **Setup (→)** | 0-shot | 8-shot CoT | 3-shot | 0-shot | 0-shot | 0-shot | 0-shot | |
|
| 30 |
-
| **Metric (→)** | EM | EM | EM | Pass@10 | %win | F1 | Loose Acc | |
|
| 31 |
-
| | | | | | | | | |
|
| 32 |
-
| OLMo-1B (0724) | 25.0 | 7.0 | 22.5 | 16.0 | - | 67.6 | 20.5 | - |
|
| 33 |
-
| +SFT | 36.0 | 12.5 | 27.2 | 21.2 | 41.5 | 81.9 | 26.1 | 35.9 |
|
| 34 |
-
| +DPO | 36.7 | 12.5 | 30.6 | 22.0 | 50.9 | 79.8 | 24.2 | 37.4 |
|
| 35 |
-
| OLMo-7B (0724) | 50.8 | 32.5 | 36.9 | 32.3 | - | 80.8 | 19.6 | - |
|
| 36 |
-
| +SFT | 54.2 | 25.0 | 35.7 | 38.5 | 70.9 | 86.1 | 39.7 | 49.3 |
|
| 37 |
-
| +DPO | 52.8 | 9.0 | 16.6 | 35.0 | 83.5 | **87.5** | 37.9 | 49.1 |
|
| 38 |
-
| JetMoE-2B-9B | 45.6 | 43.0 | 37.2 | 54.6 | - | 68.2 | 20.0 | - |
|
| 39 |
-
| +SFT | 46.1 | 53.5 | 35.6 | 64.8 | 69.3 | 55.6 | 30.5 | 50.4 |
|
| 40 |
-
| DeepSeek-3B-16B | 37.7 | 18.5 | 39.4 | 48.3 | - | 65.9 | 13.5 | - |
|
| 41 |
-
| +Chat | 48.5 | 46.5 | **40.8** | **70.1** | 74.8 | 85.6 | 32.3 | 57.0 |
|
| 42 |
-
| Qwen1.5-3B-14B | **60.4** | 13.5 | 27.2 | 60.2 | - | 73.4 | 20.9 | - |
|
| 43 |
-
| +Chat | 58.9 | **55.5** | 21.3 | 59.7 | 83.9 | 85.6 | 36.2 | 57.3 |
|
| 44 |
-
| **OLMoE (This Model)** | 49.8 | 3.0 | 33.6 | 22.4 | - | 59.7 | 16.6 | - |
|
| 45 |
-
| **+SFT** | 51.4 | 40.5 | 38.0 | 51.6 | 69.2 | 84.1 | 43.3 | 54.0 |
|
| 46 |
-
| **+DPO** | 51.9 | 45.5 | 37.0 | 54.8 | **84.0** | 82.6 | **48.1** | **57.7** |
|
| 47 |
-
|
| 48 |
-
### Artifacts
|
| 49 |
-
|
| 50 |
-
- **Pretraining**
|
| 51 |
-
- [Checkpoints](https://hf.co/allenai/OLMoE-1B-7B-0924)
|
| 52 |
-
- [Code](https://github.com/allenai/OLMo/tree/Muennighoff/MoE): Built on top of OLMo models.
|
| 53 |
-
- [Data](https://huggingface.co/datasets/allenai/OLMoE-mix-0924): Mix of DCLM Baseline with some components of Dolma.
|
| 54 |
-
- Logs: *coming soon*
|
| 55 |
-
|
| 56 |
-
- **SFT (Supervised Fine-Tuning)**
|
| 57 |
-
- [Checkpoints](https://huggingface.co/allenai/OLMoE-1B-7B-0924-SFT): With and without load balancing.
|
| 58 |
-
- [Code](https://github.com/allenai/open-instruct/tree/olmoe-sft)
|
| 59 |
-
- [Data](https://hf.co/datasets/allenai/tulu-v3.1-mix-preview-4096-OLMoE): Preview of Tulu 3 post-training recipe.
|
| 60 |
-
- [Logs](https://github.com/allenai/OLMoE/blob/main/logs/olmoe-sft-logs.txt)
|
| 61 |
-
|
| 62 |
-
- **DPO/KTO (Direct Preference Optimization/Kahneman-Tversky Optimization)**
|
| 63 |
-
- [Checkpoints](https://huggingface.co/allenai/OLMoE-1B-7B-0924-Instruct)
|
| 64 |
-
- [Preference Data](https://hf.co/datasets/allenai/ultrafeedback_binarized_cleaned)
|
| 65 |
-
- [DPO code](https://github.com/allenai/open-instruct/tree/olmoe-sft), [KTO code](https://github.com/Muennighoff/kto/blob/master/kto.py)
|
| 66 |
-
- [Logs](https://github.com/allenai/OLMoE/blob/main/logs/olmoe-dpo-logs.txt)
|
| 67 |
-
|
| 68 |
-
|
| 69 |
# Use
|
| 70 |
|
| 71 |
Install `transformers` **from source** until a release after [this PR](https://github.com/huggingface/transformers/pull/32406) & `torch` and run:
|
|
@@ -99,6 +55,29 @@ Branches:
|
|
| 99 |
- `non-annealed`: Ablation starting from the `non-annealed` branch of https://hf.co/allenai/OLMoE-1B-7B-0924-SFT which is an SFT of the pretraining checkpoint prior to annealing (branch `step1200000-tokens5033B` of https://hf.co/allenai/OLMoE-1B-7B-0924)
|
| 100 |
- `kto`: Ablation using KTO instead of DPO. This branch is the checkpoint after 5,000 steps with the RMS optimizer. The other `kto*` branches correspond to the other checkpoints mentioned in the paper.
|
| 101 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 102 |
# Citation
|
| 103 |
|
| 104 |
```bibtex
|
|
|
|
| 22 |
- Paper:
|
| 23 |
- Logs: https://github.com/allenai/OLMoE/blob/main/logs/olmoe-dpo-logs.txt
|
| 24 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
# Use
|
| 26 |
|
| 27 |
Install `transformers` **from source** until a release after [this PR](https://github.com/huggingface/transformers/pull/32406) & `torch` and run:
|
|
|
|
| 55 |
- `non-annealed`: Ablation starting from the `non-annealed` branch of https://hf.co/allenai/OLMoE-1B-7B-0924-SFT which is an SFT of the pretraining checkpoint prior to annealing (branch `step1200000-tokens5033B` of https://hf.co/allenai/OLMoE-1B-7B-0924)
|
| 56 |
- `kto`: Ablation using KTO instead of DPO. This branch is the checkpoint after 5,000 steps with the RMS optimizer. The other `kto*` branches correspond to the other checkpoints mentioned in the paper.
|
| 57 |
|
| 58 |
+
# Evaluation Snapshot
|
| 59 |
+
|
| 60 |
+
| Task (→) | MMLU | GSM8k | BBH | Human-Eval | Alpaca-Eval 1.0 | XSTest | IFEval | Avg |
|
| 61 |
+
|---------------|------|-------|------|------------|-----------------|--------|--------|------|
|
| 62 |
+
| **Setup (→)** | 0-shot | 8-shot CoT | 3-shot | 0-shot | 0-shot | 0-shot | 0-shot | |
|
| 63 |
+
| **Metric (→)** | EM | EM | EM | Pass@10 | %win | F1 | Loose Acc | |
|
| 64 |
+
| | | | | | | | | |
|
| 65 |
+
| OLMo-1B (0724) | 25.0 | 7.0 | 22.5 | 16.0 | - | 67.6 | 20.5 | - |
|
| 66 |
+
| +SFT | 36.0 | 12.5 | 27.2 | 21.2 | 41.5 | 81.9 | 26.1 | 35.9 |
|
| 67 |
+
| +DPO | 36.7 | 12.5 | 30.6 | 22.0 | 50.9 | 79.8 | 24.2 | 37.4 |
|
| 68 |
+
| OLMo-7B (0724) | 50.8 | 32.5 | 36.9 | 32.3 | - | 80.8 | 19.6 | - |
|
| 69 |
+
| +SFT | 54.2 | 25.0 | 35.7 | 38.5 | 70.9 | 86.1 | 39.7 | 49.3 |
|
| 70 |
+
| +DPO | 52.8 | 9.0 | 16.6 | 35.0 | 83.5 | **87.5** | 37.9 | 49.1 |
|
| 71 |
+
| JetMoE-2B-9B | 45.6 | 43.0 | 37.2 | 54.6 | - | 68.2 | 20.0 | - |
|
| 72 |
+
| +SFT | 46.1 | 53.5 | 35.6 | 64.8 | 69.3 | 55.6 | 30.5 | 50.4 |
|
| 73 |
+
| DeepSeek-3B-16B | 37.7 | 18.5 | 39.4 | 48.3 | - | 65.9 | 13.5 | - |
|
| 74 |
+
| +Chat | 48.5 | 46.5 | **40.8** | **70.1** | 74.8 | 85.6 | 32.3 | 57.0 |
|
| 75 |
+
| Qwen1.5-3B-14B | **60.4** | 13.5 | 27.2 | 60.2 | - | 73.4 | 20.9 | - |
|
| 76 |
+
| +Chat | 58.9 | **55.5** | 21.3 | 59.7 | 83.9 | 85.6 | 36.2 | 57.3 |
|
| 77 |
+
| **OLMoE (This Model)** | 49.8 | 3.0 | 33.6 | 22.4 | - | 59.7 | 16.6 | - |
|
| 78 |
+
| **+SFT** | 51.4 | 40.5 | 38.0 | 51.6 | 69.2 | 84.1 | 43.3 | 54.0 |
|
| 79 |
+
| **+DPO** | 51.9 | 45.5 | 37.0 | 54.8 | **84.0** | 82.6 | **48.1** | **57.7** |
|
| 80 |
+
|
| 81 |
# Citation
|
| 82 |
|
| 83 |
```bibtex
|