Update README.md
Browse files
README.md
CHANGED
|
@@ -19,7 +19,7 @@ metrics:
|
|
| 19 |
# Qwen2.5-7B-ODA-Mixture-100k
|
| 20 |
<img src="performance.png" alt="Leaderboard Performance" width="1200" />
|
| 21 |
|
| 22 |
-
Qwen2.5-7B-ODA-Mixture-100k is a supervised fine-tuned (SFT) model built on top of **Qwen2.5-7B-Base**, trained with **[ODA-Mixture-100k](https://huggingface.co/datasets/OpenDataArena/ODA-Mixture-100k)**. This training set is curated by mixing top-performing open corpora selected via the *[OpenDataArena](https://github.
|
| 23 |
|
| 24 |
---
|
| 25 |
|
|
@@ -221,25 +221,6 @@ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
|
| 221 |
|
| 222 |
---
|
| 223 |
|
| 224 |
-
## 🏋️ Training Hyperparameters
|
| 225 |
-
|
| 226 |
-
The following hyperparameters were used during training:
|
| 227 |
-
|
| 228 |
-
- **learning_rate**: 5e-05
|
| 229 |
-
- **train_batch_size**: 1
|
| 230 |
-
- **eval_batch_size**: 8
|
| 231 |
-
- **seed**: 42
|
| 232 |
-
- **distributed_type**: multi-GPU
|
| 233 |
-
- **num_devices**: 32
|
| 234 |
-
- **total_train_batch_size**: 32
|
| 235 |
-
- **total_eval_batch_size**: 256
|
| 236 |
-
- **optimizer**: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
|
| 237 |
-
- **lr_scheduler_type**: cosine
|
| 238 |
-
- **lr_scheduler_warmup_ratio**: 0.1
|
| 239 |
-
- **num_epochs**: 3.0
|
| 240 |
-
|
| 241 |
-
---
|
| 242 |
-
|
| 243 |
## 📚 Citation
|
| 244 |
|
| 245 |
If you use this model or its training data (ODA-Mixture-100k), please cite:
|
|
|
|
| 19 |
# Qwen2.5-7B-ODA-Mixture-100k
|
| 20 |
<img src="performance.png" alt="Leaderboard Performance" width="1200" />
|
| 21 |
|
| 22 |
+
Qwen2.5-7B-ODA-Mixture-100k is a supervised fine-tuned (SFT) model built on top of **Qwen2.5-7B-Base**, trained with **[ODA-Mixture-100k](https://huggingface.co/datasets/OpenDataArena/ODA-Mixture-100k)**. This training set is curated by mixing top-performing open corpora selected via the *[OpenDataArena](https://opendataarena.github.io)* leaderboard, and refined through deduplication and benchmark decontamination, aiming to improve the model’s general capabilities across **General**, **Math**, **Code**, and **Reasoning** domains under a compact ~100K data budget.
|
| 23 |
|
| 24 |
---
|
| 25 |
|
|
|
|
| 221 |
|
| 222 |
---
|
| 223 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 224 |
## 📚 Citation
|
| 225 |
|
| 226 |
If you use this model or its training data (ODA-Mixture-100k), please cite:
|