Magpie-Align
/

Llama-3.1-8B-Magpie-Align-v0.1

Text Generation

alignment-handbook

Generated from Trainer

text-generation-inference

Model card Files Files and versions

Zhangchen Xu commited on Jul 25, 2024

Commit

9299d8e

·

verified ·

1 Parent(s): ced620b

Update README.md

Files changed (1) hide show

README.md +8 -19

README.md CHANGED Viewed

@@ -5,23 +5,14 @@ tags:
 - trl
 - dpo
 - generated_from_trainer
-- trl
-- dpo
-- generated_from_trainer
 datasets:
-- flydust/llama3-ultrafeedback-armorm-2
 model-index:
-- name: Llama-3.1-8B-Magpie-Pro-MTR-UltraDPO-1
   results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/uw-nsl/huggingface/runs/ro30b4xx)
-# Llama-3.1-8B-Magpie-Pro-MTR-UltraDPO-1
-This model is a fine-tuned version of [Magpie-Align/Llama-3.1-8B-Magpie-Mix-300KMT-150KR](https://huggingface.co/Magpie-Align/Llama-3.1-8B-Magpie-Mix-300KMT-150KR) on the flydust/llama3-ultrafeedback-armorm-2 dataset.
 It achieves the following results on the evaluation set:
 - Loss: 0.3290
 - Rewards/chosen: -4.8185
@@ -35,15 +26,13 @@ It achieves the following results on the evaluation set:
 ## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
 ## Training procedure

 - trl
 - dpo
 - generated_from_trainer
 datasets:
+- princeton-nlp/llama3-ultrafeedback-armorm
 model-index:
+- name: Llama-3.1-8B-Magpie-Align-v0.1-RC1
   results: []
 ---
+This model is a fine-tuned version of [Magpie-Align/Llama-3.1-8B-Magpie-Align-SFT-v0.1](https://huggingface.co/Magpie-Align/Llama-3.1-8B-Magpie-Align-SFT-v0.1) on the princeton-nlp/llama3-ultrafeedback-armorm dataset.
 It achieves the following results on the evaluation set:
 - Loss: 0.3290
 - Rewards/chosen: -4.8185
 ## Model description
+More details will be added soon.
+## Benchmark
+- **MT-Bench: 8.375 (1st Turn), 7.650 (Second Turn), 8.013 (Average)**
+- **Alpaca Eval 2 (GPT-4-Turbo-1106): 45.73 (LC), 52.79 (WR)**
+- **Arena Hard: 42.4**
 ## Training procedure