| | --- |
| | license: apache-2.0 |
| | base_model: alignment-handbook/zephyr-7b-sft-full |
| | tags: |
| | - alignment_handbook-handbook |
| | - generated_from_trainer |
| | datasets: |
| | - HuggingFaceH4/ultrafeedback_binarized |
| | model-index: |
| | - name: mistral-7B-DPO |
| | results: |
| | - task: |
| | type: text-generation |
| | dataset: |
| | name: IFEval |
| | type: IFEval |
| | metrics: |
| | - name: inst_level_strict_acc |
| | type: IFEval |
| | value: 53.06 |
| | source: |
| | name: Open LLM Leaderboard |
| | url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard |
| | - task: |
| | type: text-generation |
| | dataset: |
| | name: BBH |
| | type: BBH |
| | metrics: |
| | - name: acc_norm |
| | type: Big Bench Hard (BBH) |
| | value: 21.78 |
| | source: |
| | name: Open LLM Leaderboard |
| | url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard |
| | - task: |
| | type: text-generation |
| | dataset: |
| | name: MATH |
| | type: MATH |
| | metrics: |
| | - name: exact_match |
| | type: Math Challenges |
| | value: 2.87 |
| | source: |
| | name: Open LLM Leaderboard |
| | url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard |
| | - task: |
| | type: text-generation |
| | dataset: |
| | name: GPQA |
| | type: GPQA |
| | metrics: |
| | - name: acc_norm |
| | type: Generalized Purpose Question Answering (GPQA) |
| | value: 3.47 |
| | source: |
| | name: Open LLM Leaderboard |
| | url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard |
| | - task: |
| | type: text-generation |
| | dataset: |
| | name: MuSR |
| | type: MuSR |
| | metrics: |
| | - name: acc_norm |
| | type: MuSR |
| | value: 7.54 |
| | source: |
| | name: Open LLM Leaderboard |
| | url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard |
| | - task: |
| | type: text-generation |
| | dataset: |
| | name: MMLU-PRO |
| | type: MMLU-PRO |
| | metrics: |
| | - name: acc |
| | type: MMLU-PRO |
| | value: 19.59 |
| | source: |
| | name: Open LLM Leaderboard |
| | url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard |
| | --- |
| | <!-- This model card has been generated automatically according to the information the Trainer had access to. You |
| | should probably proofread and complete it, then remove this comment. --> |
| |
|
| | # MistralForCausalLM_Cal_DPO |
| |
|
| | This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on the HuggingFaceH4/ultrafeedback_binarized dataset. |
| | |
| | ## Model description |
| | |
| | The Cal-DPO algorithm effectively addresses the alignment problem between large language models and human preferences by calibrating the implicit rewards in comparative preference learning to match the real rewards. It has demonstrated excellent performance in multiple task benchmark tests. |
| | |
| | ## Training procedure |
| | |
| | ### Training hyperparameters |
| | |
| | The following hyperparameters were used during training: |
| | |
| | - learning_rate: 5e-07 |
| | - train_batch_size: 8 |
| | - eval_batch_size: 4 |
| | - seed: 42 |
| | - distributed_type: multi-GPU |
| | - num_devices: 4 |
| | - gradient_accumulation_steps: 2 |
| | - total_train_batch_size: 64 |
| | - total_eval_batch_size: 16 |
| | - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 |
| | - lr_scheduler_type: cosine |
| | - lr_scheduler_warmup_ratio: 0.1 |
| | - num_epochs: 1 |
| |
|
| | ### Training results |
| |
|
| | We evaluate models on 6 key benchmarks using the Eleuther AI Language Model Evaluation Harness , a unified framework to test generative language models on a large number of different evaluation tasks. |
| |
|
| | - IFEval (https://arxiv.org/abs/2311.07911) |
| | - BBH (Big Bench Hard) (https://arxiv.org/abs/2210.09261) |
| | - GPQA (Graduate-Level Google-Proof Q&A Benchmark) (https://arxiv.org/abs/2311.12022) |
| | - MuSR (Multistep Soft Reasoning) (https://arxiv.org/abs/2310.16049) |
| | - MMLU-PRO (Massive Multitask Language Understanding - Professional) (https://arxiv.org/abs/2406.01574) |
| |
|
| | ### Framework versions |
| |
|
| | - Transformers 4.40.2 |
| | - Pytorch 2.1.2+cu121 |
| | - Datasets 2.14.6 |
| | - Tokenizers 0.19.1 |
| |
|