MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark
Paper
•
2406.01574
•
Published
•
51
This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the HuggingFaceH4/ultrafeedback_binarized dataset.
The Cal-DPO algorithm effectively addresses the alignment problem between large language models and human preferences by calibrating the implicit rewards in comparative preference learning to match the real rewards. It has demonstrated excellent performance in multiple task benchmark tests.
The following hyperparameters were used during training:
We evaluate models on 6 key benchmarks using the Eleuther AI Language Model Evaluation Harness , a unified framework to test generative language models on a large number of different evaluation tasks.
Base model
mistralai/Mistral-7B-v0.1