tianyil1
/

MistralForCausalLM_Cal_DPO

alignment_handbook-handbook

Generated from Trainer

Eval Results (legacy)

Model card Files Files and versions

MistralForCausalLM_Cal_DPO / README.md

skyoneliu

fix the format issue

7fba330 about 1 year ago

|

history blame contribute delete

4.09 kB

	---
	license: apache-2.0
	base_model: alignment-handbook/zephyr-7b-sft-full
	tags:
	- alignment_handbook-handbook
	- generated_from_trainer
	datasets:
	- HuggingFaceH4/ultrafeedback_binarized
	model-index:
	- name: mistral-7B-DPO
	results:
	- task:
	type: text-generation
	dataset:
	name: IFEval
	type: IFEval
	metrics:
	- name: inst_level_strict_acc
	type: IFEval
	value: 53.06
	source:
	name: Open LLM Leaderboard
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard
	- task:
	type: text-generation
	dataset:
	name: BBH
	type: BBH
	metrics:
	- name: acc_norm
	type: Big Bench Hard (BBH)
	value: 21.78
	source:
	name: Open LLM Leaderboard
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard
	- task:
	type: text-generation
	dataset:
	name: MATH
	type: MATH
	metrics:
	- name: exact_match
	type: Math Challenges
	value: 2.87
	source:
	name: Open LLM Leaderboard
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard
	- task:
	type: text-generation
	dataset:
	name: GPQA
	type: GPQA
	metrics:
	- name: acc_norm
	type: Generalized Purpose Question Answering (GPQA)
	value: 3.47
	source:
	name: Open LLM Leaderboard
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard
	- task:
	type: text-generation
	dataset:
	name: MuSR
	type: MuSR
	metrics:
	- name: acc_norm
	type: MuSR
	value: 7.54
	source:
	name: Open LLM Leaderboard
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard
	- task:
	type: text-generation
	dataset:
	name: MMLU-PRO
	type: MMLU-PRO
	metrics:
	- name: acc
	type: MMLU-PRO
	value: 19.59
	source:
	name: Open LLM Leaderboard
	url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard
	---
	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# MistralForCausalLM_Cal_DPO

	This model is a fine-tuned version of [alignment-handbook/zephyr-7b-sft-full](https://huggingface.co/alignment-handbook/zephyr-7b-sft-full) on the HuggingFaceH4/ultrafeedback_binarized dataset.

	## Model description

	The Cal-DPO algorithm effectively addresses the alignment problem between large language models and human preferences by calibrating the implicit rewards in comparative preference learning to match the real rewards. It has demonstrated excellent performance in multiple task benchmark tests.

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:

	- learning_rate: 5e-07
	- train_batch_size: 8
	- eval_batch_size: 4
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 4
	- gradient_accumulation_steps: 2
	- total_train_batch_size: 64
	- total_eval_batch_size: 16
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_ratio: 0.1
	- num_epochs: 1

	### Training results

	We evaluate models on 6 key benchmarks using the Eleuther AI Language Model Evaluation Harness , a unified framework to test generative language models on a large number of different evaluation tasks.

	- IFEval (https://arxiv.org/abs/2311.07911)
	- BBH (Big Bench Hard) (https://arxiv.org/abs/2210.09261)
	- GPQA (Graduate-Level Google-Proof Q&A Benchmark) (https://arxiv.org/abs/2311.12022)
	- MuSR (Multistep Soft Reasoning) (https://arxiv.org/abs/2310.16049)
	- MMLU-PRO (Massive Multitask Language Understanding - Professional) (https://arxiv.org/abs/2406.01574)

	### Framework versions

	- Transformers 4.40.2
	- Pytorch 2.1.2+cu121
	- Datasets 2.14.6
	- Tokenizers 0.19.1