Improve model card: Add pipeline tag, paper/code links, usage, and detailed info
#1
by
nielsr HF Staff - opened
README.md
CHANGED
|
@@ -1,38 +1,226 @@
|
|
| 1 |
---
|
|
|
|
|
|
|
|
|
|
| 2 |
library_name: transformers
|
| 3 |
license: apache-2.0
|
| 4 |
-
base_model: mistralai/Mistral-7B-v0.1
|
| 5 |
tags:
|
| 6 |
- alignment-handbook
|
| 7 |
- generated_from_trainer
|
| 8 |
-
|
| 9 |
-
- siqi00/mistral_metamath_question_0.7_1.0_50_256
|
| 10 |
model-index:
|
| 11 |
- name: MetaMath-Mistral-7B-DFT2
|
| 12 |
results: []
|
| 13 |
---
|
| 14 |
|
| 15 |
-
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
| 16 |
-
should probably proofread and complete it, then remove this comment. -->
|
| 17 |
-
|
| 18 |
# MetaMath-Mistral-7B-DFT2
|
| 19 |
|
| 20 |
-
This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)
|
|
|
|
|
|
|
| 21 |
|
| 22 |
## Model description
|
| 23 |
|
| 24 |
-
|
| 25 |
|
| 26 |
## Intended uses & limitations
|
| 27 |
|
| 28 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 29 |
|
| 30 |
## Training and evaluation data
|
| 31 |
|
| 32 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 33 |
|
| 34 |
## Training procedure
|
| 35 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 36 |
### Training hyperparameters
|
| 37 |
|
| 38 |
The following hyperparameters were used during training:
|
|
@@ -52,7 +240,7 @@ The following hyperparameters were used during training:
|
|
| 52 |
|
| 53 |
### Training results
|
| 54 |
|
| 55 |
-
|
| 56 |
|
| 57 |
### Framework versions
|
| 58 |
|
|
@@ -60,3 +248,17 @@ The following hyperparameters were used during training:
|
|
| 60 |
- Pytorch 2.1.0+cu121
|
| 61 |
- Datasets 3.2.0
|
| 62 |
- Tokenizers 0.20.3
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
base_model: mistralai/Mistral-7B-v0.1
|
| 3 |
+
datasets:
|
| 4 |
+
- siqi00/mistral_metamath_question_0.7_1.0_50_256
|
| 5 |
library_name: transformers
|
| 6 |
license: apache-2.0
|
|
|
|
| 7 |
tags:
|
| 8 |
- alignment-handbook
|
| 9 |
- generated_from_trainer
|
| 10 |
+
pipeline_tag: text-generation
|
|
|
|
| 11 |
model-index:
|
| 12 |
- name: MetaMath-Mistral-7B-DFT2
|
| 13 |
results: []
|
| 14 |
---
|
| 15 |
|
|
|
|
|
|
|
|
|
|
| 16 |
# MetaMath-Mistral-7B-DFT2
|
| 17 |
|
| 18 |
+
This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) using the **Discriminative Fine-Tuning (DFT)** method, as presented in the paper [Discriminative Finetuning of Generative Large Language Models without Reward Models and Human Preference Data](https://huggingface.co/papers/2502.18679).
|
| 19 |
+
|
| 20 |
+
The official code and further resources are available on the [GitHub repository: PenGuln/DFT](https://github.com/PenGuln/DFT).
|
| 21 |
|
| 22 |
## Model description
|
| 23 |
|
| 24 |
+
Discriminative Fine-Tuning (DFT) is an improved variant of Supervised Fine-Tuning (SFT) for aligning Large Language Models (LLMs). It addresses the limitations of generative training objectives inherent in SFT by adopting a discriminative paradigm. Unlike SFT, which typically overlooks negative data, DFT increases the probability of positive answers while simultaneously suppressing potentially negative ones, aiming for data prediction instead of token prediction. This approach significantly mitigates the burden of collecting human-labeled preference data or training strong reward models. DFT introduces a discriminative probabilistic framework for fine-tuning LLMs by explicitly modeling the discriminative likelihood of an answer among all possible outputs given an input. The paper also describes efficient algorithms to optimize this discriminative likelihood.
|
| 25 |
|
| 26 |
## Intended uses & limitations
|
| 27 |
|
| 28 |
+
This model, MetaMath-Mistral-7B-DFT2, is primarily intended for text generation tasks, particularly those requiring strong mathematical reasoning capabilities. It has demonstrated effectiveness in achieving performance comparable to, and in some cases better than, SFT followed by preference optimization methods. It can be used for various applications such as solving math problems, generating logical responses, and general chat completion.
|
| 29 |
+
|
| 30 |
+
**Limitations:** As with all large language models, this model may exhibit biases present in its training data, and there's a possibility of generating incorrect, nonsensical, or unhelpful information. Its performance is optimized for the domains and tasks it was fine-tuned on, and generalization to vastly different or out-of-distribution tasks might vary.
|
| 31 |
+
|
| 32 |
+
## Usage
|
| 33 |
+
|
| 34 |
+
You can use this model for text generation and chat completion with the `transformers` library:
|
| 35 |
+
|
| 36 |
+
```python
|
| 37 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM
|
| 38 |
+
import torch
|
| 39 |
+
|
| 40 |
+
model_id = "siqi00/MetaMath-Mistral-7B-DFT2"
|
| 41 |
+
|
| 42 |
+
# Load model and tokenizer
|
| 43 |
+
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
| 44 |
+
model = AutoModelForCausalLM.from_pretrained(
|
| 45 |
+
model_id,
|
| 46 |
+
torch_dtype=torch.bfloat16, # Use torch.float16 for GPUs with less VRAM
|
| 47 |
+
device_map="auto",
|
| 48 |
+
# trust_remote_code=True, # Might be needed if custom modeling files are present
|
| 49 |
+
)
|
| 50 |
+
|
| 51 |
+
# Example 1: Basic Text Generation
|
| 52 |
+
prompt_text = "The Pythagorean theorem states that in a right-angled triangle,"
|
| 53 |
+
inputs = tokenizer(prompt_text, return_tensors="pt").to(model.device)
|
| 54 |
+
|
| 55 |
+
outputs = model.generate(**inputs, max_new_tokens=50, do_sample=True, temperature=0.7)
|
| 56 |
+
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
| 57 |
+
print(f"Generated text: {generated_text}")
|
| 58 |
+
|
| 59 |
+
# Example 2: Chat Completion using the model's chat template
|
| 60 |
+
messages = [
|
| 61 |
+
{"role": "user", "content": "Explain what a LLM is to a 5-year-old."},
|
| 62 |
+
]
|
| 63 |
+
|
| 64 |
+
chat_input = tokenizer.apply_chat_template(
|
| 65 |
+
messages,
|
| 66 |
+
tokenize=False,
|
| 67 |
+
add_generation_prompt=True
|
| 68 |
+
)
|
| 69 |
+
|
| 70 |
+
chat_inputs = tokenizer(chat_input, return_tensors="pt").to(model.device)
|
| 71 |
+
chat_outputs = model.generate(**chat_inputs, max_new_tokens=100)
|
| 72 |
+
decoded_chat_output = tokenizer.batch_decode(chat_outputs[0], skip_special_tokens=True)[0]
|
| 73 |
+
print(f"Chat response: {decoded_chat_output}")
|
| 74 |
+
```
|
| 75 |
+
|
| 76 |
+
## Performance
|
| 77 |
+
|
| 78 |
+
The model's effectiveness has been demonstrated across various benchmarks, including mathematical reasoning and general language tasks.
|
| 79 |
+
|
| 80 |
+
### Mathematical Reasoning
|
| 81 |
+
|
| 82 |
+
Trained on [MetaMathQA](https://huggingface.co/datasets/meta-math/MetaMathQA). The base model is [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1). The generated negative samples $\mathbf{y}'$ can be found at [siqi00/mistral_metamath_question_0.7_1.0_50_256](https://huggingface.co/datasets/siqi00/mistral_metamath_question_0.7_1.0_50_256).
|
| 83 |
+
|
| 84 |
+
| Method | GSM8K | MATH |
|
| 85 |
+
|--------|-------|------|
|
| 86 |
+
| MetaMath-7B | 66.5 | 19.8 |
|
| 87 |
+
| MetaMath-Mistral-7B | 77.7 | 28.2 |
|
| 88 |
+
| MetaMath-Mistral-7B-DFT | **79.15** | 28.34 |
|
| 89 |
+
| MetaMath-Mistral-7B-DFT2 | 78.77 | **28.62** |
|
| 90 |
+
|
| 91 |
+
### General Language Tasks
|
| 92 |
+
|
| 93 |
+
Trained on [HuggingFaceH4/ultrafeedback_binarized](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized), where winning responses $\mathbf{y}_w$ are regarded as ground-truth and losing responses $\mathbf{y}_l$ are discarded. The base model is [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1). The generated negative samples $\mathbf{y}'$ can be found at [siqi00/mistral_ultrafeedback_unhelpful_chatprompt_0.7_1.0_50_320](https://huggingface.co/datasets/siqi00/mistral_ultrafeedback_unhelpful_chatprompt_0.7_1.0_50_320).
|
| 94 |
+
|
| 95 |
+
| Method | MMLU | TruthfulQA | HellaSwag | Winogrande | GSM8k | ARC | IFEval | Avg. |
|
| 96 |
+
|--------|-------|------------|-----------|------------|--------|-----|---------|-------|
|
| 97 |
+
| SFT | 62.18 | 50.04 | 83.59 | 78.06 | 45.26 | 63.65 | 49.72 | 61.79 |
|
| 98 |
+
| SPIN | 61.99 | 49.91 | 83.75 | 77.90 | 46.02 | 61.95 | 23.11 | 57.80 |
|
| 99 |
+
| SimPO | 62.39 | 52.08 | 83.89 | 78.14 | 2.58 | 61.86 | 18.85 | 51.40 |
|
| 100 |
+
| SimPO-SFT | 62.28 | 49.59 | 83.46 | 77.90 | 42.53 | 61.52 | 43.62 | 60.13 |
|
| 101 |
+
| KTO | 61.59 | 49.32 | 82.88 | 79.24 | 43.97 | 61.60 | 38.08 | 59.53 |
|
| 102 |
+
| ORPO | 62.26 | 48.26 | 83.07 | 79.16 | 45.41 | 62.20 | 53.41 | 61.97 |
|
| 103 |
+
| DPO-p | 62.01 | 48.66 | 84.03 | 78.61 | 40.48 | 62.20 | 25.32 | 57.33 |
|
| 104 |
+
| DFT | 61.69 | 52.23 | 83.95 | 78.37 | 48.22 | 64.25 | 51.20 | 62.84 |
|
| 105 |
+
| DFT2 | 61.66 | 54.14 | 83.20 | 77.82 | 45.49 | 64.42 | 51.20 | 62.56 |
|
| 106 |
|
| 107 |
## Training and evaluation data
|
| 108 |
|
| 109 |
+
This model was specifically fine-tuned on the `siqi00/mistral_metamath_question_0.7_1.0_50_256` dataset. This dataset was generated to contain negative samples for discriminative fine-tuning. For details on how negative samples are generated and other datasets used in the DFT project (e.g., for general language tasks), please refer to the [official GitHub repository](https://github.com/PenGuln/DFT).
|
| 110 |
+
|
| 111 |
+
## Installation
|
| 112 |
+
|
| 113 |
+
To set up the environment and install necessary dependencies for replicating DFT training and generation, follow these steps:
|
| 114 |
+
|
| 115 |
+
```bash
|
| 116 |
+
# Clone the repository
|
| 117 |
+
git clone https://github.com/PenGuln/DFT.git
|
| 118 |
+
cd DFT
|
| 119 |
+
conda env create -f dft.yml
|
| 120 |
+
conda activate dft
|
| 121 |
+
```
|
| 122 |
+
|
| 123 |
+
Next, install `flash-attn` and `alignment-handbook`:
|
| 124 |
+
|
| 125 |
+
```bash
|
| 126 |
+
pip install flash-attn==2.6.3 --no-build-isolation
|
| 127 |
+
|
| 128 |
+
git clone https://github.com/huggingface/alignment-handbook.git
|
| 129 |
+
cd alignment-handbook
|
| 130 |
+
git checkout ae3f44fc7d8003d706752ca06f689574dffa3b76 # Ensure specific version for reproducibility
|
| 131 |
+
python -m pip install .
|
| 132 |
+
cd ..
|
| 133 |
+
rm -r alignment-handbook
|
| 134 |
+
```
|
| 135 |
+
|
| 136 |
+
Finally, log into your Hugging Face and Weights and Biases accounts:
|
| 137 |
+
|
| 138 |
+
```bash
|
| 139 |
+
huggingface-cli login
|
| 140 |
+
wandb login
|
| 141 |
+
```
|
| 142 |
+
|
| 143 |
+
## Generating negative samples
|
| 144 |
+
|
| 145 |
+
The repository includes tools for generating negative samples, which are crucial for DFT. You can use `gen.py` to create 1-to-m datasets (m negative samples per prompt).
|
| 146 |
+
|
| 147 |
+
```bash
|
| 148 |
+
# Example: Generate samples with 8 different seeds for an 1-to-8 dataset
|
| 149 |
+
for seed in {0..7}; do
|
| 150 |
+
python generator/gen.py \
|
| 151 |
+
--model mistralai/Mistral-7B-v0.1 \
|
| 152 |
+
--revision 7231864981174d9bee8c7687c24c8344414eae6b \
|
| 153 |
+
--seed $seed \
|
| 154 |
+
--chat \
|
| 155 |
+
--system_message "You are an unhelpful assistant." \
|
| 156 |
+
--temp 0.7 \
|
| 157 |
+
--top_p 1.0 \
|
| 158 |
+
--top_k 50 \
|
| 159 |
+
--max_new_tokens 320 \
|
| 160 |
+
--output_prefix "mistral_ultrafeedback"
|
| 161 |
+
done
|
| 162 |
+
python generator/merge_and_upload.py \
|
| 163 |
+
--dataset "mistral_ultrafeedback" \
|
| 164 |
+
--push_to_hub # Optional: push to Hugging Face
|
| 165 |
+
```
|
| 166 |
+
|
| 167 |
+
To replicate the UF self-play datasets used in the paper:
|
| 168 |
+
|
| 169 |
+
```bash
|
| 170 |
+
bash generator/gen_uf.sh
|
| 171 |
+
```
|
| 172 |
+
|
| 173 |
+
To replicate the MetaMath self-play datasets:
|
| 174 |
+
|
| 175 |
+
```bash
|
| 176 |
+
bash generator/gen_mmqa.sh
|
| 177 |
+
```
|
| 178 |
+
|
| 179 |
+
## Evaluation
|
| 180 |
+
|
| 181 |
+
For evaluation tasks using `lm-eval-harness` and `alpaca_eval`, ensure you use the specified versions to reproduce the reported results:
|
| 182 |
+
|
| 183 |
+
```bash
|
| 184 |
+
pip install lm_eval==0.4.5
|
| 185 |
+
pip install alpaca_eval==0.6.2
|
| 186 |
+
```
|
| 187 |
+
|
| 188 |
+
## Precompute Log-likelihood
|
| 189 |
+
|
| 190 |
+
For scenarios with limited GPU memory, an option is provided to precompute log probabilities for the reference model. This allows training without keeping the reference model in memory.
|
| 191 |
+
|
| 192 |
+
To use this feature, run the training script with `--precompute_offline_ref_log_probs` enabled:
|
| 193 |
+
|
| 194 |
+
```bash
|
| 195 |
+
accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml scripts/run_dft.py recipes/dft/mistral_base_dft.yaml \
|
| 196 |
+
--output_dir=./ckpts/dft \
|
| 197 |
+
--precompute_offline_ref_log_probs=true \
|
| 198 |
+
--save_strategy=no
|
| 199 |
+
```
|
| 200 |
+
|
| 201 |
+
This will create a `logps.pt` file in your output directory. Next, load the `logps.pt` file for reference using the `--probs_dir` argument:
|
| 202 |
+
|
| 203 |
+
```bash
|
| 204 |
+
accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml scripts/run_dft.py recipes/dft/mistral_base_dft.yaml \
|
| 205 |
+
--probs_dir=./ckpts/dft/logps.pt
|
| 206 |
+
```
|
| 207 |
|
| 208 |
## Training procedure
|
| 209 |
|
| 210 |
+
### Training commands
|
| 211 |
+
|
| 212 |
+
Examples of how to run DFT and DFT2 training:
|
| 213 |
+
|
| 214 |
+
**DFT:**
|
| 215 |
+
```bash
|
| 216 |
+
accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml scripts/run_dft.py recipes/dft/mistral_base_dft.yaml
|
| 217 |
+
```
|
| 218 |
+
|
| 219 |
+
**DFT2:**
|
| 220 |
+
```bash
|
| 221 |
+
accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml scripts/run_dft.py recipes/dft/mistral_base_dft2.yaml
|
| 222 |
+
```
|
| 223 |
+
|
| 224 |
### Training hyperparameters
|
| 225 |
|
| 226 |
The following hyperparameters were used during training:
|
|
|
|
| 240 |
|
| 241 |
### Training results
|
| 242 |
|
| 243 |
+
(No specific training results like loss curves were provided in the source material.)
|
| 244 |
|
| 245 |
### Framework versions
|
| 246 |
|
|
|
|
| 248 |
- Pytorch 2.1.0+cu121
|
| 249 |
- Datasets 3.2.0
|
| 250 |
- Tokenizers 0.20.3
|
| 251 |
+
|
| 252 |
+
## Citation
|
| 253 |
+
|
| 254 |
+
Please cite the original paper if you use this model in your work:
|
| 255 |
+
|
| 256 |
+
```bibtex
|
| 257 |
+
@inproceedings{guo2025discriminativefinetuninggenerativelarge,
|
| 258 |
+
title={Discriminative Finetuning of Generative Large Language Models without Reward Models and Human Preference Data},
|
| 259 |
+
author={Siqi Guo and Ilgee Hong and Vicente Balmaseda and Changlong Yu and Liang Qiu and Xin Liu and Haoming Jiang and Tuo Zhao and Tianbao Yang},
|
| 260 |
+
year={2025},
|
| 261 |
+
booktitle={In Proceedings of International Conference on Machine Learning},
|
| 262 |
+
url={https://arxiv.org/abs/2502.18679},
|
| 263 |
+
}
|
| 264 |
+
```
|