File size: 8,264 Bytes
8a38b55
c3d15c5
 
 
8a38b55
 
 
 
 
c3d15c5
8a38b55
 
 
 
 
 
 
c3d15c5
 
 
 
8a38b55
 
 
c3d15c5
 
 
 
 
 
8a38b55
 
 
c3d15c5
 
 
8a38b55
 
 
c3d15c5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8a38b55
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c3d15c5
8a38b55
 
 
 
 
 
 
c3d15c5
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
---
base_model: mistralai/Mistral-7B-v0.1
datasets:
- siqi00/mistral_metamath_question_0.7_1.0_50_256
library_name: transformers
license: apache-2.0
tags:
- alignment-handbook
- generated_from_trainer
pipeline_tag: text-generation
model-index:
- name: MetaMath-Mistral-7B-DFT
  results: []
---

# MetaMath-Mistral-7B-DFT

This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the [siqi00/mistral_metamath_question_0.7_1.0_50_256](https://huggingface.co/datasets/siqi00/mistral_metamath_question_0.7_1.0_50_256) dataset.

The model was presented in the paper [Discriminative Finetuning of Generative Large Language Models without Reward Models and Human Preference Data](https://huggingface.co/papers/2502.18679).
The official code is available at: [https://github.com/PenGuln/DFT](https://github.com/PenGuln/DFT).

## Model description

Discriminative Fine-Tuning (DFT) is an improved variant of Supervised Fine-Tuning (SFT) for aligning Large Language Models (LLMs), designed to overcome the limitations of generative training objectives without requiring human-labeled preference data or strong reward models. Unlike SFT, which uses a generative approach and overlooks negative data, DFT adopts a discriminative paradigm. It aims to increase the probability of positive answers while simultaneously suppressing potentially negative ones, shifting the focus from token prediction to data prediction.

**Key Contributions:**
*   **Discriminative Probabilistic Framework**: DFT introduces a novel framework for fine-tuning LLMs by explicitly modeling the discriminative likelihood of an answer among all possible outputs given an input.
*   **Efficient Optimization Algorithms**: It includes efficient algorithms designed to optimize this discriminative likelihood.
*   **Strong Performance**: Extensive experiments demonstrate DFT's effectiveness, achieving performance better than SFT and comparable to, if not better than, the SFT followed by Preference Optimization (SFT→PO) pipeline.

## Intended uses & limitations

**Intended Uses:** This model, MetaMath-Mistral-7B-DFT, is primarily intended for improving performance in mathematical reasoning and general language generation tasks. It provides an effective fine-tuning approach for LLMs, especially in scenarios where collecting extensive human-labeled preference data for alignment is challenging. It can be used for research in LLM alignment and for applications requiring robust and accurate text generation.

**Limitations:** As a large language model, this model may inherit biases from its pre-training and fine-tuning data. While DFT aims to suppress negative outputs, it's crucial to evaluate its behavior for specific applications to mitigate potential factual inaccuracies or undesirable content generation. Users should implement appropriate safeguards when deploying the model in production environments.

## Training and evaluation data

This model was fine-tuned on the [siqi00/mistral_metamath_question_0.7_1.0_50_256](https://huggingface.co/datasets/siqi00/mistral_metamath_question_0.7_1.0_50_256) dataset, which contains mathematical reasoning questions and generated negative samples. The underlying data for mathematical reasoning comes from [MetaMathQA](https://huggingface.co/datasets/meta-math/MetaMathQA).

For evaluation and training related to general language tasks (not directly for this specific model, but for the DFT method), the paper utilized datasets derived from [HuggingFaceH4/ultrafeedback_binarized](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized), where winning responses were treated as ground truth.

## How to use

You can use this model for text generation with the Hugging Face `transformers` library.

```python
from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM
import torch

model_name = "ilgee/MetaMath-Mistral-7B-DFT"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16, # or torch.float16 if bfloat16 is not supported
    device_map="auto",
    trust_remote_code=True,
)

# Example for Text Generation
text = "Question: What is the capital of France?

Answer:"
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
)
print(pipe(text, max_new_tokens=30, do_sample=False)[0]["generated_text"])

# Example for Chat Completion (using the model's chat template)
messages = [{"role": "user", "content": "Hi! How are you?"}]
chat_prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
print(pipe(chat_prompt, max_new_tokens=50, do_sample=True)[0]["generated_text"])
```

## Performance

The model's performance on various benchmarks, as reported in the paper and GitHub repository, is summarized below.

### Mathematical Reasoning

Trained on [MetaMathQA](https://huggingface.co/datasets/meta-math/MetaMathQA). The base model is [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1). The generated negative samples $\mathbf{y}'$ can be found at [siqi00/mistral_metamath_question_0.7_1.0_50_256](https://huggingface.co/datasets/siqi00/mistral_metamath_question_0.7_1.0_50_256).

| Method | GSM8K | MATH |
|---|---|---|
| MetaMath-7B | 66.5 | 19.8 |
| MetaMath-Mistral-7B | 77.7 | 28.2 |
| MetaMath-Mistral-7B-DFT | **79.15** | 28.34 |
| MetaMath-Mistral-7B-DFT2 | 78.77 | **28.62** |

### General Language Tasks

Trained on [HuggingFaceH4/ultrafeedback_binarized](https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized), i.e., regarding the winning responses $\mathbf{y}_w$ as the ground-truth and discard all losing responses $\mathbf{y}_l$. The base model is [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1). The generated negative samples $\mathbf{y}'$ can be found at [siqi00/mistral_ultrafeedback_unhelpful_chatprompt_0.7_1.0_50_320](https://huggingface.co/datasets/siqi00/mistral_ultrafeedback_unhelpful_chatprompt_0.7_1.0_50_320).

| Method | MMLU | TruthfulQA | HellaSwag | Winogrande | GSM8k | ARC | IFEval | Avg. |
|---|---|---|---|---|---|---|---|---|
| SFT | 62.18 | 50.04 | 83.59 | 78.06 | 45.26 | 63.65 | 49.72 | 61.79 |
| SPIN | 61.99 | 49.91 | 83.75 | 77.90 | 46.02 | 61.95 | 23.11 | 57.80 |
| SimPO | 62.39 | 52.08 | 83.89 | 78.14 | 2.58 | 61.86 | 18.85 | 51.40 |
| SimPO-SFT | 62.28 | 49.59 | 83.46 | 77.90 | 42.53 | 61.52 | 43.62 | 60.13 |
| KTO | 61.59 | 49.32 | 82.88 | 79.24 | 43.97 | 61.60 | 38.08 | 59.53 |
| ORPO | 62.26 | 48.26 | 83.07 | 79.16 | 45.41 | 62.20 | 53.41 | 61.97 |
| DPO-p | 62.01 | 48.66 | 84.03 | 78.61 | 40.48 | 62.20 | 25.32 | 57.33 |
| DFT | 61.69 | 52.23 | 83.95 | 78.37 | 48.22 | 64.25 | 51.20 | 62.84 |
| DFT2 | 61.66 | 54.14 | 83.20 | 77.82 | 45.49 | 64.42 | 51.20 | 62.56 |

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 8e-07
- train_batch_size: 4
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- gradient_accumulation_steps: 4
- total_train_batch_size: 128
- total_eval_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3

### Training results

Please refer to the original [paper](https://huggingface.co/papers/2502.18679) and [GitHub repository](https://github.com/PenGuln/DFT) for detailed training results and performance metrics on various benchmarks.

### Framework versions

- Transformers 4.45.2
- Pytorch 2.1.0+cu121
- Datasets 3.2.0
- Tokenizers 0.20.3

## Citation

If you find this model or the related paper useful, please cite:

```bibtex
@inproceedings{guo2025discriminativefinetuninggenerativelarge,
      title={Discriminative Finetuning of Generative Large Language Models without Reward Models and Human Preference Data}, 
      author={Siqi Guo and Ilgee Hong and Vicente Balmaseda and Changlong Yu and Liang Qiu and Xin Liu and Haoming Jiang and Tuo Zhao and Tianbao Yang},
      year={2025},
      booktitle={In Proceedings of International Conference on Machine Learning},
      url={https://arxiv.org/abs/2502.18679}, 
}
```