File size: 2,018 Bytes
0b69286 b4f7003 0b69286 653e62b 32049c8 0b69286 32049c8 0b69286 0a915ee 0b69286 32049c8 0b69286 32049c8 0b69286 7291c4b 0b69286 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 | ---
license: apache-2.0
language:
- en
- zh
base_model:
- deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
pipeline_tag: text2text-generation
tags:
- Reward_Model
- Reasoning_Model
---
# Model Card for Model ID
## Model Details
### Model Description
<!-- Provide a longer summary of what this model is. -->
- **Developed by:** Hao Peng@THUKEG
- **Model type:** Generative reward model
- **Language(s) (NLP):** English, CHinese
- **License:** apache-2.0
- **Finetuned from model [optional]:** deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
### Model Sources [optional]
<!-- Provide the basic links for the model. -->
- **Repository:** https://github.com/THU-KEG/VerIF
- **Paper:** https://arxiv.org/abs/2506.09942
## Training Details
### Training Data
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
This model is trained from DeepSeek-R1-Distill-Qwen-7B using 131k critic data [IF-Verifier-Data](https://huggingface.co/datasets/THU-KEG/IF-Verifier-Data).
This model is used for verifying soft constraints of instruction following.
Deploying IF-Verifier-7B requires only one single H800 GPU, with an average reward computation time of **120** seconds per batch, which can be further reduced with multi-GPUs.
### Results
The model trained using this model is comparable with that of QwQ 32B.

#### Summary
Please refer to our paper and our GitHub repo (https://github.com/THU-KEG/VerIF) for more details.
## Citation
If this model helps, please kindly cite us:
```
@misc{peng2025verif,
title={VerIF: Verification Engineering for Reinforcement Learning in Instruction Following},
author={Hao Peng and Yunjia Qi and Xiaozhi Wang and Bin Xu and Lei Hou and Juanzi Li},
year={2025},
eprint={2506.09942},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2506.09942},
}
``` |