PersonalAILab
/

Personalized-Qwen2.5-7B-Instruct

+---
+license: apache-2.0
+library_name: transformers
+pipeline_tag: text-generation
+---
+# Personalized-Qwen2.5-7B-Instruct
+This repository hosts the `Personalized-Qwen2.5-7B-Instruct` model, developed as part of the research presented in the paper:
+**[Towards Faithful and Controllable Personalization via Critique-Post-Edit Reinforcement Learning](https://huggingface.co/papers/2510.18849)**
+## Abstract
+Faithfully personalizing large language models (LLMs) to align with individual user preferences is a critical but challenging task. While supervised fine-tuning (SFT) quickly reaches a performance plateau, standard reinforcement learning from human feedback (RLHF) also struggles with the nuances of personalization. Scalar-based reward models are prone to reward hacking which leads to verbose and superficially personalized responses. To address these limitations, we propose Critique-Post-Edit, a robust reinforcement learning framework that enables more faithful and controllable personalization. Our framework integrates two key components: (1) a Personalized Generative Reward Model (GRM) that provides multi-dimensional scores and textual critiques to resist reward hacking, and (2) a Critique-Post-Edit mechanism where the policy model revises its own outputs based on these critiques for more targeted and efficient learning. Under a rigorous length-controlled evaluation, our method substantially outperforms standard PPO on personalization benchmarks. Personalized Qwen2.5-7B achieves an average 11\% win-rate improvement, and personalized Qwen2.5-14B model surpasses the performance of GPT-4.1. These results demonstrate a practical path to faithful, efficient, and controllable personalization.
+## Project Overview
+This project provides a complete pipeline for training and evaluating large language models using the **Critique-Post-Edit** method. It includes scripts and configurations for Supervised Fine-Tuning (SFT) and Reinforcement Learning (PPO), leveraging powerful open-source frameworks like `LLaMA-Factory` and `verl`. The evaluation is conducted using `AlpacaEval` to ensure fair and comprehensive assessment of model performance. Our released models, including `Personalized-Qwen2.5-7B-Instruct` and `Personalized-Qwen2.5-14B-Instruct`, demonstrate significant improvements over the baseline models.
+For more details on the framework, training, and evaluation, please refer to the [official GitHub repository](https://github.com/OPPO-PersonalAI/Critique-Post-Edit).
+## License
+This project is licensed under the Apache License Version 2.0.
+## Citation
+If you find this work helpful, please consider citing the original paper:
+```bibtex
+@article{zhu2025towards,
+  title={Towards Faithful and Controllable Personalization via Critique-Post-Edit Reinforcement Learning},
+  author={Zhu, Chenghao and Tao, Meiling and Wang, Tiannan and Ding, Dongyi and Jiang, Yuchen Eleanor and Zhou, Wangchunshu},
+  journal={arXiv preprint arXiv:2510.18849},
+  year={2025}
+}
+```