Instructions to use xingyuHuxingyu/DynamicPO with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use xingyuHuxingyu/DynamicPO with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
Add model card for DynamicPO
#1
by nielsr HF Staff - opened
README.md
ADDED
|
@@ -0,0 +1,43 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
library_name: peft
|
| 3 |
+
pipeline_tag: text-generation
|
| 4 |
+
base_model:
|
| 5 |
+
- Qwen/Qwen2.5-7B-Instruct
|
| 6 |
+
- meta-llama/Meta-Llama-3-8B-Instruct
|
| 7 |
+
- meta-llama/Llama-2-7b-chat-hf
|
| 8 |
+
---
|
| 9 |
+
|
| 10 |
+
# DynamicPO: Dynamic Preference Optimization for Recommendation
|
| 11 |
+
|
| 12 |
+
This repository contains the model weights (LoRA adapters) for **DynamicPO**, a plug-and-play dynamic preference optimization framework for LLM-based recommender systems.
|
| 13 |
+
|
| 14 |
+
DynamicPO is designed to align Large Language Models (LLMs) with user preferences while mitigating "preference optimization collapse." This phenomenon occurs in multi-negative alignment when increasing the number of negative samples leads to performance degradation despite a decreasing training loss.
|
| 15 |
+
|
| 16 |
+
## Key Features
|
| 17 |
+
|
| 18 |
+
DynamicPO comprises two adaptive mechanisms:
|
| 19 |
+
- **Dynamic Boundary Negative Selection**: Identifies and prioritizes informative negatives near the model's decision boundary.
|
| 20 |
+
- **Dual-Margin Dynamic beta Adjustment**: Calibrates optimization strength per sample according to boundary ambiguity.
|
| 21 |
+
|
| 22 |
+
## Resources
|
| 23 |
+
|
| 24 |
+
- **Paper**: [DynamicPO: Dynamic Preference Optimization for Recommendation](https://huggingface.co/papers/2605.00327)
|
| 25 |
+
- **GitHub Repository**: [xingyuHuxingyu/DynamicPO](https://github.com/xingyuHuxingyu/DynamicPO)
|
| 26 |
+
- **Dataset**: [DynamicPO Dataset](https://huggingface.co/datasets/xingyuHuxingyu/DynamicPO-Data)
|
| 27 |
+
|
| 28 |
+
## Citation
|
| 29 |
+
|
| 30 |
+
This work was presented at DASFAA 2026. If you find this work useful, please consider citing:
|
| 31 |
+
|
| 32 |
+
```bibtex
|
| 33 |
+
@article{hu2026dynamicpo,
|
| 34 |
+
title={DynamicPO: Dynamic Preference Optimization for Recommendation},
|
| 35 |
+
author={Hu, Xingyu and Zhang, Kai and Wu, Jiancan and Wang, Shuli and Wang, Chi and Chen, Wenshuai and Zhu, Yinhua and Wang, Haitao and Wang, Xingxing and Wang, Xiang},
|
| 36 |
+
journal={arXiv preprint arXiv:2605.00327},
|
| 37 |
+
year={2026}
|
| 38 |
+
}
|
| 39 |
+
```
|
| 40 |
+
|
| 41 |
+
## Acknowledgment
|
| 42 |
+
|
| 43 |
+
This implementation is built upon the [TRL library](https://github.com/huggingface/trl).
|