|
|
--- |
|
|
license: apache-2.0 |
|
|
--- |
|
|
|
|
|
# FAPO: Flawed-Aware Policy Optimization for Efficient and Reliable Reasoning |
|
|
|
|
|
This Model is trained on the [FAPO-Reasoning-Dataset](https://huggingface.co/datasets/dyyyyyyyy/FAPO-Reasoning-Dataset) with generative rewards by [FAPO-GenRM-4B](https://huggingface.co/dyyyyyyyy/FAPO-GenRM-4B). |
|
|
|
|
|
--- |
|
|
|
|
|
Project Homepage: https://fapo-rl.github.io/ |
|
|
|
|
|
Code Implementation: https://github.com/volcengine/verl/tree/main/recipe/fapo |
|
|
|
|
|
Welcome to follow and cite our works! |
|
|
|
|
|
BibTeX citation: |
|
|
```bibtex |
|
|
@article{ding2025fapo, |
|
|
title={FAPO: Flawed-Aware Policy Optimization for Efficient and Reliable Reasoning}, |
|
|
author={Ding, Yuyang and Zhang, Chi and Li, Juntao and Lin, Haibin and Liu, Xin and Zhang, Min}, |
|
|
journal={arXiv preprint arXiv:2510.22543}, |
|
|
year={2025} |
|
|
} |
|
|
``` |
|
|
|