dyyyyyyyy
/

FAPO-32B

Model card Files Files and versions

FAPO-32B / README.md

dyyyyyyyy's picture

Update README.md

eb6ae67 verified 3 months ago

|

history blame contribute delete

791 Bytes

	---
	license: apache-2.0
	---

	# FAPO: Flawed-Aware Policy Optimization for Efficient and Reliable Reasoning

	This Model is trained on the [FAPO-Reasoning-Dataset](https://huggingface.co/datasets/dyyyyyyyy/FAPO-Reasoning-Dataset) with generative rewards by [FAPO-GenRM-4B](https://huggingface.co/dyyyyyyyy/FAPO-GenRM-4B).

	---

	Project Homepage: https://fapo-rl.github.io/

	Code Implementation: https://github.com/volcengine/verl/tree/main/recipe/fapo

	Welcome to follow and cite our works!

	BibTeX citation:
	```bibtex
	@article{ding2025fapo,
	title={FAPO: Flawed-Aware Policy Optimization for Efficient and Reliable Reasoning},
	author={Ding, Yuyang and Zhang, Chi and Li, Juntao and Lin, Haibin and Liu, Xin and Zhang, Min},
	journal={arXiv preprint arXiv:2510.22543},
	year={2025}
	}
	```