# GFPO This feature implements the GFPO algorithm to enforce concise reasoning in the model's output generation, as proposed in the paper [Sample More to Think Less: Group Filtered Policy Optimization for Concise Reasoning](https://huggingface.co/papers/2508.09726). ## Usage To activate GFPO in [`GFPOTrainer`]: - set `num_remains_in_group` in [`GFPOConfig`] - define a group filter function and set it to `group_filter_func` in [`GFPOTrainer`]. `group_filter_func` will score the `num_generations` completions and The GFPOTrainer filters groups according to their scores to get top `num_remains_in_group` completions as a new group. Model will be trained on the filtered group. ```python # train_gfpo.py from trl.experimental.gfpo import GFPOConfig, GFPOTrainer # dummy group filter to scores the completions based on its indice in group class GroupFilter: def __call__(self, group_completions, group_rewards, **kwargs): group_scores = [] for completions, rewards in zip(group_completions, group_rewards): scores = [float(i) for i in range(len(completions))] group_scores.append(scores) return group_scores training_args = GFPOConfig( output_dir="Qwen3-0.6B-GFPO", per_device_train_batch_size=4, num_remains_in_group=2, bf16=True, ) trainer = GFPOTrainer( model="Qwen/Qwen3-0.6B", reward_funcs=..., train_dataset=..., args=training_args, group_filter_func=GroupFilter(), ) trainer.train() ``` ## GFPOTrainer [[autodoc]] experimental.gfpo.GFPOTrainer - train - save_model - push_to_hub ## GFPOConfig [[autodoc]] experimental.gfpo.GFPOConfig