Buckets:
| # GFPO | |
| This feature implements the GFPO algorithm to enforce concise reasoning in the model's output generation, as proposed in the paper [Sample More to Think Less: Group Filtered Policy Optimization for Concise Reasoning](https://huggingface.co/papers/2508.09726). | |
| ## Usage | |
| To activate GFPO in `GFPOTrainer`: | |
| - set `num_remains_in_group` in `GFPOConfig` | |
| - define a group filter function and set it to `group_filter_func` in `GFPOTrainer`. `group_filter_func` will score the `num_generations` completions and The GFPOTrainer filters groups according to their scores to get top `num_remains_in_group` completions as a new group. Model will be trained on the filtered group. | |
| ```python | |
| # train_gfpo.py | |
| from trl.experimental.gfpo import GFPOConfig, GFPOTrainer | |
| # dummy group filter to scores the completions based on its indice in group | |
| class GroupFilter: | |
| def __call__(self, group_completions, group_rewards, **kwargs): | |
| group_scores = [] | |
| for completions, rewards in zip(group_completions, group_rewards): | |
| scores = [float(i) for i in range(len(completions))] | |
| group_scores.append(scores) | |
| return group_scores | |
| training_args = GFPOConfig( | |
| output_dir="Qwen3-0.6B-GFPO", | |
| per_device_train_batch_size=4, | |
| num_remains_in_group=2, | |
| bf16=True, | |
| ) | |
| trainer = GFPOTrainer( | |
| model="Qwen/Qwen3-0.6B", | |
| reward_funcs=..., | |
| train_dataset=..., | |
| args=training_args, | |
| group_filter_func=GroupFilter(), | |
| ) | |
| trainer.train() | |
| ``` | |
| ## GFPOTrainer[[trl.experimental.gfpo.GFPOTrainer]] | |
| #### trl.experimental.gfpo.GFPOTrainer[[trl.experimental.gfpo.GFPOTrainer]] | |
| [Source](https://github.com/huggingface/trl/blob/vr_5607/trl/experimental/gfpo/gfpo_trainer.py#L33) | |
| traintrl.experimental.gfpo.GFPOTrainer.trainhttps://github.com/huggingface/trl/blob/vr_5607/transformers/trainer.py#L1323[{"name": "resume_from_checkpoint", "val": ": str | bool | None = None"}, {"name": "trial", "val": ": optuna.Trial | dict[str, Any] | None = None"}, {"name": "ignore_keys_for_eval", "val": ": list[str] | None = None"}]- **resume_from_checkpoint** (`str` or `bool`, *optional*) -- | |
| If a `str`, local path to a saved checkpoint as saved by a previous instance of `Trainer`. If a | |
| `bool` and equals `True`, load the last checkpoint in *args.output_dir* as saved by a previous instance | |
| of `Trainer`. If present, training will resume from the model/optimizer/scheduler states loaded here. | |
| - **trial** (`optuna.Trial` or `dict[str, Any]`, *optional*) -- | |
| The trial run or the hyperparameter dictionary for hyperparameter search. | |
| - **ignore_keys_for_eval** (`list[str]`, *optional*) -- | |
| A list of keys in the output of your model (if it is a dictionary) that should be ignored when | |
| gathering predictions for evaluation during the training.0`~trainer_utils.TrainOutput`Object containing the global step count, training loss, and metrics. | |
| Main training entry point. | |
| **Parameters:** | |
| resume_from_checkpoint (`str` or `bool`, *optional*) : If a `str`, local path to a saved checkpoint as saved by a previous instance of `Trainer`. If a `bool` and equals `True`, load the last checkpoint in *args.output_dir* as saved by a previous instance of `Trainer`. If present, training will resume from the model/optimizer/scheduler states loaded here. | |
| trial (`optuna.Trial` or `dict[str, Any]`, *optional*) : The trial run or the hyperparameter dictionary for hyperparameter search. | |
| ignore_keys_for_eval (`list[str]`, *optional*) : A list of keys in the output of your model (if it is a dictionary) that should be ignored when gathering predictions for evaluation during the training. | |
| **Returns:** | |
| ``~trainer_utils.TrainOutput`` | |
| Object containing the global step count, training loss, and metrics. | |
| #### save_model[[trl.experimental.gfpo.GFPOTrainer.save_model]] | |
| [Source](https://github.com/huggingface/trl/blob/vr_5607/transformers/trainer.py#L3746) | |
| Will save the model, so you can reload it using `from_pretrained()`. | |
| Will only save from the main process. | |
| #### push_to_hub[[trl.experimental.gfpo.GFPOTrainer.push_to_hub]] | |
| [Source](https://github.com/huggingface/trl/blob/vr_5607/transformers/trainer.py#L3993) | |
| Upload `self.model` and `self.processing_class` to the 🤗 model hub on the repo `self.args.hub_model_id`. | |
| **Parameters:** | |
| commit_message (`str`, *optional*, defaults to `"End of training"`) : Message to commit while pushing. | |
| blocking (`bool`, *optional*, defaults to `True`) : Whether the function should return only when the `git push` has finished. | |
| token (`str`, *optional*, defaults to `None`) : Token with write permission to overwrite Trainer's original args. | |
| revision (`str`, *optional*) : The git revision to commit from. Defaults to the head of the "main" branch. | |
| kwargs (`dict[str, Any]`, *optional*) : Additional keyword arguments passed along to `~Trainer.create_model_card`. | |
| **Returns:** | |
| The URL of the repository where the model was pushed if `blocking=False`, or a `Future` object tracking the | |
| progress of the commit if `blocking=True`. | |
| ## GFPOConfig[[trl.experimental.gfpo.GFPOConfig]] | |
| #### trl.experimental.gfpo.GFPOConfig[[trl.experimental.gfpo.GFPOConfig]] | |
| [Source](https://github.com/huggingface/trl/blob/vr_5607/trl/experimental/gfpo/gfpo_config.py#L21) | |
Xet Storage Details
- Size:
- 5.24 kB
- Xet hash:
- 56dce188b8619a6249e00b1e7cb0b737a61a1005437fe0e8830c2d5e06aa2fca
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.