Buckets:

HuggingFaceDocBuilder's picture
|
download
raw
5.24 kB

GFPO

This feature implements the GFPO algorithm to enforce concise reasoning in the model's output generation, as proposed in the paper Sample More to Think Less: Group Filtered Policy Optimization for Concise Reasoning.

Usage

To activate GFPO in GFPOTrainer:

  • set num_remains_in_group in GFPOConfig
  • define a group filter function and set it to group_filter_func in GFPOTrainer. group_filter_func will score the num_generations completions and The GFPOTrainer filters groups according to their scores to get top num_remains_in_group completions as a new group. Model will be trained on the filtered group.
# train_gfpo.py
from trl.experimental.gfpo import GFPOConfig, GFPOTrainer

# dummy group filter to scores the completions based on its indice in group
class GroupFilter:
    def __call__(self, group_completions, group_rewards, **kwargs):
        group_scores = []
        for completions, rewards in zip(group_completions, group_rewards):
            scores = [float(i) for i in range(len(completions))]
            group_scores.append(scores)
        return group_scores

training_args = GFPOConfig(
    output_dir="Qwen3-0.6B-GFPO",
    per_device_train_batch_size=4,
    num_remains_in_group=2,
    bf16=True,
)
trainer = GFPOTrainer(
    model="Qwen/Qwen3-0.6B",
    reward_funcs=...,
    train_dataset=...,
    args=training_args,
    group_filter_func=GroupFilter(),
)
trainer.train()

GFPOTrainer[[trl.experimental.gfpo.GFPOTrainer]]

trl.experimental.gfpo.GFPOTrainer[[trl.experimental.gfpo.GFPOTrainer]]

Source

traintrl.experimental.gfpo.GFPOTrainer.trainhttps://github.com/huggingface/trl/blob/vr_5607/transformers/trainer.py#L1323[{"name": "resume_from_checkpoint", "val": ": str | bool | None = None"}, {"name": "trial", "val": ": optuna.Trial | dict[str, Any] | None = None"}, {"name": "ignore_keys_for_eval", "val": ": list[str] | None = None"}]- resume_from_checkpoint (str or bool, optional) -- If a str, local path to a saved checkpoint as saved by a previous instance of Trainer. If a bool and equals True, load the last checkpoint in args.output_dir as saved by a previous instance of Trainer. If present, training will resume from the model/optimizer/scheduler states loaded here.

  • trial (optuna.Trial or dict[str, Any], optional) -- The trial run or the hyperparameter dictionary for hyperparameter search.
  • ignore_keys_for_eval (list[str], optional) -- A list of keys in the output of your model (if it is a dictionary) that should be ignored when gathering predictions for evaluation during the training.0~trainer_utils.TrainOutputObject containing the global step count, training loss, and metrics.

Main training entry point.

Parameters:

resume_from_checkpoint (str or bool, optional) : If a str, local path to a saved checkpoint as saved by a previous instance of Trainer. If a bool and equals True, load the last checkpoint in args.output_dir as saved by a previous instance of Trainer. If present, training will resume from the model/optimizer/scheduler states loaded here.

trial (optuna.Trial or dict[str, Any], optional) : The trial run or the hyperparameter dictionary for hyperparameter search.

ignore_keys_for_eval (list[str], optional) : A list of keys in the output of your model (if it is a dictionary) that should be ignored when gathering predictions for evaluation during the training.

Returns:

~trainer_utils.TrainOutput

Object containing the global step count, training loss, and metrics.

save_model[[trl.experimental.gfpo.GFPOTrainer.save_model]]

Source

Will save the model, so you can reload it using from_pretrained().

Will only save from the main process.

push_to_hub[[trl.experimental.gfpo.GFPOTrainer.push_to_hub]]

Source

Upload self.model and self.processing_class to the 🤗 model hub on the repo self.args.hub_model_id.

Parameters:

commit_message (str, optional, defaults to "End of training") : Message to commit while pushing.

blocking (bool, optional, defaults to True) : Whether the function should return only when the git push has finished.

token (str, optional, defaults to None) : Token with write permission to overwrite Trainer's original args.

revision (str, optional) : The git revision to commit from. Defaults to the head of the "main" branch.

kwargs (dict[str, Any], optional) : Additional keyword arguments passed along to ~Trainer.create_model_card.

Returns:

The URL of the repository where the model was pushed if blocking=False, or a Future object tracking the progress of the commit if blocking=True.

GFPOConfig[[trl.experimental.gfpo.GFPOConfig]]

trl.experimental.gfpo.GFPOConfig[[trl.experimental.gfpo.GFPOConfig]]

Source

Xet Storage Details

Size:
5.24 kB
·
Xet hash:
56dce188b8619a6249e00b1e7cb0b737a61a1005437fe0e8830c2d5e06aa2fca

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.