Buckets:

hf-doc-build
/

doc-dev

Files

xet

hf-doc-build/doc-dev / trl /pr_5607 /en /gfpo.md

HuggingFaceDocBuilder

about 2 months ago

preview code

download

raw

5.24 kB

	# GFPO

	This feature implements the GFPO algorithm to enforce concise reasoning in the model's output generation, as proposed in the paper [Sample More to Think Less: Group Filtered Policy Optimization for Concise Reasoning](https://huggingface.co/papers/2508.09726).

	## Usage

	To activate GFPO in `GFPOTrainer`:

	- set `num_remains_in_group` in `GFPOConfig`
	- define a group filter function and set it to `group_filter_func` in `GFPOTrainer`. `group_filter_func` will score the `num_generations` completions and The GFPOTrainer filters groups according to their scores to get top `num_remains_in_group` completions as a new group. Model will be trained on the filtered group.

	```python
	# train_gfpo.py
	from trl.experimental.gfpo import GFPOConfig, GFPOTrainer

	# dummy group filter to scores the completions based on its indice in group
	class GroupFilter:
	def __call__(self, group_completions, group_rewards, **kwargs):
	group_scores = []
	for completions, rewards in zip(group_completions, group_rewards):
	scores = [float(i) for i in range(len(completions))]
	group_scores.append(scores)
	return group_scores

	training_args = GFPOConfig(
	output_dir="Qwen3-0.6B-GFPO",
	per_device_train_batch_size=4,
	num_remains_in_group=2,
	bf16=True,
	)
	trainer = GFPOTrainer(
	model="Qwen/Qwen3-0.6B",
	reward_funcs=...,
	train_dataset=...,
	args=training_args,
	group_filter_func=GroupFilter(),
	)
	trainer.train()
	```

	## GFPOTrainer[[trl.experimental.gfpo.GFPOTrainer]]

	#### trl.experimental.gfpo.GFPOTrainer[[trl.experimental.gfpo.GFPOTrainer]]

	[Source](https://github.com/huggingface/trl/blob/vr_5607/trl/experimental/gfpo/gfpo_trainer.py#L33)

	traintrl.experimental.gfpo.GFPOTrainer.trainhttps://github.com/huggingface/trl/blob/vr_5607/transformers/trainer.py#L1323[{"name": "resume_from_checkpoint", "val": ": str \| bool \| None = None"}, {"name": "trial", "val": ": optuna.Trial \| dict[str, Any] \| None = None"}, {"name": "ignore_keys_for_eval", "val": ": list[str] \| None = None"}]- resume_from_checkpoint (`str` or `bool`, optional) --
	If a `str`, local path to a saved checkpoint as saved by a previous instance of `Trainer`. If a
	`bool` and equals `True`, load the last checkpoint in args.output_dir as saved by a previous instance
	of `Trainer`. If present, training will resume from the model/optimizer/scheduler states loaded here.
	- trial (`optuna.Trial` or `dict[str, Any]`, optional) --
	The trial run or the hyperparameter dictionary for hyperparameter search.
	- ignore_keys_for_eval (`list[str]`, optional) --
	A list of keys in the output of your model (if it is a dictionary) that should be ignored when
	gathering predictions for evaluation during the training.0`~trainer_utils.TrainOutput`Object containing the global step count, training loss, and metrics.

	Main training entry point.

	Parameters:

	resume_from_checkpoint (`str` or `bool`, optional) : If a `str`, local path to a saved checkpoint as saved by a previous instance of `Trainer`. If a `bool` and equals `True`, load the last checkpoint in args.output_dir as saved by a previous instance of `Trainer`. If present, training will resume from the model/optimizer/scheduler states loaded here.

	trial (`optuna.Trial` or `dict[str, Any]`, optional) : The trial run or the hyperparameter dictionary for hyperparameter search.

	ignore_keys_for_eval (`list[str]`, optional) : A list of keys in the output of your model (if it is a dictionary) that should be ignored when gathering predictions for evaluation during the training.

	Returns:

	``~trainer_utils.TrainOutput``

	Object containing the global step count, training loss, and metrics.
	#### save_model[[trl.experimental.gfpo.GFPOTrainer.save_model]]

	[Source](https://github.com/huggingface/trl/blob/vr_5607/transformers/trainer.py#L3746)

	Will save the model, so you can reload it using `from_pretrained()`.

	Will only save from the main process.
	#### push_to_hub[[trl.experimental.gfpo.GFPOTrainer.push_to_hub]]

	[Source](https://github.com/huggingface/trl/blob/vr_5607/transformers/trainer.py#L3993)

	Upload `self.model` and `self.processing_class` to the 🤗 model hub on the repo `self.args.hub_model_id`.

	Parameters:

	commit_message (`str`, optional, defaults to `"End of training"`) : Message to commit while pushing.

	blocking (`bool`, optional, defaults to `True`) : Whether the function should return only when the `git push` has finished.

	token (`str`, optional, defaults to `None`) : Token with write permission to overwrite Trainer's original args.

	revision (`str`, optional) : The git revision to commit from. Defaults to the head of the "main" branch.

	kwargs (`dict[str, Any]`, optional) : Additional keyword arguments passed along to `~Trainer.create_model_card`.

	Returns:

	The URL of the repository where the model was pushed if `blocking=False`, or a `Future` object tracking the
	progress of the commit if `blocking=True`.

	## GFPOConfig[[trl.experimental.gfpo.GFPOConfig]]

	#### trl.experimental.gfpo.GFPOConfig[[trl.experimental.gfpo.GFPOConfig]]

	[Source](https://github.com/huggingface/trl/blob/vr_5607/trl/experimental/gfpo/gfpo_config.py#L21)

Xet Storage Details

Size:: 5.24 kB
Xet hash:: 56dce188b8619a6249e00b1e7cb0b737a61a1005437fe0e8830c2d5e06aa2fca

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.