Buckets:
GSPO-token
In the paper Group Sequence Policy Optimization, the authors propose a token-level objective variant to GSPO, called GSPO-token. To use GSPO-token, you can use the GRPOTrainer class in trl.experimental.gspo_token.
Usage
from trl.experimental.gspo_token import GRPOTrainer
from trl import GRPOConfig
training_args = GRPOConfig(
importance_sampling_level="sequence_token",
...
)
To leverage GSPO-token, the user will need to provide the per-token advantage for each token in the sequence (i.e., make varies with —which isn't the case here, ). Otherwise, GSPO-Token gradient is just equivalent to the original GSPO implementation.
GRPOTrainer[[trl.GRPOTrainer]]
trl.GRPOTrainer[[trl.GRPOTrainer]]
traintrl.GRPOTrainer.trainhttps://github.com/huggingface/trl/blob/vr_5607/transformers/trainer.py#L1323[{"name": "resume_from_checkpoint", "val": ": str | bool | None = None"}, {"name": "trial", "val": ": optuna.Trial | dict[str, Any] | None = None"}, {"name": "ignore_keys_for_eval", "val": ": list[str] | None = None"}]- resume_from_checkpoint (str or bool, optional) --
If a str, local path to a saved checkpoint as saved by a previous instance of Trainer. If a
bool and equals True, load the last checkpoint in args.output_dir as saved by a previous instance
of Trainer. If present, training will resume from the model/optimizer/scheduler states loaded here.
- trial (
optuna.Trialordict[str, Any], optional) -- The trial run or the hyperparameter dictionary for hyperparameter search. - ignore_keys_for_eval (
list[str], optional) -- A list of keys in the output of your model (if it is a dictionary) that should be ignored when gathering predictions for evaluation during the training.0~trainer_utils.TrainOutputObject containing the global step count, training loss, and metrics.
Main training entry point.
Parameters:
resume_from_checkpoint (str or bool, optional) : If a str, local path to a saved checkpoint as saved by a previous instance of Trainer. If a bool and equals True, load the last checkpoint in args.output_dir as saved by a previous instance of Trainer. If present, training will resume from the model/optimizer/scheduler states loaded here.
trial (optuna.Trial or dict[str, Any], optional) : The trial run or the hyperparameter dictionary for hyperparameter search.
ignore_keys_for_eval (list[str], optional) : A list of keys in the output of your model (if it is a dictionary) that should be ignored when gathering predictions for evaluation during the training.
Returns:
~trainer_utils.TrainOutput
Object containing the global step count, training loss, and metrics.
save_model[[trl.GRPOTrainer.save_model]]
Will save the model, so you can reload it using from_pretrained().
Will only save from the main process.
push_to_hub[[trl.GRPOTrainer.push_to_hub]]
Upload self.model and self.processing_class to the 🤗 model hub on the repo self.args.hub_model_id.
Parameters:
commit_message (str, optional, defaults to "End of training") : Message to commit while pushing.
blocking (bool, optional, defaults to True) : Whether the function should return only when the git push has finished.
token (str, optional, defaults to None) : Token with write permission to overwrite Trainer's original args.
revision (str, optional) : The git revision to commit from. Defaults to the head of the "main" branch.
kwargs (dict[str, Any], optional) : Additional keyword arguments passed along to ~Trainer.create_model_card.
Returns:
The URL of the repository where the model was pushed if blocking=False, or a Future object tracking the
progress of the commit if blocking=True.
Xet Storage Details
- Size:
- 4.27 kB
- Xet hash:
- 716a69c4af29a77ccd4ca66c31ee3f856d94532af94186d51359e6299e067d67
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.