Buckets:
| # BCO Trainer | |
| [](https://huggingface.co/models?other=bco,trl) | |
| TRL supports the Binary Classifier Optimization (BCO). | |
| The [BCO](https://huggingface.co/papers/2404.04656) authors train a binary classifier whose logit serves as a reward so that the classifier maps {prompt, chosen completion} pairs to 1 and {prompt, rejected completion} pairs to 0. | |
| For a full example have a look at `examples/scripts/bco.py`. | |
| ## Expected dataset type | |
| The [experimental.bco.BCOTrainer](/docs/trl/pr_4331/en/bco_trainer#trl.BCOTrainer) requires an [unpaired preference dataset](dataset_formats#unpaired-preference). | |
| The [experimental.bco.BCOTrainer](/docs/trl/pr_4331/en/bco_trainer#trl.BCOTrainer) supports both [conversational](dataset_formats#conversational) and [standard](dataset_formats#standard) dataset formats. When provided with a conversational dataset, the trainer will automatically apply the chat template to the dataset. | |
| ## Expected model format | |
| The BCO trainer expects a model of `AutoModelForCausalLM`, compared to PPO that expects `AutoModelForCausalLMWithValueHead` for the value function. | |
| ## Using the `BCOTrainer` | |
| For a detailed example have a look at the `examples/scripts/bco.py` script. At a high level we need to initialize the `BCOTrainer` with a `model` we wish to train and a reference `ref_model` which we will use to calculate the implicit rewards of the preferred and rejected response. | |
| The `beta` refers to the hyperparameter of the implicit reward, and the dataset contains the 3 entries listed above. Note that the `model` and `ref_model` need to have the same architecture (ie decoder only or encoder-decoder). | |
| ```python | |
| training_args = BCOConfig( | |
| beta=0.1, | |
| ) | |
| bco_trainer = BCOTrainer( | |
| model, | |
| model_ref, | |
| args=training_args, | |
| train_dataset=train_dataset, | |
| processing_class=tokenizer, | |
| ) | |
| ``` | |
| After this one can then call: | |
| ```python | |
| bco_trainer.train() | |
| ``` | |
| ## Underlying Distribution matching (UDM) | |
| In practical scenarios, the thumbs-up and thumbs-down datasets are likely to have divergent underlying distributions of prompts. | |
| Consider an LLM deployed for user feedback: if the model excels in writing tasks but underperforms in coding, the thumbs-up dataset will be dominated by writing-related prompts, while the thumbs-down dataset will contain mostly coding-related prompts. | |
| If the prompts in your desired and undesired datasets differ a lot, it is useful to enable UDM. | |
| Choose an embedding model and tokenizer: | |
| ```python | |
| embedding_model = AutoModel.from_pretrained(your_model_id) | |
| embedding_tokenizer = AutoTokenizer.from_pretrained(your_model_id) | |
| # customize this function depending on your embedding model | |
| def embed_prompt(input_ids, attention_mask, model): | |
| outputs = model(input_ids=input_ids, attention_mask=attention_mask) | |
| return outputs.last_hidden_state.mean(dim=1) | |
| embedding_model = Accelerator().prepare_model(self.embedding_model) | |
| embedding_func = partial(embed_prompt, model=embedding_model) | |
| ``` | |
| Set `prompt_sample_size` to define how many prompts are selected to train the UDM classifier and start the training with the provided embedding function: | |
| ```python | |
| training_args = BCOConfig( | |
| beta=0.1, | |
| prompt_sample_size=512, | |
| ) | |
| bco_trainer = BCOTrainer( | |
| model, | |
| model_ref, | |
| args=training_args, | |
| train_dataset=train_dataset, | |
| processing_class=tokenizer, | |
| embedding_func=embedding_func, | |
| embedding_tokenizer=self.embedding_tokenizer, | |
| ) | |
| bco_trainer.train() | |
| ``` | |
| ### For Mixture of Experts Models: Enabling the auxiliary loss | |
| MOEs are the most efficient if the load is about equally distributed between experts. | |
| To ensure that we train MOEs similarly during preference-tuning, it is beneficial to add the auxiliary loss from the load balancer to the final loss. | |
| This option is enabled by setting `output_router_logits=True` in the model config (e.g. MixtralConfig). | |
| To scale how much the auxiliary loss contributes to the total loss, use the hyperparameter `router_aux_loss_coef=...` (default: 0.001). | |
| ## BCOTrainer[[trl.BCOTrainer]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class trl.BCOTrainer</name><anchor>trl.BCOTrainer</anchor><source>https://github.com/huggingface/trl/blob/vr_4331/trl/experimental/bco/bco_trainer.py#L284</source><parameters>[{"name": "model", "val": ": transformers.modeling_utils.PreTrainedModel | torch.nn.modules.module.Module | str = None"}, {"name": "ref_model", "val": ": transformers.modeling_utils.PreTrainedModel | torch.nn.modules.module.Module | str | None = None"}, {"name": "args", "val": ": BCOConfig = None"}, {"name": "train_dataset", "val": ": datasets.arrow_dataset.Dataset | None = None"}, {"name": "eval_dataset", "val": ": datasets.arrow_dataset.Dataset | dict[str, datasets.arrow_dataset.Dataset] | None = None"}, {"name": "processing_class", "val": ": transformers.tokenization_utils_base.PreTrainedTokenizerBase | transformers.image_processing_utils.BaseImageProcessor | transformers.feature_extraction_utils.FeatureExtractionMixin | transformers.processing_utils.ProcessorMixin | None = None"}, {"name": "data_collator", "val": ": typing.Optional[typing.Callable[[list[typing.Any]], dict[str, typing.Any]]] = None"}, {"name": "model_init", "val": ": collections.abc.Callable[[], transformers.modeling_utils.PreTrainedModel] | None = None"}, {"name": "callbacks", "val": ": list[transformers.trainer_callback.TrainerCallback] | None = None"}, {"name": "optimizers", "val": ": tuple = (None, None)"}, {"name": "preprocess_logits_for_metrics", "val": ": collections.abc.Callable[[torch.Tensor, torch.Tensor], torch.Tensor] | None = None"}, {"name": "peft_config", "val": ": dict | None = None"}, {"name": "compute_metrics", "val": ": collections.abc.Callable[[transformers.trainer_utils.EvalLoopOutput], dict] | None = None"}, {"name": "model_adapter_name", "val": ": str | None = None"}, {"name": "ref_adapter_name", "val": ": str | None = None"}, {"name": "embedding_func", "val": ": collections.abc.Callable | None = None"}, {"name": "embedding_tokenizer", "val": ": transformers.tokenization_utils_base.PreTrainedTokenizerBase | None = None"}]</parameters><paramsdesc>- **model** ([PreTrainedModel](https://huggingface.co/docs/transformers/main/en/main_classes/model#transformers.PreTrainedModel)) -- | |
| The model to train, preferably an [AutoModelForSequenceClassification](https://huggingface.co/docs/transformers/main/en/model_doc/auto#transformers.AutoModelForSequenceClassification). | |
| - **ref_model** ([PreTrainedModelWrapper](/docs/trl/pr_4331/en/models#trl.PreTrainedModelWrapper)) -- | |
| Hugging Face transformer model with a casual language modelling head. Used for implicit reward computation | |
| and loss. If no reference model is provided, the trainer will create a reference model with the same | |
| architecture as the model to be optimized. | |
| - **args** ([BCOConfig](/docs/trl/pr_4331/en/bco_trainer#trl.BCOConfig)) -- | |
| The arguments to use for training. | |
| - **train_dataset** ([Dataset](https://huggingface.co/docs/datasets/main/en/package_reference/main_classes#datasets.Dataset)) -- | |
| The dataset to use for training. | |
| - **eval_dataset** ([Dataset](https://huggingface.co/docs/datasets/main/en/package_reference/main_classes#datasets.Dataset)) -- | |
| The dataset to use for evaluation. | |
| - **processing_class** ([PreTrainedTokenizerBase](https://huggingface.co/docs/transformers/main/en/internal/tokenization_utils#transformers.PreTrainedTokenizerBase), [BaseImageProcessor](https://huggingface.co/docs/transformers/main/en/main_classes/image_processor#transformers.BaseImageProcessor), [FeatureExtractionMixin](https://huggingface.co/docs/transformers/main/en/main_classes/feature_extractor#transformers.FeatureExtractionMixin) or [ProcessorMixin](https://huggingface.co/docs/transformers/main/en/main_classes/processors#transformers.ProcessorMixin), *optional*) -- | |
| Processing class used to process the data. If provided, will be used to automatically process the inputs | |
| for the model, and it will be saved along the model to make it easier to rerun an interrupted training or | |
| reuse the fine-tuned model. | |
| - **data_collator** (`DataCollator`, *optional*) -- | |
| The data collator to use for training. If None is specified, the default data collator | |
| (`DPODataCollatorWithPadding`) will be used which will pad the sequences to the maximum length of the | |
| sequences in the batch, given a dataset of paired sequences. | |
| - **model_init** (`Callable[[], transformers.PreTrainedModel]`) -- | |
| The model initializer to use for training. If None is specified, the default model initializer will be | |
| used. | |
| - **callbacks** (`list[transformers.TrainerCallback]`) -- | |
| The callbacks to use for training. | |
| - **optimizers** (`tuple[torch.optim.Optimizer, torch.optim.lr_scheduler.LambdaLR]`) -- | |
| The optimizer and scheduler to use for training. | |
| - **preprocess_logits_for_metrics** (`Callable[[torch.Tensor, torch.Tensor], torch.Tensor]`) -- | |
| The function to use to preprocess the logits before computing the metrics. | |
| - **peft_config** (`dict`, defaults to `None`) -- | |
| The PEFT configuration to use for training. If you pass a PEFT configuration, the model will be wrapped in | |
| a PEFT model. | |
| - **compute_metrics** (`Callable[[EvalPrediction], dict]`, *optional*) -- | |
| The function to use to compute the metrics. Must take a `EvalPrediction` and return a dictionary string to | |
| metric values. | |
| - **model_adapter_name** (`str`, defaults to `None`) -- | |
| Name of the train target PEFT adapter, when using LoRA with multiple adapters. | |
| - **ref_adapter_name** (`str`, defaults to `None`) -- | |
| Name of the reference PEFT adapter, when using LoRA with multiple adapters.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Initialize BCOTrainer from [BCO](https://huggingface.co/papers/2404.04656) paper. | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>train</name><anchor>trl.BCOTrainer.train</anchor><source>https://github.com/huggingface/trl/blob/vr_4331/transformers/trainer.py#L2213</source><parameters>[{"name": "resume_from_checkpoint", "val": ": typing.Union[str, bool, NoneType] = None"}, {"name": "trial", "val": ": typing.Union[ForwardRef('optuna.Trial'), dict[str, typing.Any], NoneType] = None"}, {"name": "ignore_keys_for_eval", "val": ": typing.Optional[list[str]] = None"}, {"name": "**kwargs", "val": ": typing.Any"}]</parameters><paramsdesc>- **resume_from_checkpoint** (`str` or `bool`, *optional*) -- | |
| If a `str`, local path to a saved checkpoint as saved by a previous instance of `Trainer`. If a | |
| `bool` and equals `True`, load the last checkpoint in *args.output_dir* as saved by a previous instance | |
| of `Trainer`. If present, training will resume from the model/optimizer/scheduler states loaded here. | |
| - **trial** (`optuna.Trial` or `dict[str, Any]`, *optional*) -- | |
| The trial run or the hyperparameter dictionary for hyperparameter search. | |
| - **ignore_keys_for_eval** (`list[str]`, *optional*) -- | |
| A list of keys in the output of your model (if it is a dictionary) that should be ignored when | |
| gathering predictions for evaluation during the training. | |
| - **kwargs** (`dict[str, Any]`, *optional*) -- | |
| Additional keyword arguments used to hide deprecated arguments</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Main training entry point. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>save_model</name><anchor>trl.BCOTrainer.save_model</anchor><source>https://github.com/huggingface/trl/blob/vr_4331/transformers/trainer.py#L4177</source><parameters>[{"name": "output_dir", "val": ": typing.Optional[str] = None"}, {"name": "_internal_call", "val": ": bool = False"}]</parameters></docstring> | |
| Will save the model, so you can reload it using `from_pretrained()`. | |
| Will only save from the main process. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>push_to_hub</name><anchor>trl.BCOTrainer.push_to_hub</anchor><source>https://github.com/huggingface/trl/blob/vr_4331/transformers/trainer.py#L5117</source><parameters>[{"name": "commit_message", "val": ": typing.Optional[str] = 'End of training'"}, {"name": "blocking", "val": ": bool = True"}, {"name": "token", "val": ": typing.Optional[str] = None"}, {"name": "revision", "val": ": typing.Optional[str] = None"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **commit_message** (`str`, *optional*, defaults to `"End of training"`) -- | |
| Message to commit while pushing. | |
| - **blocking** (`bool`, *optional*, defaults to `True`) -- | |
| Whether the function should return only when the `git push` has finished. | |
| - **token** (`str`, *optional*, defaults to `None`) -- | |
| Token with write permission to overwrite Trainer's original args. | |
| - **revision** (`str`, *optional*) -- | |
| The git revision to commit from. Defaults to the head of the "main" branch. | |
| - **kwargs** (`dict[str, Any]`, *optional*) -- | |
| Additional keyword arguments passed along to `~Trainer.create_model_card`.</paramsdesc><paramgroups>0</paramgroups><retdesc>The URL of the repository where the model was pushed if `blocking=False`, or a `Future` object tracking the | |
| progress of the commit if `blocking=True`.</retdesc></docstring> | |
| Upload `self.model` and `self.processing_class` to the 🤗 model hub on the repo `self.args.hub_model_id`. | |
| </div></div> | |
| ## BCOConfig[[trl.BCOConfig]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class trl.BCOConfig</name><anchor>trl.BCOConfig</anchor><source>https://github.com/huggingface/trl/blob/vr_4331/trl/experimental/bco/bco_config.py#L22</source><parameters>[{"name": "output_dir", "val": ": typing.Optional[str] = None"}, {"name": "overwrite_output_dir", "val": ": bool = False"}, {"name": "do_train", "val": ": bool = False"}, {"name": "do_eval", "val": ": bool = False"}, {"name": "do_predict", "val": ": bool = False"}, {"name": "eval_strategy", "val": ": typing.Union[transformers.trainer_utils.IntervalStrategy, str] = 'no'"}, {"name": "prediction_loss_only", "val": ": bool = False"}, {"name": "per_device_train_batch_size", "val": ": int = 8"}, {"name": "per_device_eval_batch_size", "val": ": int = 8"}, {"name": "per_gpu_train_batch_size", "val": ": typing.Optional[int] = None"}, {"name": "per_gpu_eval_batch_size", "val": ": typing.Optional[int] = None"}, {"name": "gradient_accumulation_steps", "val": ": int = 1"}, {"name": "eval_accumulation_steps", "val": ": typing.Optional[int] = None"}, {"name": "eval_delay", "val": ": float = 0"}, {"name": "torch_empty_cache_steps", "val": ": typing.Optional[int] = None"}, {"name": "learning_rate", "val": ": float = 5e-05"}, {"name": "weight_decay", "val": ": float = 0.0"}, {"name": "adam_beta1", "val": ": float = 0.9"}, {"name": "adam_beta2", "val": ": float = 0.999"}, {"name": "adam_epsilon", "val": ": float = 1e-08"}, {"name": "max_grad_norm", "val": ": float = 1.0"}, {"name": "num_train_epochs", "val": ": float = 3.0"}, {"name": "max_steps", "val": ": int = -1"}, {"name": "lr_scheduler_type", "val": ": typing.Union[transformers.trainer_utils.SchedulerType, str] = 'linear'"}, {"name": "lr_scheduler_kwargs", "val": ": typing.Union[dict[str, typing.Any], str] = <factory>"}, {"name": "warmup_ratio", "val": ": float = 0.0"}, {"name": "warmup_steps", "val": ": int = 0"}, {"name": "log_level", "val": ": str = 'passive'"}, {"name": "log_level_replica", "val": ": str = 'warning'"}, {"name": "log_on_each_node", "val": ": bool = True"}, {"name": "logging_dir", "val": ": typing.Optional[str] = None"}, {"name": "logging_strategy", "val": ": typing.Union[transformers.trainer_utils.IntervalStrategy, str] = 'steps'"}, {"name": "logging_first_step", "val": ": bool = False"}, {"name": "logging_steps", "val": ": float = 10"}, {"name": "logging_nan_inf_filter", "val": ": bool = True"}, {"name": "save_strategy", "val": ": typing.Union[transformers.trainer_utils.SaveStrategy, str] = 'steps'"}, {"name": "save_steps", "val": ": float = 500"}, {"name": "save_total_limit", "val": ": typing.Optional[int] = None"}, {"name": "save_safetensors", "val": ": bool = True"}, {"name": "save_on_each_node", "val": ": bool = False"}, {"name": "save_only_model", "val": ": bool = False"}, {"name": "restore_callback_states_from_checkpoint", "val": ": bool = False"}, {"name": "no_cuda", "val": ": bool = False"}, {"name": "use_cpu", "val": ": bool = False"}, {"name": "use_mps_device", "val": ": bool = False"}, {"name": "seed", "val": ": int = 42"}, {"name": "data_seed", "val": ": typing.Optional[int] = None"}, {"name": "jit_mode_eval", "val": ": bool = False"}, {"name": "bf16", "val": ": bool | None = None"}, {"name": "fp16", "val": ": bool = False"}, {"name": "fp16_opt_level", "val": ": str = 'O1'"}, {"name": "half_precision_backend", "val": ": str = 'auto'"}, {"name": "bf16_full_eval", "val": ": bool = False"}, {"name": "fp16_full_eval", "val": ": bool = False"}, {"name": "tf32", "val": ": typing.Optional[bool] = None"}, {"name": "local_rank", "val": ": int = -1"}, {"name": "ddp_backend", "val": ": typing.Optional[str] = None"}, {"name": "tpu_num_cores", "val": ": typing.Optional[int] = None"}, {"name": "tpu_metrics_debug", "val": ": bool = False"}, {"name": "debug", "val": ": typing.Union[str, list[transformers.debug_utils.DebugOption]] = ''"}, {"name": "dataloader_drop_last", "val": ": bool = False"}, {"name": "eval_steps", "val": ": typing.Optional[float] = None"}, {"name": "dataloader_num_workers", "val": ": int = 0"}, {"name": "dataloader_prefetch_factor", "val": ": typing.Optional[int] = None"}, {"name": "past_index", "val": ": int = -1"}, {"name": "run_name", "val": ": typing.Optional[str] = None"}, {"name": "disable_tqdm", "val": ": typing.Optional[bool] = None"}, {"name": "remove_unused_columns", "val": ": bool = True"}, {"name": "label_names", "val": ": typing.Optional[list[str]] = None"}, {"name": "load_best_model_at_end", "val": ": bool = False"}, {"name": "metric_for_best_model", "val": ": typing.Optional[str] = None"}, {"name": "greater_is_better", "val": ": typing.Optional[bool] = None"}, {"name": "ignore_data_skip", "val": ": bool = False"}, {"name": "fsdp", "val": ": typing.Union[list[transformers.trainer_utils.FSDPOption], str, NoneType] = None"}, {"name": "fsdp_min_num_params", "val": ": int = 0"}, {"name": "fsdp_config", "val": ": typing.Union[dict[str, typing.Any], str, NoneType] = None"}, {"name": "fsdp_transformer_layer_cls_to_wrap", "val": ": typing.Optional[str] = None"}, {"name": "accelerator_config", "val": ": typing.Union[dict, str, NoneType] = None"}, {"name": "parallelism_config", "val": ": typing.Optional[accelerate.parallelism_config.ParallelismConfig] = None"}, {"name": "deepspeed", "val": ": typing.Union[dict, str, NoneType] = None"}, {"name": "label_smoothing_factor", "val": ": float = 0.0"}, {"name": "optim", "val": ": typing.Union[transformers.training_args.OptimizerNames, str] = 'adamw_torch_fused'"}, {"name": "optim_args", "val": ": typing.Optional[str] = None"}, {"name": "adafactor", "val": ": bool = False"}, {"name": "group_by_length", "val": ": bool = False"}, {"name": "length_column_name", "val": ": str = 'length'"}, {"name": "report_to", "val": ": typing.Union[NoneType, str, list[str]] = None"}, {"name": "project", "val": ": str = 'huggingface'"}, {"name": "trackio_space_id", "val": ": typing.Optional[str] = 'trackio'"}, {"name": "ddp_find_unused_parameters", "val": ": typing.Optional[bool] = None"}, {"name": "ddp_bucket_cap_mb", "val": ": typing.Optional[int] = None"}, {"name": "ddp_broadcast_buffers", "val": ": typing.Optional[bool] = None"}, {"name": "dataloader_pin_memory", "val": ": bool = True"}, {"name": "dataloader_persistent_workers", "val": ": bool = False"}, {"name": "skip_memory_metrics", "val": ": bool = True"}, {"name": "use_legacy_prediction_loop", "val": ": bool = False"}, {"name": "push_to_hub", "val": ": bool = False"}, {"name": "resume_from_checkpoint", "val": ": typing.Optional[str] = None"}, {"name": "hub_model_id", "val": ": typing.Optional[str] = None"}, {"name": "hub_strategy", "val": ": typing.Union[transformers.trainer_utils.HubStrategy, str] = 'every_save'"}, {"name": "hub_token", "val": ": typing.Optional[str] = None"}, {"name": "hub_private_repo", "val": ": typing.Optional[bool] = None"}, {"name": "hub_always_push", "val": ": bool = False"}, {"name": "hub_revision", "val": ": typing.Optional[str] = None"}, {"name": "gradient_checkpointing", "val": ": bool = True"}, {"name": "gradient_checkpointing_kwargs", "val": ": typing.Union[dict[str, typing.Any], str, NoneType] = None"}, {"name": "include_inputs_for_metrics", "val": ": bool = False"}, {"name": "include_for_metrics", "val": ": list = <factory>"}, {"name": "eval_do_concat_batches", "val": ": bool = True"}, {"name": "fp16_backend", "val": ": str = 'auto'"}, {"name": "push_to_hub_model_id", "val": ": typing.Optional[str] = None"}, {"name": "push_to_hub_organization", "val": ": typing.Optional[str] = None"}, {"name": "push_to_hub_token", "val": ": typing.Optional[str] = None"}, {"name": "mp_parameters", "val": ": str = ''"}, {"name": "auto_find_batch_size", "val": ": bool = False"}, {"name": "full_determinism", "val": ": bool = False"}, {"name": "torchdynamo", "val": ": typing.Optional[str] = None"}, {"name": "ray_scope", "val": ": typing.Optional[str] = 'last'"}, {"name": "ddp_timeout", "val": ": int = 1800"}, {"name": "torch_compile", "val": ": bool = False"}, {"name": "torch_compile_backend", "val": ": typing.Optional[str] = None"}, {"name": "torch_compile_mode", "val": ": typing.Optional[str] = None"}, {"name": "include_tokens_per_second", "val": ": bool = False"}, {"name": "include_num_input_tokens_seen", "val": ": typing.Union[str, bool] = False"}, {"name": "neftune_noise_alpha", "val": ": typing.Optional[float] = None"}, {"name": "optim_target_modules", "val": ": typing.Union[NoneType, str, list[str]] = None"}, {"name": "batch_eval_metrics", "val": ": bool = False"}, {"name": "eval_on_start", "val": ": bool = False"}, {"name": "use_liger_kernel", "val": ": bool = False"}, {"name": "liger_kernel_config", "val": ": typing.Optional[dict[str, bool]] = None"}, {"name": "eval_use_gather_object", "val": ": bool = False"}, {"name": "average_tokens_across_devices", "val": ": bool = True"}, {"name": "max_length", "val": ": int | None = 1024"}, {"name": "max_prompt_length", "val": ": int | None = 512"}, {"name": "max_completion_length", "val": ": int | None = None"}, {"name": "beta", "val": ": float = 0.1"}, {"name": "label_pad_token_id", "val": ": int = -100"}, {"name": "padding_value", "val": ": int | None = None"}, {"name": "truncation_mode", "val": ": str = 'keep_end'"}, {"name": "disable_dropout", "val": ": bool = True"}, {"name": "generate_during_eval", "val": ": bool = False"}, {"name": "is_encoder_decoder", "val": ": bool | None = None"}, {"name": "precompute_ref_log_probs", "val": ": bool = False"}, {"name": "model_init_kwargs", "val": ": dict[str, typing.Any] | None = None"}, {"name": "ref_model_init_kwargs", "val": ": dict[str, typing.Any] | None = None"}, {"name": "dataset_num_proc", "val": ": int | None = None"}, {"name": "prompt_sample_size", "val": ": int = 1024"}, {"name": "min_density_ratio", "val": ": float = 0.5"}, {"name": "max_density_ratio", "val": ": float = 10.0"}]</parameters><paramsdesc>- **max_length** (`int` or `None`, *optional*, defaults to `1024`) -- | |
| Maximum length of the sequences (prompt + completion) in the batch. This argument is required if you want | |
| to use the default data collator. | |
| - **max_prompt_length** (`int` or `None`, *optional*, defaults to `512`) -- | |
| Maximum length of the prompt. This argument is required if you want to use the default data collator. | |
| - **max_completion_length** (`int`, *optional*) -- | |
| Maximum length of the completion. This argument is required if you want to use the default data collator | |
| and your model is an encoder-decoder. | |
| - **beta** (`float`, *optional*, defaults to `0.1`) -- | |
| Parameter controlling the deviation from the reference model. Higher β means less deviation from the | |
| reference model. | |
| - **label_pad_token_id** (`int`, *optional*, defaults to `-100`) -- | |
| Label pad token id. This argument is required if you want to use the default data collator. | |
| - **padding_value** (`int`, *optional*) -- | |
| Padding value to use. If `None`, the padding value of the tokenizer is used. | |
| - **truncation_mode** (`str`, *optional*, defaults to `"keep_end"`) -- | |
| Truncation mode to use when the prompt is too long. Possible values are `"keep_end"` or `"keep_start"`. | |
| This argument is required if you want to use the default data collator. | |
| - **disable_dropout** (`bool`, *optional*, defaults to `True`) -- | |
| Whether to disable dropout in the model and reference model. | |
| - **generate_during_eval** (`bool`, *optional*, defaults to `False`) -- | |
| If `True`, generates and logs completions from both the model and the reference model to W&B or Comet | |
| during evaluation. | |
| - **is_encoder_decoder** (`bool`, *optional*) -- | |
| When using the `model_init` argument (callable) to instantiate the model instead of the `model` argument, | |
| you need to specify if the model returned by the callable is an encoder-decoder model. | |
| - **precompute_ref_log_probs** (`bool`, *optional*, defaults to `False`) -- | |
| Whether to precompute reference model log probabilities for training and evaluation datasets. This is | |
| useful when training without the reference model to reduce the total GPU memory needed. | |
| - **model_init_kwargs** (`dict[str, Any]`, *optional*) -- | |
| Keyword arguments to pass to `AutoModelForCausalLM.from_pretrained` when instantiating the model from a | |
| string. | |
| - **ref_model_init_kwargs** (`dict[str, Any]`, *optional*) -- | |
| Keyword arguments to pass to `AutoModelForCausalLM.from_pretrained` when instantiating the reference model | |
| from a string. | |
| - **dataset_num_proc** (`int`, *optional*) -- | |
| Number of processes to use for processing the dataset. | |
| - **prompt_sample_size** (`int`, *optional*, defaults to `1024`) -- | |
| Number of prompts that are fed to density ratio classifier. | |
| - **min_density_ratio** (`float`, *optional*, defaults to `0.5`) -- | |
| Minimum value of the density ratio. The estimated density ratio is clamped to this value. | |
| - **max_density_ratio** (`float`, *optional*, defaults to `10.0`) -- | |
| Maximum value of the density ratio. The estimated density ratio is clamped to this value.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Configuration class for the [BCOTrainer](/docs/trl/pr_4331/en/bco_trainer#trl.BCOTrainer). | |
| This class includes only the parameters that are specific to BCO training. For a full list of training arguments, | |
| please refer to the [TrainingArguments](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.TrainingArguments) documentation. Note that default values in this class may | |
| differ from those in [TrainingArguments](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.TrainingArguments). | |
| Using [HfArgumentParser](https://huggingface.co/docs/transformers/main/en/internal/trainer_utils#transformers.HfArgumentParser) we can turn this class into | |
| [argparse](https://docs.python.org/3/library/argparse#module-argparse) arguments that can be specified on the | |
| command line. | |
| </div> | |
| <EditOnGithub source="https://github.com/huggingface/trl/blob/main/docs/source/bco_trainer.md" /> |
Xet Storage Details
- Size:
- 27.6 kB
- Xet hash:
- 176337c4e5d867651e03ba6eb4c21065819a70140f2ae386c6993750f44a996f
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.