File size: 4,306 Bytes

1fa3c6c

# BCO Trainer

[![model badge](https://img.shields.io/badge/All_models-BCO-blue)](https://huggingface.co/models?other=bco,trl)

TRL supports the Binary Classifier Optimization (BCO).
The [BCO](https://huggingface.co/papers/2404.04656) authors train a binary classifier whose logit serves as a reward so that the classifier maps {prompt, chosen completion} pairs to 1 and {prompt, rejected completion} pairs to 0.
For a full example have a look at  [`examples/scripts/bco.py`].

## Expected dataset type

The [`experimental.bco.BCOTrainer`] requires an [unpaired preference dataset](dataset_formats#unpaired-preference).
The [`experimental.bco.BCOTrainer`] supports both [conversational](dataset_formats#conversational) and [standard](dataset_formats#standard) dataset formats. When provided with a conversational dataset, the trainer will automatically apply the chat template to the dataset.

## Expected model format

The BCO trainer expects a model of `AutoModelForCausalLM`, compared to PPO that expects `AutoModelForCausalLMWithValueHead` for the value function.

## Using the `BCOTrainer`

For a detailed example have a look at the `examples/scripts/bco.py` script. At a high level we need to initialize the `BCOTrainer` with a `model` we wish to train and a reference `ref_model` which we will use to calculate the implicit rewards of the preferred and rejected response.

The `beta` refers to the hyperparameter of the implicit reward, and the dataset contains the 3 entries listed above. Note that the `model` and `ref_model` need to have the same architecture (ie decoder only or encoder-decoder).

```python

from trl.experimental.bco import BCOConfig, BCOTrainer



training_args = BCOConfig(

    beta=0.1,

)



bco_trainer = BCOTrainer(

    model,

    model_ref,

    args=training_args,

    train_dataset=train_dataset,

    processing_class=tokenizer,

)

```

After this one can then call:

```python

bco_trainer.train()

```

## Underlying Distribution matching (UDM)

In practical scenarios, the thumbs-up and thumbs-down datasets are likely to have divergent underlying distributions of prompts.
Consider an LLM deployed for user feedback: if the model excels in writing tasks but underperforms in coding, the thumbs-up dataset will be dominated by writing-related prompts, while the thumbs-down dataset will contain mostly coding-related prompts.  
If the prompts in your desired and undesired datasets differ a lot, it is useful to enable UDM.  

Choose an embedding model and tokenizer:

```python

embedding_model = AutoModel.from_pretrained(your_model_id)

embedding_tokenizer = AutoTokenizer.from_pretrained(your_model_id)



# customize this function depending on your embedding model

def embed_prompt(input_ids, attention_mask, model):

    outputs = model(input_ids=input_ids, attention_mask=attention_mask)

    return outputs.last_hidden_state.mean(dim=1)



embedding_model = Accelerator().prepare_model(self.embedding_model)

embedding_func = partial(embed_prompt, model=embedding_model)

```

Set `prompt_sample_size` to define how many prompts are selected to train the UDM classifier and start the training with the provided embedding function:

```python

training_args = BCOConfig(

    beta=0.1,

    prompt_sample_size=512,

)



bco_trainer = BCOTrainer(

    model,

    model_ref,

    args=training_args,

    train_dataset=train_dataset,

    processing_class=tokenizer,

    embedding_func=embedding_func,

    embedding_tokenizer=self.embedding_tokenizer,

)



bco_trainer.train()

```

### For Mixture of Experts Models: Enabling the auxiliary loss

MOEs are the most efficient if the load is about equally distributed between experts.  
To ensure that we train MOEs similarly during preference-tuning, it is beneficial to add the auxiliary loss from the load balancer to the final loss.  

This option is enabled by setting `output_router_logits=True` in the model config (e.g. MixtralConfig).  
To scale how much the auxiliary loss contributes to the total loss, use the hyperparameter `router_aux_loss_coef=...` (default: 0.001).

## BCOTrainer

[[autodoc]] experimental.bco.BCOTrainer
    - train
    - save_model

    - push_to_hub



## BCOConfig



[[autodoc]] experimental.bco.BCOConfig