# Migration Guide ## Migrating from v4.x to v5.x ```{eval-rst} The v5 Sentence Transformers release introduced :class:`~sentence_transformers.sparse_encoder.SparseEncoder` embedding models (see the `Sparse Encoder Usage `_ for more details on them) alongside an extensive training suite for them, including :class:`~sentence_transformers.sparse_encoder.trainer.SparseEncoderTrainer` and :class:`~sentence_transformers.sparse_encoder.training_args.SparseEncoderTrainingArguments`. Unlike with v3 (updated :class:`~sentence_transformers.SentenceTransformer`) and v4 (updated :class:`~sentence_transformers.cross_encoder.CrossEncoder`), this update does not deprecate any training methods. ``` ### Migration for model.encode ```{eval-rst} We introduce two new methods, :meth:`~sentence_transformers.SentenceTransformer.encode_query` and :meth:`~sentence_transformers.SentenceTransformer.encode_document`, which are recommended to use instead of the :meth:`~sentence_transformers.SentenceTransformer.encode` method when working with information retrieval tasks. These methods are specialized version of :meth:`~sentence_transformers.SentenceTransformer.encode` that differs in exactly two ways: 1. If no ``prompt_name`` or ``prompt`` is provided, it uses a predefined "query" prompt, if available in the model's ``prompts`` dictionary. 2. It sets the ``task`` to "query". If the model has a :class:`~sentence_transformers.models.Router` module, it will use the "query" task type to route the input through the appropriate submodules. The same methods apply to the :class:`~sentence_transformers.sparse_encoder.SparseEncoder` models. .. list-table:: encode_query and encode_document :widths: 50 50 :header-rows: 1 * - v4.x - v5.x (recommended) * - .. code-block:: python :emphasize-lines: 7-9 from sentence_transformers import SentenceTransformer model = SentenceTransformer("mixedbread-ai/mxbai-embed-large-v1") query = "What is the capital of France?" document = "Paris is the capital of France." # Use the prompt with the name "query" for the query query_embedding = model.encode(query, prompt_name="query") document_embedding = model.encode(document) print(query_embedding.shape, document_embedding.shape) # => (1, 768) (1, 768) - .. code-block:: python :emphasize-lines: 7-12 from sentence_transformers import SentenceTransformer model = SentenceTransformer("mixedbread-ai/mxbai-embed-large-v1") query = "What is the capital of France?" document = "Paris is the capital of France." # The new encode_query and encode_document methods call encode, # but with the prompt name set to "query" or "document" if the # model has prompts saved, and the task set to "query" or "document", # if the model has a Router module. query_embedding = model.encode_query(query) document_embedding = model.encode_document(document) print(query_embedding.shape, document_embedding.shape) # => (1, 768) (1, 768) We also deprecated the :meth:`~sentence_transformers.SentenceTransformer.encode_multi_process` method, which was used to encode large datasets in parallel using multiple processes. This method has now been subsumed by the :meth:`~sentence_transformers.SentenceTransformer.encode` method with the ``device``, ``pool``, and ``chunk_size`` arguments. Provide a list of devices to the ``device`` argument to use multiple processes, or a single device to use a single process. The ``pool`` argument can be used to pass a multiprocessing pool that gets reused across calls, and the ``chunk_size`` argument can be used to control the size of the chunks that are sent to each process in parallel. .. list-table:: encode_multi_process deprecation -> encode :widths: 50 50 :header-rows: 1 * - v4.x - v5.x (recommended) * - .. code-block:: python :emphasize-lines: 7-9 from sentence_transformers import SentenceTransformer def main(): model = SentenceTransformer("all-mpnet-base-v2") texts = ["The weather is so nice!", "It's so sunny outside.", ...] pool = model.start_multi_process_pool(["cpu", "cpu", "cpu", "cpu"]) embeddings = model.encode_multi_process(texts, pool, chunk_size=512) model.stop_multi_process_pool(pool) print(embeddings.shape) # => (4000, 768) if __name__ == "__main__": main() - .. code-block:: python :emphasize-lines: 7 from sentence_transformers import SentenceTransformer def main(): model = SentenceTransformer("all-mpnet-base-v2") texts = ["The weather is so nice!", "It's so sunny outside.", ...] embeddings = model.encode(texts, device=["cpu", "cpu", "cpu", "cpu"], chunk_size=512) print(embeddings.shape) # => (4000, 768) if __name__ == "__main__": main() The ``truncate_dim`` parameter allows you to reduce the dimensionality of embeddings by truncating them. This is useful for optimizing storage and retrieval while maintaining most of the semantic information. Research has shown that the first dimensions often contain most of the important information in transformer embeddings. .. list-table:: Add truncate_dim to encode :widths: 50 50 :header-rows: 1 * - v4.x - v5.x (recommended) * - .. code-block:: python :emphasize-lines: 3-8 from sentence_transformers import SentenceTransformer # To truncate embeddings to a specific dimension, # you had to specify the dimension when loading model = SentenceTransformer( "mixedbread-ai/mxbai-embed-large-v1", truncate_dim=384, ) sentences = ["This is an example sentence", "Each sentence is converted"] embeddings = model.encode(sentences) print(embeddings.shape) # => (2, 384) - .. code-block:: python :emphasize-lines: 3-7, 10-18 from sentence_transformers import SentenceTransformer # Now you can either specify the dimension when loading the model... model = SentenceTransformer( "mixedbread-ai/mxbai-embed-large-v1", truncate_dim=384, ) sentences = ["This is an example sentence", "Each sentence is converted"] # ... or you can specify it when encoding embeddings = model.encode(sentences, truncate_dim=256) print(embeddings.shape) # => (2, 256) # The encode parameter has priority, but otherwise the model truncate_dim is used embeddings = model.encode(sentences) print(embeddings.shape) # => (2, 384) ``` ### Migration for Asym to Router ```{eval-rst} The ``Asym`` module has been renamed and updated to the new :class:`~sentence_transformers.models.Router` module, which provides the same functionality but with a more consistent API and additional features. The new :class:`~sentence_transformers.models.Router` module allows for more flexible routing of different tasks, such as query and document embeddings, and is recommended when working with asymmetric models that require different processing for different tasks, notably queries and documents. The :meth:`~sentence_transformers.SentenceTransformer.encode_query` and :meth:`~sentence_transformers.SentenceTransformer.encode_document` methods automatically set the ``task`` parameter that is used by the :class:`~sentence_transformers.models.Router` module to route the input to the query or document submodules, respectively. .. collapse:: Asym -> Router .. list-table:: :widths: 50 50 :header-rows: 1 * - v4.x - v5.x (recommended) * - .. code-block:: python :emphasize-lines: 7-10 from sentence_transformers import SentenceTransformer, models # Load a Sentence Transformer model and add an asymmetric router # for different query and document post-processing model = SentenceTransformer("microsoft/mpnet-base") dim = model.get_sentence_embedding_dimension() asym_model = models.Asym({ 'sts': [models.Dense(dim, dim)], 'classification': [models.Dense(dim, dim)] }) model.add_module("asym", asym_model) - .. code-block:: python :emphasize-lines: 7-10 from sentence_transformers import SentenceTransformer, models # Load a Sentence Transformer model and add a router # for different query and document post-processing model = SentenceTransformer("microsoft/mpnet-base") dim = model.get_sentence_embedding_dimension() router_model = models.Router({ 'sts': [models.Dense(dim, dim)], 'classification': [models.Dense(dim, dim)] }) model.add_module("router", router_model) .. collapse:: Asym -> Router for queries and documents .. list-table:: :widths: 50 50 :header-rows: 1 * - v4.x - v5.x (recommended) * - .. code-block:: python :emphasize-lines: 8-11, 22-23 from sentence_transformers import SentenceTransformer from sentence_transformers.models import Router, Normalize # Use a regular SentenceTransformer for the document embeddings, # and a static embedding model for the query embeddings document_embedder = SentenceTransformer("mixedbread-ai/mxbai-embed-large-v1") query_embedder = SentenceTransformer("static-retrieval-mrl-en-v1") asym = Asym({ "query": list(query_embedder.children()), "document": list(document_embedder.children()), }) normalize = Normalize() # Create an asymmetric model with different encoders for queries and documents model = SentenceTransformer( modules=[asym, normalize], ) # ... requires more training to align the vector spaces # Use the query & document routes query_embedding = model.encode({"query": "What is the capital of France?"}) document_embedding = model.encode({"document": "Paris is the capital of France."}) - .. code-block:: python :emphasize-lines: 8-11, 22-23 from sentence_transformers import SentenceTransformer from sentence_transformers.models import Router, Normalize # Use a regular SentenceTransformer for the document embeddings, # and a static embedding model for the query embeddings document_embedder = SentenceTransformer("mixedbread-ai/mxbai-embed-large-v1") query_embedder = SentenceTransformer("static-retrieval-mrl-en-v1") router = Router.for_query_document( query_modules=list(query_embedder.children()), document_modules=list(document_embedder.children()), ) normalize = Normalize() # Create an asymmetric model with different encoders for queries and documents model = SentenceTransformer( modules=[router, normalize], ) # ... requires more training to align the vector spaces # Use the query & document routes query_embedding = model.encode_query("What is the capital of France?") document_embedding = model.encode_document("Paris is the capital of France.") .. collapse:: Asym inference -> Router inference .. list-table:: :widths: 50 50 :header-rows: 1 * - v4.x - v5.x (recommended) * - .. code-block:: python ... # Use the query & document routes as keys in dictionaries query_embedding = model.encode([{"query": "What is the capital of France?"}]) document_embedding = model.encode([ {"document": "Paris is the capital of France."}, {"document": "Berlin is the capital of Germany."}, ]) class_embedding = model.encode( [{"classification": "S&P500 is down 2.1% today."}], ) - .. code-block:: python ... # Use the query & document routes with encode_query/encode_document query_embedding = model.encode_query(["What is the capital of France?"]) document_embedding = model.encode_document([ "Paris is the capital of France.", "Berlin is the capital of Germany.", ]) # When using routes other than "query" and "document", you can use the `task` parameter # on model.encode class_embedding = model.encode( ["S&P500 is down 2.1% today."], task="classification" # or any other task defined in the model Router ) .. collapse:: Asym training -> Router training .. list-table:: :widths: 50 50 :header-rows: 1 * - v4.x - v5.x (recommended) * - .. code-block:: python :emphasize-lines: 16-22 ... # Prepare a training dataset for an Asym model with "query" and "document" keys train_dataset = Dataset.from_dict({ "query": [ "is toprol xl the same as metoprolol?", "are eyes always the same size?", ], "answer": [ "Metoprolol succinate is also known by the brand name Toprol XL.", "The eyes are always the same size from birth to death.", ], }) # This mapper turns normal texts into a dictionary mapping Asym keys to the text def mapper(sample): return { "question": {"query": sample["question"]}, "answer": {"document": sample["answer"]}, } train_dataset = train_dataset.map(mapper) print(train_dataset[0]) """ { "question": {"query": "is toprol xl the same as metoprolol?"}, "answer": {"document": "Metoprolol succinate is also known by the ..."} } """ trainer = SentenceTransformerTrainer( # Or SparseEncoderTrainer model=model, args=training_args, train_dataset=train_dataset, ... ) - .. code-block:: python :emphasize-lines: 25-28 ... # Prepare a training dataset for a Router model with "query" and "document" keys train_dataset = Dataset.from_dict({ "query": [ "is toprol xl the same as metoprolol?", "are eyes always the same size?", ], "answer": [ "Metoprolol succinate is also known by the brand name Toprol XL.", "The eyes are always the same size from birth to death.", ], }) train_dataset = train_dataset.map(mapper) print(train_dataset[0]) """ { "question": "is toprol xl the same as metoprolol?", "answer": "Metoprolol succinate is also known by the brand name Toprol XL." } """ args = SentenceTransformerTrainingArguments( # Or SparseEncoderTrainingArguments # Map dataset columns to the Router keys router_mapping={ "question": "query", "answer": "document", } ) trainer = SentenceTransformerTrainer( # Or SparseEncoderTrainer model=model, args=training_args, train_dataset=train_dataset, ... ) ```
### Migration of advanced usage ```{eval-rst} .. collapse:: Module and InputModule convenience superclasses .. list-table:: :widths: 50 50 :header-rows: 1 * - v4.x - v5.x (recommended) * - .. code-block:: python :emphasize-lines: 4 from sentence_transformers import SentenceTransformer import torch class MyModule(torch.nn.Module): def __init__(self): super().__init__() # Custom code here model = SentenceTransformer(modules=[MyModule()]) - .. code-block:: python :emphasize-lines: 4-9 from sentence_transformers import SentenceTransformer from sentence_transformers.models import Module, InputModule # The new Module and InputModule superclasses provide convenience methods # like 'load', 'load_file_path', 'load_dir_path', 'load_torch_weights', # 'save_config', 'save_torch_weights', 'get_config_dict' # InputModule is meant to be used as the first module, is requires the # 'tokenize' method to be implemented class MyModule(Module): def __init__(self): super().__init__() # Custom initialization code here model = SentenceTransformer(modules=[MyModule()]) .. collapse:: Custom batch samplers via class or function .. list-table:: :widths: 50 50 :header-rows: 1 * - v4.x - v5.x (recommended) * - .. code-block:: python from sentence_transformers import SentenceTransformer, SentenceTransformerTrainer class CustomSentenceTransformerTrainer(SentenceTransformerTrainer): # Custom batch samplers require subclassing the Trainer def get_batch_sampler( self, dataset, batch_size, drop_last, valid_label_columns=None, generator=None, seed=0, ): # Custom batch sampler logic here return ... ... trainer = CustomSentenceTransformerTrainer( model=model, args=args, train_dataset=train_dataset, ... ) trainer.train() - .. code-block:: python from sentence_transformers import SentenceTransformer, SentenceTransformerTrainer from sentence_transformers.sampler import DefaultBatchSampler import torch class CustomBatchSampler(DefaultBatchSampler): def __init__( self, dataset: Dataset, batch_size: int, drop_last: bool, valid_label_columns: list[str] | None = None, generator: torch.Generator | None = None, seed: int = 0, ): super().__init__(dataset, batch_size, drop_last, valid_label_columns, generator, seed) # Custom batch sampler logic here args = SentenceTransformerTrainingArguments( # Other training arguments batch_sampler=CustomBatchSampler, # Use the custom batch sampler class ) trainer = SentenceTransformerTrainer( model=model, args=args, train_dataset=train_dataset, ... ) trainer.train() # Or, use a function to initialize the batch sampler def custom_batch_sampler( dataset: Dataset, batch_size: int, drop_last: bool, valid_label_columns: list[str] | None = None, generator: torch.Generator | None = None, seed: int = 0, ): # Custom batch sampler logic here return ... args = SentenceTransformerTrainingArguments( # Other training arguments batch_sampler=custom_batch_sampler, # Use the custom batch sampler function ) trainer = SentenceTransformerTrainer( model=model, args=args, train_dataset=train_dataset, ... ) trainer.train() .. collapse:: Custom multi-dataset batch samplers via class or function .. list-table:: :widths: 50 50 :header-rows: 1 * - v4.x - v5.x (recommended) * - .. code-block:: python from sentence_transformers import SentenceTransformer, SentenceTransformerTrainer class CustomSentenceTransformerTrainer(SentenceTransformerTrainer): def get_multi_dataset_batch_sampler( self, dataset: ConcatDataset, batch_samplers: list[BatchSampler], generator: torch.Generator | None = None, seed: int | None = 0, ): # Custom multi-dataset batch sampler logic here return ... ... trainer = CustomSentenceTransformerTrainer( model=model, args=args, train_dataset=train_dataset, ... ) trainer.train() - .. code-block:: python from sentence_transformers import SentenceTransformer, SentenceTransformerTrainer from sentence_transformers.sampler import MultiDatasetDefaultBatchSampler import torch class CustomMultiDatasetBatchSampler(MultiDatasetDefaultBatchSampler): def __init__( self, dataset: ConcatDataset, batch_samplers: list[BatchSampler], generator: torch.Generator | None = None, seed: int = 0, ): super().__init__(dataset, batch_samplers=batch_samplers, generator=generator, seed=seed) # Custom multi-dataset batch sampler logic here args = SentenceTransformerTrainingArguments( # Other training arguments multi_dataset_batch_sampler=CustomMultiDatasetBatchSampler, ) trainer = SentenceTransformerTrainer( model=model, args=args, train_dataset=train_dataset, ... ) trainer.train() # Or, use a function to initialize the batch sampler def custom_batch_sampler( dataset: ConcatDataset, batch_samplers: list[BatchSampler], generator: torch.Generator | None = None, seed: int = 0, ): # Custom multi-dataset batch sampler logic here return ... args = SentenceTransformerTrainingArguments( # Other training arguments multi_dataset_batch_sampler=custom_batch_sampler, # Use the custom batch sampler function ) trainer = SentenceTransformerTrainer( model=model, args=args, train_dataset=train_dataset, ... ) trainer.train() .. collapse:: Custom learning rate for sections .. list-table:: :widths: 50 50 :header-rows: 1 * - v4.x - v5.x (recommended) * - .. code-block:: python # A bunch of hacky code to set different learning rates # for different sections of the model - .. code-block:: python :emphasize-lines: 3-9, 14 from sentence_transformers import SentenceTransformer, SentenceTransformerTrainer # Custom learning rate for each section of the model, # mapping regular expressions of parameter names to learning rates # Matching is done with 'search', not just 'match' or 'fullmatch' learning_rate_mapping = { "SparseStaticEmbedding": 1e-4, "linear_.*": 1e-5, } args = SentenceTransformerTrainingArguments( ..., learning_rate=1e-5, # Default learning rate learning_rate_mapping=learning_rate_mapping, ) trainer = SentenceTransformerTrainer( model=model, args=args, train_dataset=train_dataset, ... ) trainer.train() .. collapse:: Training with composite losses .. list-table:: :widths: 50 50 :header-rows: 1 * - v4.x - v5.x (recommended) * - .. code-block:: python :emphasize-lines: 10-11 class CustomLoss(torch.nn.Module): def __init__(self, model, ...): super().__init__() # Custom loss initialization code here def forward(self, features, labels): loss_component_one = self.compute_loss_one(features, labels) loss_component_two = self.compute_loss_two(features, labels) loss = loss_component_one * alpha + loss_component_two * beta return loss loss = CustomLoss(model, ...) - .. code-block:: python :emphasize-lines: 10-16 class CustomLoss(torch.nn.Module): def __init__(self, model, ...): super().__init__() # Custom loss initialization code here def forward(self, features, labels): loss_component_one = self.compute_loss_one(features, labels) loss_component_two = self.compute_loss_two(features, labels) # You can now return a dictionary of loss components. # The trainer considers the full loss as the sum of all # components, but each component will also be logged separately. return { "loss_one": loss_component_one, "loss_two": loss_component_two, } loss = CustomLoss(model, ...) .. collapse:: Accessing the underlying Transformer model .. list-table:: :widths: 50 50 :header-rows: 1 * - v4.x - v5.x (recommended) * - .. code-block:: python :emphasize-lines: 8 from sentence_transformers import SentenceTransformer # Sometimes, for one reason or another, you need to access the underlying # Transformer directly. This was previously commonly done by accessing # the first module, often 'Transformer', and then accessing the # `auto_model` attribute. model = SentenceTransformer("all-MiniLM-L6-v2") print(model[0].auto_model) # BertModel( # (embeddings): BertEmbeddings( # ... - .. code-block:: python :emphasize-lines: 6 from sentence_transformers import SentenceTransformer # Now, you can just use the `transformers_model` attribute on the model itself # even if your model has non-standard modules. model = SentenceTransformer("all-MiniLM-L6-v2") print(model.transformers_model) # BertModel( # (embeddings): BertEmbeddings( # ... ```
## Migrating from v3.x to v4.x ```{eval-rst} The v4 Sentence Transformers release refactored the training of :class:`~sentence_transformers.cross_encoder.CrossEncoder` reranker/pair classification models, replacing :meth:`CrossEncoder.fit ` with a :class:`~sentence_transformers.cross_encoder.trainer.CrossEncoderTrainer` and :class:`~sentence_transformers.cross_encoder..training_args.CrossEncoderTrainingArguments`. Like with v3 and :class:`~sentence_transformers.SentenceTransformer` models, this update softly deprecated :meth:`CrossEncoder.fit `, meaning that it still works, but it's recommended to switch to the new v4.x training format. Behind the scenes, this method now uses the new trainer. .. warning:: If you don't have code that uses :meth:`CrossEncoder.fit `, then you will not have to make any changes to your code to update from v3.x to v4.x. If you do, your code still works, but it is recommended to switch to the new v4.x training format, as it allows more training arguments and functionality. See the `Training Overview `_ for more details. .. list-table:: Old and new training flow :widths: 50 50 :header-rows: 1 * - v3.x - v4.x (recommended) * - :: from sentence_transformers import CrossEncoder, InputExample from torch.utils.data import DataLoader # 1. Define the model. Either from scratch of by loading a pre-trained model model = CrossEncoder("microsoft/mpnet-base") # 2. Define your train examples. You need more than just two examples... train_examples = [ InputExample(texts=["What are pandas?", "The giant panda ..."], label=1), InputExample(texts=["What's a panda?", "Mount Vesuvius is a ..."], label=0), ] train_dataloader = DataLoader(train_examples, shuffle=True, batch_size=16) # 3. Finetune the model model.fit(train_dataloader=train_dataloader, epochs=1, warmup_steps=100) - :: from datasets import load_dataset from sentence_transformers import CrossEncoder, CrossEncoderTrainer from sentence_transformers.cross_encoder.losses import BinaryCrossEntropyLoss # 1. Define the model. Either from scratch of by loading a pre-trained model model = CrossEncoder("microsoft/mpnet-base") # 2. Load a dataset to finetune on, convert to required format dataset = load_dataset("sentence-transformers/hotpotqa", "triplet", split="train") def triplet_to_labeled_pair(batch): anchors = batch["anchor"] positives = batch["positive"] negatives = batch["negative"] return { "sentence_A": anchors * 2, "sentence_B": positives + negatives, "labels": [1] * len(positives) + [0] * len(negatives), } dataset = dataset.map(triplet_to_labeled_pair, batched=True, remove_columns=dataset.column_names) train_dataset = dataset.select(range(10_000)) eval_dataset = dataset.select(range(10_000, 11_000)) # 3. Define a loss function loss = BinaryCrossEntropyLoss(model) # 4. Create a trainer & train trainer = CrossEncoderTrainer( model=model, train_dataset=train_dataset, eval_dataset=eval_dataset, loss=loss, ) trainer.train() # 5. Save the trained model model.save_pretrained("models/mpnet-base-hotpotqa") # model.push_to_hub("mpnet-base-hotpotqa") ``` ### Migration for parameters on `CrossEncoder` initialization and methods ```{eval-rst} .. list-table:: :widths: 50 50 :header-rows: 1 * - v3.x - v4.x (recommended) * - ``CrossEncoder(model_name=...)`` - Renamed to ``CrossEncoder(model_name_or_path=...)`` * - ``CrossEncoder(automodel_args=...)`` - Renamed to ``CrossEncoder(model_kwargs=...)`` * - ``CrossEncoder(tokenizer_args=...)`` - Renamed to ``CrossEncoder(tokenizer_kwargs=...)`` * - ``CrossEncoder(config_args=...)`` - Renamed to ``CrossEncoder(config_kwargs=...)`` * - ``CrossEncoder(cache_dir=...)`` - Renamed to ``CrossEncoder(cache_folder=...)`` * - ``CrossEncoder(default_activation_function=...)`` - Renamed to ``CrossEncoder(activation_fn=...)`` * - ``CrossEncoder(classifier_dropout=...)`` - Use ``CrossEncoder(config_kwargs={"classifier_dropout": ...})`` instead. * - ``CrossEncoder.predict(activation_fct=...)`` - Renamed to ``CrossEncoder.predict(activation_fn=...)`` * - ``CrossEncoder.rank(activation_fct=...)`` - Renamed to ``CrossEncoder.rank(activation_fn=...)`` * - ``CrossEncoder.predict(num_workers=...)`` - Fully deprecated, no longer has any effect. * - ``CrossEncoder.rank(num_workers=...)`` - Fully deprecated, no longer has any effect. .. note:: The old keyword arguments still work, but they will emit a warning recommending you to use the new names instead. ``` ### Migration for specific parameters from `CrossEncoder.fit` ```{eval-rst} .. collapse:: CrossEncoder.fit(train_dataloader) .. list-table:: :widths: 50 50 :header-rows: 1 * - v3.x - v4.x (recommended) * - .. code-block:: python :emphasize-lines: 8-12, 15 from sentence_transformers import CrossEncoder, InputExample from torch.utils.data import DataLoader # 1. Define the model. Either from scratch of by loading a pre-trained model model = CrossEncoder("microsoft/mpnet-base") # 2. Define your train examples. You need more than just two examples... train_examples = [ InputExample(texts=["What are pandas?", "The giant panda ..."], label=1), InputExample(texts=["What's a panda?", "Mount Vesuvius is a ..."], label=0), ] train_dataloader = DataLoader(train_examples, shuffle=True, batch_size=16) # 3. Finetune the model model.fit(train_dataloader=train_dataloader) - .. code-block:: python :emphasize-lines: 6-18, 26 from datasets import Dataset from sentence_transformers import CrossEncoder, CrossEncoderTrainer from sentence_transformers.cross_encoder.losses import BinaryCrossEntropyLoss # Define a training dataset train_examples = [ { "sentence_1": "A person on a horse jumps over a broken down airplane.", "sentence_2": "A person is outdoors, on a horse.", "label": 1, }, { "sentence_1": "Children smiling and waving at camera", "sentence_2": "The kids are frowning", "label": 0, }, ] train_dataset = Dataset.from_list(train_examples) # Define a loss function loss = BinaryCrossEntropyLoss(model) # Finetune the model trainer = CrossEncoderTrainer( model=model, train_dataset=train_dataset, loss=loss, ) trainer.train() .. collapse:: CrossEncoder.fit(loss_fct) .. list-table:: :widths: 50 50 :header-rows: 1 * - v3.x - v4.x (recommended) * - .. code-block:: python :emphasize-lines: 6 ... # Finetune the model model.fit( train_dataloader=train_dataloader, loss_fct=torch.nn.MSELoss(), ) - .. code-block:: python :emphasize-lines: 1, 6, 7, 14 from sentence_transformers.cross_encoder.losses import MSELoss ... # Prepare the loss function # See all valid losses in https://sbert.net/docs/cross_encoder/loss_overview.html loss = MSELoss(model) # Finetune the model trainer = CrossEncoderTrainer( model=model, args=args, train_dataset=train_dataset, loss=loss, ) trainer.train() .. collapse:: CrossEncoder.fit(evaluator) .. list-table:: :widths: 50 50 :header-rows: 1 * - v3.x - v4.x (recommended) * - .. code-block:: python :emphasize-lines: 9 ... # Load an evaluator evaluator = CrossEncoderNanoBEIREvaluator() # Finetune with an evaluator model.fit( train_dataloader=train_dataloader, evaluator=evaluator, ) - .. code-block:: python :emphasize-lines: 10 # Load an evaluator evaluator = CrossEncoderNanoBEIREvaluator() # Finetune with an evaluator trainer = CrossEncoderTrainer( model=model, train_dataset=train_dataset, eval_dataset=eval_dataset, loss=loss, evaluator=evaluator, ) trainer.train() .. collapse:: CrossEncoder.fit(epochs) .. list-table:: :widths: 50 50 :header-rows: 1 * - v3.x - v4.x (recommended) * - .. code-block:: python :emphasize-lines: 6 ... # Finetune the model model.fit( train_dataloader=train_dataloader, epochs=1, ) - .. code-block:: python :emphasize-lines: 5 ... # Prepare the Training Arguments args = CrossEncoderTrainingArguments( num_train_epochs=1, ) # Finetune the model trainer = CrossEncoderTrainer( model=model, args=args, train_dataset=train_dataset, loss=loss, ) trainer.train() .. collapse:: CrossEncoder.fit(activation_fct) .. list-table:: :widths: 50 50 :header-rows: 1 * - v3.x - v4.x (recommended) * - .. code-block:: python :emphasize-lines: 6 ... # Finetune the model model.fit( train_dataloader=train_dataloader, activation_fct=torch.nn.Sigmoid(), ) - .. code-block:: python :emphasize-lines: 4 ... # Prepare the loss function loss = MSELoss(model, activation_fn=torch.nn.Sigmoid()) # Finetune the model trainer = CrossEncoderTrainer( model=model, args=args, train_dataset=train_dataset, loss=loss, ) trainer.train() .. collapse:: CrossEncoder.fit(scheduler) .. list-table:: :widths: 50 50 :header-rows: 1 * - v3.x - v4.x (recommended) * - .. code-block:: python :emphasize-lines: 6 ... # Finetune the model model.fit( train_dataloader=train_dataloader, scheduler="WarmupLinear", ) - .. code-block:: python :emphasize-lines: 6 ... # Prepare the Training Arguments args = CrossEncoderTrainingArguments( # See https://huggingface.co/docs/transformers/main_classes/optimizer_schedules#transformers.SchedulerType lr_scheduler_type="linear" ) # Finetune the model trainer = CrossEncoderTrainer( model=model, args=args, train_dataset=train_dataset, loss=loss, ) trainer.train() .. collapse:: CrossEncoder.fit(warmup_steps) .. list-table:: :widths: 50 50 :header-rows: 1 * - v3.x - v4.x (recommended) * - .. code-block:: python :emphasize-lines: 6 ... # Finetune the model model.fit( train_dataloader=train_dataloader, warmup_steps=1000, ) - .. code-block:: python :emphasize-lines: 5 ... # Prepare the Training Arguments args = CrossEncoderTrainingArguments( warmup_steps=1000, ) # Finetune the model trainer = CrossEncoderTrainer( model=model, args=args, train_dataset=train_dataset, loss=loss, ) trainer.train() .. collapse:: CrossEncoder.fit(optimizer_class, optimizer_params) .. list-table:: :widths: 50 50 :header-rows: 1 * - v3.x - v4.x (recommended) * - .. code-block:: python :emphasize-lines: 6 ... # Finetune the model model.fit( train_dataloader=train_dataloader, optimizer_class=torch.optim.AdamW, optimizer_params={"eps": 1e-7}, ) - .. code-block:: python :emphasize-lines: 6-7 ... # Prepare the Training Arguments args = CrossEncoderTrainingArguments( # See https://github.com/huggingface/transformers/blob/main/src/transformers/training_args.py optim="adamw_torch", optim_args={"eps": 1e-7}, ) # Finetune the model trainer = CrossEncoderTrainer( model=model, args=args, train_dataset=train_dataset, loss=loss, ) trainer.train() .. collapse:: CrossEncoder.fit(weight_decay) .. list-table:: :widths: 50 50 :header-rows: 1 * - v3.x - v4.x (recommended) * - .. code-block:: python :emphasize-lines: 6 ... # Finetune the model model.fit( train_dataloader=train_dataloader, weight_decay=0.02, ) - .. code-block:: python :emphasize-lines: 5 ... # Prepare the Training Arguments args = CrossEncoderTrainingArguments( weight_decay=0.02, ) # Finetune the model trainer = CrossEncoderTrainer( model=model, args=args, train_dataset=train_dataset, loss=loss, ) trainer.train() .. collapse:: CrossEncoder.fit(evaluation_steps) .. list-table:: :widths: 50 50 :header-rows: 1 * - v3.x - v4.x (recommended) * - .. code-block:: python :emphasize-lines: 6, 7 ... # Finetune the model model.fit( train_dataloader=train_dataloader, evaluator=evaluator, evaluation_steps=1000, ) - .. code-block:: python :emphasize-lines: 5, 6, 10, 15, 17 ... # Prepare the Training Arguments args = CrossEncoderTrainingArguments( eval_strategy="steps", eval_steps=1000, ) # Finetune the model # Note: You need an eval_dataset and/or evaluator to evaluate trainer = CrossEncoderTrainer( model=model, args=args, train_dataset=train_dataset, eval_dataset=eval_dataset, loss=loss, evaluator=evaluator, ) trainer.train() .. collapse:: CrossEncoder.fit(output_path, save_best_model) .. list-table:: :widths: 50 50 :header-rows: 1 * - v3.x - v4.x (recommended) * - .. code-block:: python :emphasize-lines: 7, 8 ... # Finetune the model model.fit( train_dataloader=train_dataloader, evaluator=evaluator, output_path="my/path", save_best_model=True, ) - .. code-block:: python :emphasize-lines: 5, 6, 19 ... # Prepare the Training Arguments args = CrossEncoderTrainingArguments( load_best_model_at_end=True, metric_for_best_model="hotpotqa_ndcg@10", # E.g. `evaluator.primary_metric` ) # Finetune the model trainer = CrossEncoderTrainer( model=model, args=args, train_dataset=train_dataset, loss=loss, ) trainer.train() # Save the best model at my output path model.save_pretrained("my/path") .. collapse:: CrossEncoder.fit(max_grad_norm) .. list-table:: :widths: 50 50 :header-rows: 1 * - v3.x - v4.x (recommended) * - .. code-block:: python :emphasize-lines: 6 ... # Finetune the model model.fit( train_dataloader=train_dataloader, max_grad_norm=1, ) - .. code-block:: python :emphasize-lines: 5 ... # Prepare the Training Arguments args = CrossEncoderTrainingArguments( max_grad_norm=1, ) # Finetune the model trainer = CrossEncoderTrainer( model=model, args=args, train_dataset=train_dataset, loss=loss, ) trainer.train() .. collapse:: CrossEncoder.fit(use_amp) .. list-table:: :widths: 50 50 :header-rows: 1 * - v3.x - v4.x (recommended) * - .. code-block:: python :emphasize-lines: 6 ... # Finetune the model model.fit( train_dataloader=train_dataloader, use_amp=True, ) - .. code-block:: python :emphasize-lines: 5, 6 ... # Prepare the Training Arguments args = CrossEncoderTrainingArguments( fp16=True, bf16=False, # If your GPU supports it, you can also use bf16 instead ) # Finetune the model trainer = CrossEncoderTrainer( model=model, args=args, train_dataset=train_dataset, loss=loss, ) trainer.train() .. collapse:: CrossEncoder.fit(callback) .. list-table:: :widths: 50 50 :header-rows: 1 * - v3.x - v4.x (recommended) * - .. code-block:: python :emphasize-lines: 3, 4, 9 ... def printer_callback(score, epoch, steps): print(f"Score: {score:.4f} at epoch {epoch:d}, step {steps:d}") # Finetune the model model.fit( train_dataloader=train_dataloader, callback=printer_callback, ) - .. code-block:: python :emphasize-lines: 1, 5-10, 17 from transformers import TrainerCallback ... class PrinterCallback(TrainerCallback): # Subclass any method from https://huggingface.co/docs/transformers/main_classes/callback#transformers.TrainerCallback def on_evaluate(self, args, state, control, metrics=None, **kwargs): print(f"Metrics: {metrics} at epoch {state.epoch:d}, step {state.global_step:d}") printer_callback = PrinterCallback() # Finetune the model trainer = CrossEncoderTrainer( model=model, train_dataset=train_dataset, loss=loss, callbacks=[printer_callback], ) trainer.train() .. collapse:: CrossEncoder.fit(show_progress_bar) .. list-table:: :widths: 50 50 :header-rows: 1 * - v3.x - v4.x (recommended) * - .. code-block:: python :emphasize-lines: 6 ... # Finetune the model model.fit( train_dataloader=train_dataloader, show_progress_bar=True, ) - .. code-block:: python :emphasize-lines: 5 ... # Prepare the Training Arguments args = CrossEncoderTrainingArguments( disable_tqdm=False, ) # Finetune the model trainer = CrossEncoderTrainer( model=model, args=args, train_dataset=train_dataset, loss=loss, ) trainer.train() .. raw:: html
.. note:: The old :meth:`CrossEncoder.fit ` method still works, it was only softly deprecated. It now uses the new :class:`~sentence_transformers.cross_encoder.trainer.CrossEncoderTrainer` behind the scenes. ``` ### Migration for CrossEncoder evaluators ```{eval-rst} .. list-table:: :widths: 50 50 :header-rows: 1 * - v3.x - v4.x (recommended) * - ``CEBinaryAccuracyEvaluator`` - Use :class:`~sentence_transformers.cross_encoder.evaluation.CrossEncoderClassificationEvaluator`, an encompassed evaluator which uses the same inputs & outputs. * - ``CEBinaryClassificationEvaluator`` - Use :class:`~sentence_transformers.cross_encoder.evaluation.CrossEncoderClassificationEvaluator`, an encompassed evaluator which uses the same inputs & outputs. * - ``CECorrelationEvaluator`` - Use :class:`~sentence_transformers.cross_encoder.evaluation.CrossEncoderCorrelationEvaluator`, this evaluator was renamed. * - ``CEF1Evaluator`` - Use :class:`~sentence_transformers.cross_encoder.evaluation.CrossEncoderClassificationEvaluator`, an encompassed evaluator which uses the same inputs & outputs. * - ``CESoftmaxAccuracyEvaluator`` - Use :class:`~sentence_transformers.cross_encoder.evaluation.CrossEncoderClassificationEvaluator`, an encompassed evaluator which uses the same inputs & outputs. * - ``CERerankingEvaluator`` - Renamed to :class:`~sentence_transformers.cross_encoder.evaluation.CrossEncoderRerankingEvaluator`, this evaluator was renamed .. note:: The old evaluators still work, they will simply warn you to update to the new evaluators. ``` ## Migrating from v2.x to v3.x ```{eval-rst} The v3 Sentence Transformers release refactored the training of :class:`~sentence_transformers.SentenceTransformer` embedding models, replacing :meth:`SentenceTransformer.fit ` with a :class:`~sentence_transformers.trainer.SentenceTransformerTrainer` and :class:`~sentence_transformers.training_args.SentenceTransformerTrainingArguments`. This update softly deprecated :meth:`SentenceTransformer.fit `, meaning that it still works, but it's recommended to switch to the new v3.x training format. Behind the scenes, this method now uses the new trainer. .. warning:: If you don't have code that uses :meth:`SentenceTransformer.fit `, then you will not have to make any changes to your code to update from v2.x to v3.x. If you do, your code still works, but it is recommended to switch to the new v3.x training format, as it allows more training arguments and functionality. See the `Training Overview `_ for more details. .. list-table:: Old and new training flow :widths: 50 50 :header-rows: 1 * - v2.x - v3.x (recommended) * - :: from sentence_transformers import SentenceTransformer, InputExample, losses from torch.utils.data import DataLoader # 1. Define the model. Either from scratch of by loading a pre-trained model model = SentenceTransformer("microsoft/mpnet-base") # 2. Define your train examples. You need more than just two examples... train_examples = [ InputExample(texts=[ "A person on a horse jumps over a broken down airplane.", "A person is outdoors, on a horse.", "A person is at a diner, ordering an omelette.", ]), InputExample(texts=[ "Children smiling and waving at camera", "There are children present", "The kids are frowning", ]), ] train_dataloader = DataLoader(train_examples, shuffle=True, batch_size=16) # 3. Define a loss function train_loss = losses.MultipleNegativesRankingLoss(model) # 4. Finetune the model model.fit( train_objectives=[(train_dataloader, train_loss)], epochs=1, warmup_steps=100, ) # 5. Save the trained model model.save_pretrained("models/mpnet-base-all-nli") - :: from datasets import load_dataset from sentence_transformers import SentenceTransformer, SentenceTransformerTrainer from sentence_transformers.losses import MultipleNegativesRankingLoss # 1. Define the model. Either from scratch of by loading a pre-trained model model = SentenceTransformer("microsoft/mpnet-base") # 2. Load a dataset to finetune on dataset = load_dataset("sentence-transformers/all-nli", "triplet") train_dataset = dataset["train"].select(range(10_000)) eval_dataset = dataset["dev"].select(range(1_000)) # 3. Define a loss function loss = MultipleNegativesRankingLoss(model) # 4. Create a trainer & train trainer = SentenceTransformerTrainer( model=model, train_dataset=train_dataset, eval_dataset=eval_dataset, loss=loss, ) trainer.train() # 5. Save the trained model model.save_pretrained("models/mpnet-base-all-nli") # model.push_to_hub("mpnet-base-all-nli") ``` ### Migration for specific parameters from `SentenceTransformer.fit` ```{eval-rst} .. collapse:: SentenceTransformer.fit(train_objectives) .. list-table:: :widths: 50 50 :header-rows: 1 * - v2.x - v3.x (recommended) * - .. code-block:: python :emphasize-lines: 5-17, 20, 23 from sentence_transformers import SentenceTransformer, InputExample, losses from torch.utils.data import DataLoader # Define a training dataloader train_examples = [ InputExample(texts=[ "A person on a horse jumps over a broken down airplane.", "A person is outdoors, on a horse.", "A person is at a diner, ordering an omelette.", ]), InputExample(texts=[ "Children smiling and waving at camera", "There are children present", "The kids are frowning", ]), ] train_dataloader = DataLoader(train_examples, shuffle=True, batch_size=16) # Define a loss function train_loss = losses.MultipleNegativesRankingLoss(model) # Finetune the model model.fit(train_objectives=[(train_dataloader, train_loss)]) - .. code-block:: python :emphasize-lines: 6-18, 21, 26, 27 from datasets import Dataset from sentence_transformers import SentenceTransformer, SentenceTransformerTrainer from sentence_transformers.losses import MultipleNegativesRankingLoss # Define a training dataset train_examples = [ { "anchor": "A person on a horse jumps over a broken down airplane.", "positive": "A person is outdoors, on a horse.", "negative": "A person is at a diner, ordering an omelette.", }, { "anchor": "Children smiling and waving at camera", "positive": "There are children present", "negative": "The kids are frowning", }, ] train_dataset = Dataset.from_list(train_examples) # Define a loss function loss = MultipleNegativesRankingLoss(model) # Finetune the model trainer = SentenceTransformerTrainer( model=model, train_dataset=train_dataset, loss=loss, ) trainer.train() .. collapse:: SentenceTransformer.fit(evaluator) .. list-table:: :widths: 50 50 :header-rows: 1 * - v2.x - v3.x (recommended) * - .. code-block:: python :emphasize-lines: 9 ... # Load an evaluator evaluator = NanoBEIREvaluator() # Finetune with an evaluator model.fit( train_objectives=[(train_dataloader, train_loss)], evaluator=evaluator, ) - .. code-block:: python :emphasize-lines: 10 # Load an evaluator evaluator = NanoBEIREvaluator() # Finetune with an evaluator trainer = SentenceTransformerTrainer( model=model, train_dataset=train_dataset, eval_dataset=eval_dataset, loss=loss, evaluator=evaluator, ) trainer.train() .. collapse:: SentenceTransformer.fit(epochs) .. list-table:: :widths: 50 50 :header-rows: 1 * - v2.x - v3.x (recommended) * - .. code-block:: python :emphasize-lines: 6 ... # Finetune the model model.fit( train_objectives=[(train_dataloader, train_loss)], epochs=1, ) - .. code-block:: python :emphasize-lines: 5 ... # Prepare the Training Arguments args = SentenceTransformerTrainingArguments( num_train_epochs=1, ) # Finetune the model trainer = SentenceTransformerTrainer( model=model, args=args, train_dataset=train_dataset, loss=loss, ) trainer.train() .. collapse:: SentenceTransformer.fit(steps_per_epoch) .. list-table:: :widths: 50 50 :header-rows: 1 * - v2.x - v3.x (recommended) * - .. code-block:: python :emphasize-lines: 6 ... # Finetune the model model.fit( train_objectives=[(train_dataloader, train_loss)], steps_per_epoch=1000, ) - .. code-block:: python :emphasize-lines: 5 ... # Prepare the Training Arguments args = SentenceTransformerTrainingArguments( max_steps=1000, # Note: max_steps is across all epochs, not per epoch ) # Finetune the model trainer = SentenceTransformerTrainer( model=model, args=args, train_dataset=train_dataset, loss=loss, ) trainer.train() .. collapse:: SentenceTransformer.fit(scheduler) .. list-table:: :widths: 50 50 :header-rows: 1 * - v2.x - v3.x (recommended) * - .. code-block:: python :emphasize-lines: 6 ... # Finetune the model model.fit( train_objectives=[(train_dataloader, train_loss)], scheduler="WarmupLinear", ) - .. code-block:: python :emphasize-lines: 6 ... # Prepare the Training Arguments args = SentenceTransformerTrainingArguments( # See https://huggingface.co/docs/transformers/main_classes/optimizer_schedules#transformers.SchedulerType lr_scheduler_type="linear" ) # Finetune the model trainer = SentenceTransformerTrainer( model=model, args=args, train_dataset=train_dataset, loss=loss, ) trainer.train() .. collapse:: SentenceTransformer.fit(warmup_steps) .. list-table:: :widths: 50 50 :header-rows: 1 * - v2.x - v3.x (recommended) * - .. code-block:: python :emphasize-lines: 6 ... # Finetune the model model.fit( train_objectives=[(train_dataloader, train_loss)], warmup_steps=1000, ) - .. code-block:: python :emphasize-lines: 5 ... # Prepare the Training Arguments args = SentenceTransformerTrainingArguments( warmup_steps=1000, ) # Finetune the model trainer = SentenceTransformerTrainer( model=model, args=args, train_dataset=train_dataset, loss=loss, ) trainer.train() .. collapse:: SentenceTransformer.fit(optimizer_class, optimizer_params) .. list-table:: :widths: 50 50 :header-rows: 1 * - v2.x - v3.x (recommended) * - .. code-block:: python :emphasize-lines: 6 ... # Finetune the model model.fit( train_objectives=[(train_dataloader, train_loss)], optimizer_class=torch.optim.AdamW, optimizer_params={"eps": 1e-7}, ) - .. code-block:: python :emphasize-lines: 6-7 ... # Prepare the Training Arguments args = SentenceTransformerTrainingArguments( # See https://github.com/huggingface/transformers/blob/main/src/transformers/training_args.py optim="adamw_torch", optim_args={"eps": 1e-7}, ) # Finetune the model trainer = SentenceTransformerTrainer( model=model, args=args, train_dataset=train_dataset, loss=loss, ) trainer.train() .. collapse:: SentenceTransformer.fit(weight_decay) .. list-table:: :widths: 50 50 :header-rows: 1 * - v2.x - v3.x (recommended) * - .. code-block:: python :emphasize-lines: 6 ... # Finetune the model model.fit( train_objectives=[(train_dataloader, train_loss)], weight_decay=0.02, ) - .. code-block:: python :emphasize-lines: 5 ... # Prepare the Training Arguments args = SentenceTransformerTrainingArguments( weight_decay=0.02, ) # Finetune the model trainer = SentenceTransformerTrainer( model=model, args=args, train_dataset=train_dataset, loss=loss, ) trainer.train() .. collapse:: SentenceTransformer.fit(evaluation_steps) .. list-table:: :widths: 50 50 :header-rows: 1 * - v2.x - v3.x (recommended) * - .. code-block:: python :emphasize-lines: 6, 7 ... # Finetune the model model.fit( train_objectives=[(train_dataloader, train_loss)], evaluator=evaluator, evaluation_steps=1000, ) - .. code-block:: python :emphasize-lines: 5, 6, 10, 15, 17 ... # Prepare the Training Arguments args = SentenceTransformerTrainingArguments( eval_strategy="steps", eval_steps=1000, ) # Finetune the model # Note: You need an eval_dataset and/or evaluator to evaluate trainer = SentenceTransformerTrainer( model=model, args=args, train_dataset=train_dataset, eval_dataset=eval_dataset, loss=loss, evaluator=evaluator, ) trainer.train() .. collapse:: SentenceTransformer.fit(output_path, save_best_model) .. list-table:: :widths: 50 50 :header-rows: 1 * - v2.x - v3.x (recommended) * - .. code-block:: python :emphasize-lines: 7, 8 ... # Finetune the model model.fit( train_objectives=[(train_dataloader, train_loss)], evaluator=evaluator, output_path="my/path", save_best_model=True, ) - .. code-block:: python :emphasize-lines: 5, 6, 19 ... # Prepare the Training Arguments args = SentenceTransformerTrainingArguments( load_best_model_at_end=True, metric_for_best_model="all_nli_cosine_accuracy", # E.g. `evaluator.primary_metric` ) # Finetune the model trainer = SentenceTransformerTrainer( model=model, args=args, train_dataset=train_dataset, loss=loss, ) trainer.train() # Save the best model at my output path model.save_pretrained("my/path") .. collapse:: SentenceTransformer.fit(max_grad_norm) .. list-table:: :widths: 50 50 :header-rows: 1 * - v2.x - v3.x (recommended) * - .. code-block:: python :emphasize-lines: 6 ... # Finetune the model model.fit( train_objectives=[(train_dataloader, train_loss)], max_grad_norm=1, ) - .. code-block:: python :emphasize-lines: 5 ... # Prepare the Training Arguments args = SentenceTransformerTrainingArguments( max_grad_norm=1, ) # Finetune the model trainer = SentenceTransformerTrainer( model=model, args=args, train_dataset=train_dataset, loss=loss, ) trainer.train() .. collapse:: SentenceTransformer.fit(use_amp) .. list-table:: :widths: 50 50 :header-rows: 1 * - v2.x - v3.x (recommended) * - .. code-block:: python :emphasize-lines: 6 ... # Finetune the model model.fit( train_objectives=[(train_dataloader, train_loss)], use_amp=True, ) - .. code-block:: python :emphasize-lines: 5, 6 ... # Prepare the Training Arguments args = SentenceTransformerTrainingArguments( fp16=True, bf16=False, # If your GPU supports it, you can also use bf16 instead ) # Finetune the model trainer = SentenceTransformerTrainer( model=model, args=args, train_dataset=train_dataset, loss=loss, ) trainer.train() .. collapse:: SentenceTransformer.fit(callback) .. list-table:: :widths: 50 50 :header-rows: 1 * - v2.x - v3.x (recommended) * - .. code-block:: python :emphasize-lines: 3, 4, 9 ... def printer_callback(score, epoch, steps): print(f"Score: {score:.4f} at epoch {epoch:d}, step {steps:d}") # Finetune the model model.fit( train_objectives=[(train_dataloader, train_loss)], callback=printer_callback, ) - .. code-block:: python :emphasize-lines: 1, 5-10, 17 from transformers import TrainerCallback ... class PrinterCallback(TrainerCallback): # Subclass any method from https://huggingface.co/docs/transformers/main_classes/callback#transformers.TrainerCallback def on_evaluate(self, args, state, control, metrics=None, **kwargs): print(f"Metrics: {metrics} at epoch {state.epoch:d}, step {state.global_step:d}") printer_callback = PrinterCallback() # Finetune the model trainer = SentenceTransformerTrainer( model=model, train_dataset=train_dataset, loss=loss, callbacks=[printer_callback], ) trainer.train() .. collapse:: SentenceTransformer.fit(show_progress_bar) .. list-table:: :widths: 50 50 :header-rows: 1 * - v2.x - v3.x (recommended) * - .. code-block:: python :emphasize-lines: 6 ... # Finetune the model model.fit( train_objectives=[(train_dataloader, train_loss)], show_progress_bar=True, ) - .. code-block:: python :emphasize-lines: 5 ... # Prepare the Training Arguments args = SentenceTransformerTrainingArguments( disable_tqdm=False, ) # Finetune the model trainer = SentenceTransformerTrainer( model=model, args=args, train_dataset=train_dataset, loss=loss, ) trainer.train() .. collapse:: SentenceTransformer.fit(checkpoint_path, checkpoint_save_steps, checkpoint_save_total_limit) .. list-table:: :widths: 50 50 :header-rows: 1 * - v2.x - v3.x (recommended) * - .. code-block:: python :emphasize-lines: 6-8 ... # Finetune the model model.fit( train_objectives=[(train_dataloader, train_loss)], checkpoint_path="checkpoints", checkpoint_save_steps=5000, checkpoint_save_total_limit=2, ) - .. code-block:: python :emphasize-lines: 7-9, 13, 18 ... # Prepare the Training Arguments args = SentenceTransformerTrainingArguments( eval_strategy="steps", eval_steps=5000, save_strategy="steps", save_steps=5000, save_total_limit=2, ) # Finetune the model # Note: You need an eval_dataset and/or evaluator to checkpoint trainer = SentenceTransformerTrainer( model=model, args=args, train_dataset=train_dataset, eval_dataset=eval_dataset, loss=loss, ) trainer.train() ```
### Migration for custom Datasets and DataLoaders used in `SentenceTransformer.fit` ```{eval-rst} .. list-table:: :widths: 50 50 :header-rows: 1 * - v2.x - v3.x (recommended) * - ``ParallelSentencesDataset`` - Manually creating a :class:`~datasets.Dataset` and adding a ``label`` column for embeddings. Alternatively, consider loading one of our pre-provided `Parallel Sentences Datasets `_. * - ``SentenceLabelDataset`` - Loading or creating a :class:`~datasets.Dataset` and using ``SentenceTransformerTrainingArguments(batch_sampler=BatchSamplers.GROUP_BY_LABEL)`` (uses the :class:`~sentence_transformers.sampler.GroupByLabelBatchSampler`). Recommended for the BatchTripletLosses. * - ``DenoisingAutoEncoderDataset`` - Manually adding a column with noisy text to a :class:`~datasets.Dataset` with texts, e.g. with :func:`Dataset.map `. * - ``NoDuplicatesDataLoader`` - Loading or creating a :class:`~datasets.Dataset` and using ``SentenceTransformerTrainingArguments(batch_sampler=BatchSamplers.NO_DUPLICATES)`` (uses the :class:`~sentence_transformers.sampler.NoDuplicatesBatchSampler`). Recommended for :class:`~sentence_transformers.losses.MultipleNegativesRankingLoss`. ```