Buckets:

hf-doc-build
/

doc-dev

Files

xet

hf-doc-build/doc-dev / transformers /pr_33892 /en /main_classes /model.md

rtrm

about 1 month ago

preview code

download

raw

52.9 kB

	# Models

	The base class [PreTrainedModel](/docs/transformers/pr_33892/en/main_classes/model#transformers.PreTrainedModel) implements the common methods for loading/saving a model either from a local
	file or directory, or from a pretrained model configuration provided by the library (downloaded from HuggingFace's Hub).

	[PreTrainedModel](/docs/transformers/pr_33892/en/main_classes/model#transformers.PreTrainedModel) also implements a few methods which are common among all the models to:

	- resize the input token embeddings when new tokens are added to the vocabulary

	The other methods that are common to each model are defined in [ModuleUtilsMixin](/docs/transformers/pr_33892/en/main_classes/model#transformers.modeling_utils.ModuleUtilsMixin) and [GenerationMixin](/docs/transformers/pr_33892/en/main_classes/text_generation#transformers.GenerationMixin).

	## PreTrainedModel[[transformers.PreTrainedModel]]

	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>class transformers.PreTrainedModel</name><anchor>transformers.PreTrainedModel</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L1644</source><parameters>[{"name": "config", "val": ": PreTrainedConfig"}, {"name": "inputs", "val": ""}, {"name": "*kwargs", "val": ""}]</parameters></docstring>

	Base class for all models.

	[PreTrainedModel](/docs/transformers/pr_33892/en/main_classes/model#transformers.PreTrainedModel) takes care of storing the configuration of the models and handles methods for loading,
	downloading and saving models as well as a few methods common to all models to:

	- resize the input embeddings

	Class attributes (overridden by derived classes):

	- config_class ([PreTrainedConfig](/docs/transformers/pr_33892/en/main_classes/configuration#transformers.PreTrainedConfig)) -- A subclass of [PreTrainedConfig](/docs/transformers/pr_33892/en/main_classes/configuration#transformers.PreTrainedConfig) to use as configuration class
	for this model architecture.
	- base_model_prefix (`str`) -- A string indicating the attribute associated to the base model in derived
	classes of the same architecture adding modules on top of the base model.
	- main_input_name (`str`) -- The name of the principal input to the model (often `input_ids` for NLP
	models, `pixel_values` for vision models and `input_values` for speech models).
	- can_record_outputs (dict):



	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>push_to_hub</name><anchor>transformers.PreTrainedModel.push_to_hub</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/utils/hub.py#L821</source><parameters>[{"name": "repo_id", "val": ": str"}, {"name": "use_temp_dir", "val": ": bool \| None = None"}, {"name": "commit_message", "val": ": str \| None = None"}, {"name": "private", "val": ": bool \| None = None"}, {"name": "token", "val": ": bool \| str \| None = None"}, {"name": "max_shard_size", "val": ": int \| str \| None = '5GB'"}, {"name": "create_pr", "val": ": bool = False"}, {"name": "safe_serialization", "val": ": bool = True"}, {"name": "revision", "val": ": str \| None = None"}, {"name": "commit_description", "val": ": str \| None = None"}, {"name": "tags", "val": ": list[str] \| None = None"}, {"name": "deprecated_kwargs", "val": ""}]</parameters><paramsdesc>- repo_id** (`str`) --
	The name of the repository you want to push your model to. It should contain your organization name
	when pushing to a given organization.
	- use_temp_dir (`bool`, optional) --
	Whether or not to use a temporary directory to store the files saved before they are pushed to the Hub.
	Will default to `True` if there is no directory named like `repo_id`, `False` otherwise.
	- commit_message (`str`, optional) --
	Message to commit while pushing. Will default to `"Upload model"`.
	- private (`bool`, optional) --
	Whether to make the repo private. If `None` (default), the repo will be public unless the organization's default is private. This value is ignored if the repo already exists.
	- token (`bool` or `str`, optional) --
	The token to use as HTTP bearer authorization for remote files. If `True`, will use the token generated
	when running `hf auth login` (stored in `~/.huggingface`). Will default to `True` if `repo_url`
	is not specified.
	- max_shard_size (`int` or `str`, optional, defaults to `"5GB"`) --
	Only applicable for models. The maximum size for a checkpoint before being sharded. Checkpoints shard
	will then be each of size lower than this size. If expressed as a string, needs to be digits followed
	by a unit (like `"5MB"`). We default it to `"5GB"` so that users can easily load models on free-tier
	Google Colab instances without any CPU OOM issues.
	- create_pr (`bool`, optional, defaults to `False`) --
	Whether or not to create a PR with the uploaded files or directly commit.
	- safe_serialization (`bool`, optional, defaults to `True`) --
	Whether or not to convert the model weights in safetensors format for safer serialization.
	- revision (`str`, optional) --
	Branch to push the uploaded files to.
	- commit_description (`str`, optional) --
	The description of the commit that will be created
	- tags (`list[str]`, optional) --
	List of tags to push on the Hub.</paramsdesc><paramgroups>0</paramgroups></docstring>

	Upload the model file to the 🤗 Model Hub.



	<ExampleCodeBlock anchor="transformers.PreTrainedModel.push_to_hub.example">

	Examples:

	```python
	from transformers import AutoModel

	model = AutoModel.from_pretrained("google-bert/bert-base-cased")

	# Push the model to your namespace with the name "my-finetuned-bert".
	model.push_to_hub("my-finetuned-bert")

	# Push the model to an organization with the name "my-finetuned-bert".
	model.push_to_hub("huggingface/my-finetuned-bert")
	```

	</ExampleCodeBlock>


	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>add_model_tags</name><anchor>transformers.PreTrainedModel.add_model_tags</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L1975</source><parameters>[{"name": "tags", "val": ": typing.Union[list[str], str]"}]</parameters><paramsdesc>- tags (`Union[list[str], str]`) --
	The desired tags to inject in the model</paramsdesc><paramgroups>0</paramgroups></docstring>

	Add custom tags into the model that gets pushed to the Hugging Face Hub. Will
	not overwrite existing tags in the model.



	<ExampleCodeBlock anchor="transformers.PreTrainedModel.add_model_tags.example">

	Examples:

	```python
	from transformers import AutoModel

	model = AutoModel.from_pretrained("google-bert/bert-base-cased")

	model.add_model_tags(["custom", "custom-bert"])

	# Push the model to your namespace with the name "my-custom-bert".
	model.push_to_hub("my-custom-bert")
	```

	</ExampleCodeBlock>


	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>can_generate</name><anchor>transformers.PreTrainedModel.can_generate</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L2091</source><parameters>[]</parameters><rettype>`bool`</rettype><retdesc>Whether this model can generate sequences with `.generate()`.</retdesc></docstring>

	Returns whether this model can generate sequences with `.generate()` from the `GenerationMixin`.

	Under the hood, on classes where this function returns True, some generation-specific changes are triggered:
	for instance, the model instance will have a populated `generation_config` attribute.






	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>dequantize</name><anchor>transformers.PreTrainedModel.dequantize</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L1957</source><parameters>[]</parameters></docstring>

	Potentially dequantize the model in case it has been quantized by a quantization method that support
	dequantization.


	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>disable_input_require_grads</name><anchor>transformers.PreTrainedModel.disable_input_require_grads</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L2578</source><parameters>[]</parameters></docstring>

	Removes the `_require_grads_hook`.


	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>enable_input_require_grads</name><anchor>transformers.PreTrainedModel.enable_input_require_grads</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L2567</source><parameters>[]</parameters></docstring>

	Enables the gradients for the input embeddings. This is useful for fine-tuning adapter weights while keeping
	the model weights fixed.


	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>from_pretrained</name><anchor>transformers.PreTrainedModel.from_pretrained</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L4057</source><parameters>[{"name": "pretrained_model_name_or_path", "val": ": typing.Union[str, os.PathLike, NoneType]"}, {"name": "model_args", "val": ""}, {"name": "config", "val": ": typing.Union[transformers.configuration_utils.PreTrainedConfig, str, os.PathLike, NoneType] = None"}, {"name": "cache_dir", "val": ": typing.Union[str, os.PathLike, NoneType] = None"}, {"name": "ignore_mismatched_sizes", "val": ": bool = False"}, {"name": "force_download", "val": ": bool = False"}, {"name": "local_files_only", "val": ": bool = False"}, {"name": "token", "val": ": typing.Union[str, bool, NoneType] = None"}, {"name": "revision", "val": ": str = 'main'"}, {"name": "use_safetensors", "val": ": typing.Optional[bool] = None"}, {"name": "weights_only", "val": ": bool = True"}, {"name": "kwargs", "val": ""}]</parameters><paramsdesc>- pretrained_model_name_or_path* (`str` or `os.PathLike`, optional) --
	Can be either:

	- A string, the model id of a pretrained model hosted inside a model repo on huggingface.co.
	- A path to a directory containing model weights saved using
	[save_pretrained()](/docs/transformers/pr_33892/en/main_classes/model#transformers.PreTrainedModel.save_pretrained), e.g., `./my_model_directory/`.
	- `None` if you are both providing the configuration and state dictionary (resp. with keyword
	arguments `config` and `state_dict`).
	- model_args (sequence of positional arguments, optional) --
	All remaining positional arguments will be passed to the underlying model's `__init__` method.
	- config (`Union[PreTrainedConfig, str, os.PathLike]`, optional) --
	Can be either:

	- an instance of a class derived from [PreTrainedConfig](/docs/transformers/pr_33892/en/main_classes/configuration#transformers.PreTrainedConfig),
	- a string or path valid as input to [from_pretrained()](/docs/transformers/pr_33892/en/main_classes/configuration#transformers.PreTrainedConfig.from_pretrained).

	Configuration for the model to use instead of an automatically loaded configuration. Configuration can
	be automatically loaded when:

	- The model is a model provided by the library (loaded with the model id string of a pretrained
	model).
	- The model was saved using [save_pretrained()](/docs/transformers/pr_33892/en/main_classes/model#transformers.PreTrainedModel.save_pretrained) and is reloaded by supplying the
	save directory.
	- The model is loaded by supplying a local directory as `pretrained_model_name_or_path` and a
	configuration JSON file named config.json is found in the directory.
	- state_dict (`dict[str, torch.Tensor]`, optional) --
	A state dictionary to use instead of a state dictionary loaded from saved weights file.

	This option can be used if you want to create a model from a pretrained configuration but load your own
	weights. In this case though, you should check if using [save_pretrained()](/docs/transformers/pr_33892/en/main_classes/model#transformers.PreTrainedModel.save_pretrained) and
	[from_pretrained()](/docs/transformers/pr_33892/en/main_classes/model#transformers.PreTrainedModel.from_pretrained) is not a simpler option.
	- cache_dir (`Union[str, os.PathLike]`, optional) --
	Path to a directory in which a downloaded pretrained model configuration should be cached if the
	standard cache should not be used.
	- ignore_mismatched_sizes (`bool`, optional, defaults to `False`) --
	Whether or not to raise an error if some of the weights from the checkpoint do not have the same size
	as the weights of the model (if for instance, you are instantiating a model with 10 labels from a
	checkpoint with 3 labels).
	- force_download (`bool`, optional, defaults to `False`) --
	Whether or not to force the (re-)download of the model weights and configuration files, overriding the
	cached versions if they exist.
	- proxies (`dict[str, str]`, optional) --
	A dictionary of proxy servers to use by protocol or endpoint, e.g., `{'http': 'foo.bar:3128',
	'http://hostname': 'foo.bar:4012'}`. The proxies are used on each request.
	- output_loading_info(`bool`, optional, defaults to `False`) --
	Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages.
	- local_files_only(`bool`, optional, defaults to `False`) --
	Whether or not to only look at local files (i.e., do not try to download the model).
	- token (`str` or `bool`, optional) --
	The token to use as HTTP bearer authorization for remote files. If `True`, or not specified, will use
	the token generated when running `hf auth login` (stored in `~/.huggingface`).
	- revision (`str`, optional, defaults to `"main"`) --
	The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a
	git-based system for storing models and other artifacts on huggingface.co, so `revision` can be any
	identifier allowed by git.

	<Tip>

	To test a pull request you made on the Hub, you can pass `revision="refs/pr/<pr_number>"`.

	</Tip>
	- attn_implementation (`str`, optional) --
	The attention implementation to use in the model (if relevant). Can be any of `"eager"` (manual implementation of the attention), `"sdpa"` (using [`F.scaled_dot_product_attention`](https://pytorch.org/docs/master/generated/torch.nn.functional.scaled_dot_product_attention.html)), `"flash_attention_2"` (using [Dao-AILab/flash-attention](https://github.com/Dao-AILab/flash-attention)), or `"flash_attention_3"` (using [Dao-AILab/flash-attention/hopper](https://github.com/Dao-AILab/flash-attention/tree/main/hopper)). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual `"eager"` implementation.

	Accept HF kernel references in the form:
	<namespace>/<repo_name>[@<revision>][:<kernel_name>]

	- <namespace> and <repo_name> are any non-"/" and non-":" sequences.
	- "@<revision>" is optional (branch, tag, or commit-ish), e.g. "@main", "@v1.2.0", "@abc123".
	- ":<kernel_name>" is optional and selects a function inside the kernel repo.
	- Both options can appear together and in this order only: @revision first, then :kernel_name.
	- We intentionally allow a leading "<wrapper>\|" prefix (e.g., "flash\|...") because the code
	strips it before loading; '\|' is not excluded in the character classes here.

	Examples that match:
	"org/model"
	"org/model@main"
	"org/model:custom_kernel"
	"org/model@v1.2.3:custom_kernel"

	</paramsdesc><paramsdesc1title>Parameters for big model inference</paramsdesc1title><paramsdesc1>

	- dtype (`str` or `torch.dtype`, optional) --
	Override the default `torch_dtype` and load the model under a specific `dtype`. The different options
	are:

	1. `torch.float16` or `torch.bfloat16` or `torch.float`: load in a specified
	`dtype`, ignoring the model's `config.dtype` if one exists. If not specified
	- the model will get loaded in `torch.float` (fp32).

	2. `"auto"` - A `dtype` or `torch_dtype` entry in the `config.json` file of the model will be
	attempted to be used. If this entry isn't found then next check the `dtype` of the first weight in
	the checkpoint that's of a floating point type and use that as `dtype`. This will load the model
	using the `dtype` it was saved in at the end of the training. It can't be used as an indicator of how
	the model was trained. Since it could be trained in one of half precision dtypes, but saved in fp32.

	3. A string that is a valid `torch.dtype`. E.g. "float32" loads the model in `torch.float32`, "float16" loads in `torch.float16` etc.

	<Tip>

	For some models the `dtype` they were trained in is unknown - you may try to check the model's paper or
	reach out to the authors and ask them to add this information to the model's card and to insert the
	`dtype` or `torch_dtype` entry in `config.json` on the hub.

	</Tip>

	- device_map (`str` or `dict[str, Union[int, str, torch.device]]` or `int` or `torch.device`, optional) --
	A map that specifies where each submodule should go. It doesn't need to be refined to each
	parameter/buffer name, once a given module name is inside, every submodule of it will be sent to the
	same device. If we only pass the device (e.g., `"cpu"`, `"cuda:1"`, `"mps"`, or a GPU ordinal rank
	like `1`) on which the model will be allocated, the device map will map the entire model to this
	device. Passing `device_map = 0` means put the whole model on GPU 0.

	To have Accelerate compute the most optimized `device_map` automatically, set `device_map="auto"`. For
	more information about each option see [designing a device
	map](https://hf.co/docs/accelerate/main/en/usage_guides/big_modeling#designing-a-device-map).
	- max_memory (`Dict`, optional) --
	A dictionary device identifier to maximum memory if using `device_map`. Will default to the maximum memory available for each
	GPU and the available CPU RAM if unset.
	- tp_plan (`Optional[Union[dict, str]]`, optional) --
	A torch tensor parallel plan, see [here](https://pytorch.org/tutorials/intermediate/TP_tutorial.html). Use `tp_plan="auto"` to
	use the predefined plan based on the model. If it's a dict, then it should match between module names and desired layout.
	Note that if you use it, you should launch your script accordingly with `torchrun [args] script.py`. This will be much
	faster than using a `device_map`, but has limitations.
	- tp_size (`str`, optional) --
	A torch tensor parallel degree. If not provided would default to world size.
	- device_mesh (`torch.distributed.DeviceMesh`, optional) --
	A torch device mesh. If not provided would default to world size. Used only for tensor parallel for now.
	If provided, it has to contain dimension named `"tp"` in case it's > 1 dimensional, this dimension will be used for tensor parallelism
	- offload_folder (`str` or `os.PathLike`, optional) --
	If the `device_map` contains any value `"disk"`, the folder where we will offload weights.
	- offload_buffers (`bool`, optional) --
	Whether or not to offload the buffers with the model parameters.
	- quantization_config (`Union[QuantizationConfigMixin,Dict]`, optional) --
	A dictionary of configuration parameters or a QuantizationConfigMixin object for quantization (e.g
	bitsandbytes, gptq).
	- subfolder (`str`, optional, defaults to `""`) --
	In case the relevant files are located inside a subfolder of the model repo on huggingface.co, you can
	specify the folder name here.
	- variant (`str`, optional) --
	If specified load weights from `variant` filename, e.g. pytorch_model.<variant>.bin.
	- use_safetensors (`bool`, optional, defaults to `None`) --
	Whether or not to use `safetensors` checkpoints. Defaults to `None`. If not specified and `safetensors`
	is not installed, it will be set to `False`.
	- weights_only (`bool`, optional, defaults to `True`) --
	Indicates whether unpickler should be restricted to loading only tensors, primitive types,
	dictionaries and any types added via torch.serialization.add_safe_globals().
	When set to False, we can load wrapper tensor subclass weights.
	- key_mapping (`dict[str, str], optional) --
	A potential mapping of the weight names if using a model on the Hub which is compatible to a Transformers
	architecture, but was not converted accordingly.
	- kwargs (remaining dictionary of keyword arguments, optional) --
	Can be used to update the configuration object (after it being loaded) and initiate the model (e.g.,
	`output_attentions=True`). Behaves differently depending on whether a `config` is provided or
	automatically loaded:

	- If a configuration is provided with `config`, `**kwargs` will be directly passed to the
	underlying model's `__init__` method (we assume all relevant updates to the configuration have
	already been done)
	- If a configuration is not provided, `kwargs` will be first passed to the configuration class
	initialization function ([from_pretrained()](/docs/transformers/pr_33892/en/main_classes/configuration#transformers.PreTrainedConfig.from_pretrained)). Each key of `kwargs` that
	corresponds to a configuration attribute will be used to override said attribute with the
	supplied `kwargs` value. Remaining keys that do not correspond to any configuration attribute
	will be passed to the underlying model's `__init__` function.</paramsdesc1><paramgroups>1</paramgroups></docstring>

	Instantiate a pretrained pytorch model from a pre-trained model configuration.

	The model is set in evaluation mode by default using `model.eval()` (Dropout modules are deactivated). To train
	the model, you should first set it back in training mode with `model.train()`.

	The warning Weights from XXX not initialized from pretrained model means that the weights of XXX do not come
	pretrained with the rest of the model. It is up to you to train those weights with a downstream fine-tuning
	task.

	The warning Weights from XXX not used in YYY means that the layer XXX is not used by YYY, therefore those
	weights are discarded.



	<Tip>

	Activate the special ["offline-mode"](https://huggingface.co/transformers/installation.html#offline-mode) to
	use this method in a firewalled environment.

	</Tip>

	<ExampleCodeBlock anchor="transformers.PreTrainedModel.from_pretrained.example">

	Examples:

	```python
	>>> from transformers import BertConfig, BertModel

	>>> # Download model and configuration from huggingface.co and cache.
	>>> model = BertModel.from_pretrained("google-bert/bert-base-uncased")
	>>> # Model was saved using save_pretrained('./test/saved_model/') (for example purposes, not runnable).
	>>> model = BertModel.from_pretrained("./test/saved_model/")
	>>> # Update configuration during loading.
	>>> model = BertModel.from_pretrained("google-bert/bert-base-uncased", output_attentions=True)
	>>> assert model.config.output_attentions == True
	```

	</ExampleCodeBlock>


	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>get_compiled_call</name><anchor>transformers.PreTrainedModel.get_compiled_call</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L5057</source><parameters>[{"name": "compile_config", "val": ": typing.Optional[transformers.generation.configuration_utils.CompileConfig]"}]</parameters></docstring>
	Return a `torch.compile`'d version of `self.__call__`. This is useful to dynamically choose between
	non-compiled/compiled `forward` during inference, especially to switch between prefill (where we don't
	want to use compiled version to avoid recomputing the graph with new shapes) and iterative decoding
	(where we want the speed-ups of compiled version with static shapes).

	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>get_decoder</name><anchor>transformers.PreTrainedModel.get_decoder</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L2584</source><parameters>[]</parameters></docstring>

	Best-effort lookup of the decoder module.

	Order of attempts (covers ~85 % of current usages):

	1. `self.decoder`
	2. `self.model` (many wrappers store the decoder here)
	3. `self.model.get_decoder()` (nested wrappers)
	4. fallback: raise for the few exotic models that need a bespoke rule


	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>get_memory_footprint</name><anchor>transformers.PreTrainedModel.get_memory_footprint</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L3889</source><parameters>[{"name": "return_buffers", "val": " = True"}]</parameters><paramsdesc>- return_buffers (`bool`, optional, defaults to `True`) --
	Whether to return the size of the buffer tensors in the computation of the memory footprint. Buffers
	are tensors that do not require gradients and not registered as parameters. E.g. mean and std in batch
	norm layers. Please see: https://discuss.pytorch.org/t/what-pytorch-means-by-buffers/120266/2</paramsdesc><paramgroups>0</paramgroups></docstring>

	Get the memory footprint of a model. This will return the memory footprint of the current model in bytes.
	Useful to benchmark the memory footprint of the current model and design some tests. Solution inspired from the
	PyTorch discussions: https://discuss.pytorch.org/t/gpu-memory-that-model-uses/56822/2




	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>get_parameter_or_buffer</name><anchor>transformers.PreTrainedModel.get_parameter_or_buffer</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L5189</source><parameters>[{"name": "target", "val": ": str"}]</parameters></docstring>

	Return the parameter or buffer given by `target` if it exists, otherwise throw an error. This combines
	`get_parameter()` and `get_buffer()` in a single handy function. If the target is an `_extra_state` attribute,
	it will return the extra state provided by the module. Note that it only work if `target` is a leaf of the model.


	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>gradient_checkpointing_disable</name><anchor>transformers.PreTrainedModel.gradient_checkpointing_disable</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L3421</source><parameters>[]</parameters></docstring>

	Deactivates gradient checkpointing for the current model.


	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>gradient_checkpointing_enable</name><anchor>transformers.PreTrainedModel.gradient_checkpointing_enable</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L3360</source><parameters>[{"name": "gradient_checkpointing_kwargs", "val": " = None"}]</parameters><paramsdesc>- gradient_checkpointing_kwargs (dict, optional) --
	Additional keyword arguments passed along to the `torch.utils.checkpoint.checkpoint` function.</paramsdesc><paramgroups>0</paramgroups></docstring>

	Activates gradient checkpointing for the current model.

	We pass the `__call__` method of the modules instead of `forward` because `__call__` attaches all the hooks of
	the module. https://discuss.pytorch.org/t/any-different-between-model-input-and-model-forward-input/3690/2




	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>init_weights</name><anchor>transformers.PreTrainedModel.init_weights</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L3347</source><parameters>[]</parameters></docstring>

	Maybe initializes weights. If using a custom `PreTrainedModel`, you need to implement any
	initialization logic in `_init_weights`.


	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>initialize_weights</name><anchor>transformers.PreTrainedModel.initialize_weights</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L2673</source><parameters>[]</parameters></docstring>

	This is equivalent to calling `self.apply(self._initialize_weights)`, but correctly handles composite models.
	This function dynamically dispatches the correct `init_weights` function to the modules as we advance in the
	module graph along the recursion. It can handle an arbitrary number of sub-models. Without it, every composite
	model would have to recurse a second time on all sub-models explicitly in the outer-most `_init_weights`, which
	is extremely error prone and inefficient.

	Note that the `torch.no_grad()` decorator is very important as well, as most of our `_init_weights` do not use
	`torch.nn.init` functions (which are all no_grad by default), but simply do in-place ops such as
	`module.weight.data.zero_()`.


	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>post_init</name><anchor>transformers.PreTrainedModel.post_init</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L1853</source><parameters>[]</parameters></docstring>

	A method executed at the end of each Transformer model initialization, to execute code that needs the model's
	modules properly initialized (such as weight initialization).

	This is also used when the user is running distributed code. We add hooks to the modules here, according to
	the model's tp_plan!


	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>register_for_auto_class</name><anchor>transformers.PreTrainedModel.register_for_auto_class</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L4920</source><parameters>[{"name": "auto_class", "val": " = 'AutoModel'"}]</parameters><paramsdesc>- auto_class (`str` or `type`, optional, defaults to `"AutoModel"`) --
	The auto class to register this new model with.</paramsdesc><paramgroups>0</paramgroups></docstring>

	Register this class with a given auto class. This should only be used for custom models as the ones in the
	library are already mapped with an auto class.






	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>resize_token_embeddings</name><anchor>transformers.PreTrainedModel.resize_token_embeddings</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L2882</source><parameters>[{"name": "new_num_tokens", "val": ": typing.Optional[int] = None"}, {"name": "pad_to_multiple_of", "val": ": typing.Optional[int] = None"}, {"name": "mean_resizing", "val": ": bool = True"}]</parameters><paramsdesc>- new_num_tokens (`int`, optional) --
	The new number of tokens in the embedding matrix. Increasing the size will add newly initialized
	vectors at the end. Reducing the size will remove vectors from the end. If not provided or `None`, just
	returns a pointer to the input tokens `torch.nn.Embedding` module of the model without doing anything.
	- pad_to_multiple_of (`int`, optional) --
	If set will pad the embedding matrix to a multiple of the provided value.If `new_num_tokens` is set to
	`None` will just pad the embedding to a multiple of `pad_to_multiple_of`.

	This is especially useful to enable the use of Tensor Cores on NVIDIA hardware with compute capability
	`>= 7.5` (Volta), or on TPUs which benefit from having sequence lengths be a multiple of 128. For more
	details about this, or help on choosing the correct value for resizing, refer to this guide:
	https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc
	- mean_resizing (`bool`) --
	Whether to initialize the added embeddings from a multivariate normal distribution that has old embeddings' mean and
	covariance or to initialize them with a normal distribution that has a mean of zero and std equals `config.initializer_range`.

	Setting `mean_resizing` to `True` is useful when increasing the size of the embeddings of causal language models,
	where the generated tokens' probabilities won't be affected by the added embeddings because initializing the new embeddings with the
	old embeddings' mean will reduce the kl-divergence between the next token probability before and after adding the new embeddings.
	Refer to this article for more information: https://nlp.stanford.edu/~johnhew/vocab-expansion.html</paramsdesc><paramgroups>0</paramgroups><rettype>`torch.nn.Embedding`</rettype><retdesc>Pointer to the input tokens Embeddings Module of the model.</retdesc></docstring>

	Resizes input token embeddings matrix of the model if `new_num_tokens != config.vocab_size`.

	Takes care of tying weights embeddings afterwards if the model class has a `tie_weights()` method.








	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>save_pretrained</name><anchor>transformers.PreTrainedModel.save_pretrained</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L3448</source><parameters>[{"name": "save_directory", "val": ": typing.Union[str, os.PathLike]"}, {"name": "is_main_process", "val": ": bool = True"}, {"name": "state_dict", "val": ": typing.Optional[dict] = None"}, {"name": "save_function", "val": ": Callable = <function save at 0x7fa464b22ef0>"}, {"name": "push_to_hub", "val": ": bool = False"}, {"name": "max_shard_size", "val": ": typing.Union[int, str] = '5GB'"}, {"name": "safe_serialization", "val": ": bool = True"}, {"name": "variant", "val": ": typing.Optional[str] = None"}, {"name": "token", "val": ": typing.Union[str, bool, NoneType] = None"}, {"name": "save_peft_format", "val": ": bool = True"}, {"name": "kwargs", "val": ""}]</parameters><paramsdesc>- save_directory** (`str` or `os.PathLike`) --
	Directory to which to save. Will be created if it doesn't exist.
	- is_main_process (`bool`, optional, defaults to `True`) --
	Whether the process calling this is the main process or not. Useful when in distributed training like
	TPUs and need to call this function on all processes. In this case, set `is_main_process=True` only on
	the main process to avoid race conditions.
	- state_dict (nested dictionary of `torch.Tensor`) --
	The state dictionary of the model to save. Will default to `self.state_dict()`, but can be used to only
	save parts of the model or if special precautions need to be taken when recovering the state dictionary
	of a model (like when using model parallelism).
	- save_function (`Callable`) --
	The function to use to save the state dictionary. Useful on distributed training like TPUs when one
	need to replace `torch.save` by another method.
	- push_to_hub (`bool`, optional, defaults to `False`) --
	Whether or not to push your model to the Hugging Face model hub after saving it. You can specify the
	repository you want to push to with `repo_id` (will default to the name of `save_directory` in your
	namespace).
	- max_shard_size (`int` or `str`, optional, defaults to `"5GB"`) --
	The maximum size for a checkpoint before being sharded. Checkpoints shard will then be each of size
	lower than this size. If expressed as a string, needs to be digits followed by a unit (like `"5MB"`).
	We default it to 5GB in order for models to be able to run easily on free-tier google colab instances
	without CPU OOM issues.

	<Tip warning={true}>

	If a single weight of the model is bigger than `max_shard_size`, it will be in its own checkpoint shard
	which will be bigger than `max_shard_size`.

	</Tip>

	- safe_serialization (`bool`, optional, defaults to `True`) --
	Whether to save the model using `safetensors` or the traditional PyTorch way (that uses `pickle`).
	- variant (`str`, optional) --
	If specified, weights are saved in the format pytorch_model.<variant>.bin.
	- token (`str` or `bool`, optional) --
	The token to use as HTTP bearer authorization for remote files. If `True`, or not specified, will use
	the token generated when running `hf auth login` (stored in `~/.huggingface`).
	- save_peft_format (`bool`, optional, defaults to `True`) --
	For backward compatibility with PEFT library, in case adapter weights are attached to the model, all
	keys of the state dict of adapters needs to be prepended with `base_model.model`. Advanced users can
	disable this behaviours by setting `save_peft_format` to `False`.
	- kwargs (`dict[str, Any]`, optional) --
	Additional key word arguments passed along to the [push_to_hub()](/docs/transformers/pr_33892/en/main_classes/model#transformers.utils.PushToHubMixin.push_to_hub) method.</paramsdesc><paramgroups>0</paramgroups></docstring>

	Save a model and its configuration file to a directory, so that it can be re-loaded using the
	[from_pretrained()](/docs/transformers/pr_33892/en/main_classes/model#transformers.PreTrainedModel.from_pretrained) class method.




	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>set_attn_implementation</name><anchor>transformers.PreTrainedModel.set_attn_implementation</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L2468</source><parameters>[{"name": "attn_implementation", "val": ": typing.Union[str, dict]"}]</parameters><paramsdesc>- attn_implementation (`str` or `dict`) --
	The attention implementation to set for this model. It can be either a `str`, in which case it will be
	dispatched to all submodels if relevant, or a `dict` where keys are the sub_configs name, in which case each
	submodel will dispatch the corresponding value.</paramsdesc><paramgroups>0</paramgroups></docstring>

	Set the requested `attn_implementation` for this model.




	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>set_decoder</name><anchor>transformers.PreTrainedModel.set_decoder</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L2609</source><parameters>[{"name": "decoder", "val": ""}]</parameters></docstring>

	Symmetric setter. Mirrors the lookup logic used in `get_decoder`.


	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>tie_embeddings_and_encoder_decoder</name><anchor>transformers.PreTrainedModel.tie_embeddings_and_encoder_decoder</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L2704</source><parameters>[]</parameters></docstring>

	If set in the config, tie the weights between the input embeddings and the output embeddings,
	and the encoder and decoder.


	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>tie_weights</name><anchor>transformers.PreTrainedModel.tie_weights</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L2725</source><parameters>[]</parameters></docstring>

	Recursively (for all submodels) tie all the weights of the model.


	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>upcast_modules_in_fp32</name><anchor>transformers.PreTrainedModel.upcast_modules_in_fp32</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L5222</source><parameters>[{"name": "hf_quantizer", "val": ": transformers.quantizers.base.HfQuantizer \| None"}, {"name": "dtype", "val": ": dtype"}]</parameters></docstring>

	Upcast modules defined in `_keep_in_fp32_modules` and `_keep_in_fp32_modules_strict` in fp32, if
	`dtype` is different than fp32.


	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>warn_if_padding_and_no_attention_mask</name><anchor>transformers.PreTrainedModel.warn_if_padding_and_no_attention_mask</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L4942</source><parameters>[{"name": "input_ids", "val": ""}, {"name": "attention_mask", "val": ""}]</parameters></docstring>

	Shows a one-time warning if the input_ids appear to contain padding and no attention mask was given.


	</div></div>

	Custom models should also include a `_supports_assign_param_buffer`, which determines if superfast init can apply
	on the particular model. Signs that your model needs this are if `test_save_and_load_from_pretrained` fails. If so,
	set this to `False`.

	## ModuleUtilsMixin[[transformers.modeling_utils.ModuleUtilsMixin]]

	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>class transformers.modeling_utils.ModuleUtilsMixin</name><anchor>transformers.modeling_utils.ModuleUtilsMixin</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L1280</source><parameters>[]</parameters></docstring>

	A few utilities for `torch.nn.Modules`, to be used as a mixin.



	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>add_memory_hooks</name><anchor>transformers.modeling_utils.ModuleUtilsMixin.add_memory_hooks</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L1311</source><parameters>[]</parameters></docstring>

	Add a memory hook before and after each sub-module forward pass to record increase in memory consumption.

	Increase in memory consumption is stored in a `mem_rss_diff` attribute for each module and can be reset to zero
	with `model.reset_memory_hooks_state()`.


	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>estimate_tokens</name><anchor>transformers.modeling_utils.ModuleUtilsMixin.estimate_tokens</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L1506</source><parameters>[{"name": "input_dict", "val": ": dict"}]</parameters><paramsdesc>- inputs (`dict`) -- The model inputs.</paramsdesc><paramgroups>0</paramgroups><rettype>`int`</rettype><retdesc>The total number of tokens.</retdesc></docstring>

	Helper function to estimate the total number of tokens from the model inputs.








	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>floating_point_ops</name><anchor>transformers.modeling_utils.ModuleUtilsMixin.floating_point_ops</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L1527</source><parameters>[{"name": "input_dict", "val": ": dict"}, {"name": "exclude_embeddings", "val": ": bool = True"}]</parameters><paramsdesc>- batch_size (`int`) --
	The batch size for the forward pass.

	- sequence_length (`int`) --
	The number of tokens in each line of the batch.

	- exclude_embeddings (`bool`, optional, defaults to `True`) --
	Whether or not to count embedding and softmax operations.</paramsdesc><paramgroups>0</paramgroups><rettype>`int`</rettype><retdesc>The number of floating-point operations.</retdesc></docstring>

	Get number of (optionally, non-embeddings) floating-point operations for the forward and backward passes of a
	batch with this transformer model. Default approximation neglects the quadratic dependency on the number of
	tokens (valid if `12 * d_model << sequence_length`) as laid out in [this
	paper](https://huggingface.co/papers/2001.08361) section 2.1. Should be overridden for transformers with parameter
	re-use e.g. Albert or Universal Transformers, or if doing long-range modeling with very high sequence lengths.








	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>get_extended_attention_mask</name><anchor>transformers.modeling_utils.ModuleUtilsMixin.get_extended_attention_mask</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L1396</source><parameters>[{"name": "attention_mask", "val": ": Tensor"}, {"name": "input_shape", "val": ": tuple"}, {"name": "device", "val": ": typing.Optional[torch.device] = None"}, {"name": "dtype", "val": ": typing.Optional[torch.dtype] = None"}]</parameters><paramsdesc>- attention_mask (`torch.Tensor`) --
	Mask with ones indicating tokens to attend to, zeros for tokens to ignore.
	- input_shape (`tuple[int]`) --
	The shape of the input to the model.</paramsdesc><paramgroups>0</paramgroups><retdesc>`torch.Tensor` The extended attention mask, with a the same dtype as `attention_mask.dtype`.</retdesc></docstring>

	Makes broadcastable attention and causal masks so that future and masked tokens are ignored.






	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>invert_attention_mask</name><anchor>transformers.modeling_utils.ModuleUtilsMixin.invert_attention_mask</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L1347</source><parameters>[{"name": "encoder_attention_mask", "val": ": Tensor"}]</parameters><paramsdesc>- encoder_attention_mask (`torch.Tensor`) -- An attention mask.</paramsdesc><paramgroups>0</paramgroups><rettype>`torch.Tensor`</rettype><retdesc>The inverted attention mask.</retdesc></docstring>

	Invert an attention mask (e.g., switches 0. and 1.).








	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>num_parameters</name><anchor>transformers.modeling_utils.ModuleUtilsMixin.num_parameters</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L1452</source><parameters>[{"name": "only_trainable", "val": ": bool = False"}, {"name": "exclude_embeddings", "val": ": bool = False"}]</parameters><paramsdesc>- only_trainable (`bool`, optional, defaults to `False`) --
	Whether or not to return only the number of trainable parameters

	- exclude_embeddings (`bool`, optional, defaults to `False`) --
	Whether or not to return only the number of non-embeddings parameters</paramsdesc><paramgroups>0</paramgroups><rettype>`int`</rettype><retdesc>The number of parameters.</retdesc></docstring>

	Get number of (optionally, trainable or non-embeddings) parameters in the module.








	</div>
	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>reset_memory_hooks_state</name><anchor>transformers.modeling_utils.ModuleUtilsMixin.reset_memory_hooks_state</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L1323</source><parameters>[]</parameters></docstring>

	Reset the `mem_rss_diff` attribute of each module (see [add_memory_hooks()](/docs/transformers/pr_33892/en/main_classes/model#transformers.modeling_utils.ModuleUtilsMixin.add_memory_hooks)).


	</div></div>

	## Pushing to the Hub[[transformers.utils.PushToHubMixin]]

	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>class transformers.utils.PushToHubMixin</name><anchor>transformers.utils.PushToHubMixin</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/utils/hub.py#L696</source><parameters>[]</parameters></docstring>

	A Mixin containing the functionality to push a model or tokenizer to the hub.



	<div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8">


	<docstring><name>push_to_hub</name><anchor>transformers.utils.PushToHubMixin.push_to_hub</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/utils/hub.py#L821</source><parameters>[{"name": "repo_id", "val": ": str"}, {"name": "use_temp_dir", "val": ": bool \| None = None"}, {"name": "commit_message", "val": ": str \| None = None"}, {"name": "private", "val": ": bool \| None = None"}, {"name": "token", "val": ": bool \| str \| None = None"}, {"name": "max_shard_size", "val": ": int \| str \| None = '5GB'"}, {"name": "create_pr", "val": ": bool = False"}, {"name": "safe_serialization", "val": ": bool = True"}, {"name": "revision", "val": ": str \| None = None"}, {"name": "commit_description", "val": ": str \| None = None"}, {"name": "tags", "val": ": list[str] \| None = None"}, {"name": "deprecated_kwargs", "val": ""}]</parameters><paramsdesc>- repo_id** (`str`) --
	The name of the repository you want to push your {object} to. It should contain your organization name
	when pushing to a given organization.
	- use_temp_dir (`bool`, optional) --
	Whether or not to use a temporary directory to store the files saved before they are pushed to the Hub.
	Will default to `True` if there is no directory named like `repo_id`, `False` otherwise.
	- commit_message (`str`, optional) --
	Message to commit while pushing. Will default to `"Upload {object}"`.
	- private (`bool`, optional) --
	Whether to make the repo private. If `None` (default), the repo will be public unless the organization's default is private. This value is ignored if the repo already exists.
	- token (`bool` or `str`, optional) --
	The token to use as HTTP bearer authorization for remote files. If `True`, will use the token generated
	when running `hf auth login` (stored in `~/.huggingface`). Will default to `True` if `repo_url`
	is not specified.
	- max_shard_size (`int` or `str`, optional, defaults to `"5GB"`) --
	Only applicable for models. The maximum size for a checkpoint before being sharded. Checkpoints shard
	will then be each of size lower than this size. If expressed as a string, needs to be digits followed
	by a unit (like `"5MB"`). We default it to `"5GB"` so that users can easily load models on free-tier
	Google Colab instances without any CPU OOM issues.
	- create_pr (`bool`, optional, defaults to `False`) --
	Whether or not to create a PR with the uploaded files or directly commit.
	- safe_serialization (`bool`, optional, defaults to `True`) --
	Whether or not to convert the model weights in safetensors format for safer serialization.
	- revision (`str`, optional) --
	Branch to push the uploaded files to.
	- commit_description (`str`, optional) --
	The description of the commit that will be created
	- tags (`list[str]`, optional) --
	List of tags to push on the Hub.</paramsdesc><paramgroups>0</paramgroups></docstring>

	Upload the {object_files} to the 🤗 Model Hub.



	<ExampleCodeBlock anchor="transformers.utils.PushToHubMixin.push_to_hub.example">

	Examples:

	```python
	from transformers import {object_class}

	{object} = {object_class}.from_pretrained("google-bert/bert-base-cased")

	# Push the {object} to your namespace with the name "my-finetuned-bert".
	{object}.push_to_hub("my-finetuned-bert")

	# Push the {object} to an organization with the name "my-finetuned-bert".
	{object}.push_to_hub("huggingface/my-finetuned-bert")
	```

	</ExampleCodeBlock>


	</div></div>

	<EditOnGithub source="https://github.com/huggingface/transformers/blob/main/docs/source/en/main_classes/model.md" />

Xet Storage Details

Size:: 52.9 kB
Xet hash:: 6223f0db4b5e514c1ab6509da179f4e53af970020724c288cc8056771f523da2

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.