Buckets:
| # Models | |
| The base class [PreTrainedModel](/docs/transformers/pr_33892/en/main_classes/model#transformers.PreTrainedModel) implements the common methods for loading/saving a model either from a local | |
| file or directory, or from a pretrained model configuration provided by the library (downloaded from HuggingFace's Hub). | |
| [PreTrainedModel](/docs/transformers/pr_33892/en/main_classes/model#transformers.PreTrainedModel) also implements a few methods which are common among all the models to: | |
| - resize the input token embeddings when new tokens are added to the vocabulary | |
| The other methods that are common to each model are defined in [ModuleUtilsMixin](/docs/transformers/pr_33892/en/main_classes/model#transformers.modeling_utils.ModuleUtilsMixin) and [GenerationMixin](/docs/transformers/pr_33892/en/main_classes/text_generation#transformers.GenerationMixin). | |
| ## PreTrainedModel[[transformers.PreTrainedModel]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class transformers.PreTrainedModel</name><anchor>transformers.PreTrainedModel</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L1644</source><parameters>[{"name": "config", "val": ": PreTrainedConfig"}, {"name": "*inputs", "val": ""}, {"name": "**kwargs", "val": ""}]</parameters></docstring> | |
| Base class for all models. | |
| [PreTrainedModel](/docs/transformers/pr_33892/en/main_classes/model#transformers.PreTrainedModel) takes care of storing the configuration of the models and handles methods for loading, | |
| downloading and saving models as well as a few methods common to all models to: | |
| - resize the input embeddings | |
| Class attributes (overridden by derived classes): | |
| - **config_class** ([PreTrainedConfig](/docs/transformers/pr_33892/en/main_classes/configuration#transformers.PreTrainedConfig)) -- A subclass of [PreTrainedConfig](/docs/transformers/pr_33892/en/main_classes/configuration#transformers.PreTrainedConfig) to use as configuration class | |
| for this model architecture. | |
| - **base_model_prefix** (`str`) -- A string indicating the attribute associated to the base model in derived | |
| classes of the same architecture adding modules on top of the base model. | |
| - **main_input_name** (`str`) -- The name of the principal input to the model (often `input_ids` for NLP | |
| models, `pixel_values` for vision models and `input_values` for speech models). | |
| - **can_record_outputs** (dict): | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>push_to_hub</name><anchor>transformers.PreTrainedModel.push_to_hub</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/utils/hub.py#L821</source><parameters>[{"name": "repo_id", "val": ": str"}, {"name": "use_temp_dir", "val": ": bool | None = None"}, {"name": "commit_message", "val": ": str | None = None"}, {"name": "private", "val": ": bool | None = None"}, {"name": "token", "val": ": bool | str | None = None"}, {"name": "max_shard_size", "val": ": int | str | None = '5GB'"}, {"name": "create_pr", "val": ": bool = False"}, {"name": "safe_serialization", "val": ": bool = True"}, {"name": "revision", "val": ": str | None = None"}, {"name": "commit_description", "val": ": str | None = None"}, {"name": "tags", "val": ": list[str] | None = None"}, {"name": "**deprecated_kwargs", "val": ""}]</parameters><paramsdesc>- **repo_id** (`str`) -- | |
| The name of the repository you want to push your model to. It should contain your organization name | |
| when pushing to a given organization. | |
| - **use_temp_dir** (`bool`, *optional*) -- | |
| Whether or not to use a temporary directory to store the files saved before they are pushed to the Hub. | |
| Will default to `True` if there is no directory named like `repo_id`, `False` otherwise. | |
| - **commit_message** (`str`, *optional*) -- | |
| Message to commit while pushing. Will default to `"Upload model"`. | |
| - **private** (`bool`, *optional*) -- | |
| Whether to make the repo private. If `None` (default), the repo will be public unless the organization's default is private. This value is ignored if the repo already exists. | |
| - **token** (`bool` or `str`, *optional*) -- | |
| The token to use as HTTP bearer authorization for remote files. If `True`, will use the token generated | |
| when running `hf auth login` (stored in `~/.huggingface`). Will default to `True` if `repo_url` | |
| is not specified. | |
| - **max_shard_size** (`int` or `str`, *optional*, defaults to `"5GB"`) -- | |
| Only applicable for models. The maximum size for a checkpoint before being sharded. Checkpoints shard | |
| will then be each of size lower than this size. If expressed as a string, needs to be digits followed | |
| by a unit (like `"5MB"`). We default it to `"5GB"` so that users can easily load models on free-tier | |
| Google Colab instances without any CPU OOM issues. | |
| - **create_pr** (`bool`, *optional*, defaults to `False`) -- | |
| Whether or not to create a PR with the uploaded files or directly commit. | |
| - **safe_serialization** (`bool`, *optional*, defaults to `True`) -- | |
| Whether or not to convert the model weights in safetensors format for safer serialization. | |
| - **revision** (`str`, *optional*) -- | |
| Branch to push the uploaded files to. | |
| - **commit_description** (`str`, *optional*) -- | |
| The description of the commit that will be created | |
| - **tags** (`list[str]`, *optional*) -- | |
| List of tags to push on the Hub.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Upload the model file to the 🤗 Model Hub. | |
| <ExampleCodeBlock anchor="transformers.PreTrainedModel.push_to_hub.example"> | |
| Examples: | |
| ```python | |
| from transformers import AutoModel | |
| model = AutoModel.from_pretrained("google-bert/bert-base-cased") | |
| # Push the model to your namespace with the name "my-finetuned-bert". | |
| model.push_to_hub("my-finetuned-bert") | |
| # Push the model to an organization with the name "my-finetuned-bert". | |
| model.push_to_hub("huggingface/my-finetuned-bert") | |
| ``` | |
| </ExampleCodeBlock> | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>add_model_tags</name><anchor>transformers.PreTrainedModel.add_model_tags</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L1975</source><parameters>[{"name": "tags", "val": ": typing.Union[list[str], str]"}]</parameters><paramsdesc>- **tags** (`Union[list[str], str]`) -- | |
| The desired tags to inject in the model</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Add custom tags into the model that gets pushed to the Hugging Face Hub. Will | |
| not overwrite existing tags in the model. | |
| <ExampleCodeBlock anchor="transformers.PreTrainedModel.add_model_tags.example"> | |
| Examples: | |
| ```python | |
| from transformers import AutoModel | |
| model = AutoModel.from_pretrained("google-bert/bert-base-cased") | |
| model.add_model_tags(["custom", "custom-bert"]) | |
| # Push the model to your namespace with the name "my-custom-bert". | |
| model.push_to_hub("my-custom-bert") | |
| ``` | |
| </ExampleCodeBlock> | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>can_generate</name><anchor>transformers.PreTrainedModel.can_generate</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L2091</source><parameters>[]</parameters><rettype>`bool`</rettype><retdesc>Whether this model can generate sequences with `.generate()`.</retdesc></docstring> | |
| Returns whether this model can generate sequences with `.generate()` from the `GenerationMixin`. | |
| Under the hood, on classes where this function returns True, some generation-specific changes are triggered: | |
| for instance, the model instance will have a populated `generation_config` attribute. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>dequantize</name><anchor>transformers.PreTrainedModel.dequantize</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L1957</source><parameters>[]</parameters></docstring> | |
| Potentially dequantize the model in case it has been quantized by a quantization method that support | |
| dequantization. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>disable_input_require_grads</name><anchor>transformers.PreTrainedModel.disable_input_require_grads</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L2578</source><parameters>[]</parameters></docstring> | |
| Removes the `_require_grads_hook`. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>enable_input_require_grads</name><anchor>transformers.PreTrainedModel.enable_input_require_grads</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L2567</source><parameters>[]</parameters></docstring> | |
| Enables the gradients for the input embeddings. This is useful for fine-tuning adapter weights while keeping | |
| the model weights fixed. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>from_pretrained</name><anchor>transformers.PreTrainedModel.from_pretrained</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L4057</source><parameters>[{"name": "pretrained_model_name_or_path", "val": ": typing.Union[str, os.PathLike, NoneType]"}, {"name": "*model_args", "val": ""}, {"name": "config", "val": ": typing.Union[transformers.configuration_utils.PreTrainedConfig, str, os.PathLike, NoneType] = None"}, {"name": "cache_dir", "val": ": typing.Union[str, os.PathLike, NoneType] = None"}, {"name": "ignore_mismatched_sizes", "val": ": bool = False"}, {"name": "force_download", "val": ": bool = False"}, {"name": "local_files_only", "val": ": bool = False"}, {"name": "token", "val": ": typing.Union[str, bool, NoneType] = None"}, {"name": "revision", "val": ": str = 'main'"}, {"name": "use_safetensors", "val": ": typing.Optional[bool] = None"}, {"name": "weights_only", "val": ": bool = True"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **pretrained_model_name_or_path** (`str` or `os.PathLike`, *optional*) -- | |
| Can be either: | |
| - A string, the *model id* of a pretrained model hosted inside a model repo on huggingface.co. | |
| - A path to a *directory* containing model weights saved using | |
| [save_pretrained()](/docs/transformers/pr_33892/en/main_classes/model#transformers.PreTrainedModel.save_pretrained), e.g., `./my_model_directory/`. | |
| - `None` if you are both providing the configuration and state dictionary (resp. with keyword | |
| arguments `config` and `state_dict`). | |
| - **model_args** (sequence of positional arguments, *optional*) -- | |
| All remaining positional arguments will be passed to the underlying model's `__init__` method. | |
| - **config** (`Union[PreTrainedConfig, str, os.PathLike]`, *optional*) -- | |
| Can be either: | |
| - an instance of a class derived from [PreTrainedConfig](/docs/transformers/pr_33892/en/main_classes/configuration#transformers.PreTrainedConfig), | |
| - a string or path valid as input to [from_pretrained()](/docs/transformers/pr_33892/en/main_classes/configuration#transformers.PreTrainedConfig.from_pretrained). | |
| Configuration for the model to use instead of an automatically loaded configuration. Configuration can | |
| be automatically loaded when: | |
| - The model is a model provided by the library (loaded with the *model id* string of a pretrained | |
| model). | |
| - The model was saved using [save_pretrained()](/docs/transformers/pr_33892/en/main_classes/model#transformers.PreTrainedModel.save_pretrained) and is reloaded by supplying the | |
| save directory. | |
| - The model is loaded by supplying a local directory as `pretrained_model_name_or_path` and a | |
| configuration JSON file named *config.json* is found in the directory. | |
| - **state_dict** (`dict[str, torch.Tensor]`, *optional*) -- | |
| A state dictionary to use instead of a state dictionary loaded from saved weights file. | |
| This option can be used if you want to create a model from a pretrained configuration but load your own | |
| weights. In this case though, you should check if using [save_pretrained()](/docs/transformers/pr_33892/en/main_classes/model#transformers.PreTrainedModel.save_pretrained) and | |
| [from_pretrained()](/docs/transformers/pr_33892/en/main_classes/model#transformers.PreTrainedModel.from_pretrained) is not a simpler option. | |
| - **cache_dir** (`Union[str, os.PathLike]`, *optional*) -- | |
| Path to a directory in which a downloaded pretrained model configuration should be cached if the | |
| standard cache should not be used. | |
| - **ignore_mismatched_sizes** (`bool`, *optional*, defaults to `False`) -- | |
| Whether or not to raise an error if some of the weights from the checkpoint do not have the same size | |
| as the weights of the model (if for instance, you are instantiating a model with 10 labels from a | |
| checkpoint with 3 labels). | |
| - **force_download** (`bool`, *optional*, defaults to `False`) -- | |
| Whether or not to force the (re-)download of the model weights and configuration files, overriding the | |
| cached versions if they exist. | |
| - **proxies** (`dict[str, str]`, *optional*) -- | |
| A dictionary of proxy servers to use by protocol or endpoint, e.g., `{'http': 'foo.bar:3128', | |
| 'http://hostname': 'foo.bar:4012'}`. The proxies are used on each request. | |
| - **output_loading_info(`bool`,** *optional*, defaults to `False`) -- | |
| Whether ot not to also return a dictionary containing missing keys, unexpected keys and error messages. | |
| - **local_files_only(`bool`,** *optional*, defaults to `False`) -- | |
| Whether or not to only look at local files (i.e., do not try to download the model). | |
| - **token** (`str` or `bool`, *optional*) -- | |
| The token to use as HTTP bearer authorization for remote files. If `True`, or not specified, will use | |
| the token generated when running `hf auth login` (stored in `~/.huggingface`). | |
| - **revision** (`str`, *optional*, defaults to `"main"`) -- | |
| The specific model version to use. It can be a branch name, a tag name, or a commit id, since we use a | |
| git-based system for storing models and other artifacts on huggingface.co, so `revision` can be any | |
| identifier allowed by git. | |
| <Tip> | |
| To test a pull request you made on the Hub, you can pass `revision="refs/pr/<pr_number>"`. | |
| </Tip> | |
| - **attn_implementation** (`str`, *optional*) -- | |
| The attention implementation to use in the model (if relevant). Can be any of `"eager"` (manual implementation of the attention), `"sdpa"` (using [`F.scaled_dot_product_attention`](https://pytorch.org/docs/master/generated/torch.nn.functional.scaled_dot_product_attention.html)), `"flash_attention_2"` (using [Dao-AILab/flash-attention](https://github.com/Dao-AILab/flash-attention)), or `"flash_attention_3"` (using [Dao-AILab/flash-attention/hopper](https://github.com/Dao-AILab/flash-attention/tree/main/hopper)). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual `"eager"` implementation. | |
| Accept HF kernel references in the form: | |
| <namespace>/<repo_name>[@<revision>][:<kernel_name>] | |
| - <namespace> and <repo_name> are any non-"/" and non-":" sequences. | |
| - "@<revision>" is optional (branch, tag, or commit-ish), e.g. "@main", "@v1.2.0", "@abc123". | |
| - ":<kernel_name>" is optional and selects a function inside the kernel repo. | |
| - Both options can appear together and in this order only: @revision first, then :kernel_name. | |
| - We intentionally allow a leading "<wrapper>|" prefix (e.g., "flash|...") because the code | |
| strips it before loading; '|' is not excluded in the character classes here. | |
| Examples that match: | |
| "org/model" | |
| "org/model@main" | |
| "org/model:custom_kernel" | |
| "org/model@v1.2.3:custom_kernel" | |
| </paramsdesc><paramsdesc1title>Parameters for big model inference</paramsdesc1title><paramsdesc1> | |
| - **dtype** (`str` or `torch.dtype`, *optional*) -- | |
| Override the default `torch_dtype` and load the model under a specific `dtype`. The different options | |
| are: | |
| 1. `torch.float16` or `torch.bfloat16` or `torch.float`: load in a specified | |
| `dtype`, ignoring the model's `config.dtype` if one exists. If not specified | |
| - the model will get loaded in `torch.float` (fp32). | |
| 2. `"auto"` - A `dtype` or `torch_dtype` entry in the `config.json` file of the model will be | |
| attempted to be used. If this entry isn't found then next check the `dtype` of the first weight in | |
| the checkpoint that's of a floating point type and use that as `dtype`. This will load the model | |
| using the `dtype` it was saved in at the end of the training. It can't be used as an indicator of how | |
| the model was trained. Since it could be trained in one of half precision dtypes, but saved in fp32. | |
| 3. A string that is a valid `torch.dtype`. E.g. "float32" loads the model in `torch.float32`, "float16" loads in `torch.float16` etc. | |
| <Tip> | |
| For some models the `dtype` they were trained in is unknown - you may try to check the model's paper or | |
| reach out to the authors and ask them to add this information to the model's card and to insert the | |
| `dtype` or `torch_dtype` entry in `config.json` on the hub. | |
| </Tip> | |
| - **device_map** (`str` or `dict[str, Union[int, str, torch.device]]` or `int` or `torch.device`, *optional*) -- | |
| A map that specifies where each submodule should go. It doesn't need to be refined to each | |
| parameter/buffer name, once a given module name is inside, every submodule of it will be sent to the | |
| same device. If we only pass the device (*e.g.*, `"cpu"`, `"cuda:1"`, `"mps"`, or a GPU ordinal rank | |
| like `1`) on which the model will be allocated, the device map will map the entire model to this | |
| device. Passing `device_map = 0` means put the whole model on GPU 0. | |
| To have Accelerate compute the most optimized `device_map` automatically, set `device_map="auto"`. For | |
| more information about each option see [designing a device | |
| map](https://hf.co/docs/accelerate/main/en/usage_guides/big_modeling#designing-a-device-map). | |
| - **max_memory** (`Dict`, *optional*) -- | |
| A dictionary device identifier to maximum memory if using `device_map`. Will default to the maximum memory available for each | |
| GPU and the available CPU RAM if unset. | |
| - **tp_plan** (`Optional[Union[dict, str]]`, *optional*) -- | |
| A torch tensor parallel plan, see [here](https://pytorch.org/tutorials/intermediate/TP_tutorial.html). Use `tp_plan="auto"` to | |
| use the predefined plan based on the model. If it's a dict, then it should match between module names and desired layout. | |
| Note that if you use it, you should launch your script accordingly with `torchrun [args] script.py`. This will be much | |
| faster than using a `device_map`, but has limitations. | |
| - **tp_size** (`str`, *optional*) -- | |
| A torch tensor parallel degree. If not provided would default to world size. | |
| - **device_mesh** (`torch.distributed.DeviceMesh`, *optional*) -- | |
| A torch device mesh. If not provided would default to world size. Used only for tensor parallel for now. | |
| If provided, it has to contain dimension named `"tp"` in case it's > 1 dimensional, this dimension will be used for tensor parallelism | |
| - **offload_folder** (`str` or `os.PathLike`, *optional*) -- | |
| If the `device_map` contains any value `"disk"`, the folder where we will offload weights. | |
| - **offload_buffers** (`bool`, *optional*) -- | |
| Whether or not to offload the buffers with the model parameters. | |
| - **quantization_config** (`Union[QuantizationConfigMixin,Dict]`, *optional*) -- | |
| A dictionary of configuration parameters or a QuantizationConfigMixin object for quantization (e.g | |
| bitsandbytes, gptq). | |
| - **subfolder** (`str`, *optional*, defaults to `""`) -- | |
| In case the relevant files are located inside a subfolder of the model repo on huggingface.co, you can | |
| specify the folder name here. | |
| - **variant** (`str`, *optional*) -- | |
| If specified load weights from `variant` filename, *e.g.* pytorch_model.<variant>.bin. | |
| - **use_safetensors** (`bool`, *optional*, defaults to `None`) -- | |
| Whether or not to use `safetensors` checkpoints. Defaults to `None`. If not specified and `safetensors` | |
| is not installed, it will be set to `False`. | |
| - **weights_only** (`bool`, *optional*, defaults to `True`) -- | |
| Indicates whether unpickler should be restricted to loading only tensors, primitive types, | |
| dictionaries and any types added via torch.serialization.add_safe_globals(). | |
| When set to False, we can load wrapper tensor subclass weights. | |
| - **key_mapping** (`dict[str, str], *optional*) -- | |
| A potential mapping of the weight names if using a model on the Hub which is compatible to a Transformers | |
| architecture, but was not converted accordingly. | |
| - **kwargs** (remaining dictionary of keyword arguments, *optional*) -- | |
| Can be used to update the configuration object (after it being loaded) and initiate the model (e.g., | |
| `output_attentions=True`). Behaves differently depending on whether a `config` is provided or | |
| automatically loaded: | |
| - If a configuration is provided with `config`, `**kwargs` will be directly passed to the | |
| underlying model's `__init__` method (we assume all relevant updates to the configuration have | |
| already been done) | |
| - If a configuration is not provided, `kwargs` will be first passed to the configuration class | |
| initialization function ([from_pretrained()](/docs/transformers/pr_33892/en/main_classes/configuration#transformers.PreTrainedConfig.from_pretrained)). Each key of `kwargs` that | |
| corresponds to a configuration attribute will be used to override said attribute with the | |
| supplied `kwargs` value. Remaining keys that do not correspond to any configuration attribute | |
| will be passed to the underlying model's `__init__` function.</paramsdesc1><paramgroups>1</paramgroups></docstring> | |
| Instantiate a pretrained pytorch model from a pre-trained model configuration. | |
| The model is set in evaluation mode by default using `model.eval()` (Dropout modules are deactivated). To train | |
| the model, you should first set it back in training mode with `model.train()`. | |
| The warning *Weights from XXX not initialized from pretrained model* means that the weights of XXX do not come | |
| pretrained with the rest of the model. It is up to you to train those weights with a downstream fine-tuning | |
| task. | |
| The warning *Weights from XXX not used in YYY* means that the layer XXX is not used by YYY, therefore those | |
| weights are discarded. | |
| <Tip> | |
| Activate the special ["offline-mode"](https://huggingface.co/transformers/installation.html#offline-mode) to | |
| use this method in a firewalled environment. | |
| </Tip> | |
| <ExampleCodeBlock anchor="transformers.PreTrainedModel.from_pretrained.example"> | |
| Examples: | |
| ```python | |
| >>> from transformers import BertConfig, BertModel | |
| >>> # Download model and configuration from huggingface.co and cache. | |
| >>> model = BertModel.from_pretrained("google-bert/bert-base-uncased") | |
| >>> # Model was saved using *save_pretrained('./test/saved_model/')* (for example purposes, not runnable). | |
| >>> model = BertModel.from_pretrained("./test/saved_model/") | |
| >>> # Update configuration during loading. | |
| >>> model = BertModel.from_pretrained("google-bert/bert-base-uncased", output_attentions=True) | |
| >>> assert model.config.output_attentions == True | |
| ``` | |
| </ExampleCodeBlock> | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>get_compiled_call</name><anchor>transformers.PreTrainedModel.get_compiled_call</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L5057</source><parameters>[{"name": "compile_config", "val": ": typing.Optional[transformers.generation.configuration_utils.CompileConfig]"}]</parameters></docstring> | |
| Return a `torch.compile`'d version of `self.__call__`. This is useful to dynamically choose between | |
| non-compiled/compiled `forward` during inference, especially to switch between prefill (where we don't | |
| want to use compiled version to avoid recomputing the graph with new shapes) and iterative decoding | |
| (where we want the speed-ups of compiled version with static shapes). | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>get_decoder</name><anchor>transformers.PreTrainedModel.get_decoder</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L2584</source><parameters>[]</parameters></docstring> | |
| Best-effort lookup of the *decoder* module. | |
| Order of attempts (covers ~85 % of current usages): | |
| 1. `self.decoder` | |
| 2. `self.model` (many wrappers store the decoder here) | |
| 3. `self.model.get_decoder()` (nested wrappers) | |
| 4. fallback: raise for the few exotic models that need a bespoke rule | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>get_memory_footprint</name><anchor>transformers.PreTrainedModel.get_memory_footprint</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L3889</source><parameters>[{"name": "return_buffers", "val": " = True"}]</parameters><paramsdesc>- **return_buffers** (`bool`, *optional*, defaults to `True`) -- | |
| Whether to return the size of the buffer tensors in the computation of the memory footprint. Buffers | |
| are tensors that do not require gradients and not registered as parameters. E.g. mean and std in batch | |
| norm layers. Please see: https://discuss.pytorch.org/t/what-pytorch-means-by-buffers/120266/2</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Get the memory footprint of a model. This will return the memory footprint of the current model in bytes. | |
| Useful to benchmark the memory footprint of the current model and design some tests. Solution inspired from the | |
| PyTorch discussions: https://discuss.pytorch.org/t/gpu-memory-that-model-uses/56822/2 | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>get_parameter_or_buffer</name><anchor>transformers.PreTrainedModel.get_parameter_or_buffer</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L5189</source><parameters>[{"name": "target", "val": ": str"}]</parameters></docstring> | |
| Return the parameter or buffer given by `target` if it exists, otherwise throw an error. This combines | |
| `get_parameter()` and `get_buffer()` in a single handy function. If the target is an `_extra_state` attribute, | |
| it will return the extra state provided by the module. Note that it only work if `target` is a leaf of the model. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>gradient_checkpointing_disable</name><anchor>transformers.PreTrainedModel.gradient_checkpointing_disable</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L3421</source><parameters>[]</parameters></docstring> | |
| Deactivates gradient checkpointing for the current model. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>gradient_checkpointing_enable</name><anchor>transformers.PreTrainedModel.gradient_checkpointing_enable</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L3360</source><parameters>[{"name": "gradient_checkpointing_kwargs", "val": " = None"}]</parameters><paramsdesc>- **gradient_checkpointing_kwargs** (dict, *optional*) -- | |
| Additional keyword arguments passed along to the `torch.utils.checkpoint.checkpoint` function.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Activates gradient checkpointing for the current model. | |
| We pass the `__call__` method of the modules instead of `forward` because `__call__` attaches all the hooks of | |
| the module. https://discuss.pytorch.org/t/any-different-between-model-input-and-model-forward-input/3690/2 | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>init_weights</name><anchor>transformers.PreTrainedModel.init_weights</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L3347</source><parameters>[]</parameters></docstring> | |
| Maybe initializes weights. If using a custom `PreTrainedModel`, you need to implement any | |
| initialization logic in `_init_weights`. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>initialize_weights</name><anchor>transformers.PreTrainedModel.initialize_weights</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L2673</source><parameters>[]</parameters></docstring> | |
| This is equivalent to calling `self.apply(self._initialize_weights)`, but correctly handles composite models. | |
| This function dynamically dispatches the correct `init_weights` function to the modules as we advance in the | |
| module graph along the recursion. It can handle an arbitrary number of sub-models. Without it, every composite | |
| model would have to recurse a second time on all sub-models explicitly in the outer-most `_init_weights`, which | |
| is extremely error prone and inefficient. | |
| Note that the `torch.no_grad()` decorator is very important as well, as most of our `_init_weights` do not use | |
| `torch.nn.init` functions (which are all no_grad by default), but simply do in-place ops such as | |
| `module.weight.data.zero_()`. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>post_init</name><anchor>transformers.PreTrainedModel.post_init</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L1853</source><parameters>[]</parameters></docstring> | |
| A method executed at the end of each Transformer model initialization, to execute code that needs the model's | |
| modules properly initialized (such as weight initialization). | |
| This is also used when the user is running distributed code. We add hooks to the modules here, according to | |
| the model's tp_plan! | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>register_for_auto_class</name><anchor>transformers.PreTrainedModel.register_for_auto_class</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L4920</source><parameters>[{"name": "auto_class", "val": " = 'AutoModel'"}]</parameters><paramsdesc>- **auto_class** (`str` or `type`, *optional*, defaults to `"AutoModel"`) -- | |
| The auto class to register this new model with.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Register this class with a given auto class. This should only be used for custom models as the ones in the | |
| library are already mapped with an auto class. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>resize_token_embeddings</name><anchor>transformers.PreTrainedModel.resize_token_embeddings</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L2882</source><parameters>[{"name": "new_num_tokens", "val": ": typing.Optional[int] = None"}, {"name": "pad_to_multiple_of", "val": ": typing.Optional[int] = None"}, {"name": "mean_resizing", "val": ": bool = True"}]</parameters><paramsdesc>- **new_num_tokens** (`int`, *optional*) -- | |
| The new number of tokens in the embedding matrix. Increasing the size will add newly initialized | |
| vectors at the end. Reducing the size will remove vectors from the end. If not provided or `None`, just | |
| returns a pointer to the input tokens `torch.nn.Embedding` module of the model without doing anything. | |
| - **pad_to_multiple_of** (`int`, *optional*) -- | |
| If set will pad the embedding matrix to a multiple of the provided value.If `new_num_tokens` is set to | |
| `None` will just pad the embedding to a multiple of `pad_to_multiple_of`. | |
| This is especially useful to enable the use of Tensor Cores on NVIDIA hardware with compute capability | |
| `>= 7.5` (Volta), or on TPUs which benefit from having sequence lengths be a multiple of 128. For more | |
| details about this, or help on choosing the correct value for resizing, refer to this guide: | |
| https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc | |
| - **mean_resizing** (`bool`) -- | |
| Whether to initialize the added embeddings from a multivariate normal distribution that has old embeddings' mean and | |
| covariance or to initialize them with a normal distribution that has a mean of zero and std equals `config.initializer_range`. | |
| Setting `mean_resizing` to `True` is useful when increasing the size of the embeddings of causal language models, | |
| where the generated tokens' probabilities won't be affected by the added embeddings because initializing the new embeddings with the | |
| old embeddings' mean will reduce the kl-divergence between the next token probability before and after adding the new embeddings. | |
| Refer to this article for more information: https://nlp.stanford.edu/~johnhew/vocab-expansion.html</paramsdesc><paramgroups>0</paramgroups><rettype>`torch.nn.Embedding`</rettype><retdesc>Pointer to the input tokens Embeddings Module of the model.</retdesc></docstring> | |
| Resizes input token embeddings matrix of the model if `new_num_tokens != config.vocab_size`. | |
| Takes care of tying weights embeddings afterwards if the model class has a `tie_weights()` method. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>save_pretrained</name><anchor>transformers.PreTrainedModel.save_pretrained</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L3448</source><parameters>[{"name": "save_directory", "val": ": typing.Union[str, os.PathLike]"}, {"name": "is_main_process", "val": ": bool = True"}, {"name": "state_dict", "val": ": typing.Optional[dict] = None"}, {"name": "save_function", "val": ": Callable = <function save at 0x7fa464b22ef0>"}, {"name": "push_to_hub", "val": ": bool = False"}, {"name": "max_shard_size", "val": ": typing.Union[int, str] = '5GB'"}, {"name": "safe_serialization", "val": ": bool = True"}, {"name": "variant", "val": ": typing.Optional[str] = None"}, {"name": "token", "val": ": typing.Union[str, bool, NoneType] = None"}, {"name": "save_peft_format", "val": ": bool = True"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **save_directory** (`str` or `os.PathLike`) -- | |
| Directory to which to save. Will be created if it doesn't exist. | |
| - **is_main_process** (`bool`, *optional*, defaults to `True`) -- | |
| Whether the process calling this is the main process or not. Useful when in distributed training like | |
| TPUs and need to call this function on all processes. In this case, set `is_main_process=True` only on | |
| the main process to avoid race conditions. | |
| - **state_dict** (nested dictionary of `torch.Tensor`) -- | |
| The state dictionary of the model to save. Will default to `self.state_dict()`, but can be used to only | |
| save parts of the model or if special precautions need to be taken when recovering the state dictionary | |
| of a model (like when using model parallelism). | |
| - **save_function** (`Callable`) -- | |
| The function to use to save the state dictionary. Useful on distributed training like TPUs when one | |
| need to replace `torch.save` by another method. | |
| - **push_to_hub** (`bool`, *optional*, defaults to `False`) -- | |
| Whether or not to push your model to the Hugging Face model hub after saving it. You can specify the | |
| repository you want to push to with `repo_id` (will default to the name of `save_directory` in your | |
| namespace). | |
| - **max_shard_size** (`int` or `str`, *optional*, defaults to `"5GB"`) -- | |
| The maximum size for a checkpoint before being sharded. Checkpoints shard will then be each of size | |
| lower than this size. If expressed as a string, needs to be digits followed by a unit (like `"5MB"`). | |
| We default it to 5GB in order for models to be able to run easily on free-tier google colab instances | |
| without CPU OOM issues. | |
| <Tip warning={true}> | |
| If a single weight of the model is bigger than `max_shard_size`, it will be in its own checkpoint shard | |
| which will be bigger than `max_shard_size`. | |
| </Tip> | |
| - **safe_serialization** (`bool`, *optional*, defaults to `True`) -- | |
| Whether to save the model using `safetensors` or the traditional PyTorch way (that uses `pickle`). | |
| - **variant** (`str`, *optional*) -- | |
| If specified, weights are saved in the format pytorch_model.<variant>.bin. | |
| - **token** (`str` or `bool`, *optional*) -- | |
| The token to use as HTTP bearer authorization for remote files. If `True`, or not specified, will use | |
| the token generated when running `hf auth login` (stored in `~/.huggingface`). | |
| - **save_peft_format** (`bool`, *optional*, defaults to `True`) -- | |
| For backward compatibility with PEFT library, in case adapter weights are attached to the model, all | |
| keys of the state dict of adapters needs to be prepended with `base_model.model`. Advanced users can | |
| disable this behaviours by setting `save_peft_format` to `False`. | |
| - **kwargs** (`dict[str, Any]`, *optional*) -- | |
| Additional key word arguments passed along to the [push_to_hub()](/docs/transformers/pr_33892/en/main_classes/model#transformers.utils.PushToHubMixin.push_to_hub) method.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Save a model and its configuration file to a directory, so that it can be re-loaded using the | |
| [from_pretrained()](/docs/transformers/pr_33892/en/main_classes/model#transformers.PreTrainedModel.from_pretrained) class method. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>set_attn_implementation</name><anchor>transformers.PreTrainedModel.set_attn_implementation</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L2468</source><parameters>[{"name": "attn_implementation", "val": ": typing.Union[str, dict]"}]</parameters><paramsdesc>- **attn_implementation** (`str` or `dict`) -- | |
| The attention implementation to set for this model. It can be either a `str`, in which case it will be | |
| dispatched to all submodels if relevant, or a `dict` where keys are the sub_configs name, in which case each | |
| submodel will dispatch the corresponding value.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Set the requested `attn_implementation` for this model. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>set_decoder</name><anchor>transformers.PreTrainedModel.set_decoder</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L2609</source><parameters>[{"name": "decoder", "val": ""}]</parameters></docstring> | |
| Symmetric setter. Mirrors the lookup logic used in `get_decoder`. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>tie_embeddings_and_encoder_decoder</name><anchor>transformers.PreTrainedModel.tie_embeddings_and_encoder_decoder</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L2704</source><parameters>[]</parameters></docstring> | |
| If set in the config, tie the weights between the input embeddings and the output embeddings, | |
| and the encoder and decoder. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>tie_weights</name><anchor>transformers.PreTrainedModel.tie_weights</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L2725</source><parameters>[]</parameters></docstring> | |
| Recursively (for all submodels) tie all the weights of the model. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>upcast_modules_in_fp32</name><anchor>transformers.PreTrainedModel.upcast_modules_in_fp32</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L5222</source><parameters>[{"name": "hf_quantizer", "val": ": transformers.quantizers.base.HfQuantizer | None"}, {"name": "dtype", "val": ": dtype"}]</parameters></docstring> | |
| Upcast modules defined in `_keep_in_fp32_modules` and `_keep_in_fp32_modules_strict` in fp32, if | |
| `dtype` is different than fp32. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>warn_if_padding_and_no_attention_mask</name><anchor>transformers.PreTrainedModel.warn_if_padding_and_no_attention_mask</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L4942</source><parameters>[{"name": "input_ids", "val": ""}, {"name": "attention_mask", "val": ""}]</parameters></docstring> | |
| Shows a one-time warning if the input_ids appear to contain padding and no attention mask was given. | |
| </div></div> | |
| Custom models should also include a `_supports_assign_param_buffer`, which determines if superfast init can apply | |
| on the particular model. Signs that your model needs this are if `test_save_and_load_from_pretrained` fails. If so, | |
| set this to `False`. | |
| ## ModuleUtilsMixin[[transformers.modeling_utils.ModuleUtilsMixin]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class transformers.modeling_utils.ModuleUtilsMixin</name><anchor>transformers.modeling_utils.ModuleUtilsMixin</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L1280</source><parameters>[]</parameters></docstring> | |
| A few utilities for `torch.nn.Modules`, to be used as a mixin. | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>add_memory_hooks</name><anchor>transformers.modeling_utils.ModuleUtilsMixin.add_memory_hooks</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L1311</source><parameters>[]</parameters></docstring> | |
| Add a memory hook before and after each sub-module forward pass to record increase in memory consumption. | |
| Increase in memory consumption is stored in a `mem_rss_diff` attribute for each module and can be reset to zero | |
| with `model.reset_memory_hooks_state()`. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>estimate_tokens</name><anchor>transformers.modeling_utils.ModuleUtilsMixin.estimate_tokens</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L1506</source><parameters>[{"name": "input_dict", "val": ": dict"}]</parameters><paramsdesc>- **inputs** (`dict`) -- The model inputs.</paramsdesc><paramgroups>0</paramgroups><rettype>`int`</rettype><retdesc>The total number of tokens.</retdesc></docstring> | |
| Helper function to estimate the total number of tokens from the model inputs. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>floating_point_ops</name><anchor>transformers.modeling_utils.ModuleUtilsMixin.floating_point_ops</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L1527</source><parameters>[{"name": "input_dict", "val": ": dict"}, {"name": "exclude_embeddings", "val": ": bool = True"}]</parameters><paramsdesc>- **batch_size** (`int`) -- | |
| The batch size for the forward pass. | |
| - **sequence_length** (`int`) -- | |
| The number of tokens in each line of the batch. | |
| - **exclude_embeddings** (`bool`, *optional*, defaults to `True`) -- | |
| Whether or not to count embedding and softmax operations.</paramsdesc><paramgroups>0</paramgroups><rettype>`int`</rettype><retdesc>The number of floating-point operations.</retdesc></docstring> | |
| Get number of (optionally, non-embeddings) floating-point operations for the forward and backward passes of a | |
| batch with this transformer model. Default approximation neglects the quadratic dependency on the number of | |
| tokens (valid if `12 * d_model << sequence_length`) as laid out in [this | |
| paper](https://huggingface.co/papers/2001.08361) section 2.1. Should be overridden for transformers with parameter | |
| re-use e.g. Albert or Universal Transformers, or if doing long-range modeling with very high sequence lengths. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>get_extended_attention_mask</name><anchor>transformers.modeling_utils.ModuleUtilsMixin.get_extended_attention_mask</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L1396</source><parameters>[{"name": "attention_mask", "val": ": Tensor"}, {"name": "input_shape", "val": ": tuple"}, {"name": "device", "val": ": typing.Optional[torch.device] = None"}, {"name": "dtype", "val": ": typing.Optional[torch.dtype] = None"}]</parameters><paramsdesc>- **attention_mask** (`torch.Tensor`) -- | |
| Mask with ones indicating tokens to attend to, zeros for tokens to ignore. | |
| - **input_shape** (`tuple[int]`) -- | |
| The shape of the input to the model.</paramsdesc><paramgroups>0</paramgroups><retdesc>`torch.Tensor` The extended attention mask, with a the same dtype as `attention_mask.dtype`.</retdesc></docstring> | |
| Makes broadcastable attention and causal masks so that future and masked tokens are ignored. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>invert_attention_mask</name><anchor>transformers.modeling_utils.ModuleUtilsMixin.invert_attention_mask</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L1347</source><parameters>[{"name": "encoder_attention_mask", "val": ": Tensor"}]</parameters><paramsdesc>- **encoder_attention_mask** (`torch.Tensor`) -- An attention mask.</paramsdesc><paramgroups>0</paramgroups><rettype>`torch.Tensor`</rettype><retdesc>The inverted attention mask.</retdesc></docstring> | |
| Invert an attention mask (e.g., switches 0. and 1.). | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>num_parameters</name><anchor>transformers.modeling_utils.ModuleUtilsMixin.num_parameters</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L1452</source><parameters>[{"name": "only_trainable", "val": ": bool = False"}, {"name": "exclude_embeddings", "val": ": bool = False"}]</parameters><paramsdesc>- **only_trainable** (`bool`, *optional*, defaults to `False`) -- | |
| Whether or not to return only the number of trainable parameters | |
| - **exclude_embeddings** (`bool`, *optional*, defaults to `False`) -- | |
| Whether or not to return only the number of non-embeddings parameters</paramsdesc><paramgroups>0</paramgroups><rettype>`int`</rettype><retdesc>The number of parameters.</retdesc></docstring> | |
| Get number of (optionally, trainable or non-embeddings) parameters in the module. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>reset_memory_hooks_state</name><anchor>transformers.modeling_utils.ModuleUtilsMixin.reset_memory_hooks_state</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/modeling_utils.py#L1323</source><parameters>[]</parameters></docstring> | |
| Reset the `mem_rss_diff` attribute of each module (see [add_memory_hooks()](/docs/transformers/pr_33892/en/main_classes/model#transformers.modeling_utils.ModuleUtilsMixin.add_memory_hooks)). | |
| </div></div> | |
| ## Pushing to the Hub[[transformers.utils.PushToHubMixin]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class transformers.utils.PushToHubMixin</name><anchor>transformers.utils.PushToHubMixin</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/utils/hub.py#L696</source><parameters>[]</parameters></docstring> | |
| A Mixin containing the functionality to push a model or tokenizer to the hub. | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>push_to_hub</name><anchor>transformers.utils.PushToHubMixin.push_to_hub</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/utils/hub.py#L821</source><parameters>[{"name": "repo_id", "val": ": str"}, {"name": "use_temp_dir", "val": ": bool | None = None"}, {"name": "commit_message", "val": ": str | None = None"}, {"name": "private", "val": ": bool | None = None"}, {"name": "token", "val": ": bool | str | None = None"}, {"name": "max_shard_size", "val": ": int | str | None = '5GB'"}, {"name": "create_pr", "val": ": bool = False"}, {"name": "safe_serialization", "val": ": bool = True"}, {"name": "revision", "val": ": str | None = None"}, {"name": "commit_description", "val": ": str | None = None"}, {"name": "tags", "val": ": list[str] | None = None"}, {"name": "**deprecated_kwargs", "val": ""}]</parameters><paramsdesc>- **repo_id** (`str`) -- | |
| The name of the repository you want to push your {object} to. It should contain your organization name | |
| when pushing to a given organization. | |
| - **use_temp_dir** (`bool`, *optional*) -- | |
| Whether or not to use a temporary directory to store the files saved before they are pushed to the Hub. | |
| Will default to `True` if there is no directory named like `repo_id`, `False` otherwise. | |
| - **commit_message** (`str`, *optional*) -- | |
| Message to commit while pushing. Will default to `"Upload {object}"`. | |
| - **private** (`bool`, *optional*) -- | |
| Whether to make the repo private. If `None` (default), the repo will be public unless the organization's default is private. This value is ignored if the repo already exists. | |
| - **token** (`bool` or `str`, *optional*) -- | |
| The token to use as HTTP bearer authorization for remote files. If `True`, will use the token generated | |
| when running `hf auth login` (stored in `~/.huggingface`). Will default to `True` if `repo_url` | |
| is not specified. | |
| - **max_shard_size** (`int` or `str`, *optional*, defaults to `"5GB"`) -- | |
| Only applicable for models. The maximum size for a checkpoint before being sharded. Checkpoints shard | |
| will then be each of size lower than this size. If expressed as a string, needs to be digits followed | |
| by a unit (like `"5MB"`). We default it to `"5GB"` so that users can easily load models on free-tier | |
| Google Colab instances without any CPU OOM issues. | |
| - **create_pr** (`bool`, *optional*, defaults to `False`) -- | |
| Whether or not to create a PR with the uploaded files or directly commit. | |
| - **safe_serialization** (`bool`, *optional*, defaults to `True`) -- | |
| Whether or not to convert the model weights in safetensors format for safer serialization. | |
| - **revision** (`str`, *optional*) -- | |
| Branch to push the uploaded files to. | |
| - **commit_description** (`str`, *optional*) -- | |
| The description of the commit that will be created | |
| - **tags** (`list[str]`, *optional*) -- | |
| List of tags to push on the Hub.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Upload the {object_files} to the 🤗 Model Hub. | |
| <ExampleCodeBlock anchor="transformers.utils.PushToHubMixin.push_to_hub.example"> | |
| Examples: | |
| ```python | |
| from transformers import {object_class} | |
| {object} = {object_class}.from_pretrained("google-bert/bert-base-cased") | |
| # Push the {object} to your namespace with the name "my-finetuned-bert". | |
| {object}.push_to_hub("my-finetuned-bert") | |
| # Push the {object} to an organization with the name "my-finetuned-bert". | |
| {object}.push_to_hub("huggingface/my-finetuned-bert") | |
| ``` | |
| </ExampleCodeBlock> | |
| </div></div> | |
| <EditOnGithub source="https://github.com/huggingface/transformers/blob/main/docs/source/en/main_classes/model.md" /> |
Xet Storage Details
- Size:
- 52.9 kB
- Xet hash:
- 6223f0db4b5e514c1ab6509da179f4e53af970020724c288cc8056771f523da2
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.