Buckets:
| # Big Transfer (BiT) | |
| <div class="flex flex-wrap space-x-1"> | |
| <img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white"> | |
| </div> | |
| ## Overview | |
| The BiT model was proposed in [Big Transfer (BiT): General Visual Representation Learning](https://huggingface.co/papers/1912.11370) by Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Joan Puigcerver, Jessica Yung, Sylvain Gelly, Neil Houlsby. | |
| BiT is a simple recipe for scaling up pre-training of [ResNet](resnet)-like architectures (specifically, ResNetv2). The method results in significant improvements for transfer learning. | |
| The abstract from the paper is the following: | |
| *Transfer of pre-trained representations improves sample efficiency and simplifies hyperparameter tuning when training deep neural networks for vision. We revisit the paradigm of pre-training on large supervised datasets and fine-tuning the model on a target task. We scale up pre-training, and propose a simple recipe that we call Big Transfer (BiT). By combining a few carefully selected components, and transferring using a simple heuristic, we achieve strong performance on over 20 datasets. BiT performs well across a surprisingly wide range of data regimes -- from 1 example per class to 1M total examples. BiT achieves 87.5% top-1 accuracy on ILSVRC-2012, 99.4% on CIFAR-10, and 76.3% on the 19 task Visual Task Adaptation Benchmark (VTAB). On small datasets, BiT attains 76.8% on ILSVRC-2012 with 10 examples per class, and 97.0% on CIFAR-10 with 10 examples per class. We conduct detailed analysis of the main components that lead to high transfer performance.* | |
| This model was contributed by [nielsr](https://huggingface.co/nielsr). | |
| The original code can be found [here](https://github.com/google-research/big_transfer). | |
| ## Usage tips | |
| - BiT models are equivalent to ResNetv2 in terms of architecture, except that: 1) all batch normalization layers are replaced by [group normalization](https://huggingface.co/papers/1803.08494), | |
| 2) [weight standardization](https://huggingface.co/papers/1903.10520) is used for convolutional layers. The authors show that the combination of both is useful for training with large batch sizes, and has a significant | |
| impact on transfer learning. | |
| ## Resources | |
| A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with BiT. | |
| <PipelineTag pipeline="image-classification"/> | |
| - [BitForImageClassification](/docs/transformers/pr_33892/en/model_doc/bit#transformers.BitForImageClassification) is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/image_classification.ipynb). | |
| - See also: [Image classification task guide](../tasks/image_classification) | |
| If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource. | |
| ## BitConfig[[transformers.BitConfig]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class transformers.BitConfig</name><anchor>transformers.BitConfig</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/models/bit/configuration_bit.py#L25</source><parameters>[{"name": "num_channels", "val": " = 3"}, {"name": "embedding_size", "val": " = 64"}, {"name": "hidden_sizes", "val": " = [256, 512, 1024, 2048]"}, {"name": "depths", "val": " = [3, 4, 6, 3]"}, {"name": "layer_type", "val": " = 'preactivation'"}, {"name": "hidden_act", "val": " = 'relu'"}, {"name": "global_padding", "val": " = None"}, {"name": "num_groups", "val": " = 32"}, {"name": "drop_path_rate", "val": " = 0.0"}, {"name": "embedding_dynamic_padding", "val": " = False"}, {"name": "output_stride", "val": " = 32"}, {"name": "width_factor", "val": " = 1"}, {"name": "out_features", "val": " = None"}, {"name": "out_indices", "val": " = None"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **num_channels** (`int`, *optional*, defaults to 3) -- | |
| The number of input channels. | |
| - **embedding_size** (`int`, *optional*, defaults to 64) -- | |
| Dimensionality (hidden size) for the embedding layer. | |
| - **hidden_sizes** (`list[int]`, *optional*, defaults to `[256, 512, 1024, 2048]`) -- | |
| Dimensionality (hidden size) at each stage. | |
| - **depths** (`list[int]`, *optional*, defaults to `[3, 4, 6, 3]`) -- | |
| Depth (number of layers) for each stage. | |
| - **layer_type** (`str`, *optional*, defaults to `"preactivation"`) -- | |
| The layer to use, it can be either `"preactivation"` or `"bottleneck"`. | |
| - **hidden_act** (`str`, *optional*, defaults to `"relu"`) -- | |
| The non-linear activation function in each block. If string, `"gelu"`, `"relu"`, `"selu"` and `"gelu_new"` | |
| are supported. | |
| - **global_padding** (`str`, *optional*) -- | |
| Padding strategy to use for the convolutional layers. Can be either `"valid"`, `"same"`, or `None`. | |
| - **num_groups** (`int`, *optional*, defaults to 32) -- | |
| Number of groups used for the `BitGroupNormActivation` layers. | |
| - **drop_path_rate** (`float`, *optional*, defaults to 0.0) -- | |
| The drop path rate for the stochastic depth. | |
| - **embedding_dynamic_padding** (`bool`, *optional*, defaults to `False`) -- | |
| Whether or not to make use of dynamic padding for the embedding layer. | |
| - **output_stride** (`int`, *optional*, defaults to 32) -- | |
| The output stride of the model. | |
| - **width_factor** (`int`, *optional*, defaults to 1) -- | |
| The width factor for the model. | |
| - **out_features** (`list[str]`, *optional*) -- | |
| If used as backbone, list of features to output. Can be any of `"stem"`, `"stage1"`, `"stage2"`, etc. | |
| (depending on how many stages the model has). If unset and `out_indices` is set, will default to the | |
| corresponding stages. If unset and `out_indices` is unset, will default to the last stage. Must be in the | |
| same order as defined in the `stage_names` attribute. | |
| - **out_indices** (`list[int]`, *optional*) -- | |
| If used as backbone, list of indices of features to output. Can be any of 0, 1, 2, etc. (depending on how | |
| many stages the model has). If unset and `out_features` is set, will default to the corresponding stages. | |
| If unset and `out_features` is unset, will default to the last stage. Must be in the | |
| same order as defined in the `stage_names` attribute.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| This is the configuration class to store the configuration of a [BitModel](/docs/transformers/pr_33892/en/model_doc/bit#transformers.BitModel). It is used to instantiate an BiT | |
| model according to the specified arguments, defining the model architecture. Instantiating a configuration with the | |
| defaults will yield a similar configuration to that of the BiT | |
| [google/bit-50](https://huggingface.co/google/bit-50) architecture. | |
| Configuration objects inherit from [PreTrainedConfig](/docs/transformers/pr_33892/en/main_classes/configuration#transformers.PreTrainedConfig) and can be used to control the model outputs. Read the | |
| documentation from [PreTrainedConfig](/docs/transformers/pr_33892/en/main_classes/configuration#transformers.PreTrainedConfig) for more information. | |
| <ExampleCodeBlock anchor="transformers.BitConfig.example"> | |
| Example: | |
| ```python | |
| >>> from transformers import BitConfig, BitModel | |
| >>> # Initializing a BiT bit-50 style configuration | |
| >>> configuration = BitConfig() | |
| >>> # Initializing a model (with random weights) from the bit-50 style configuration | |
| >>> model = BitModel(configuration) | |
| >>> # Accessing the model configuration | |
| >>> configuration = model.config | |
| ``` | |
| </ExampleCodeBlock> | |
| </div> | |
| ## BitImageProcessor[[transformers.BitImageProcessor]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class transformers.BitImageProcessor</name><anchor>transformers.BitImageProcessor</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/models/bit/image_processing_bit.py#L51</source><parameters>[{"name": "do_resize", "val": ": bool = True"}, {"name": "size", "val": ": typing.Optional[dict[str, int]] = None"}, {"name": "resample", "val": ": Resampling = <Resampling.BICUBIC: 3>"}, {"name": "do_center_crop", "val": ": bool = True"}, {"name": "crop_size", "val": ": typing.Optional[dict[str, int]] = None"}, {"name": "do_rescale", "val": ": bool = True"}, {"name": "rescale_factor", "val": ": typing.Union[int, float] = 0.00392156862745098"}, {"name": "do_normalize", "val": ": bool = True"}, {"name": "image_mean", "val": ": typing.Union[float, list[float], NoneType] = None"}, {"name": "image_std", "val": ": typing.Union[float, list[float], NoneType] = None"}, {"name": "do_convert_rgb", "val": ": bool = True"}, {"name": "**kwargs", "val": ""}]</parameters><paramsdesc>- **do_resize** (`bool`, *optional*, defaults to `True`) -- | |
| Whether to resize the image's (height, width) dimensions to the specified `size`. Can be overridden by | |
| `do_resize` in the `preprocess` method. | |
| - **size** (`dict[str, int]` *optional*, defaults to `{"shortest_edge" -- 224}`): | |
| Size of the image after resizing. The shortest edge of the image is resized to size["shortest_edge"], with | |
| the longest edge resized to keep the input aspect ratio. Can be overridden by `size` in the `preprocess` | |
| method. | |
| - **resample** (`PILImageResampling`, *optional*, defaults to `PILImageResampling.BICUBIC`) -- | |
| Resampling filter to use if resizing the image. Can be overridden by `resample` in the `preprocess` method. | |
| - **do_center_crop** (`bool`, *optional*, defaults to `True`) -- | |
| Whether to center crop the image to the specified `crop_size`. Can be overridden by `do_center_crop` in the | |
| `preprocess` method. | |
| - **crop_size** (`dict[str, int]` *optional*, defaults to 224) -- | |
| Size of the output image after applying `center_crop`. Can be overridden by `crop_size` in the `preprocess` | |
| method. | |
| - **do_rescale** (`bool`, *optional*, defaults to `True`) -- | |
| Whether to rescale the image by the specified scale `rescale_factor`. Can be overridden by `do_rescale` in | |
| the `preprocess` method. | |
| - **rescale_factor** (`int` or `float`, *optional*, defaults to `1/255`) -- | |
| Scale factor to use if rescaling the image. Can be overridden by `rescale_factor` in the `preprocess` | |
| method. | |
| - **do_normalize** -- | |
| Whether to normalize the image. Can be overridden by `do_normalize` in the `preprocess` method. | |
| - **image_mean** (`float` or `list[float]`, *optional*, defaults to `OPENAI_CLIP_MEAN`) -- | |
| Mean to use if normalizing the image. This is a float or list of floats the length of the number of | |
| channels in the image. Can be overridden by the `image_mean` parameter in the `preprocess` method. | |
| - **image_std** (`float` or `list[float]`, *optional*, defaults to `OPENAI_CLIP_MEAN`) -- | |
| Standard deviation to use if normalizing the image. This is a float or list of floats the length of the | |
| number of channels in the image. Can be overridden by the `image_std` parameter in the `preprocess` method. | |
| Can be overridden by the `image_std` parameter in the `preprocess` method. | |
| - **do_convert_rgb** (`bool`, *optional*, defaults to `True`) -- | |
| Whether to convert the image to RGB.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Constructs a BiT image processor. | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>preprocess</name><anchor>transformers.BitImageProcessor.preprocess</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/models/bit/image_processing_bit.py#L175</source><parameters>[{"name": "images", "val": ": typing.Union[ForwardRef('PIL.Image.Image'), numpy.ndarray, ForwardRef('torch.Tensor'), list['PIL.Image.Image'], list[numpy.ndarray], list['torch.Tensor']]"}, {"name": "do_resize", "val": ": typing.Optional[bool] = None"}, {"name": "size", "val": ": typing.Optional[dict[str, int]] = None"}, {"name": "resample", "val": ": typing.Optional[PIL.Image.Resampling] = None"}, {"name": "do_center_crop", "val": ": typing.Optional[bool] = None"}, {"name": "crop_size", "val": ": typing.Optional[int] = None"}, {"name": "do_rescale", "val": ": typing.Optional[bool] = None"}, {"name": "rescale_factor", "val": ": typing.Optional[float] = None"}, {"name": "do_normalize", "val": ": typing.Optional[bool] = None"}, {"name": "image_mean", "val": ": typing.Union[float, list[float], NoneType] = None"}, {"name": "image_std", "val": ": typing.Union[float, list[float], NoneType] = None"}, {"name": "do_convert_rgb", "val": ": typing.Optional[bool] = None"}, {"name": "return_tensors", "val": ": typing.Union[str, transformers.utils.generic.TensorType, NoneType] = None"}, {"name": "data_format", "val": ": typing.Optional[transformers.image_utils.ChannelDimension] = <ChannelDimension.FIRST: 'channels_first'>"}, {"name": "input_data_format", "val": ": typing.Union[str, transformers.image_utils.ChannelDimension, NoneType] = None"}]</parameters><paramsdesc>- **images** (`ImageInput`) -- | |
| Image to preprocess. Expects a single or batch of images with pixel values ranging from 0 to 255. If | |
| passing in images with pixel values between 0 and 1, set `do_rescale=False`. | |
| - **do_resize** (`bool`, *optional*, defaults to `self.do_resize`) -- | |
| Whether to resize the image. | |
| - **size** (`dict[str, int]`, *optional*, defaults to `self.size`) -- | |
| Size of the image after resizing. Shortest edge of the image is resized to size["shortest_edge"], with | |
| the longest edge resized to keep the input aspect ratio. | |
| - **resample** (`int`, *optional*, defaults to `self.resample`) -- | |
| Resampling filter to use if resizing the image. This can be one of the enum `PILImageResampling`. Only | |
| has an effect if `do_resize` is set to `True`. | |
| - **do_center_crop** (`bool`, *optional*, defaults to `self.do_center_crop`) -- | |
| Whether to center crop the image. | |
| - **crop_size** (`dict[str, int]`, *optional*, defaults to `self.crop_size`) -- | |
| Size of the center crop. Only has an effect if `do_center_crop` is set to `True`. | |
| - **do_rescale** (`bool`, *optional*, defaults to `self.do_rescale`) -- | |
| Whether to rescale the image. | |
| - **rescale_factor** (`float`, *optional*, defaults to `self.rescale_factor`) -- | |
| Rescale factor to rescale the image by if `do_rescale` is set to `True`. | |
| - **do_normalize** (`bool`, *optional*, defaults to `self.do_normalize`) -- | |
| Whether to normalize the image. | |
| - **image_mean** (`float` or `list[float]`, *optional*, defaults to `self.image_mean`) -- | |
| Image mean to use for normalization. Only has an effect if `do_normalize` is set to `True`. | |
| - **image_std** (`float` or `list[float]`, *optional*, defaults to `self.image_std`) -- | |
| Image standard deviation to use for normalization. Only has an effect if `do_normalize` is set to | |
| `True`. | |
| - **do_convert_rgb** (`bool`, *optional*, defaults to `self.do_convert_rgb`) -- | |
| Whether to convert the image to RGB. | |
| - **return_tensors** (`str` or `TensorType`, *optional*) -- | |
| The type of tensors to return. Can be one of: | |
| - Unset: Return a list of `np.ndarray`. | |
| - `TensorType.PYTORCH` or `'pt'`: Return a batch of type `torch.Tensor`. | |
| - `TensorType.NUMPY` or `'np'`: Return a batch of type `np.ndarray`. | |
| - **data_format** (`ChannelDimension` or `str`, *optional*, defaults to `ChannelDimension.FIRST`) -- | |
| The channel dimension format for the output image. Can be one of: | |
| - `"channels_first"` or `ChannelDimension.FIRST`: image in (num_channels, height, width) format. | |
| - `"channels_last"` or `ChannelDimension.LAST`: image in (height, width, num_channels) format. | |
| - Unset: Use the channel dimension format of the input image. | |
| - **input_data_format** (`ChannelDimension` or `str`, *optional*) -- | |
| The channel dimension format for the input image. If unset, the channel dimension format is inferred | |
| from the input image. Can be one of: | |
| - `"channels_first"` or `ChannelDimension.FIRST`: image in (num_channels, height, width) format. | |
| - `"channels_last"` or `ChannelDimension.LAST`: image in (height, width, num_channels) format. | |
| - `"none"` or `ChannelDimension.NONE`: image in (height, width) format.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Preprocess an image or batch of images. | |
| </div></div> | |
| ## BitImageProcessorFast[[transformers.BitImageProcessorFast]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class transformers.BitImageProcessorFast</name><anchor>transformers.BitImageProcessorFast</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/models/bit/image_processing_bit_fast.py#L23</source><parameters>[{"name": "**kwargs", "val": ": typing_extensions.Unpack[transformers.processing_utils.ImagesKwargs]"}]</parameters></docstring> | |
| Constructs a fast Bit image processor. | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>preprocess</name><anchor>transformers.BitImageProcessorFast.preprocess</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/image_processing_utils_fast.py#L710</source><parameters>[{"name": "images", "val": ": typing.Union[ForwardRef('PIL.Image.Image'), numpy.ndarray, ForwardRef('torch.Tensor'), list['PIL.Image.Image'], list[numpy.ndarray], list['torch.Tensor']]"}, {"name": "*args", "val": ""}, {"name": "**kwargs", "val": ": typing_extensions.Unpack[transformers.processing_utils.ImagesKwargs]"}]</parameters><paramsdesc>- **images** (`Union[PIL.Image.Image, numpy.ndarray, torch.Tensor, list['PIL.Image.Image'], list[numpy.ndarray], list['torch.Tensor']]`) -- | |
| Image to preprocess. Expects a single or batch of images with pixel values ranging from 0 to 255. If | |
| passing in images with pixel values between 0 and 1, set `do_rescale=False`. | |
| - **do_convert_rgb** (`bool`, *optional*) -- | |
| Whether to convert the image to RGB. | |
| - **do_resize** (`bool`, *optional*) -- | |
| Whether to resize the image. | |
| - **size** (`Annotated[Union[int, list[int], tuple[int, ...], dict[str, int], NoneType], None]`) -- | |
| Describes the maximum input dimensions to the model. | |
| - **crop_size** (`Annotated[Union[int, list[int], tuple[int, ...], dict[str, int], NoneType], None]`) -- | |
| Size of the output image after applying `center_crop`. | |
| - **resample** (`Annotated[Union[PILImageResampling, int, NoneType], None]`) -- | |
| Resampling filter to use if resizing the image. This can be one of the enum `PILImageResampling`. Only | |
| has an effect if `do_resize` is set to `True`. | |
| - **do_rescale** (`bool`, *optional*) -- | |
| Whether to rescale the image. | |
| - **rescale_factor** (`float`, *optional*) -- | |
| Rescale factor to rescale the image by if `do_rescale` is set to `True`. | |
| - **do_normalize** (`bool`, *optional*) -- | |
| Whether to normalize the image. | |
| - **image_mean** (`Union[float, list[float], tuple[float, ...], NoneType]`) -- | |
| Image mean to use for normalization. Only has an effect if `do_normalize` is set to `True`. | |
| - **image_std** (`Union[float, list[float], tuple[float, ...], NoneType]`) -- | |
| Image standard deviation to use for normalization. Only has an effect if `do_normalize` is set to | |
| `True`. | |
| - **do_pad** (`bool`, *optional*) -- | |
| Whether to pad the image. Padding is done either to the largest size in the batch | |
| or to a fixed square size per image. The exact padding strategy depends on the model. | |
| - **pad_size** (`Annotated[Union[int, list[int], tuple[int, ...], dict[str, int], NoneType], None]`) -- | |
| The size in `{"height": int, "width" int}` to pad the images to. Must be larger than any image size | |
| provided for preprocessing. If `pad_size` is not provided, images will be padded to the largest | |
| height and width in the batch. Applied only when `do_pad=True.` | |
| - **do_center_crop** (`bool`, *optional*) -- | |
| Whether to center crop the image. | |
| - **data_format** (`Union[str, ~image_utils.ChannelDimension, NoneType]`) -- | |
| Only `ChannelDimension.FIRST` is supported. Added for compatibility with slow processors. | |
| - **input_data_format** (`Union[str, ~image_utils.ChannelDimension, NoneType]`) -- | |
| The channel dimension format for the input image. If unset, the channel dimension format is inferred | |
| from the input image. Can be one of: | |
| - `"channels_first"` or `ChannelDimension.FIRST`: image in (num_channels, height, width) format. | |
| - `"channels_last"` or `ChannelDimension.LAST`: image in (height, width, num_channels) format. | |
| - `"none"` or `ChannelDimension.NONE`: image in (height, width) format. | |
| - **device** (`Annotated[str, None]`, *optional*) -- | |
| The device to process the images on. If unset, the device is inferred from the input images. | |
| - **return_tensors** (`Annotated[Union[str, ~utils.generic.TensorType, NoneType], None]`) -- | |
| Returns stacked tensors if set to `pt, otherwise returns a list of tensors. | |
| - **disable_grouping** (`bool`, *optional*) -- | |
| Whether to disable grouping of images by size to process them individually and not in batches. | |
| If None, will be set to True if the images are on CPU, and False otherwise. This choice is based on | |
| empirical observations, as detailed here: https://github.com/huggingface/transformers/pull/38157</paramsdesc><paramgroups>0</paramgroups><rettype>`<class 'transformers.image_processing_base.BatchFeature'>`</rettype><retdesc>- **data** (`dict`) -- Dictionary of lists/arrays/tensors returned by the __call__ method ('pixel_values', etc.). | |
| - **tensor_type** (`Union[None, str, TensorType]`, *optional*) -- You can give a tensor_type here to convert the lists of integers in PyTorch/Numpy Tensors at | |
| initialization.</retdesc></docstring> | |
| </div></div> | |
| ## BitModel[[transformers.BitModel]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class transformers.BitModel</name><anchor>transformers.BitModel</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/models/bit/modeling_bit.py#L647</source><parameters>[{"name": "config", "val": ""}]</parameters><paramsdesc>- **config** ([BitModel](/docs/transformers/pr_33892/en/model_doc/bit#transformers.BitModel)) -- | |
| Model configuration class with all the parameters of the model. Initializing with a config file does not | |
| load the weights associated with the model, only the configuration. Check out the | |
| [from_pretrained()](/docs/transformers/pr_33892/en/main_classes/model#transformers.PreTrainedModel.from_pretrained) method to load the model weights.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| The bare Bit Model outputting raw hidden-states without any specific head on top. | |
| This model inherits from [PreTrainedModel](/docs/transformers/pr_33892/en/main_classes/model#transformers.PreTrainedModel). Check the superclass documentation for the generic methods the | |
| library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads | |
| etc.) | |
| This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass. | |
| Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage | |
| and behavior. | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>forward</name><anchor>transformers.BitModel.forward</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/models/bit/modeling_bit.py#L665</source><parameters>[{"name": "pixel_values", "val": ": Tensor"}, {"name": "output_hidden_states", "val": ": typing.Optional[bool] = None"}, {"name": "return_dict", "val": ": typing.Optional[bool] = None"}]</parameters><paramsdesc>- **pixel_values** (`torch.Tensor` of shape `(batch_size, num_channels, image_size, image_size)`) -- | |
| The tensors corresponding to the input images. Pixel values can be obtained using | |
| [BitImageProcessor](/docs/transformers/pr_33892/en/model_doc/bit#transformers.BitImageProcessor). See [BitImageProcessor.__call__()](/docs/transformers/pr_33892/en/model_doc/fuyu#transformers.FuyuImageProcessor.__call__) for details (`processor_class` uses | |
| [BitImageProcessor](/docs/transformers/pr_33892/en/model_doc/bit#transformers.BitImageProcessor) for processing images). | |
| - **output_hidden_states** (`bool`, *optional*) -- | |
| Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for | |
| more detail. | |
| - **return_dict** (`bool`, *optional*) -- | |
| Whether or not to return a [ModelOutput](/docs/transformers/pr_33892/en/main_classes/output#transformers.utils.ModelOutput) instead of a plain tuple.</paramsdesc><paramgroups>0</paramgroups><rettype>`transformers.modeling_outputs.BaseModelOutputWithPoolingAndNoAttention` or `tuple(torch.FloatTensor)`</rettype><retdesc>A `transformers.modeling_outputs.BaseModelOutputWithPoolingAndNoAttention` or a tuple of | |
| `torch.FloatTensor` (if `return_dict=False` is passed or when `config.return_dict=False`) comprising various | |
| elements depending on the configuration ([BitConfig](/docs/transformers/pr_33892/en/model_doc/bit#transformers.BitConfig)) and inputs. | |
| - **last_hidden_state** (`torch.FloatTensor` of shape `(batch_size, num_channels, height, width)`) -- Sequence of hidden-states at the output of the last layer of the model. | |
| - **pooler_output** (`torch.FloatTensor` of shape `(batch_size, hidden_size)`) -- Last layer hidden-state after a pooling operation on the spatial dimensions. | |
| - **hidden_states** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`) -- Tuple of `torch.FloatTensor` (one for the output of the embeddings, if the model has an embedding layer, + | |
| one for the output of each layer) of shape `(batch_size, num_channels, height, width)`. | |
| Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.</retdesc></docstring> | |
| The [BitModel](/docs/transformers/pr_33892/en/model_doc/bit#transformers.BitModel) forward method, overrides the `__call__` special method. | |
| <Tip> | |
| Although the recipe for forward pass needs to be defined within this function, one should call the `Module` | |
| instance afterwards instead of this since the former takes care of running the pre and post processing steps while | |
| the latter silently ignores them. | |
| </Tip> | |
| <ExampleCodeBlock anchor="transformers.BitModel.forward.example"> | |
| Example: | |
| ```python | |
| ``` | |
| </ExampleCodeBlock> | |
| </div></div> | |
| ## BitForImageClassification[[transformers.BitForImageClassification]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class transformers.BitForImageClassification</name><anchor>transformers.BitForImageClassification</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/models/bit/modeling_bit.py#L702</source><parameters>[{"name": "config", "val": ""}]</parameters><paramsdesc>- **config** ([BitForImageClassification](/docs/transformers/pr_33892/en/model_doc/bit#transformers.BitForImageClassification)) -- | |
| Model configuration class with all the parameters of the model. Initializing with a config file does not | |
| load the weights associated with the model, only the configuration. Check out the | |
| [from_pretrained()](/docs/transformers/pr_33892/en/main_classes/model#transformers.PreTrainedModel.from_pretrained) method to load the model weights.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| BiT Model with an image classification head on top (a linear layer on top of the pooled features), e.g. for | |
| ImageNet. | |
| This model inherits from [PreTrainedModel](/docs/transformers/pr_33892/en/main_classes/model#transformers.PreTrainedModel). Check the superclass documentation for the generic methods the | |
| library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads | |
| etc.) | |
| This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass. | |
| Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage | |
| and behavior. | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>forward</name><anchor>transformers.BitForImageClassification.forward</anchor><source>https://github.com/huggingface/transformers/blob/vr_33892/src/transformers/models/bit/modeling_bit.py#L715</source><parameters>[{"name": "pixel_values", "val": ": typing.Optional[torch.FloatTensor] = None"}, {"name": "labels", "val": ": typing.Optional[torch.LongTensor] = None"}, {"name": "output_hidden_states", "val": ": typing.Optional[bool] = None"}, {"name": "return_dict", "val": ": typing.Optional[bool] = None"}]</parameters><paramsdesc>- **pixel_values** (`torch.FloatTensor` of shape `(batch_size, num_channels, image_size, image_size)`, *optional*) -- | |
| The tensors corresponding to the input images. Pixel values can be obtained using | |
| [BitImageProcessor](/docs/transformers/pr_33892/en/model_doc/bit#transformers.BitImageProcessor). See [BitImageProcessor.__call__()](/docs/transformers/pr_33892/en/model_doc/fuyu#transformers.FuyuImageProcessor.__call__) for details (`processor_class` uses | |
| [BitImageProcessor](/docs/transformers/pr_33892/en/model_doc/bit#transformers.BitImageProcessor) for processing images). | |
| - **labels** (`torch.LongTensor` of shape `(batch_size,)`, *optional*) -- | |
| Labels for computing the image classification/regression loss. Indices should be in `[0, ..., | |
| config.num_labels - 1]`. If `config.num_labels > 1` a classification loss is computed (Cross-Entropy). | |
| - **output_hidden_states** (`bool`, *optional*) -- | |
| Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for | |
| more detail. | |
| - **return_dict** (`bool`, *optional*) -- | |
| Whether or not to return a [ModelOutput](/docs/transformers/pr_33892/en/main_classes/output#transformers.utils.ModelOutput) instead of a plain tuple.</paramsdesc><paramgroups>0</paramgroups><rettype>[transformers.modeling_outputs.ImageClassifierOutputWithNoAttention](/docs/transformers/pr_33892/en/main_classes/output#transformers.modeling_outputs.ImageClassifierOutputWithNoAttention) or `tuple(torch.FloatTensor)`</rettype><retdesc>A [transformers.modeling_outputs.ImageClassifierOutputWithNoAttention](/docs/transformers/pr_33892/en/main_classes/output#transformers.modeling_outputs.ImageClassifierOutputWithNoAttention) or a tuple of | |
| `torch.FloatTensor` (if `return_dict=False` is passed or when `config.return_dict=False`) comprising various | |
| elements depending on the configuration ([BitConfig](/docs/transformers/pr_33892/en/model_doc/bit#transformers.BitConfig)) and inputs. | |
| - **loss** (`torch.FloatTensor` of shape `(1,)`, *optional*, returned when `labels` is provided) -- Classification (or regression if config.num_labels==1) loss. | |
| - **logits** (`torch.FloatTensor` of shape `(batch_size, config.num_labels)`) -- Classification (or regression if config.num_labels==1) scores (before SoftMax). | |
| - **hidden_states** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`) -- Tuple of `torch.FloatTensor` (one for the output of the embeddings, if the model has an embedding layer, + | |
| one for the output of each stage) of shape `(batch_size, num_channels, height, width)`. Hidden-states (also | |
| called feature maps) of the model at the output of each stage.</retdesc></docstring> | |
| The [BitForImageClassification](/docs/transformers/pr_33892/en/model_doc/bit#transformers.BitForImageClassification) forward method, overrides the `__call__` special method. | |
| <Tip> | |
| Although the recipe for forward pass needs to be defined within this function, one should call the `Module` | |
| instance afterwards instead of this since the former takes care of running the pre and post processing steps while | |
| the latter silently ignores them. | |
| </Tip> | |
| <ExampleCodeBlock anchor="transformers.BitForImageClassification.forward.example"> | |
| Example: | |
| ```python | |
| >>> from transformers import AutoImageProcessor, BitForImageClassification | |
| >>> import torch | |
| >>> from datasets import load_dataset | |
| >>> dataset = load_dataset("huggingface/cats-image") | |
| >>> image = dataset["test"]["image"][0] | |
| >>> image_processor = AutoImageProcessor.from_pretrained("google/bit-50") | |
| >>> model = BitForImageClassification.from_pretrained("google/bit-50") | |
| >>> inputs = image_processor(image, return_tensors="pt") | |
| >>> with torch.no_grad(): | |
| ... logits = model(**inputs).logits | |
| >>> # model predicts one of the 1000 ImageNet classes | |
| >>> predicted_label = logits.argmax(-1).item() | |
| >>> print(model.config.id2label[predicted_label]) | |
| ... | |
| ``` | |
| </ExampleCodeBlock> | |
| </div></div> | |
| <EditOnGithub source="https://github.com/huggingface/transformers/blob/main/docs/source/en/model_doc/bit.md" /> |
Xet Storage Details
- Size:
- 33.1 kB
- Xet hash:
- 25ede403c34db5e2f5e2f83d97dc2d59af78d3374984323bac2f574ace663205
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.