Buckets:
| # Big Transfer (BiT) | |
| ## Overview | |
| The BiT model was proposed in [Big Transfer (BiT): General Visual Representation Learning](https://huggingface.co/papers/1912.11370) by Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Joan Puigcerver, Jessica Yung, Sylvain Gelly, Neil Houlsby. | |
| BiT is a simple recipe for scaling up pre-training of [ResNet](resnet)-like architectures (specifically, ResNetv2). The method results in significant improvements for transfer learning. | |
| The abstract from the paper is the following: | |
| *Transfer of pre-trained representations improves sample efficiency and simplifies hyperparameter tuning when training deep neural networks for vision. We revisit the paradigm of pre-training on large supervised datasets and fine-tuning the model on a target task. We scale up pre-training, and propose a simple recipe that we call Big Transfer (BiT). By combining a few carefully selected components, and transferring using a simple heuristic, we achieve strong performance on over 20 datasets. BiT performs well across a surprisingly wide range of data regimes -- from 1 example per class to 1M total examples. BiT achieves 87.5% top-1 accuracy on ILSVRC-2012, 99.4% on CIFAR-10, and 76.3% on the 19 task Visual Task Adaptation Benchmark (VTAB). On small datasets, BiT attains 76.8% on ILSVRC-2012 with 10 examples per class, and 97.0% on CIFAR-10 with 10 examples per class. We conduct detailed analysis of the main components that lead to high transfer performance.* | |
| This model was contributed by [nielsr](https://huggingface.co/nielsr). | |
| The original code can be found [here](https://github.com/google-research/big_transfer). | |
| ## Usage tips | |
| - BiT models are equivalent to ResNetv2 in terms of architecture, except that: 1) all batch normalization layers are replaced by [group normalization](https://huggingface.co/papers/1803.08494), | |
| 2) [weight standardization](https://huggingface.co/papers/1903.10520) is used for convolutional layers. The authors show that the combination of both is useful for training with large batch sizes, and has a significant | |
| impact on transfer learning. | |
| ## Resources | |
| A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with BiT. | |
| - [BitForImageClassification](/docs/transformers/pr_26617/en/model_doc/bit#transformers.BitForImageClassification) is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/image_classification.ipynb). | |
| - See also: [Image classification task guide](../tasks/image_classification) | |
| If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource. | |
| ## BitConfig[[transformers.BitConfig]] | |
| #### transformers.BitConfig[[transformers.BitConfig]] | |
| [Source](https://github.com/huggingface/transformers/blob/vr_26617/src/transformers/models/bit/configuration_bit.py#L24) | |
| This is the configuration class to store the configuration of a [BitModel](/docs/transformers/pr_26617/en/model_doc/bit#transformers.BitModel). It is used to instantiate an BiT | |
| model according to the specified arguments, defining the model architecture. Instantiating a configuration with the | |
| defaults will yield a similar configuration to that of the BiT | |
| [google/bit-50](https://huggingface.co/google/bit-50) architecture. | |
| Configuration objects inherit from [PreTrainedConfig](/docs/transformers/pr_26617/en/main_classes/configuration#transformers.PreTrainedConfig) and can be used to control the model outputs. Read the | |
| documentation from [PreTrainedConfig](/docs/transformers/pr_26617/en/main_classes/configuration#transformers.PreTrainedConfig) for more information. | |
| Example: | |
| ```python | |
| >>> from transformers import BitConfig, BitModel | |
| >>> # Initializing a BiT bit-50 style configuration | |
| >>> configuration = BitConfig() | |
| >>> # Initializing a model (with random weights) from the bit-50 style configuration | |
| >>> model = BitModel(configuration) | |
| >>> # Accessing the model configuration | |
| >>> configuration = model.config | |
| ``` | |
| **Parameters:** | |
| num_channels (`int`, *optional*, defaults to 3) : The number of input channels. | |
| embedding_size (`int`, *optional*, defaults to 64) : Dimensionality (hidden size) for the embedding layer. | |
| hidden_sizes (`list[int]`, *optional*, defaults to `[256, 512, 1024, 2048]`) : Dimensionality (hidden size) at each stage. | |
| depths (`list[int]`, *optional*, defaults to `[3, 4, 6, 3]`) : Depth (number of layers) for each stage. | |
| layer_type (`str`, *optional*, defaults to `"preactivation"`) : The layer to use, it can be either `"preactivation"` or `"bottleneck"`. | |
| hidden_act (`str`, *optional*, defaults to `"relu"`) : The non-linear activation function in each block. If string, `"gelu"`, `"relu"`, `"selu"` and `"gelu_new"` are supported. | |
| global_padding (`str`, *optional*) : Padding strategy to use for the convolutional layers. Can be either `"valid"`, `"same"`, or `None`. | |
| num_groups (`int`, *optional*, defaults to 32) : Number of groups used for the `BitGroupNormActivation` layers. | |
| drop_path_rate (`float`, *optional*, defaults to 0.0) : The drop path rate for the stochastic depth. | |
| embedding_dynamic_padding (`bool`, *optional*, defaults to `False`) : Whether or not to make use of dynamic padding for the embedding layer. | |
| output_stride (`int`, *optional*, defaults to 32) : The output stride of the model. | |
| width_factor (`int`, *optional*, defaults to 1) : The width factor for the model. | |
| out_features (`list[str]`, *optional*) : If used as backbone, list of features to output. Can be any of `"stem"`, `"stage1"`, `"stage2"`, etc. (depending on how many stages the model has). If unset and `out_indices` is set, will default to the corresponding stages. If unset and `out_indices` is unset, will default to the last stage. Must be in the same order as defined in the `stage_names` attribute. | |
| out_indices (`list[int]`, *optional*) : If used as backbone, list of indices of features to output. Can be any of 0, 1, 2, etc. (depending on how many stages the model has). If unset and `out_features` is set, will default to the corresponding stages. If unset and `out_features` is unset, will default to the last stage. Must be in the same order as defined in the `stage_names` attribute. | |
| ## BitImageProcessor[[transformers.BitImageProcessor]] | |
| #### transformers.BitImageProcessor[[transformers.BitImageProcessor]] | |
| [Source](https://github.com/huggingface/transformers/blob/vr_26617/src/transformers/models/bit/image_processing_bit.py#L48) | |
| Constructs a BiT image processor. | |
| preprocesstransformers.BitImageProcessor.preprocesshttps://github.com/huggingface/transformers/blob/vr_26617/src/transformers/models/bit/image_processing_bit.py#L172[{"name": "images", "val": ": typing.Union[ForwardRef('PIL.Image.Image'), numpy.ndarray, ForwardRef('torch.Tensor'), list['PIL.Image.Image'], list[numpy.ndarray], list['torch.Tensor']]"}, {"name": "do_resize", "val": ": bool | None = None"}, {"name": "size", "val": ": dict[str, int] | None = None"}, {"name": "resample", "val": ": PIL.Image.Resampling | None = None"}, {"name": "do_center_crop", "val": ": bool | None = None"}, {"name": "crop_size", "val": ": int | None = None"}, {"name": "do_rescale", "val": ": bool | None = None"}, {"name": "rescale_factor", "val": ": float | None = None"}, {"name": "do_normalize", "val": ": bool | None = None"}, {"name": "image_mean", "val": ": float | list[float] | None = None"}, {"name": "image_std", "val": ": float | list[float] | None = None"}, {"name": "do_convert_rgb", "val": ": bool | None = None"}, {"name": "return_tensors", "val": ": str | transformers.utils.generic.TensorType | None = None"}, {"name": "data_format", "val": ": transformers.image_utils.ChannelDimension | None = "}, {"name": "input_data_format", "val": ": str | transformers.image_utils.ChannelDimension | None = None"}]- **images** (`ImageInput`) -- | |
| Image to preprocess. Expects a single or batch of images with pixel values ranging from 0 to 255. If | |
| passing in images with pixel values between 0 and 1, set `do_rescale=False`. | |
| - **do_resize** (`bool`, *optional*, defaults to `self.do_resize`) -- | |
| Whether to resize the image. | |
| - **size** (`dict[str, int]`, *optional*, defaults to `self.size`) -- | |
| Size of the image after resizing. Shortest edge of the image is resized to size["shortest_edge"], with | |
| the longest edge resized to keep the input aspect ratio. | |
| - **resample** (`int`, *optional*, defaults to `self.resample`) -- | |
| Resampling filter to use if resizing the image. This can be one of the enum `PILImageResampling`. Only | |
| has an effect if `do_resize` is set to `True`. | |
| - **do_center_crop** (`bool`, *optional*, defaults to `self.do_center_crop`) -- | |
| Whether to center crop the image. | |
| - **crop_size** (`dict[str, int]`, *optional*, defaults to `self.crop_size`) -- | |
| Size of the center crop. Only has an effect if `do_center_crop` is set to `True`. | |
| - **do_rescale** (`bool`, *optional*, defaults to `self.do_rescale`) -- | |
| Whether to rescale the image. | |
| - **rescale_factor** (`float`, *optional*, defaults to `self.rescale_factor`) -- | |
| Rescale factor to rescale the image by if `do_rescale` is set to `True`. | |
| - **do_normalize** (`bool`, *optional*, defaults to `self.do_normalize`) -- | |
| Whether to normalize the image. | |
| - **image_mean** (`float` or `list[float]`, *optional*, defaults to `self.image_mean`) -- | |
| Image mean to use for normalization. Only has an effect if `do_normalize` is set to `True`. | |
| - **image_std** (`float` or `list[float]`, *optional*, defaults to `self.image_std`) -- | |
| Image standard deviation to use for normalization. Only has an effect if `do_normalize` is set to | |
| `True`. | |
| - **do_convert_rgb** (`bool`, *optional*, defaults to `self.do_convert_rgb`) -- | |
| Whether to convert the image to RGB. | |
| - **return_tensors** (`str` or `TensorType`, *optional*) -- | |
| The type of tensors to return. Can be one of: | |
| - Unset: Return a list of `np.ndarray`. | |
| - `TensorType.PYTORCH` or `'pt'`: Return a batch of type `torch.Tensor`. | |
| - `TensorType.NUMPY` or `'np'`: Return a batch of type `np.ndarray`. | |
| - **data_format** (`ChannelDimension` or `str`, *optional*, defaults to `ChannelDimension.FIRST`) -- | |
| The channel dimension format for the output image. Can be one of: | |
| - `"channels_first"` or `ChannelDimension.FIRST`: image in (num_channels, height, width) format. | |
| - `"channels_last"` or `ChannelDimension.LAST`: image in (height, width, num_channels) format. | |
| - Unset: Use the channel dimension format of the input image. | |
| - **input_data_format** (`ChannelDimension` or `str`, *optional*) -- | |
| The channel dimension format for the input image. If unset, the channel dimension format is inferred | |
| from the input image. Can be one of: | |
| - `"channels_first"` or `ChannelDimension.FIRST`: image in (num_channels, height, width) format. | |
| - `"channels_last"` or `ChannelDimension.LAST`: image in (height, width, num_channels) format. | |
| - `"none"` or `ChannelDimension.NONE`: image in (height, width) format.0 | |
| Preprocess an image or batch of images. | |
| **Parameters:** | |
| do_resize (`bool`, *optional*, defaults to `True`) : Whether to resize the image's (height, width) dimensions to the specified `size`. Can be overridden by `do_resize` in the `preprocess` method. | |
| size (`dict[str, int]` *optional*, defaults to `{"shortest_edge" : 224}`): Size of the image after resizing. The shortest edge of the image is resized to size["shortest_edge"], with the longest edge resized to keep the input aspect ratio. Can be overridden by `size` in the `preprocess` method. | |
| resample (`PILImageResampling`, *optional*, defaults to `PILImageResampling.BICUBIC`) : Resampling filter to use if resizing the image. Can be overridden by `resample` in the `preprocess` method. | |
| do_center_crop (`bool`, *optional*, defaults to `True`) : Whether to center crop the image to the specified `crop_size`. Can be overridden by `do_center_crop` in the `preprocess` method. | |
| crop_size (`dict[str, int]` *optional*, defaults to 224) : Size of the output image after applying `center_crop`. Can be overridden by `crop_size` in the `preprocess` method. | |
| do_rescale (`bool`, *optional*, defaults to `True`) : Whether to rescale the image by the specified scale `rescale_factor`. Can be overridden by `do_rescale` in the `preprocess` method. | |
| rescale_factor (`int` or `float`, *optional*, defaults to `1/255`) : Scale factor to use if rescaling the image. Can be overridden by `rescale_factor` in the `preprocess` method. | |
| do_normalize : Whether to normalize the image. Can be overridden by `do_normalize` in the `preprocess` method. | |
| image_mean (`float` or `list[float]`, *optional*, defaults to `OPENAI_CLIP_MEAN`) : Mean to use if normalizing the image. This is a float or list of floats the length of the number of channels in the image. Can be overridden by the `image_mean` parameter in the `preprocess` method. | |
| image_std (`float` or `list[float]`, *optional*, defaults to `OPENAI_CLIP_MEAN`) : Standard deviation to use if normalizing the image. This is a float or list of floats the length of the number of channels in the image. Can be overridden by the `image_std` parameter in the `preprocess` method. Can be overridden by the `image_std` parameter in the `preprocess` method. | |
| do_convert_rgb (`bool`, *optional*, defaults to `True`) : Whether to convert the image to RGB. | |
| ## BitImageProcessorFast[[transformers.BitImageProcessorFast]] | |
| #### transformers.BitImageProcessorFast[[transformers.BitImageProcessorFast]] | |
| [Source](https://github.com/huggingface/transformers/blob/vr_26617/src/transformers/models/bit/image_processing_bit_fast.py#L22) | |
| Constructs a BitImageProcessorFast image processor. | |
| preprocesstransformers.BitImageProcessorFast.preprocesshttps://github.com/huggingface/transformers/blob/vr_26617/src/transformers/image_processing_utils_fast.py#L838[{"name": "images", "val": ": typing.Union[ForwardRef('PIL.Image.Image'), numpy.ndarray, ForwardRef('torch.Tensor'), list['PIL.Image.Image'], list[numpy.ndarray], list['torch.Tensor']]"}, {"name": "*args", "val": ""}, {"name": "**kwargs", "val": ": typing_extensions.Unpack[transformers.processing_utils.ImagesKwargs]"}]- **images** (`Union[PIL.Image.Image, numpy.ndarray, torch.Tensor, list[PIL.Image.Image], list[numpy.ndarray], list[torch.Tensor]]`) -- | |
| Image to preprocess. Expects a single or batch of images with pixel values ranging from 0 to 255. If | |
| passing in images with pixel values between 0 and 1, set `do_rescale=False`. | |
| - **return_tensors** (`str` or [TensorType](/docs/transformers/pr_26617/en/internal/file_utils#transformers.TensorType), *optional*) -- | |
| Returns stacked tensors if set to `'pt'`, otherwise returns a list of tensors. | |
| - ****kwargs** ([ImagesKwargs](/docs/transformers/pr_26617/en/main_classes/processors#transformers.ImagesKwargs), *optional*) -- | |
| Additional image preprocessing options. Model-specific kwargs are listed above; see the TypedDict class | |
| for the complete list of supported arguments.0`~image_processing_base.BatchFeature`- **data** (`dict`) -- Dictionary of lists/arrays/tensors returned by the __call__ method ('pixel_values', etc.). | |
| - **tensor_type** (`Union[None, str, TensorType]`, *optional*) -- You can give a tensor_type here to convert the lists of integers in PyTorch/Numpy Tensors at | |
| initialization. | |
| **Parameters:** | |
| - ****kwargs** ([ImagesKwargs](/docs/transformers/pr_26617/en/main_classes/processors#transformers.ImagesKwargs), *optional*) : Additional image preprocessing options. Model-specific kwargs are listed above; see the TypedDict class for the complete list of supported arguments. | |
| **Returns:** | |
| ``~image_processing_base.BatchFeature`` | |
| - **data** (`dict`) -- Dictionary of lists/arrays/tensors returned by the __call__ method ('pixel_values', etc.). | |
| - **tensor_type** (`Union[None, str, TensorType]`, *optional*) -- You can give a tensor_type here to convert the lists of integers in PyTorch/Numpy Tensors at | |
| initialization. | |
| ## BitModel[[transformers.BitModel]] | |
| #### transformers.BitModel[[transformers.BitModel]] | |
| [Source](https://github.com/huggingface/transformers/blob/vr_26617/src/transformers/models/bit/modeling_bit.py#L651) | |
| The bare Bit Model outputting raw hidden-states without any specific head on top. | |
| This model inherits from [PreTrainedModel](/docs/transformers/pr_26617/en/main_classes/model#transformers.PreTrainedModel). Check the superclass documentation for the generic methods the | |
| library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads | |
| etc.) | |
| This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass. | |
| Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage | |
| and behavior. | |
| forwardtransformers.BitModel.forwardhttps://github.com/huggingface/transformers/blob/vr_26617/src/transformers/models/bit/modeling_bit.py#L669[{"name": "pixel_values", "val": ": Tensor"}, {"name": "output_hidden_states", "val": ": bool | None = None"}, {"name": "return_dict", "val": ": bool | None = None"}, {"name": "**kwargs", "val": ""}]- **pixel_values** (`torch.Tensor` of shape `(batch_size, num_channels, image_size, image_size)`) -- | |
| The tensors corresponding to the input images. Pixel values can be obtained using | |
| [BitImageProcessorFast](/docs/transformers/pr_26617/en/model_doc/bit#transformers.BitImageProcessorFast). See [BitImageProcessorFast.__call__()](/docs/transformers/pr_26617/en/model_doc/fuyu#transformers.FuyuImageProcessor.__call__) for details (`processor_class` uses | |
| [BitImageProcessorFast](/docs/transformers/pr_26617/en/model_doc/bit#transformers.BitImageProcessorFast) for processing images). | |
| - **output_hidden_states** (`bool`, *optional*) -- | |
| Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for | |
| more detail. | |
| - **return_dict** (`bool`, *optional*) -- | |
| Whether or not to return a [ModelOutput](/docs/transformers/pr_26617/en/main_classes/output#transformers.utils.ModelOutput) instead of a plain tuple.0`BaseModelOutputWithPoolingAndNoAttention` or `tuple(torch.FloatTensor)`A `BaseModelOutputWithPoolingAndNoAttention` or a tuple of | |
| `torch.FloatTensor` (if `return_dict=False` is passed or when `config.return_dict=False`) comprising various | |
| elements depending on the configuration ([BitConfig](/docs/transformers/pr_26617/en/model_doc/bit#transformers.BitConfig)) and inputs. | |
| The [BitModel](/docs/transformers/pr_26617/en/model_doc/bit#transformers.BitModel) forward method, overrides the `__call__` special method. | |
| Although the recipe for forward pass needs to be defined within this function, one should call the `Module` | |
| instance afterwards instead of this since the former takes care of running the pre and post processing steps while | |
| the latter silently ignores them. | |
| - **last_hidden_state** (`torch.FloatTensor` of shape `(batch_size, num_channels, height, width)`) -- Sequence of hidden-states at the output of the last layer of the model. | |
| - **pooler_output** (`torch.FloatTensor` of shape `(batch_size, hidden_size)`) -- Last layer hidden-state after a pooling operation on the spatial dimensions. | |
| - **hidden_states** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`) -- Tuple of `torch.FloatTensor` (one for the output of the embeddings, if the model has an embedding layer, + | |
| one for the output of each layer) of shape `(batch_size, num_channels, height, width)`. | |
| Hidden-states of the model at the output of each layer plus the optional initial embedding outputs. | |
| Example: | |
| ```python | |
| ``` | |
| **Parameters:** | |
| config ([BitModel](/docs/transformers/pr_26617/en/model_doc/bit#transformers.BitModel)) : Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the [from_pretrained()](/docs/transformers/pr_26617/en/main_classes/model#transformers.PreTrainedModel.from_pretrained) method to load the model weights. | |
| **Returns:** | |
| ``BaseModelOutputWithPoolingAndNoAttention` or `tuple(torch.FloatTensor)`` | |
| A `BaseModelOutputWithPoolingAndNoAttention` or a tuple of | |
| `torch.FloatTensor` (if `return_dict=False` is passed or when `config.return_dict=False`) comprising various | |
| elements depending on the configuration ([BitConfig](/docs/transformers/pr_26617/en/model_doc/bit#transformers.BitConfig)) and inputs. | |
| ## BitForImageClassification[[transformers.BitForImageClassification]] | |
| #### transformers.BitForImageClassification[[transformers.BitForImageClassification]] | |
| [Source](https://github.com/huggingface/transformers/blob/vr_26617/src/transformers/models/bit/modeling_bit.py#L710) | |
| BiT Model with an image classification head on top (a linear layer on top of the pooled features), e.g. for | |
| ImageNet. | |
| This model inherits from [PreTrainedModel](/docs/transformers/pr_26617/en/main_classes/model#transformers.PreTrainedModel). Check the superclass documentation for the generic methods the | |
| library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads | |
| etc.) | |
| This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass. | |
| Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage | |
| and behavior. | |
| forwardtransformers.BitForImageClassification.forwardhttps://github.com/huggingface/transformers/blob/vr_26617/src/transformers/models/bit/modeling_bit.py#L723[{"name": "pixel_values", "val": ": torch.FloatTensor | None = None"}, {"name": "labels", "val": ": torch.LongTensor | None = None"}, {"name": "output_hidden_states", "val": ": bool | None = None"}, {"name": "return_dict", "val": ": bool | None = None"}, {"name": "**kwargs", "val": ""}]- **pixel_values** (`torch.FloatTensor` of shape `(batch_size, num_channels, image_size, image_size)`, *optional*) -- | |
| The tensors corresponding to the input images. Pixel values can be obtained using | |
| [BitImageProcessorFast](/docs/transformers/pr_26617/en/model_doc/bit#transformers.BitImageProcessorFast). See [BitImageProcessorFast.__call__()](/docs/transformers/pr_26617/en/model_doc/fuyu#transformers.FuyuImageProcessor.__call__) for details (`processor_class` uses | |
| [BitImageProcessorFast](/docs/transformers/pr_26617/en/model_doc/bit#transformers.BitImageProcessorFast) for processing images). | |
| - **labels** (`torch.LongTensor` of shape `(batch_size,)`, *optional*) -- | |
| Labels for computing the image classification/regression loss. Indices should be in `[0, ..., | |
| config.num_labels - 1]`. If `config.num_labels > 1` a classification loss is computed (Cross-Entropy). | |
| - **output_hidden_states** (`bool`, *optional*) -- | |
| Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for | |
| more detail. | |
| - **return_dict** (`bool`, *optional*) -- | |
| Whether or not to return a [ModelOutput](/docs/transformers/pr_26617/en/main_classes/output#transformers.utils.ModelOutput) instead of a plain tuple.0[ImageClassifierOutputWithNoAttention](/docs/transformers/pr_26617/en/main_classes/output#transformers.modeling_outputs.ImageClassifierOutputWithNoAttention) or `tuple(torch.FloatTensor)`A [ImageClassifierOutputWithNoAttention](/docs/transformers/pr_26617/en/main_classes/output#transformers.modeling_outputs.ImageClassifierOutputWithNoAttention) or a tuple of | |
| `torch.FloatTensor` (if `return_dict=False` is passed or when `config.return_dict=False`) comprising various | |
| elements depending on the configuration ([BitConfig](/docs/transformers/pr_26617/en/model_doc/bit#transformers.BitConfig)) and inputs. | |
| The [BitForImageClassification](/docs/transformers/pr_26617/en/model_doc/bit#transformers.BitForImageClassification) forward method, overrides the `__call__` special method. | |
| Although the recipe for forward pass needs to be defined within this function, one should call the `Module` | |
| instance afterwards instead of this since the former takes care of running the pre and post processing steps while | |
| the latter silently ignores them. | |
| - **loss** (`torch.FloatTensor` of shape `(1,)`, *optional*, returned when `labels` is provided) -- Classification (or regression if config.num_labels==1) loss. | |
| - **logits** (`torch.FloatTensor` of shape `(batch_size, config.num_labels)`) -- Classification (or regression if config.num_labels==1) scores (before SoftMax). | |
| - **hidden_states** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`) -- Tuple of `torch.FloatTensor` (one for the output of the embeddings, if the model has an embedding layer, + | |
| one for the output of each stage) of shape `(batch_size, num_channels, height, width)`. Hidden-states (also | |
| called feature maps) of the model at the output of each stage. | |
| Example: | |
| ```python | |
| >>> from transformers import AutoImageProcessor, BitForImageClassification | |
| >>> import torch | |
| >>> from datasets import load_dataset | |
| >>> dataset = load_dataset("huggingface/cats-image") | |
| >>> image = dataset["test"]["image"][0] | |
| >>> image_processor = AutoImageProcessor.from_pretrained("google/bit-50") | |
| >>> model = BitForImageClassification.from_pretrained("google/bit-50") | |
| >>> inputs = image_processor(image, return_tensors="pt") | |
| >>> with torch.no_grad(): | |
| ... logits = model(**inputs).logits | |
| >>> # model predicts one of the 1000 ImageNet classes | |
| >>> predicted_label = logits.argmax(-1).item() | |
| >>> print(model.config.id2label[predicted_label]) | |
| ... | |
| ``` | |
| **Parameters:** | |
| config ([BitForImageClassification](/docs/transformers/pr_26617/en/model_doc/bit#transformers.BitForImageClassification)) : Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the [from_pretrained()](/docs/transformers/pr_26617/en/main_classes/model#transformers.PreTrainedModel.from_pretrained) method to load the model weights. | |
| **Returns:** | |
| `[ImageClassifierOutputWithNoAttention](/docs/transformers/pr_26617/en/main_classes/output#transformers.modeling_outputs.ImageClassifierOutputWithNoAttention) or `tuple(torch.FloatTensor)`` | |
| A [ImageClassifierOutputWithNoAttention](/docs/transformers/pr_26617/en/main_classes/output#transformers.modeling_outputs.ImageClassifierOutputWithNoAttention) or a tuple of | |
| `torch.FloatTensor` (if `return_dict=False` is passed or when `config.return_dict=False`) comprising various | |
| elements depending on the configuration ([BitConfig](/docs/transformers/pr_26617/en/model_doc/bit#transformers.BitConfig)) and inputs. | |
Xet Storage Details
- Size:
- 26.9 kB
- Xet hash:
- 7633b7b9ee97123082dd555b79f820ad34dbbbff73e1a3181e1f3170c718552b
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.