Buckets:
| # Big Transfer (BiT) | |
| ## Overview | |
| The BiT model was proposed in [Big Transfer (BiT): General Visual Representation Learning](https://huggingface.co/papers/1912.11370) by Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Joan Puigcerver, Jessica Yung, Sylvain Gelly, Neil Houlsby. | |
| BiT is a simple recipe for scaling up pre-training of [ResNet](resnet)-like architectures (specifically, ResNetv2). The method results in significant improvements for transfer learning. | |
| The abstract from the paper is the following: | |
| *Transfer of pre-trained representations improves sample efficiency and simplifies hyperparameter tuning when training deep neural networks for vision. We revisit the paradigm of pre-training on large supervised datasets and fine-tuning the model on a target task. We scale up pre-training, and propose a simple recipe that we call Big Transfer (BiT). By combining a few carefully selected components, and transferring using a simple heuristic, we achieve strong performance on over 20 datasets. BiT performs well across a surprisingly wide range of data regimes -- from 1 example per class to 1M total examples. BiT achieves 87.5% top-1 accuracy on ILSVRC-2012, 99.4% on CIFAR-10, and 76.3% on the 19 task Visual Task Adaptation Benchmark (VTAB). On small datasets, BiT attains 76.8% on ILSVRC-2012 with 10 examples per class, and 97.0% on CIFAR-10 with 10 examples per class. We conduct detailed analysis of the main components that lead to high transfer performance.* | |
| This model was contributed by [nielsr](https://huggingface.co/nielsr). | |
| The original code can be found [here](https://github.com/google-research/big_transfer). | |
| ## Usage tips | |
| - BiT models are equivalent to ResNetv2 in terms of architecture, except that: 1) all batch normalization layers are replaced by [group normalization](https://huggingface.co/papers/1803.08494), | |
| 2) [weight standardization](https://huggingface.co/papers/1903.10520) is used for convolutional layers. The authors show that the combination of both is useful for training with large batch sizes, and has a significant | |
| impact on transfer learning. | |
| ## Resources | |
| A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with BiT. | |
| - [BitForImageClassification](/docs/transformers/main/en/model_doc/bit#transformers.BitForImageClassification) is supported by this [example script](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-classification) and [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/image_classification.ipynb). | |
| - See also: [Image classification task guide](../tasks/image_classification) | |
| If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource. | |
| ## BitConfig[[transformers.BitConfig]] | |
| #### transformers.BitConfig[[transformers.BitConfig]] | |
| [Source](https://github.com/huggingface/transformers/blob/main/src/transformers/models/bit/configuration_bit.py#L25) | |
| This is the configuration class to store the configuration of a BitModel. It is used to instantiate a Bit | |
| model according to the specified arguments, defining the model architecture. Instantiating a configuration with the | |
| defaults will yield a similar configuration to that of the [google/bit-50](https://huggingface.co/google/bit-50) | |
| Configuration objects inherit from [PreTrainedConfig](/docs/transformers/main/en/main_classes/configuration#transformers.PreTrainedConfig) and can be used to control the model outputs. Read the | |
| documentation from [PreTrainedConfig](/docs/transformers/main/en/main_classes/configuration#transformers.PreTrainedConfig) for more information. | |
| Example: | |
| ```python | |
| >>> from transformers import BitConfig, BitModel | |
| >>> # Initializing a BiT bit-50 style configuration | |
| >>> configuration = BitConfig() | |
| >>> # Initializing a model (with random weights) from the bit-50 style configuration | |
| >>> model = BitModel(configuration) | |
| >>> # Accessing the model configuration | |
| >>> configuration = model.config | |
| ``` | |
| **Parameters:** | |
| num_channels (`int`, *optional*, defaults to `3`) : The number of input channels. | |
| embedding_size (`int`, *optional*, defaults to `64`) : Dimensionality of the embeddings and hidden states. | |
| hidden_sizes (`Union[list[int], tuple[int, ...]]`, *optional*, defaults to `(256, 512, 1024, 2048)`) : Dimensionality (hidden size) at each stage of the model. | |
| depths (`Union[list[int], tuple[int, ...]]`, *optional*, defaults to `(3, 4, 6, 3)`) : Depth of each layer in the Transformer. | |
| layer_type (`str`, *optional*, defaults to `"preactivation"`) : The layer to use, it can be either `"preactivation"` or `"bottleneck"`. | |
| hidden_act (`str`, *optional*, defaults to `relu`) : The non-linear activation function (function or string) in the decoder. For example, `"gelu"`, `"relu"`, `"silu"`, etc. | |
| global_padding (`str`, *optional*) : Padding strategy to use for the convolutional layers. Can be either `"valid"`, `"same"`, or `None`. | |
| num_groups (`int`, *optional*, defaults to 32) : Number of groups used for the `BitGroupNormActivation` layers. | |
| drop_path_rate (`Union[float, int]`, *optional*, defaults to `0.0`) : Drop path rate for the patch fusion. | |
| embedding_dynamic_padding (`bool`, *optional*, defaults to `False`) : Whether or not to make use of dynamic padding for the embedding layer. | |
| output_stride (`int`, *optional*, defaults to `32`) : The ratio between the spatial resolution of the input and output feature maps. | |
| width_factor (`int`, *optional*, defaults to 1) : The width factor for the model. | |
| ## BitImageProcessor[[transformers.BitImageProcessor]] | |
| #### transformers.BitImageProcessor[[transformers.BitImageProcessor]] | |
| [Source](https://github.com/huggingface/transformers/blob/main/src/transformers/models/bit/image_processing_bit.py#L22) | |
| Constructs a BitImageProcessor image processor. | |
| preprocesstransformers.BitImageProcessor.preprocesshttps://github.com/huggingface/transformers/blob/main/src/transformers/image_processing_utils.py#L382[{"name": "images", "val": ": typing.Union[ForwardRef('PIL.Image.Image'), numpy.ndarray, ForwardRef('torch.Tensor'), list['PIL.Image.Image'], list[numpy.ndarray], list['torch.Tensor']]"}, {"name": "*args", "val": ""}, {"name": "**kwargs", "val": ": typing_extensions.Unpack[transformers.processing_utils.ImagesKwargs]"}]- **images** (`Union[PIL.Image.Image, numpy.ndarray, torch.Tensor, list[PIL.Image.Image], list[numpy.ndarray], list[torch.Tensor]]`) -- | |
| Image to preprocess. Expects a single or batch of images with pixel values ranging from 0 to 255. If | |
| passing in images with pixel values between 0 and 1, set `do_rescale=False`. | |
| - **return_tensors** (`str` or [TensorType](/docs/transformers/main/en/internal/file_utils#transformers.TensorType), *optional*) -- | |
| Returns stacked tensors if set to `'pt'`, otherwise returns a list of tensors. | |
| - ****kwargs** ([ImagesKwargs](/docs/transformers/main/en/main_classes/processors#transformers.ImagesKwargs), *optional*) -- | |
| Additional image preprocessing options. Model-specific kwargs are listed above; see the TypedDict class | |
| for the complete list of supported arguments.0`~image_processing_base.BatchFeature`- **data** (`dict`) -- Dictionary of lists/arrays/tensors returned by the __call__ method ('pixel_values', etc.). | |
| - **tensor_type** (`Union[None, str, TensorType]`, *optional*) -- You can give a tensor_type here to convert the lists of integers in PyTorch/Numpy Tensors at | |
| initialization. | |
| **Parameters:** | |
| - ****kwargs** ([ImagesKwargs](/docs/transformers/main/en/main_classes/processors#transformers.ImagesKwargs), *optional*) : Additional image preprocessing options. Model-specific kwargs are listed above; see the TypedDict class for the complete list of supported arguments. | |
| **Returns:** | |
| ``~image_processing_base.BatchFeature`` | |
| - **data** (`dict`) -- Dictionary of lists/arrays/tensors returned by the __call__ method ('pixel_values', etc.). | |
| - **tensor_type** (`Union[None, str, TensorType]`, *optional*) -- You can give a tensor_type here to convert the lists of integers in PyTorch/Numpy Tensors at | |
| initialization. | |
| ## BitImageProcessorPil[[transformers.BitImageProcessorPil]] | |
| #### transformers.BitImageProcessorPil[[transformers.BitImageProcessorPil]] | |
| [Source](https://github.com/huggingface/transformers/blob/main/src/transformers/models/bit/image_processing_pil_bit.py#L22) | |
| Constructs a BitImageProcessor image processor. | |
| preprocesstransformers.BitImageProcessorPil.preprocesshttps://github.com/huggingface/transformers/blob/main/src/transformers/image_processing_utils.py#L382[{"name": "images", "val": ": typing.Union[ForwardRef('PIL.Image.Image'), numpy.ndarray, ForwardRef('torch.Tensor'), list['PIL.Image.Image'], list[numpy.ndarray], list['torch.Tensor']]"}, {"name": "*args", "val": ""}, {"name": "**kwargs", "val": ": typing_extensions.Unpack[transformers.processing_utils.ImagesKwargs]"}]- **images** (`Union[PIL.Image.Image, numpy.ndarray, torch.Tensor, list[PIL.Image.Image], list[numpy.ndarray], list[torch.Tensor]]`) -- | |
| Image to preprocess. Expects a single or batch of images with pixel values ranging from 0 to 255. If | |
| passing in images with pixel values between 0 and 1, set `do_rescale=False`. | |
| - **return_tensors** (`str` or [TensorType](/docs/transformers/main/en/internal/file_utils#transformers.TensorType), *optional*) -- | |
| Returns stacked tensors if set to `'pt'`, otherwise returns a list of tensors. | |
| - ****kwargs** ([ImagesKwargs](/docs/transformers/main/en/main_classes/processors#transformers.ImagesKwargs), *optional*) -- | |
| Additional image preprocessing options. Model-specific kwargs are listed above; see the TypedDict class | |
| for the complete list of supported arguments.0`~image_processing_base.BatchFeature`- **data** (`dict`) -- Dictionary of lists/arrays/tensors returned by the __call__ method ('pixel_values', etc.). | |
| - **tensor_type** (`Union[None, str, TensorType]`, *optional*) -- You can give a tensor_type here to convert the lists of integers in PyTorch/Numpy Tensors at | |
| initialization. | |
| **Parameters:** | |
| - ****kwargs** ([ImagesKwargs](/docs/transformers/main/en/main_classes/processors#transformers.ImagesKwargs), *optional*) : Additional image preprocessing options. Model-specific kwargs are listed above; see the TypedDict class for the complete list of supported arguments. | |
| **Returns:** | |
| ``~image_processing_base.BatchFeature`` | |
| - **data** (`dict`) -- Dictionary of lists/arrays/tensors returned by the __call__ method ('pixel_values', etc.). | |
| - **tensor_type** (`Union[None, str, TensorType]`, *optional*) -- You can give a tensor_type here to convert the lists of integers in PyTorch/Numpy Tensors at | |
| initialization. | |
| ## BitModel[[transformers.BitModel]] | |
| #### transformers.BitModel[[transformers.BitModel]] | |
| [Source](https://github.com/huggingface/transformers/blob/main/src/transformers/models/bit/modeling_bit.py#L646) | |
| The bare Bit Model outputting raw hidden-states without any specific head on top. | |
| This model inherits from [PreTrainedModel](/docs/transformers/main/en/main_classes/model#transformers.PreTrainedModel). Check the superclass documentation for the generic methods the | |
| library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads | |
| etc.) | |
| This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass. | |
| Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage | |
| and behavior. | |
| forwardtransformers.BitModel.forwardhttps://github.com/huggingface/transformers/blob/main/src/transformers/models/bit/modeling_bit.py#L664[{"name": "pixel_values", "val": ": Tensor"}, {"name": "output_hidden_states", "val": ": bool | None = None"}, {"name": "return_dict", "val": ": bool | None = None"}, {"name": "**kwargs", "val": ""}]- **pixel_values** (`torch.Tensor` of shape `(batch_size, num_channels, image_size, image_size)`) -- | |
| The tensors corresponding to the input images. Pixel values can be obtained using | |
| [BitImageProcessor](/docs/transformers/main/en/model_doc/bit#transformers.BitImageProcessor). See `BitImageProcessor.__call__()` for details (`processor_class` uses | |
| [BitImageProcessor](/docs/transformers/main/en/model_doc/bit#transformers.BitImageProcessor) for processing images). | |
| - **output_hidden_states** (`bool`, *optional*) -- | |
| Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for | |
| more detail. | |
| - **return_dict** (`bool`, *optional*) -- | |
| Whether or not to return a [ModelOutput](/docs/transformers/main/en/main_classes/output#transformers.utils.ModelOutput) instead of a plain tuple.0`BaseModelOutputWithPoolingAndNoAttention` or `tuple(torch.FloatTensor)`A `BaseModelOutputWithPoolingAndNoAttention` or a tuple of | |
| `torch.FloatTensor` (if `return_dict=False` is passed or when `config.return_dict=False`) comprising various | |
| elements depending on the configuration ([BitConfig](/docs/transformers/main/en/model_doc/bit#transformers.BitConfig)) and inputs. | |
| The [BitModel](/docs/transformers/main/en/model_doc/bit#transformers.BitModel) forward method, overrides the `__call__` special method. | |
| Although the recipe for forward pass needs to be defined within this function, one should call the `Module` | |
| instance afterwards instead of this since the former takes care of running the pre and post processing steps while | |
| the latter silently ignores them. | |
| - **last_hidden_state** (`torch.FloatTensor` of shape `(batch_size, num_channels, height, width)`) -- Sequence of hidden-states at the output of the last layer of the model. | |
| - **pooler_output** (`torch.FloatTensor` of shape `(batch_size, hidden_size)`) -- Last layer hidden-state after a pooling operation on the spatial dimensions. | |
| - **hidden_states** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`) -- Tuple of `torch.FloatTensor` (one for the output of the embeddings, if the model has an embedding layer, + | |
| one for the output of each layer) of shape `(batch_size, num_channels, height, width)`. | |
| Hidden-states of the model at the output of each layer plus the optional initial embedding outputs. | |
| Example: | |
| ```python | |
| ``` | |
| **Parameters:** | |
| config ([BitModel](/docs/transformers/main/en/model_doc/bit#transformers.BitModel)) : Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the [from_pretrained()](/docs/transformers/main/en/main_classes/model#transformers.PreTrainedModel.from_pretrained) method to load the model weights. | |
| **Returns:** | |
| ``BaseModelOutputWithPoolingAndNoAttention` or `tuple(torch.FloatTensor)`` | |
| A `BaseModelOutputWithPoolingAndNoAttention` or a tuple of | |
| `torch.FloatTensor` (if `return_dict=False` is passed or when `config.return_dict=False`) comprising various | |
| elements depending on the configuration ([BitConfig](/docs/transformers/main/en/model_doc/bit#transformers.BitConfig)) and inputs. | |
| ## BitForImageClassification[[transformers.BitForImageClassification]] | |
| #### transformers.BitForImageClassification[[transformers.BitForImageClassification]] | |
| [Source](https://github.com/huggingface/transformers/blob/main/src/transformers/models/bit/modeling_bit.py#L705) | |
| BiT Model with an image classification head on top (a linear layer on top of the pooled features), e.g. for | |
| ImageNet. | |
| This model inherits from [PreTrainedModel](/docs/transformers/main/en/main_classes/model#transformers.PreTrainedModel). Check the superclass documentation for the generic methods the | |
| library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads | |
| etc.) | |
| This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass. | |
| Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage | |
| and behavior. | |
| forwardtransformers.BitForImageClassification.forwardhttps://github.com/huggingface/transformers/blob/main/src/transformers/models/bit/modeling_bit.py#L718[{"name": "pixel_values", "val": ": torch.FloatTensor | None = None"}, {"name": "labels", "val": ": torch.LongTensor | None = None"}, {"name": "output_hidden_states", "val": ": bool | None = None"}, {"name": "return_dict", "val": ": bool | None = None"}, {"name": "**kwargs", "val": ""}]- **pixel_values** (`torch.FloatTensor` of shape `(batch_size, num_channels, image_size, image_size)`, *optional*) -- | |
| The tensors corresponding to the input images. Pixel values can be obtained using | |
| [BitImageProcessor](/docs/transformers/main/en/model_doc/bit#transformers.BitImageProcessor). See `BitImageProcessor.__call__()` for details (`processor_class` uses | |
| [BitImageProcessor](/docs/transformers/main/en/model_doc/bit#transformers.BitImageProcessor) for processing images). | |
| - **labels** (`torch.LongTensor` of shape `(batch_size,)`, *optional*) -- | |
| Labels for computing the image classification/regression loss. Indices should be in `[0, ..., | |
| config.num_labels - 1]`. If `config.num_labels > 1` a classification loss is computed (Cross-Entropy). | |
| - **output_hidden_states** (`bool`, *optional*) -- | |
| Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for | |
| more detail. | |
| - **return_dict** (`bool`, *optional*) -- | |
| Whether or not to return a [ModelOutput](/docs/transformers/main/en/main_classes/output#transformers.utils.ModelOutput) instead of a plain tuple.0[ImageClassifierOutputWithNoAttention](/docs/transformers/main/en/main_classes/output#transformers.modeling_outputs.ImageClassifierOutputWithNoAttention) or `tuple(torch.FloatTensor)`A [ImageClassifierOutputWithNoAttention](/docs/transformers/main/en/main_classes/output#transformers.modeling_outputs.ImageClassifierOutputWithNoAttention) or a tuple of | |
| `torch.FloatTensor` (if `return_dict=False` is passed or when `config.return_dict=False`) comprising various | |
| elements depending on the configuration ([BitConfig](/docs/transformers/main/en/model_doc/bit#transformers.BitConfig)) and inputs. | |
| The [BitForImageClassification](/docs/transformers/main/en/model_doc/bit#transformers.BitForImageClassification) forward method, overrides the `__call__` special method. | |
| Although the recipe for forward pass needs to be defined within this function, one should call the `Module` | |
| instance afterwards instead of this since the former takes care of running the pre and post processing steps while | |
| the latter silently ignores them. | |
| - **loss** (`torch.FloatTensor` of shape `(1,)`, *optional*, returned when `labels` is provided) -- Classification (or regression if config.num_labels==1) loss. | |
| - **logits** (`torch.FloatTensor` of shape `(batch_size, config.num_labels)`) -- Classification (or regression if config.num_labels==1) scores (before SoftMax). | |
| - **hidden_states** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`) -- Tuple of `torch.FloatTensor` (one for the output of the embeddings, if the model has an embedding layer, + | |
| one for the output of each stage) of shape `(batch_size, num_channels, height, width)`. Hidden-states (also | |
| called feature maps) of the model at the output of each stage. | |
| Example: | |
| ```python | |
| >>> from transformers import AutoImageProcessor, BitForImageClassification | |
| >>> import torch | |
| >>> from datasets import load_dataset | |
| >>> dataset = load_dataset("huggingface/cats-image") | |
| >>> image = dataset["test"]["image"][0] | |
| >>> image_processor = AutoImageProcessor.from_pretrained("google/bit-50") | |
| >>> model = BitForImageClassification.from_pretrained("google/bit-50") | |
| >>> inputs = image_processor(image, return_tensors="pt") | |
| >>> with torch.no_grad(): | |
| ... logits = model(**inputs).logits | |
| >>> # model predicts one of the 1000 ImageNet classes | |
| >>> predicted_label = logits.argmax(-1).item() | |
| >>> print(model.config.id2label[predicted_label]) | |
| ... | |
| ``` | |
| **Parameters:** | |
| config ([BitForImageClassification](/docs/transformers/main/en/model_doc/bit#transformers.BitForImageClassification)) : Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the [from_pretrained()](/docs/transformers/main/en/main_classes/model#transformers.PreTrainedModel.from_pretrained) method to load the model weights. | |
| **Returns:** | |
| `[ImageClassifierOutputWithNoAttention](/docs/transformers/main/en/main_classes/output#transformers.modeling_outputs.ImageClassifierOutputWithNoAttention) or `tuple(torch.FloatTensor)`` | |
| A [ImageClassifierOutputWithNoAttention](/docs/transformers/main/en/main_classes/output#transformers.modeling_outputs.ImageClassifierOutputWithNoAttention) or a tuple of | |
| `torch.FloatTensor` (if `return_dict=False` is passed or when `config.return_dict=False`) comprising various | |
| elements depending on the configuration ([BitConfig](/docs/transformers/main/en/model_doc/bit#transformers.BitConfig)) and inputs. | |
Xet Storage Details
- Size:
- 21.3 kB
- Xet hash:
- 3d799be3f0bc4220e859c848b820bc3cfdeb9986f5aec9c121563df41fc4ea4c
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.