Buckets:
| # Big Transfer (BiT) | |
| ## Overview | |
| BiT モデルは、Alexander Kolesnikov、Lucas Beyer、Xiaohua Zhai、Joan Puigcerver、Jessica Yung、Sylvain Gelly によって [Big Transfer (BiT): General Visual Representation Learning](https://huggingface.co/papers/1912.11370) で提案されました。ニール・ホールズビー。 | |
| BiT は、[ResNet](resnet) のようなアーキテクチャ (具体的には ResNetv2) の事前トレーニングをスケールアップするための簡単なレシピです。この方法により、転移学習が大幅に改善されます。 | |
| 論文の要約は次のとおりです。 | |
| *事前トレーニングされた表現の転送により、サンプル効率が向上し、視覚用のディープ ニューラル ネットワークをトレーニングする際のハイパーパラメーター調整が簡素化されます。大規模な教師ありデータセットでの事前トレーニングと、ターゲット タスクでのモデルの微調整のパラダイムを再検討します。私たちは事前トレーニングをスケールアップし、Big Transfer (BiT) と呼ぶシンプルなレシピを提案します。いくつかの慎重に選択されたコンポーネントを組み合わせ、シンプルなヒューリスティックを使用して転送することにより、20 を超えるデータセットで優れたパフォーマンスを実現します。 BiT は、クラスごとに 1 つのサンプルから合計 100 万のサンプルまで、驚くほど広範囲のデータ領域にわたって良好にパフォーマンスを発揮します。 BiT は、ILSVRC-2012 で 87.5%、CIFAR-10 で 99.4%、19 タスクの Visual Task Adaptation Benchmark (VTAB) で 76.3% のトップ 1 精度を達成しました。小規模なデータセットでは、BiT は ILSVRC-2012 (クラスあたり 10 例) で 76.8%、CIFAR-10 (クラスあたり 10 例) で 97.0% を達成しました。高い転写性能を実現する主要成分を詳細に分析※。 | |
| ## Usage tips | |
| - BiT モデルは、アーキテクチャの点で ResNetv2 と同等ですが、次の点が異なります: 1) すべてのバッチ正規化層が [グループ正規化](https://huggingface.co/papers/1803.08494) に置き換えられます。 | |
| 2) [重みの標準化](https://huggingface.co/papers/1903.10520) は畳み込み層に使用されます。著者らは、両方の組み合わせが大きなバッチサイズでのトレーニングに役立ち、重要な効果があることを示しています。 | |
| 転移学習への影響。 | |
| このモデルは、[nielsr](https://huggingface.co/nielsr) によって提供されました。 | |
| 元のコードは [こちら](https://github.com/google-research/big_transfer) にあります。 | |
| ## Resources | |
| BiT を始めるのに役立つ公式 Hugging Face およびコミュニティ (🌎 で示されている) リソースのリスト。 | |
| - [BitForImageClassification](/docs/transformers/main/ja/model_doc/bit#transformers.BitForImageClassification) は、この [サンプル スクリプト](https://github.com/huggingface/transformers/tree/main/examples/pytorch/image-classification) および [ノートブック](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/image_classification.ipynb)。 | |
| - 参照: [画像分類タスク ガイド](../tasks/image_classification) | |
| ここに含めるリソースの送信に興味がある場合は、お気軽にプル リクエストを開いてください。審査させていただきます。リソースは、既存のリソースを複製するのではなく、何か新しいものを示すことが理想的です。 | |
| ## BitConfig[[transformers.BitConfig]] | |
| #### transformers.BitConfig[[transformers.BitConfig]] | |
| [Source](https://github.com/huggingface/transformers/blob/main/src/transformers/models/bit/configuration_bit.py#L25) | |
| This is the configuration class to store the configuration of a BitModel. It is used to instantiate a Bit | |
| model according to the specified arguments, defining the model architecture. Instantiating a configuration with the | |
| defaults will yield a similar configuration to that of the [google/bit-50](https://huggingface.co/google/bit-50) | |
| Configuration objects inherit from [PreTrainedConfig](/docs/transformers/main/ja/main_classes/configuration#transformers.PreTrainedConfig) and can be used to control the model outputs. Read the | |
| documentation from [PreTrainedConfig](/docs/transformers/main/ja/main_classes/configuration#transformers.PreTrainedConfig) for more information. | |
| Example: | |
| ```python | |
| >>> from transformers import BitConfig, BitModel | |
| >>> # Initializing a BiT bit-50 style configuration | |
| >>> configuration = BitConfig() | |
| >>> # Initializing a model (with random weights) from the bit-50 style configuration | |
| >>> model = BitModel(configuration) | |
| >>> # Accessing the model configuration | |
| >>> configuration = model.config | |
| ``` | |
| **Parameters:** | |
| num_channels (`int`, *optional*, defaults to `3`) : The number of input channels. | |
| embedding_size (`int`, *optional*, defaults to `64`) : Dimensionality of the embeddings and hidden states. | |
| hidden_sizes (`Union[list[int], tuple[int, ...]]`, *optional*, defaults to `(256, 512, 1024, 2048)`) : Dimensionality (hidden size) at each stage of the model. | |
| depths (`Union[list[int], tuple[int, ...]]`, *optional*, defaults to `(3, 4, 6, 3)`) : Depth of each layer in the Transformer. | |
| layer_type (`str`, *optional*, defaults to `"preactivation"`) : The layer to use, it can be either `"preactivation"` or `"bottleneck"`. | |
| hidden_act (`str`, *optional*, defaults to `relu`) : The non-linear activation function (function or string) in the decoder. For example, `"gelu"`, `"relu"`, `"silu"`, etc. | |
| global_padding (`str`, *optional*) : Padding strategy to use for the convolutional layers. Can be either `"valid"`, `"same"`, or `None`. | |
| num_groups (`int`, *optional*, defaults to 32) : Number of groups used for the `BitGroupNormActivation` layers. | |
| drop_path_rate (`Union[float, int]`, *optional*, defaults to `0.0`) : Drop path rate for the patch fusion. | |
| embedding_dynamic_padding (`bool`, *optional*, defaults to `False`) : Whether or not to make use of dynamic padding for the embedding layer. | |
| output_stride (`int`, *optional*, defaults to `32`) : The ratio between the spatial resolution of the input and output feature maps. | |
| width_factor (`int`, *optional*, defaults to 1) : The width factor for the model. | |
| ## BitImageProcessor[[transformers.BitImageProcessor]] | |
| #### transformers.BitImageProcessor[[transformers.BitImageProcessor]] | |
| [Source](https://github.com/huggingface/transformers/blob/main/src/transformers/models/bit/image_processing_bit.py#L22) | |
| Constructs a BitImageProcessor image processor. | |
| preprocesstransformers.BitImageProcessor.preprocesshttps://github.com/huggingface/transformers/blob/main/src/transformers/image_processing_utils.py#L382[{"name": "images", "val": ": typing.Union[ForwardRef('PIL.Image.Image'), numpy.ndarray, ForwardRef('torch.Tensor'), list['PIL.Image.Image'], list[numpy.ndarray], list['torch.Tensor']]"}, {"name": "*args", "val": ""}, {"name": "**kwargs", "val": ": typing_extensions.Unpack[transformers.processing_utils.ImagesKwargs]"}]- **images** (`Union[PIL.Image.Image, numpy.ndarray, torch.Tensor, list[PIL.Image.Image], list[numpy.ndarray], list[torch.Tensor]]`) -- | |
| Image to preprocess. Expects a single or batch of images with pixel values ranging from 0 to 255. If | |
| passing in images with pixel values between 0 and 1, set `do_rescale=False`. | |
| - **return_tensors** (`str` or [TensorType](/docs/transformers/main/ja/internal/file_utils#transformers.TensorType), *optional*) -- | |
| Returns stacked tensors if set to `'pt'`, otherwise returns a list of tensors. | |
| - ****kwargs** (`ImagesKwargs`, *optional*) -- | |
| Additional image preprocessing options. Model-specific kwargs are listed above; see the TypedDict class | |
| for the complete list of supported arguments.0`~image_processing_base.BatchFeature`- **data** (`dict`) -- Dictionary of lists/arrays/tensors returned by the __call__ method ('pixel_values', etc.). | |
| - **tensor_type** (`Union[None, str, TensorType]`, *optional*) -- You can give a tensor_type here to convert the lists of integers in PyTorch/Numpy Tensors at | |
| initialization. | |
| **Parameters:** | |
| - ****kwargs** (`ImagesKwargs`, *optional*) : Additional image preprocessing options. Model-specific kwargs are listed above; see the TypedDict class for the complete list of supported arguments. | |
| **Returns:** | |
| ``~image_processing_base.BatchFeature`` | |
| - **data** (`dict`) -- Dictionary of lists/arrays/tensors returned by the __call__ method ('pixel_values', etc.). | |
| - **tensor_type** (`Union[None, str, TensorType]`, *optional*) -- You can give a tensor_type here to convert the lists of integers in PyTorch/Numpy Tensors at | |
| initialization. | |
| ## BitImageProcessorPil[[transformers.BitImageProcessorPil]] | |
| #### transformers.BitImageProcessorPil[[transformers.BitImageProcessorPil]] | |
| [Source](https://github.com/huggingface/transformers/blob/main/src/transformers/models/bit/image_processing_pil_bit.py#L22) | |
| Constructs a BitImageProcessor image processor. | |
| preprocesstransformers.BitImageProcessorPil.preprocesshttps://github.com/huggingface/transformers/blob/main/src/transformers/image_processing_utils.py#L382[{"name": "images", "val": ": typing.Union[ForwardRef('PIL.Image.Image'), numpy.ndarray, ForwardRef('torch.Tensor'), list['PIL.Image.Image'], list[numpy.ndarray], list['torch.Tensor']]"}, {"name": "*args", "val": ""}, {"name": "**kwargs", "val": ": typing_extensions.Unpack[transformers.processing_utils.ImagesKwargs]"}]- **images** (`Union[PIL.Image.Image, numpy.ndarray, torch.Tensor, list[PIL.Image.Image], list[numpy.ndarray], list[torch.Tensor]]`) -- | |
| Image to preprocess. Expects a single or batch of images with pixel values ranging from 0 to 255. If | |
| passing in images with pixel values between 0 and 1, set `do_rescale=False`. | |
| - **return_tensors** (`str` or [TensorType](/docs/transformers/main/ja/internal/file_utils#transformers.TensorType), *optional*) -- | |
| Returns stacked tensors if set to `'pt'`, otherwise returns a list of tensors. | |
| - ****kwargs** (`ImagesKwargs`, *optional*) -- | |
| Additional image preprocessing options. Model-specific kwargs are listed above; see the TypedDict class | |
| for the complete list of supported arguments.0`~image_processing_base.BatchFeature`- **data** (`dict`) -- Dictionary of lists/arrays/tensors returned by the __call__ method ('pixel_values', etc.). | |
| - **tensor_type** (`Union[None, str, TensorType]`, *optional*) -- You can give a tensor_type here to convert the lists of integers in PyTorch/Numpy Tensors at | |
| initialization. | |
| **Parameters:** | |
| - ****kwargs** (`ImagesKwargs`, *optional*) : Additional image preprocessing options. Model-specific kwargs are listed above; see the TypedDict class for the complete list of supported arguments. | |
| **Returns:** | |
| ``~image_processing_base.BatchFeature`` | |
| - **data** (`dict`) -- Dictionary of lists/arrays/tensors returned by the __call__ method ('pixel_values', etc.). | |
| - **tensor_type** (`Union[None, str, TensorType]`, *optional*) -- You can give a tensor_type here to convert the lists of integers in PyTorch/Numpy Tensors at | |
| initialization. | |
| ## BitModel[[transformers.BitModel]] | |
| #### transformers.BitModel[[transformers.BitModel]] | |
| [Source](https://github.com/huggingface/transformers/blob/main/src/transformers/models/bit/modeling_bit.py#L646) | |
| The bare Bit Model outputting raw hidden-states without any specific head on top. | |
| This model inherits from [PreTrainedModel](/docs/transformers/main/ja/main_classes/model#transformers.PreTrainedModel). Check the superclass documentation for the generic methods the | |
| library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads | |
| etc.) | |
| This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass. | |
| Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage | |
| and behavior. | |
| forwardtransformers.BitModel.forwardhttps://github.com/huggingface/transformers/blob/main/src/transformers/models/bit/modeling_bit.py#L664[{"name": "pixel_values", "val": ": Tensor"}, {"name": "output_hidden_states", "val": ": bool | None = None"}, {"name": "return_dict", "val": ": bool | None = None"}, {"name": "**kwargs", "val": ""}]- **pixel_values** (`torch.Tensor` of shape `(batch_size, num_channels, image_size, image_size)`) -- | |
| The tensors corresponding to the input images. Pixel values can be obtained using | |
| [BitImageProcessor](/docs/transformers/main/ja/model_doc/bit#transformers.BitImageProcessor). See `BitImageProcessor.__call__()` for details (`processor_class` uses | |
| [BitImageProcessor](/docs/transformers/main/ja/model_doc/bit#transformers.BitImageProcessor) for processing images). | |
| - **output_hidden_states** (`bool`, *optional*) -- | |
| Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for | |
| more detail. | |
| - **return_dict** (`bool`, *optional*) -- | |
| Whether or not to return a [ModelOutput](/docs/transformers/main/ja/main_classes/output#transformers.utils.ModelOutput) instead of a plain tuple.0`BaseModelOutputWithPoolingAndNoAttention` or `tuple(torch.FloatTensor)`A `BaseModelOutputWithPoolingAndNoAttention` or a tuple of | |
| `torch.FloatTensor` (if `return_dict=False` is passed or when `config.return_dict=False`) comprising various | |
| elements depending on the configuration ([BitConfig](/docs/transformers/main/ja/model_doc/bit#transformers.BitConfig)) and inputs. | |
| The [BitModel](/docs/transformers/main/ja/model_doc/bit#transformers.BitModel) forward method, overrides the `__call__` special method. | |
| Although the recipe for forward pass needs to be defined within this function, one should call the `Module` | |
| instance afterwards instead of this since the former takes care of running the pre and post processing steps while | |
| the latter silently ignores them. | |
| - **last_hidden_state** (`torch.FloatTensor` of shape `(batch_size, num_channels, height, width)`) -- Sequence of hidden-states at the output of the last layer of the model. | |
| - **pooler_output** (`torch.FloatTensor` of shape `(batch_size, hidden_size)`) -- Last layer hidden-state after a pooling operation on the spatial dimensions. | |
| - **hidden_states** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`) -- Tuple of `torch.FloatTensor` (one for the output of the embeddings, if the model has an embedding layer, + | |
| one for the output of each layer) of shape `(batch_size, num_channels, height, width)`. | |
| Hidden-states of the model at the output of each layer plus the optional initial embedding outputs. | |
| Example: | |
| ```python | |
| ``` | |
| **Parameters:** | |
| config ([BitModel](/docs/transformers/main/ja/model_doc/bit#transformers.BitModel)) : Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the [from_pretrained()](/docs/transformers/main/ja/main_classes/model#transformers.PreTrainedModel.from_pretrained) method to load the model weights. | |
| **Returns:** | |
| ``BaseModelOutputWithPoolingAndNoAttention` or `tuple(torch.FloatTensor)`` | |
| A `BaseModelOutputWithPoolingAndNoAttention` or a tuple of | |
| `torch.FloatTensor` (if `return_dict=False` is passed or when `config.return_dict=False`) comprising various | |
| elements depending on the configuration ([BitConfig](/docs/transformers/main/ja/model_doc/bit#transformers.BitConfig)) and inputs. | |
| ## BitForImageClassification[[transformers.BitForImageClassification]] | |
| #### transformers.BitForImageClassification[[transformers.BitForImageClassification]] | |
| [Source](https://github.com/huggingface/transformers/blob/main/src/transformers/models/bit/modeling_bit.py#L705) | |
| BiT Model with an image classification head on top (a linear layer on top of the pooled features), e.g. for | |
| ImageNet. | |
| This model inherits from [PreTrainedModel](/docs/transformers/main/ja/main_classes/model#transformers.PreTrainedModel). Check the superclass documentation for the generic methods the | |
| library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads | |
| etc.) | |
| This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass. | |
| Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage | |
| and behavior. | |
| forwardtransformers.BitForImageClassification.forwardhttps://github.com/huggingface/transformers/blob/main/src/transformers/models/bit/modeling_bit.py#L718[{"name": "pixel_values", "val": ": torch.FloatTensor | None = None"}, {"name": "labels", "val": ": torch.LongTensor | None = None"}, {"name": "output_hidden_states", "val": ": bool | None = None"}, {"name": "return_dict", "val": ": bool | None = None"}, {"name": "**kwargs", "val": ""}]- **pixel_values** (`torch.FloatTensor` of shape `(batch_size, num_channels, image_size, image_size)`, *optional*) -- | |
| The tensors corresponding to the input images. Pixel values can be obtained using | |
| [BitImageProcessor](/docs/transformers/main/ja/model_doc/bit#transformers.BitImageProcessor). See `BitImageProcessor.__call__()` for details (`processor_class` uses | |
| [BitImageProcessor](/docs/transformers/main/ja/model_doc/bit#transformers.BitImageProcessor) for processing images). | |
| - **labels** (`torch.LongTensor` of shape `(batch_size,)`, *optional*) -- | |
| Labels for computing the image classification/regression loss. Indices should be in `[0, ..., | |
| config.num_labels - 1]`. If `config.num_labels > 1` a classification loss is computed (Cross-Entropy). | |
| - **output_hidden_states** (`bool`, *optional*) -- | |
| Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for | |
| more detail. | |
| - **return_dict** (`bool`, *optional*) -- | |
| Whether or not to return a [ModelOutput](/docs/transformers/main/ja/main_classes/output#transformers.utils.ModelOutput) instead of a plain tuple.0[ImageClassifierOutputWithNoAttention](/docs/transformers/main/ja/main_classes/output#transformers.modeling_outputs.ImageClassifierOutputWithNoAttention) or `tuple(torch.FloatTensor)`A [ImageClassifierOutputWithNoAttention](/docs/transformers/main/ja/main_classes/output#transformers.modeling_outputs.ImageClassifierOutputWithNoAttention) or a tuple of | |
| `torch.FloatTensor` (if `return_dict=False` is passed or when `config.return_dict=False`) comprising various | |
| elements depending on the configuration ([BitConfig](/docs/transformers/main/ja/model_doc/bit#transformers.BitConfig)) and inputs. | |
| The [BitForImageClassification](/docs/transformers/main/ja/model_doc/bit#transformers.BitForImageClassification) forward method, overrides the `__call__` special method. | |
| Although the recipe for forward pass needs to be defined within this function, one should call the `Module` | |
| instance afterwards instead of this since the former takes care of running the pre and post processing steps while | |
| the latter silently ignores them. | |
| - **loss** (`torch.FloatTensor` of shape `(1,)`, *optional*, returned when `labels` is provided) -- Classification (or regression if config.num_labels==1) loss. | |
| - **logits** (`torch.FloatTensor` of shape `(batch_size, config.num_labels)`) -- Classification (or regression if config.num_labels==1) scores (before SoftMax). | |
| - **hidden_states** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`) -- Tuple of `torch.FloatTensor` (one for the output of the embeddings, if the model has an embedding layer, + | |
| one for the output of each stage) of shape `(batch_size, num_channels, height, width)`. Hidden-states (also | |
| called feature maps) of the model at the output of each stage. | |
| Example: | |
| ```python | |
| >>> from transformers import AutoImageProcessor, BitForImageClassification | |
| >>> import torch | |
| >>> from datasets import load_dataset | |
| >>> dataset = load_dataset("huggingface/cats-image") | |
| >>> image = dataset["test"]["image"][0] | |
| >>> image_processor = AutoImageProcessor.from_pretrained("google/bit-50") | |
| >>> model = BitForImageClassification.from_pretrained("google/bit-50") | |
| >>> inputs = image_processor(image, return_tensors="pt") | |
| >>> with torch.no_grad(): | |
| ... logits = model(**inputs).logits | |
| >>> # model predicts one of the 1000 ImageNet classes | |
| >>> predicted_label = logits.argmax(-1).item() | |
| >>> print(model.config.id2label[predicted_label]) | |
| ... | |
| ``` | |
| **Parameters:** | |
| config ([BitForImageClassification](/docs/transformers/main/ja/model_doc/bit#transformers.BitForImageClassification)) : Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the [from_pretrained()](/docs/transformers/main/ja/main_classes/model#transformers.PreTrainedModel.from_pretrained) method to load the model weights. | |
| **Returns:** | |
| `[ImageClassifierOutputWithNoAttention](/docs/transformers/main/ja/main_classes/output#transformers.modeling_outputs.ImageClassifierOutputWithNoAttention) or `tuple(torch.FloatTensor)`` | |
| A [ImageClassifierOutputWithNoAttention](/docs/transformers/main/ja/main_classes/output#transformers.modeling_outputs.ImageClassifierOutputWithNoAttention) or a tuple of | |
| `torch.FloatTensor` (if `return_dict=False` is passed or when `config.return_dict=False`) comprising various | |
| elements depending on the configuration ([BitConfig](/docs/transformers/main/ja/model_doc/bit#transformers.BitConfig)) and inputs. | |
Xet Storage Details
- Size:
- 21.8 kB
- Xet hash:
- c1ef0fe8a3c5c1893a97d870be420189d29b8d336814e4a3d53084fbea015052
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.