Buckets:
| # ALBERT[[albert]] | |
| [ALBERT](https://huggingface.co/papers/1909.11942)는 [BERT](./bert)의 확장성과 학습 시 메모리 한계를 해결하기 위해 설계된 모델입니다. 이 모델은 두 가지 파라미터 감소 기법을 도입합니다. 첫 번째는 임베딩 행렬 분해(factorized embedding parametrization)로, 큰 어휘 임베딩 행렬을 두 개의 작은 행렬로 분해하여 히든 사이즈를 늘려도 파라미터 수가 크게 증가하지 않도록 합니다. 두 번째는 계층 간 파라미터 공유(cross-layer parameter sharing)로, 여러 계층이 파라미터를 공유하여 학습해야 할 파라미터 수를 줄입니다. | |
| ALBERT는 BERT에서 발생하는 GPU/TPU 메모리 한계, 긴 학습 시간, 갑작스런 성능 저하 문제를 해결하기 위해 만들어졌습니다. ALBERT는 파라미터를 줄이기 위해 두 가지 기법을 사용하여 메모리 사용량을 줄이고 BERT의 학습 속도를 높입니다: | |
| - **임베딩 행렬 분해:** 큰 어휘 임베딩 행렬을 두 개의 더 작은 행렬로 분해하여 메모리 사용량을 줄입니다. | |
| - **계층 간 파라미터 공유:** 각 트랜스포머 계층마다 별도의 파라미터를 학습하는 대신, 여러 계층이 파라미터를 공유하여 학습해야 할 가중치 수를 더욱 줄입니다. | |
| ALBERT는 BERT와 마찬가지로 절대 위치 임베딩(absolute position embeddings)을 사용하므로, 입력 패딩은 오른쪽에 적용해야 합니다. 임베딩 크기는 128이며, BERT의 768보다 작습니다. ALBERT는 한 번에 최대 512개의 토큰을 처리할 수 있습니다. | |
| 모든 공식 ALBERT 체크포인트는 [ALBERT 커뮤니티](https://huggingface.co/albert) 조직에서 확인하실 수 있습니다. | |
| > [!TIP] | |
| > 오른쪽 사이드바의 ALBERT 모델을 클릭하시면 다양한 언어 작업에 ALBERT를 적용하는 예시를 더 확인하실 수 있습니다. | |
| 아래 예시는 [Pipeline](/docs/transformers/main/ko/main_classes/pipelines#transformers.Pipeline), [AutoModel](/docs/transformers/main/ko/model_doc/auto#transformers.AutoModel) 그리고 커맨드라인에서 `[MASK]` 토큰을 예측하는 방법을 보여줍니다. | |
| ```py | |
| import torch | |
| from transformers import pipeline | |
| pipeline = pipeline( | |
| task="fill-mask", | |
| model="albert-base-v2", | |
| dtype=torch.float16, | |
| device=0 | |
| ) | |
| pipeline("식물은 광합성이라고 알려진 과정을 통해 [MASK]를 생성합니다.", top_k=5) | |
| ``` | |
| ```py | |
| import torch | |
| from transformers import AutoModelForMaskedLM, AutoTokenizer | |
| tokenizer = AutoTokenizer.from_pretrained("albert/albert-base-v2") | |
| model = AutoModelForMaskedLM.from_pretrained( | |
| "albert/albert-base-v2", | |
| dtype=torch.float16, | |
| attn_implementation="sdpa", | |
| device_map="auto" | |
| ) | |
| prompt = "식물은 [MASK]이라고 알려진 과정을 통해 에너지를 생성합니다." | |
| inputs = tokenizer(prompt, return_tensors="pt").to(model.device) | |
| with torch.no_grad(): | |
| outputs = model(**inputs) | |
| mask_token_index = torch.where(inputs["input_ids"] == tokenizer.mask_token_id)[1] | |
| predictions = outputs.logits[0, mask_token_index] | |
| top_k = torch.topk(predictions, k=5).indices.tolist() | |
| for token_id in top_k[0]: | |
| print(f"예측: {tokenizer.decode([token_id])}") | |
| ``` | |
| ## 참고 사항[[notes]] | |
| - BERT는 절대 위치 임베딩을 사용하므로, 오른쪽에 입력이 패딩돼야 합니다. | |
| - 임베딩 크기 `E`는 히든 크기 `H`와 다릅니다. 임베딩은 문맥에 독립적(각 토큰마다 하나의 임베딩 벡터)이고, 은닉 상태는 문맥에 의존적(토큰 시퀀스마다 하나의 은닉 상태)입니다. 임베딩 행렬은 `V x E`(V: 어휘 크기)이므로, 일반적으로 `H >> E`가 더 논리적입니다. `E < H`일 때 모델 파라미터가 더 적어집니다. | |
| ## 참고 자료[[resources]] | |
| 아래 섹션의 자료들은 공식 Hugging Face 및 커뮤니티(🌎 표시) 자료로, AlBERT를 시작하는 데 도움이 됩니다. 여기에 추가할 자료가 있다면 Pull Request를 보내주세요! 기존 자료와 중복되지 않고 새로운 내용을 담고 있으면 좋습니다. | |
| - [AlbertForSequenceClassification](/docs/transformers/main/ko/model_doc/albert#transformers.AlbertForSequenceClassification)은 이 [예제 스크립트](https://github.com/huggingface/transformers/tree/main/examples/pytorch/text-classification)에서 지원됩니다. | |
| - `TFAlbertForSequenceClassification`은 이 [예제 스크립트](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/text-classification)에서 지원됩니다. | |
| - `FlaxAlbertForSequenceClassification`은 이 [예제 스크립트](https://github.com/huggingface/transformers/tree/main/examples/flax/text-classification)와 [노트북](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/text_classification_flax.ipynb)에서 지원됩니다. | |
| - [텍스트 분류 작업 가이드](../tasks/sequence_classification)에서 모델 사용법을 확인하세요. | |
| - [AlbertForTokenClassification](/docs/transformers/main/ko/model_doc/albert#transformers.AlbertForTokenClassification)은 이 [예제 스크립트](https://github.com/huggingface/transformers/tree/main/examples/pytorch/token-classification)에서 지원됩니다. | |
| - `TFAlbertForTokenClassification`은 이 [예제 스크립트](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/token-classification)와 [노트북](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/token_classification-tf.ipynb)에서 지원됩니다. | |
| - `FlaxAlbertForTokenClassification`은 이 [예제 스크립트](https://github.com/huggingface/transformers/tree/main/examples/flax/token-classification)에서 지원됩니다. | |
| - 🤗 Hugging Face의 [토큰 분류](https://huggingface.co/course/chapter7/2?fw=pt) 강좌 | |
| - [토큰 분류 작업 가이드](../tasks/token_classification)에서 모델 사용법을 확인하세요. | |
| - [AlbertForMaskedLM](/docs/transformers/main/ko/model_doc/albert#transformers.AlbertForMaskedLM)은 이 [예제 스크립트](https://github.com/huggingface/transformers/tree/main/examples/pytorch/language-modeling#robertabertdistilbert-and-masked-language-modeling)와 [노트북](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/language_modeling.ipynb)에서 지원됩니다. | |
| - `TFAlbertForMaskedLM`은 이 [예제 스크립트](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/language-modeling#run_mlmpy)와 [노트북](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/language_modeling-tf.ipynb)에서 지원됩니다. | |
| - `FlaxAlbertForMaskedLM`은 이 [예제 스크립트](https://github.com/huggingface/transformers/tree/main/examples/flax/language-modeling#masked-language-modeling)와 [노트북](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/masked_language_modeling_flax.ipynb)에서 지원됩니다. | |
| - 🤗 Hugging Face의 [마스킹 언어 모델링](https://huggingface.co/course/chapter7/3?fw=pt) 강좌 | |
| - [마스킹 언어 모델링 작업 가이드](../tasks/masked_language_modeling)에서 모델 사용법을 확인하세요. | |
| - [AlbertForQuestionAnswering](/docs/transformers/main/ko/model_doc/albert#transformers.AlbertForQuestionAnswering)은 이 [예제 스크립트](https://github.com/huggingface/transformers/tree/main/examples/pytorch/question-answering)와 [노트북](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/question_answering.ipynb)에서 지원됩니다. | |
| - `TFAlbertForQuestionAnswering`은 이 [예제 스크립트](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/question-answering)와 [노트북](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/question_answering-tf.ipynb)에서 지원됩니다. | |
| - `FlaxAlbertForQuestionAnswering`은 이 [예제 스크립트](https://github.com/huggingface/transformers/tree/main/examples/flax/question-answering)에서 지원됩니다. | |
| - [질의응답](https://huggingface.co/course/chapter7/7?fw=pt) 🤗 Hugging Face 강좌의 챕터. | |
| - [질의응답 작업 가이드](../tasks/question_answering)에서 모델 사용법을 확인하세요. | |
| **다중 선택(Multiple choice)** | |
| - [AlbertForMultipleChoice](/docs/transformers/main/ko/model_doc/albert#transformers.AlbertForMultipleChoice)는 이 [예제 스크립트](https://github.com/huggingface/transformers/tree/main/examples/pytorch/multiple-choice)와 [노트북](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/multiple_choice.ipynb)에서 지원됩니다. | |
| - `TFAlbertForMultipleChoice`는 이 [예제 스크립트](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/multiple-choice)와 [노트북](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/multiple_choice-tf.ipynb)에서 지원됩니다. | |
| - [다중 선택 작업 가이드](../tasks/multiple_choice)에서 모델 사용법을 확인하세요. | |
| ## AlbertConfig[[albertconfig]][[transformers.AlbertConfig]] | |
| #### transformers.AlbertConfig[[transformers.AlbertConfig]] | |
| [Source](https://github.com/huggingface/transformers/blob/main/src/transformers/models/albert/configuration_albert.py#L25) | |
| This is the configuration class to store the configuration of a AlbertModel. It is used to instantiate a Albert | |
| model according to the specified arguments, defining the model architecture. Instantiating a configuration with the | |
| defaults will yield a similar configuration to that of the [albert/albert-xxlarge-v2](https://huggingface.co/albert/albert-xxlarge-v2) | |
| Configuration objects inherit from [PreTrainedConfig](/docs/transformers/main/ko/main_classes/configuration#transformers.PreTrainedConfig) and can be used to control the model outputs. Read the | |
| documentation from [PreTrainedConfig](/docs/transformers/main/ko/main_classes/configuration#transformers.PreTrainedConfig) for more information. | |
| Examples: | |
| ```python | |
| >>> from transformers import AlbertConfig, AlbertModel | |
| >>> # Initializing an ALBERT-xxlarge style configuration | |
| >>> albert_xxlarge_configuration = AlbertConfig() | |
| >>> # Initializing an ALBERT-base style configuration | |
| >>> albert_base_configuration = AlbertConfig( | |
| ... hidden_size=768, | |
| ... num_attention_heads=12, | |
| ... intermediate_size=3072, | |
| ... ) | |
| >>> # Initializing a model (with random weights) from the ALBERT-base style configuration | |
| >>> model = AlbertModel(albert_xxlarge_configuration) | |
| >>> # Accessing the model configuration | |
| >>> configuration = model.config | |
| ``` | |
| **Parameters:** | |
| vocab_size (`int`, *optional*, defaults to `30000`) : Vocabulary size of the model. Defines the number of different tokens that can be represented by the `input_ids`. | |
| embedding_size (`int`, *optional*, defaults to `128`) : Dimensionality of the embeddings and hidden states. | |
| hidden_size (`int`, *optional*, defaults to `4096`) : Dimension of the hidden representations. | |
| num_hidden_layers (`int`, *optional*, defaults to `12`) : Number of hidden layers in the Transformer decoder. | |
| num_hidden_groups (`int`, *optional*, defaults to 1) : Number of groups for the hidden layers, parameters in the same group are shared. | |
| num_attention_heads (`int`, *optional*, defaults to `64`) : Number of attention heads for each attention layer in the Transformer decoder. | |
| intermediate_size (`int`, *optional*, defaults to `16384`) : Dimension of the MLP representations. | |
| inner_group_num (`int`, *optional*, defaults to 1) : The number of inner repetition of attention and ffn. | |
| hidden_act (`str`, *optional*, defaults to `gelu_new`) : The non-linear activation function (function or string) in the decoder. For example, `"gelu"`, `"relu"`, `"silu"`, etc. | |
| hidden_dropout_prob (`Union[int, float]`, *optional*, defaults to `0.0`) : The dropout probability for all fully connected layers in the embeddings, encoder, and pooler. | |
| attention_probs_dropout_prob (`Union[int, float]`, *optional*, defaults to `0.0`) : The dropout ratio for the attention probabilities. | |
| max_position_embeddings (`int`, *optional*, defaults to `512`) : The maximum sequence length that this model might ever be used with. | |
| type_vocab_size (`int`, *optional*, defaults to `2`) : The vocabulary size of the `token_type_ids`. | |
| initializer_range (`float`, *optional*, defaults to `0.02`) : The standard deviation of the truncated_normal_initializer for initializing all weight matrices. | |
| layer_norm_eps (`float`, *optional*, defaults to `1e-12`) : The epsilon used by the layer normalization layers. | |
| classifier_dropout_prob (`Union[int, float]`, *optional*, defaults to `0.1`) : The dropout ratio for classifier. | |
| pad_token_id (`int`, *optional*, defaults to `0`) : Token id used for padding in the vocabulary. | |
| bos_token_id (`int`, *optional*, defaults to `2`) : Token id used for beginning-of-stream in the vocabulary. | |
| eos_token_id (`Union[int, list[int]]`, *optional*, defaults to `3`) : Token id used for end-of-stream in the vocabulary. | |
| tie_word_embeddings (`bool`, *optional*, defaults to `True`) : Whether to tie weight embeddings according to model's `tied_weights_keys` mapping. | |
| ## AlbertTokenizer[[alberttokenizer]][[transformers.AlbertTokenizer]] | |
| #### transformers.AlbertTokenizer[[transformers.AlbertTokenizer]] | |
| [Source](https://github.com/huggingface/transformers/blob/main/src/transformers/models/albert/tokenization_albert.py#L28) | |
| Construct a "fast" ALBERT tokenizer (backed by HuggingFace's *tokenizers* library). Based on | |
| [Unigram](https://huggingface.co/docs/tokenizers/python/latest/components.html?highlight=unigram#models). This | |
| tokenizer inherits from [PreTrainedTokenizerFast](/docs/transformers/main/ko/main_classes/tokenizer#transformers.TokenizersBackend) which contains most of the main methods. Users should refer to | |
| this superclass for more information regarding those methods | |
| get_special_tokens_masktransformers.AlbertTokenizer.get_special_tokens_maskhttps://github.com/huggingface/transformers/blob/main/src/transformers/tokenization_utils_base.py#L1318[{"name": "token_ids_0", "val": ": list[int]"}, {"name": "token_ids_1", "val": ": list[int] | None = None"}, {"name": "already_has_special_tokens", "val": ": bool = False"}]- **token_ids_0** -- List of IDs for the (possibly already formatted) sequence. | |
| - **token_ids_1** -- Unused when `already_has_special_tokens=True`. Must be None in that case. | |
| - **already_has_special_tokens** -- Whether the sequence is already formatted with special tokens.0A list of integers in the range [0, 1]1 for a special token, 0 for a sequence token. | |
| Retrieve sequence ids from a token list that has no special tokens added. | |
| For fast tokenizers, data collators call this with `already_has_special_tokens=True` to build a mask over an | |
| already-formatted sequence. In that case, we compute the mask by checking membership in `all_special_ids`. | |
| **Parameters:** | |
| do_lower_case (`bool`, *optional*, defaults to `True`) : Whether or not to lowercase the input when tokenizing. | |
| keep_accents (`bool`, *optional*, defaults to `False`) : Whether or not to keep accents when tokenizing. | |
| bos_token (`str`, *optional*, defaults to `"[CLS]"`) : The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token. When building a sequence using special tokens, this is not the token that is used for the beginning of sequence. The token used is the `cls_token`. | |
| eos_token (`str`, *optional*, defaults to `"[SEP]"`) : The end of sequence token. .. note:: When building a sequence using special tokens, this is not the token that is used for the end of sequence. The token used is the `sep_token`. | |
| unk_token (`str`, *optional*, defaults to `"<unk>"`) : The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this token instead. | |
| sep_token (`str`, *optional*, defaults to `"[SEP]"`) : The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for sequence classification or for a text and a question for question answering. It is also used as the last token of a sequence built with special tokens. | |
| pad_token (`str`, *optional*, defaults to `"<pad>"`) : The token used for padding, for example when batching sequences of different lengths. | |
| cls_token (`str`, *optional*, defaults to `"[CLS]"`) : The classifier token which is used when doing sequence classification (classification of the whole sequence instead of per-token classification). It is the first token of the sequence when built with special tokens. | |
| mask_token (`str`, *optional*, defaults to `"[MASK]"`) : The token used for masking values. This is the token used when training this model with masked language modeling. This is the token which the model will try to predict. | |
| add_prefix_space (`bool`, *optional*, defaults to `True`) : Whether or not to add an initial space to the input. This allows to treat the leading word just as any other word. | |
| trim_offsets (`bool`, *optional*, defaults to `True`) : Whether the post processing step should trim offsets to avoid including whitespaces. | |
| vocab (`str` or `list[tuple[str, float]]`, *optional*) : Custom vocabulary with `(token, score)` tuples. If not provided, vocabulary is loaded from `vocab_file`. | |
| vocab_file (`str`, *optional*) : [SentencePiece](https://github.com/google/sentencepiece) file (generally has a .model extension) that contains the vocabulary necessary to instantiate a tokenizer. | |
| **Returns:** | |
| `A list of integers in the range [0, 1]` | |
| 1 for a special token, 0 for a sequence token. | |
| #### save_vocabulary[[transformers.AlbertTokenizer.save_vocabulary]] | |
| [Source](https://github.com/huggingface/transformers/blob/main/src/transformers/tokenization_utils_tokenizers.py#L509) | |
| ## AlbertTokenizerFast[[alberttokenizerfast]][[transformers.AlbertTokenizer]] | |
| #### transformers.AlbertTokenizer[[transformers.AlbertTokenizer]] | |
| [Source](https://github.com/huggingface/transformers/blob/main/src/transformers/models/albert/tokenization_albert.py#L28) | |
| Construct a "fast" ALBERT tokenizer (backed by HuggingFace's *tokenizers* library). Based on | |
| [Unigram](https://huggingface.co/docs/tokenizers/python/latest/components.html?highlight=unigram#models). This | |
| tokenizer inherits from [PreTrainedTokenizerFast](/docs/transformers/main/ko/main_classes/tokenizer#transformers.TokenizersBackend) which contains most of the main methods. Users should refer to | |
| this superclass for more information regarding those methods | |
| **Parameters:** | |
| do_lower_case (`bool`, *optional*, defaults to `True`) : Whether or not to lowercase the input when tokenizing. | |
| keep_accents (`bool`, *optional*, defaults to `False`) : Whether or not to keep accents when tokenizing. | |
| bos_token (`str`, *optional*, defaults to `"[CLS]"`) : The beginning of sequence token that was used during pretraining. Can be used a sequence classifier token. When building a sequence using special tokens, this is not the token that is used for the beginning of sequence. The token used is the `cls_token`. | |
| eos_token (`str`, *optional*, defaults to `"[SEP]"`) : The end of sequence token. .. note:: When building a sequence using special tokens, this is not the token that is used for the end of sequence. The token used is the `sep_token`. | |
| unk_token (`str`, *optional*, defaults to `"<unk>"`) : The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this token instead. | |
| sep_token (`str`, *optional*, defaults to `"[SEP]"`) : The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for sequence classification or for a text and a question for question answering. It is also used as the last token of a sequence built with special tokens. | |
| pad_token (`str`, *optional*, defaults to `"<pad>"`) : The token used for padding, for example when batching sequences of different lengths. | |
| cls_token (`str`, *optional*, defaults to `"[CLS]"`) : The classifier token which is used when doing sequence classification (classification of the whole sequence instead of per-token classification). It is the first token of the sequence when built with special tokens. | |
| mask_token (`str`, *optional*, defaults to `"[MASK]"`) : The token used for masking values. This is the token used when training this model with masked language modeling. This is the token which the model will try to predict. | |
| add_prefix_space (`bool`, *optional*, defaults to `True`) : Whether or not to add an initial space to the input. This allows to treat the leading word just as any other word. | |
| trim_offsets (`bool`, *optional*, defaults to `True`) : Whether the post processing step should trim offsets to avoid including whitespaces. | |
| vocab (`str` or `list[tuple[str, float]]`, *optional*) : Custom vocabulary with `(token, score)` tuples. If not provided, vocabulary is loaded from `vocab_file`. | |
| vocab_file (`str`, *optional*) : [SentencePiece](https://github.com/google/sentencepiece) file (generally has a .model extension) that contains the vocabulary necessary to instantiate a tokenizer. | |
| ## Albert 특화 출력[[albert-specific-outputs]][[transformers.models.albert.modeling_albert.AlbertForPreTrainingOutput]] | |
| #### transformers.models.albert.modeling_albert.AlbertForPreTrainingOutput[[transformers.models.albert.modeling_albert.AlbertForPreTrainingOutput]] | |
| [Source](https://github.com/huggingface/transformers/blob/main/src/transformers/models/albert/modeling_albert.py#L332) | |
| Output type of [AlbertForPreTraining](/docs/transformers/main/ko/model_doc/albert#transformers.AlbertForPreTraining). | |
| **Parameters:** | |
| loss (`*optional*`, returned when `labels` is provided, `torch.FloatTensor` of shape `(1,)`) : Total loss as the sum of the masked language modeling loss and the next sequence prediction (classification) loss. | |
| prediction_logits (`torch.FloatTensor` of shape `(batch_size, sequence_length, config.vocab_size)`) : Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax). | |
| sop_logits (`torch.FloatTensor` of shape `(batch_size, 2)`) : Prediction scores of the next sequence prediction (classification) head (scores of True/False continuation before SoftMax). | |
| hidden_states (`tuple[torch.FloatTensor]`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`) : Tuple of `torch.FloatTensor` (one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape `(batch_size, sequence_length, hidden_size)`. Hidden-states of the model at the output of each layer plus the optional initial embedding outputs. | |
| attentions (`tuple[torch.FloatTensor]`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`) : Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length, sequence_length)`. Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads. | |
| loss (`*optional*`, returned when `labels` is provided, `torch.FloatTensor` of shape `(1,)`) : Total loss as the sum of the masked language modeling loss and the next sequence prediction (classification) loss. | |
| prediction_logits (`torch.FloatTensor` of shape `(batch_size, sequence_length, config.vocab_size)`) : Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax). | |
| sop_logits (`torch.FloatTensor` of shape `(batch_size, 2)`) : Prediction scores of the next sequence prediction (classification) head (scores of True/False continuation before SoftMax). | |
| hidden_states (`tuple[torch.FloatTensor]`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`) : Tuple of `torch.FloatTensor` (one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape `(batch_size, sequence_length, hidden_size)`. Hidden-states of the model at the output of each layer plus the optional initial embedding outputs. | |
| attentions (`tuple[torch.FloatTensor]`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`) : Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length, sequence_length)`. Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads. | |
| ## AlbertModel[[albertmodel]][[transformers.AlbertModel]] | |
| #### transformers.AlbertModel[[transformers.AlbertModel]] | |
| [Source](https://github.com/huggingface/transformers/blob/main/src/transformers/models/albert/modeling_albert.py#L352) | |
| The bare Albert Model outputting raw hidden-states without any specific head on top. | |
| This model inherits from [PreTrainedModel](/docs/transformers/main/ko/main_classes/model#transformers.PreTrainedModel). Check the superclass documentation for the generic methods the | |
| library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads | |
| etc.) | |
| This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass. | |
| Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage | |
| and behavior. | |
| forwardtransformers.AlbertModel.forwardhttps://github.com/huggingface/transformers/blob/main/src/transformers/models/albert/modeling_albert.py#L384[{"name": "input_ids", "val": ": torch.LongTensor | None = None"}, {"name": "attention_mask", "val": ": torch.FloatTensor | None = None"}, {"name": "token_type_ids", "val": ": torch.LongTensor | None = None"}, {"name": "position_ids", "val": ": torch.LongTensor | None = None"}, {"name": "inputs_embeds", "val": ": torch.FloatTensor | None = None"}, {"name": "**kwargs", "val": ": typing_extensions.Unpack[transformers.utils.generic.TransformersKwargs]"}]- **input_ids** (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Indices of input sequence tokens in the vocabulary. Padding will be ignored by default. | |
| Indices can be obtained using [AutoTokenizer](/docs/transformers/main/ko/model_doc/auto#transformers.AutoTokenizer). See [PreTrainedTokenizer.encode()](/docs/transformers/main/ko/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.encode) and | |
| [PreTrainedTokenizer.__call__()](/docs/transformers/main/ko/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.__call__) for details. | |
| [What are input IDs?](../glossary#input-ids) | |
| - **attention_mask** (`torch.FloatTensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`: | |
| - 1 for tokens that are **not masked**, | |
| - 0 for tokens that are **masked**. | |
| [What are attention masks?](../glossary#attention-mask) | |
| - **token_type_ids** (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Segment token indices to indicate first and second portions of the inputs. Indices are selected in `[0, 1]`: | |
| - 0 corresponds to a *sentence A* token, | |
| - 1 corresponds to a *sentence B* token. | |
| [What are token type IDs?](../glossary#token-type-ids) | |
| - **position_ids** (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Indices of positions of each input sequence tokens in the position embeddings. Selected in the range `[0, config.n_positions - 1]`. | |
| [What are position IDs?](../glossary#position-ids) | |
| - **inputs_embeds** (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*) -- | |
| Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This | |
| is useful if you want more control over how to convert `input_ids` indices into associated vectors than the | |
| model's internal embedding lookup matrix.0[BaseModelOutputWithPooling](/docs/transformers/main/ko/main_classes/output#transformers.modeling_outputs.BaseModelOutputWithPooling) or `tuple(torch.FloatTensor)`A [BaseModelOutputWithPooling](/docs/transformers/main/ko/main_classes/output#transformers.modeling_outputs.BaseModelOutputWithPooling) or a tuple of | |
| `torch.FloatTensor` (if `return_dict=False` is passed or when `config.return_dict=False`) comprising various | |
| elements depending on the configuration ([AlbertConfig](/docs/transformers/main/ko/model_doc/albert#transformers.AlbertConfig)) and inputs. | |
| The [AlbertModel](/docs/transformers/main/ko/model_doc/albert#transformers.AlbertModel) forward method, overrides the `__call__` special method. | |
| Although the recipe for forward pass needs to be defined within this function, one should call the `Module` | |
| instance afterwards instead of this since the former takes care of running the pre and post processing steps while | |
| the latter silently ignores them. | |
| - **last_hidden_state** (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`) -- Sequence of hidden-states at the output of the last layer of the model. | |
| - **pooler_output** (`torch.FloatTensor` of shape `(batch_size, hidden_size)`) -- Last layer hidden-state of the first token of the sequence (classification token) after further processing | |
| through the layers used for the auxiliary pretraining task. E.g. for BERT-family of models, this returns | |
| the classification token after processing through a linear layer and a tanh activation function. The linear | |
| layer weights are trained from the next sentence prediction (classification) objective during pretraining. | |
| - **hidden_states** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`) -- Tuple of `torch.FloatTensor` (one for the output of the embeddings, if the model has an embedding layer, + | |
| one for the output of each layer) of shape `(batch_size, sequence_length, hidden_size)`. | |
| Hidden-states of the model at the output of each layer plus the optional initial embedding outputs. | |
| - **attentions** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`) -- Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length, | |
| sequence_length)`. | |
| Attentions weights after the attention softmax, used to compute the weighted average in the self-attention | |
| heads. | |
| **Parameters:** | |
| config ([AlbertConfig](/docs/transformers/main/ko/model_doc/albert#transformers.AlbertConfig)) : Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the [from_pretrained()](/docs/transformers/main/ko/main_classes/model#transformers.PreTrainedModel.from_pretrained) method to load the model weights. | |
| add_pooling_layer (`bool`, *optional*, defaults to `True`) : Whether to add a pooling layer | |
| **Returns:** | |
| `[BaseModelOutputWithPooling](/docs/transformers/main/ko/main_classes/output#transformers.modeling_outputs.BaseModelOutputWithPooling) or `tuple(torch.FloatTensor)`` | |
| A [BaseModelOutputWithPooling](/docs/transformers/main/ko/main_classes/output#transformers.modeling_outputs.BaseModelOutputWithPooling) or a tuple of | |
| `torch.FloatTensor` (if `return_dict=False` is passed or when `config.return_dict=False`) comprising various | |
| elements depending on the configuration ([AlbertConfig](/docs/transformers/main/ko/model_doc/albert#transformers.AlbertConfig)) and inputs. | |
| ## AlbertForPreTraining[[albertforpretraining]][[transformers.AlbertForPreTraining]] | |
| #### transformers.AlbertForPreTraining[[transformers.AlbertForPreTraining]] | |
| [Source](https://github.com/huggingface/transformers/blob/main/src/transformers/models/albert/modeling_albert.py#L432) | |
| Albert Model with two heads on top as done during the pretraining: a `masked language modeling` head and a | |
| `sentence order prediction (classification)` head. | |
| This model inherits from [PreTrainedModel](/docs/transformers/main/ko/main_classes/model#transformers.PreTrainedModel). Check the superclass documentation for the generic methods the | |
| library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads | |
| etc.) | |
| This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass. | |
| Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage | |
| and behavior. | |
| forwardtransformers.AlbertForPreTraining.forwardhttps://github.com/huggingface/transformers/blob/main/src/transformers/models/albert/modeling_albert.py#L457[{"name": "input_ids", "val": ": torch.LongTensor | None = None"}, {"name": "attention_mask", "val": ": torch.FloatTensor | None = None"}, {"name": "token_type_ids", "val": ": torch.LongTensor | None = None"}, {"name": "position_ids", "val": ": torch.LongTensor | None = None"}, {"name": "inputs_embeds", "val": ": torch.FloatTensor | None = None"}, {"name": "labels", "val": ": torch.LongTensor | None = None"}, {"name": "sentence_order_label", "val": ": torch.LongTensor | None = None"}, {"name": "**kwargs", "val": ": typing_extensions.Unpack[transformers.utils.generic.TransformersKwargs]"}]- **input_ids** (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Indices of input sequence tokens in the vocabulary. Padding will be ignored by default. | |
| Indices can be obtained using [AutoTokenizer](/docs/transformers/main/ko/model_doc/auto#transformers.AutoTokenizer). See [PreTrainedTokenizer.encode()](/docs/transformers/main/ko/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.encode) and | |
| [PreTrainedTokenizer.__call__()](/docs/transformers/main/ko/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.__call__) for details. | |
| [What are input IDs?](../glossary#input-ids) | |
| - **attention_mask** (`torch.FloatTensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`: | |
| - 1 for tokens that are **not masked**, | |
| - 0 for tokens that are **masked**. | |
| [What are attention masks?](../glossary#attention-mask) | |
| - **token_type_ids** (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Segment token indices to indicate first and second portions of the inputs. Indices are selected in `[0, 1]`: | |
| - 0 corresponds to a *sentence A* token, | |
| - 1 corresponds to a *sentence B* token. | |
| [What are token type IDs?](../glossary#token-type-ids) | |
| - **position_ids** (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Indices of positions of each input sequence tokens in the position embeddings. Selected in the range `[0, config.n_positions - 1]`. | |
| [What are position IDs?](../glossary#position-ids) | |
| - **inputs_embeds** (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*) -- | |
| Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This | |
| is useful if you want more control over how to convert `input_ids` indices into associated vectors than the | |
| model's internal embedding lookup matrix. | |
| - **labels** (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Labels for computing the masked language modeling loss. Indices should be in `[-100, 0, ..., | |
| config.vocab_size]` (see `input_ids` docstring) Tokens with indices set to `-100` are ignored (masked), the | |
| loss is only computed for the tokens with labels in `[0, ..., config.vocab_size]` | |
| - **sentence_order_label** (`torch.LongTensor` of shape `(batch_size,)`, *optional*) -- | |
| Labels for computing the next sequence prediction (classification) loss. Input should be a sequence pair | |
| (see `input_ids` docstring) Indices should be in `[0, 1]`. `0` indicates original order (sequence A, then | |
| sequence B), `1` indicates switched order (sequence B, then sequence A).0[AlbertForPreTrainingOutput](/docs/transformers/main/ko/model_doc/albert#transformers.models.albert.modeling_albert.AlbertForPreTrainingOutput) or `tuple(torch.FloatTensor)`A [AlbertForPreTrainingOutput](/docs/transformers/main/ko/model_doc/albert#transformers.models.albert.modeling_albert.AlbertForPreTrainingOutput) or a tuple of | |
| `torch.FloatTensor` (if `return_dict=False` is passed or when `config.return_dict=False`) comprising various | |
| elements depending on the configuration ([AlbertConfig](/docs/transformers/main/ko/model_doc/albert#transformers.AlbertConfig)) and inputs. | |
| The [AlbertForPreTraining](/docs/transformers/main/ko/model_doc/albert#transformers.AlbertForPreTraining) forward method, overrides the `__call__` special method. | |
| Although the recipe for forward pass needs to be defined within this function, one should call the `Module` | |
| instance afterwards instead of this since the former takes care of running the pre and post processing steps while | |
| the latter silently ignores them. | |
| - **loss** (`*optional*`, returned when `labels` is provided, `torch.FloatTensor` of shape `(1,)`) -- Total loss as the sum of the masked language modeling loss and the next sequence prediction | |
| (classification) loss. | |
| - **prediction_logits** (`torch.FloatTensor` of shape `(batch_size, sequence_length, config.vocab_size)`) -- Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax). | |
| - **sop_logits** (`torch.FloatTensor` of shape `(batch_size, 2)`) -- Prediction scores of the next sequence prediction (classification) head (scores of True/False continuation | |
| before SoftMax). | |
| - **hidden_states** (`tuple[torch.FloatTensor]`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`) -- Tuple of `torch.FloatTensor` (one for the output of the embeddings, if the model has an embedding layer, + | |
| one for the output of each layer) of shape `(batch_size, sequence_length, hidden_size)`. | |
| Hidden-states of the model at the output of each layer plus the optional initial embedding outputs. | |
| - **attentions** (`tuple[torch.FloatTensor]`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`) -- Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length, | |
| sequence_length)`. | |
| Attentions weights after the attention softmax, used to compute the weighted average in the self-attention | |
| heads. | |
| Example: | |
| ```python | |
| >>> from transformers import AutoTokenizer, AlbertForPreTraining | |
| >>> import torch | |
| >>> tokenizer = AutoTokenizer.from_pretrained("albert/albert-base-v2") | |
| >>> model = AlbertForPreTraining.from_pretrained("albert/albert-base-v2") | |
| >>> input_ids = torch.tensor(tokenizer.encode("Hello, my dog is cute", add_special_tokens=True)).unsqueeze(0) | |
| >>> # Batch size 1 | |
| >>> outputs = model(input_ids) | |
| >>> prediction_logits = outputs.prediction_logits | |
| >>> sop_logits = outputs.sop_logits | |
| ``` | |
| **Parameters:** | |
| config ([AlbertConfig](/docs/transformers/main/ko/model_doc/albert#transformers.AlbertConfig)) : Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the [from_pretrained()](/docs/transformers/main/ko/main_classes/model#transformers.PreTrainedModel.from_pretrained) method to load the model weights. | |
| **Returns:** | |
| `[AlbertForPreTrainingOutput](/docs/transformers/main/ko/model_doc/albert#transformers.models.albert.modeling_albert.AlbertForPreTrainingOutput) or `tuple(torch.FloatTensor)`` | |
| A [AlbertForPreTrainingOutput](/docs/transformers/main/ko/model_doc/albert#transformers.models.albert.modeling_albert.AlbertForPreTrainingOutput) or a tuple of | |
| `torch.FloatTensor` (if `return_dict=False` is passed or when `config.return_dict=False`) comprising various | |
| elements depending on the configuration ([AlbertConfig](/docs/transformers/main/ko/model_doc/albert#transformers.AlbertConfig)) and inputs. | |
| ## AlbertForMaskedLM[[albertformaskedlm]][[transformers.AlbertForMaskedLM]] | |
| #### transformers.AlbertForMaskedLM[[transformers.AlbertForMaskedLM]] | |
| [Source](https://github.com/huggingface/transformers/blob/main/src/transformers/models/albert/modeling_albert.py#L562) | |
| The Albert Model with a `language modeling` head on top." | |
| This model inherits from [PreTrainedModel](/docs/transformers/main/ko/main_classes/model#transformers.PreTrainedModel). Check the superclass documentation for the generic methods the | |
| library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads | |
| etc.) | |
| This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass. | |
| Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage | |
| and behavior. | |
| forwardtransformers.AlbertForMaskedLM.forwardhttps://github.com/huggingface/transformers/blob/main/src/transformers/models/albert/modeling_albert.py#L587[{"name": "input_ids", "val": ": torch.LongTensor | None = None"}, {"name": "attention_mask", "val": ": torch.FloatTensor | None = None"}, {"name": "token_type_ids", "val": ": torch.LongTensor | None = None"}, {"name": "position_ids", "val": ": torch.LongTensor | None = None"}, {"name": "inputs_embeds", "val": ": torch.FloatTensor | None = None"}, {"name": "labels", "val": ": torch.LongTensor | None = None"}, {"name": "**kwargs", "val": ": typing_extensions.Unpack[transformers.utils.generic.TransformersKwargs]"}]- **input_ids** (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Indices of input sequence tokens in the vocabulary. Padding will be ignored by default. | |
| Indices can be obtained using [AutoTokenizer](/docs/transformers/main/ko/model_doc/auto#transformers.AutoTokenizer). See [PreTrainedTokenizer.encode()](/docs/transformers/main/ko/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.encode) and | |
| [PreTrainedTokenizer.__call__()](/docs/transformers/main/ko/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.__call__) for details. | |
| [What are input IDs?](../glossary#input-ids) | |
| - **attention_mask** (`torch.FloatTensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`: | |
| - 1 for tokens that are **not masked**, | |
| - 0 for tokens that are **masked**. | |
| [What are attention masks?](../glossary#attention-mask) | |
| - **token_type_ids** (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Segment token indices to indicate first and second portions of the inputs. Indices are selected in `[0, 1]`: | |
| - 0 corresponds to a *sentence A* token, | |
| - 1 corresponds to a *sentence B* token. | |
| [What are token type IDs?](../glossary#token-type-ids) | |
| - **position_ids** (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Indices of positions of each input sequence tokens in the position embeddings. Selected in the range `[0, config.n_positions - 1]`. | |
| [What are position IDs?](../glossary#position-ids) | |
| - **inputs_embeds** (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*) -- | |
| Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This | |
| is useful if you want more control over how to convert `input_ids` indices into associated vectors than the | |
| model's internal embedding lookup matrix. | |
| - **labels** (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Labels for computing the masked language modeling loss. Indices should be in `[-100, 0, ..., | |
| config.vocab_size]` (see `input_ids` docstring) Tokens with indices set to `-100` are ignored (masked), the | |
| loss is only computed for the tokens with labels in `[0, ..., config.vocab_size]`0[MaskedLMOutput](/docs/transformers/main/ko/main_classes/output#transformers.modeling_outputs.MaskedLMOutput) or `tuple(torch.FloatTensor)`A [MaskedLMOutput](/docs/transformers/main/ko/main_classes/output#transformers.modeling_outputs.MaskedLMOutput) or a tuple of | |
| `torch.FloatTensor` (if `return_dict=False` is passed or when `config.return_dict=False`) comprising various | |
| elements depending on the configuration ([AlbertConfig](/docs/transformers/main/ko/model_doc/albert#transformers.AlbertConfig)) and inputs. | |
| The [AlbertForMaskedLM](/docs/transformers/main/ko/model_doc/albert#transformers.AlbertForMaskedLM) forward method, overrides the `__call__` special method. | |
| Although the recipe for forward pass needs to be defined within this function, one should call the `Module` | |
| instance afterwards instead of this since the former takes care of running the pre and post processing steps while | |
| the latter silently ignores them. | |
| - **loss** (`torch.FloatTensor` of shape `(1,)`, *optional*, returned when `labels` is provided) -- Masked language modeling (MLM) loss. | |
| - **logits** (`torch.FloatTensor` of shape `(batch_size, sequence_length, config.vocab_size)`) -- Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax). | |
| - **hidden_states** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`) -- Tuple of `torch.FloatTensor` (one for the output of the embeddings, if the model has an embedding layer, + | |
| one for the output of each layer) of shape `(batch_size, sequence_length, hidden_size)`. | |
| Hidden-states of the model at the output of each layer plus the optional initial embedding outputs. | |
| - **attentions** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`) -- Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length, | |
| sequence_length)`. | |
| Attentions weights after the attention softmax, used to compute the weighted average in the self-attention | |
| heads. | |
| Example: | |
| ```python | |
| >>> import torch | |
| >>> from transformers import AutoTokenizer, AlbertForMaskedLM | |
| >>> tokenizer = AutoTokenizer.from_pretrained("albert/albert-base-v2") | |
| >>> model = AlbertForMaskedLM.from_pretrained("albert/albert-base-v2") | |
| >>> # add mask_token | |
| >>> inputs = tokenizer("The capital of [MASK] is Paris.", return_tensors="pt") | |
| >>> with torch.no_grad(): | |
| ... logits = model(**inputs).logits | |
| >>> # retrieve index of [MASK] | |
| >>> mask_token_index = (inputs.input_ids == tokenizer.mask_token_id)[0].nonzero(as_tuple=True)[0] | |
| >>> predicted_token_id = logits[0, mask_token_index].argmax(axis=-1) | |
| >>> tokenizer.decode(predicted_token_id) | |
| 'france' | |
| ``` | |
| ```python | |
| >>> labels = tokenizer("The capital of France is Paris.", return_tensors="pt")["input_ids"] | |
| >>> labels = torch.where(inputs.input_ids == tokenizer.mask_token_id, labels, -100) | |
| >>> outputs = model(**inputs, labels=labels) | |
| >>> round(outputs.loss.item(), 2) | |
| 0.81 | |
| ``` | |
| **Parameters:** | |
| config ([AlbertForMaskedLM](/docs/transformers/main/ko/model_doc/albert#transformers.AlbertForMaskedLM)) : Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the [from_pretrained()](/docs/transformers/main/ko/main_classes/model#transformers.PreTrainedModel.from_pretrained) method to load the model weights. | |
| **Returns:** | |
| `[MaskedLMOutput](/docs/transformers/main/ko/main_classes/output#transformers.modeling_outputs.MaskedLMOutput) or `tuple(torch.FloatTensor)`` | |
| A [MaskedLMOutput](/docs/transformers/main/ko/main_classes/output#transformers.modeling_outputs.MaskedLMOutput) or a tuple of | |
| `torch.FloatTensor` (if `return_dict=False` is passed or when `config.return_dict=False`) comprising various | |
| elements depending on the configuration ([AlbertConfig](/docs/transformers/main/ko/model_doc/albert#transformers.AlbertConfig)) and inputs. | |
| ## AlbertForSequenceClassification[[albertforsequenceclassification]][[transformers.AlbertForSequenceClassification]] | |
| #### transformers.AlbertForSequenceClassification[[transformers.AlbertForSequenceClassification]] | |
| [Source](https://github.com/huggingface/transformers/blob/main/src/transformers/models/albert/modeling_albert.py#L666) | |
| Albert Model transformer with a sequence classification/regression head on top (a linear layer on top of the pooled | |
| output) e.g. for GLUE tasks. | |
| This model inherits from [PreTrainedModel](/docs/transformers/main/ko/main_classes/model#transformers.PreTrainedModel). Check the superclass documentation for the generic methods the | |
| library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads | |
| etc.) | |
| This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass. | |
| Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage | |
| and behavior. | |
| forwardtransformers.AlbertForSequenceClassification.forwardhttps://github.com/huggingface/transformers/blob/main/src/transformers/models/albert/modeling_albert.py#L679[{"name": "input_ids", "val": ": torch.LongTensor | None = None"}, {"name": "attention_mask", "val": ": torch.FloatTensor | None = None"}, {"name": "token_type_ids", "val": ": torch.LongTensor | None = None"}, {"name": "position_ids", "val": ": torch.LongTensor | None = None"}, {"name": "inputs_embeds", "val": ": torch.FloatTensor | None = None"}, {"name": "labels", "val": ": torch.LongTensor | None = None"}, {"name": "**kwargs", "val": ": typing_extensions.Unpack[transformers.utils.generic.TransformersKwargs]"}]- **input_ids** (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Indices of input sequence tokens in the vocabulary. Padding will be ignored by default. | |
| Indices can be obtained using [AutoTokenizer](/docs/transformers/main/ko/model_doc/auto#transformers.AutoTokenizer). See [PreTrainedTokenizer.encode()](/docs/transformers/main/ko/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.encode) and | |
| [PreTrainedTokenizer.__call__()](/docs/transformers/main/ko/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.__call__) for details. | |
| [What are input IDs?](../glossary#input-ids) | |
| - **attention_mask** (`torch.FloatTensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`: | |
| - 1 for tokens that are **not masked**, | |
| - 0 for tokens that are **masked**. | |
| [What are attention masks?](../glossary#attention-mask) | |
| - **token_type_ids** (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Segment token indices to indicate first and second portions of the inputs. Indices are selected in `[0, 1]`: | |
| - 0 corresponds to a *sentence A* token, | |
| - 1 corresponds to a *sentence B* token. | |
| [What are token type IDs?](../glossary#token-type-ids) | |
| - **position_ids** (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Indices of positions of each input sequence tokens in the position embeddings. Selected in the range `[0, config.n_positions - 1]`. | |
| [What are position IDs?](../glossary#position-ids) | |
| - **inputs_embeds** (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*) -- | |
| Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This | |
| is useful if you want more control over how to convert `input_ids` indices into associated vectors than the | |
| model's internal embedding lookup matrix. | |
| - **labels** (`torch.LongTensor` of shape `(batch_size,)`, *optional*) -- | |
| Labels for computing the sequence classification/regression loss. Indices should be in `[0, ..., | |
| config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), If | |
| `config.num_labels > 1` a classification loss is computed (Cross-Entropy).0[SequenceClassifierOutput](/docs/transformers/main/ko/main_classes/output#transformers.modeling_outputs.SequenceClassifierOutput) or `tuple(torch.FloatTensor)`A [SequenceClassifierOutput](/docs/transformers/main/ko/main_classes/output#transformers.modeling_outputs.SequenceClassifierOutput) or a tuple of | |
| `torch.FloatTensor` (if `return_dict=False` is passed or when `config.return_dict=False`) comprising various | |
| elements depending on the configuration ([AlbertConfig](/docs/transformers/main/ko/model_doc/albert#transformers.AlbertConfig)) and inputs. | |
| The [AlbertForSequenceClassification](/docs/transformers/main/ko/model_doc/albert#transformers.AlbertForSequenceClassification) forward method, overrides the `__call__` special method. | |
| Although the recipe for forward pass needs to be defined within this function, one should call the `Module` | |
| instance afterwards instead of this since the former takes care of running the pre and post processing steps while | |
| the latter silently ignores them. | |
| - **loss** (`torch.FloatTensor` of shape `(1,)`, *optional*, returned when `labels` is provided) -- Classification (or regression if config.num_labels==1) loss. | |
| - **logits** (`torch.FloatTensor` of shape `(batch_size, config.num_labels)`) -- Classification (or regression if config.num_labels==1) scores (before SoftMax). | |
| - **hidden_states** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`) -- Tuple of `torch.FloatTensor` (one for the output of the embeddings, if the model has an embedding layer, + | |
| one for the output of each layer) of shape `(batch_size, sequence_length, hidden_size)`. | |
| Hidden-states of the model at the output of each layer plus the optional initial embedding outputs. | |
| - **attentions** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`) -- Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length, | |
| sequence_length)`. | |
| Attentions weights after the attention softmax, used to compute the weighted average in the self-attention | |
| heads. | |
| Example of single-label classification: | |
| ```python | |
| >>> import torch | |
| >>> from transformers import AutoTokenizer, AlbertForSequenceClassification | |
| >>> tokenizer = AutoTokenizer.from_pretrained("albert/albert-xxlarge-v2") | |
| >>> model = AlbertForSequenceClassification.from_pretrained("albert/albert-xxlarge-v2") | |
| >>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt") | |
| >>> with torch.no_grad(): | |
| ... logits = model(**inputs).logits | |
| >>> predicted_class_id = logits.argmax().item() | |
| >>> model.config.id2label[predicted_class_id] | |
| ... | |
| >>> # To train a model on `num_labels` classes, you can pass `num_labels=num_labels` to `.from_pretrained(...)` | |
| >>> num_labels = len(model.config.id2label) | |
| >>> model = AlbertForSequenceClassification.from_pretrained("albert/albert-xxlarge-v2", num_labels=num_labels) | |
| >>> labels = torch.tensor([1]) | |
| >>> loss = model(**inputs, labels=labels).loss | |
| >>> round(loss.item(), 2) | |
| ... | |
| ``` | |
| Example of multi-label classification: | |
| ```python | |
| >>> import torch | |
| >>> from transformers import AutoTokenizer, AlbertForSequenceClassification | |
| >>> tokenizer = AutoTokenizer.from_pretrained("albert/albert-xxlarge-v2") | |
| >>> model = AlbertForSequenceClassification.from_pretrained("albert/albert-xxlarge-v2", problem_type="multi_label_classification") | |
| >>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt") | |
| >>> with torch.no_grad(): | |
| ... logits = model(**inputs).logits | |
| >>> predicted_class_ids = torch.arange(0, logits.shape[-1])[torch.sigmoid(logits).squeeze(dim=0) > 0.5] | |
| >>> # To train a model on `num_labels` classes, you can pass `num_labels=num_labels` to `.from_pretrained(...)` | |
| >>> num_labels = len(model.config.id2label) | |
| >>> model = AlbertForSequenceClassification.from_pretrained( | |
| ... "albert/albert-xxlarge-v2", num_labels=num_labels, problem_type="multi_label_classification" | |
| ... ) | |
| >>> labels = torch.sum( | |
| ... torch.nn.functional.one_hot(predicted_class_ids[None, :].clone(), num_classes=num_labels), dim=1 | |
| ... ).to(torch.float) | |
| >>> loss = model(**inputs, labels=labels).loss | |
| ``` | |
| **Parameters:** | |
| config ([AlbertConfig](/docs/transformers/main/ko/model_doc/albert#transformers.AlbertConfig)) : Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the [from_pretrained()](/docs/transformers/main/ko/main_classes/model#transformers.PreTrainedModel.from_pretrained) method to load the model weights. | |
| **Returns:** | |
| `[SequenceClassifierOutput](/docs/transformers/main/ko/main_classes/output#transformers.modeling_outputs.SequenceClassifierOutput) or `tuple(torch.FloatTensor)`` | |
| A [SequenceClassifierOutput](/docs/transformers/main/ko/main_classes/output#transformers.modeling_outputs.SequenceClassifierOutput) or a tuple of | |
| `torch.FloatTensor` (if `return_dict=False` is passed or when `config.return_dict=False`) comprising various | |
| elements depending on the configuration ([AlbertConfig](/docs/transformers/main/ko/model_doc/albert#transformers.AlbertConfig)) and inputs. | |
| ## AlbertForMultipleChoice[[albertformultiplechoice]][[transformers.AlbertForMultipleChoice]] | |
| #### transformers.AlbertForMultipleChoice[[transformers.AlbertForMultipleChoice]] | |
| [Source](https://github.com/huggingface/transformers/blob/main/src/transformers/models/albert/modeling_albert.py#L874) | |
| The Albert Model with a multiple choice classification head on top (a linear layer on top of the pooled output and a | |
| softmax) e.g. for RocStories/SWAG tasks. | |
| This model inherits from [PreTrainedModel](/docs/transformers/main/ko/main_classes/model#transformers.PreTrainedModel). Check the superclass documentation for the generic methods the | |
| library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads | |
| etc.) | |
| This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass. | |
| Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage | |
| and behavior. | |
| forwardtransformers.AlbertForMultipleChoice.forwardhttps://github.com/huggingface/transformers/blob/main/src/transformers/models/albert/modeling_albert.py#L885[{"name": "input_ids", "val": ": torch.LongTensor | None = None"}, {"name": "attention_mask", "val": ": torch.FloatTensor | None = None"}, {"name": "token_type_ids", "val": ": torch.LongTensor | None = None"}, {"name": "position_ids", "val": ": torch.LongTensor | None = None"}, {"name": "inputs_embeds", "val": ": torch.FloatTensor | None = None"}, {"name": "labels", "val": ": torch.LongTensor | None = None"}, {"name": "**kwargs", "val": ": typing_extensions.Unpack[transformers.utils.generic.TransformersKwargs]"}]- **input_ids** (`torch.LongTensor` of shape `(batch_size, num_choices, sequence_length)`) -- | |
| Indices of input sequence tokens in the vocabulary. | |
| Indices can be obtained using [AutoTokenizer](/docs/transformers/main/ko/model_doc/auto#transformers.AutoTokenizer). See [PreTrainedTokenizer.__call__()](/docs/transformers/main/ko/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.__call__) and | |
| [PreTrainedTokenizer.encode()](/docs/transformers/main/ko/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.encode) for details. | |
| [What are input IDs?](../glossary#input-ids) | |
| - **attention_mask** (`torch.FloatTensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`: | |
| - 1 for tokens that are **not masked**, | |
| - 0 for tokens that are **masked**. | |
| [What are attention masks?](../glossary#attention-mask) | |
| - **token_type_ids** (`torch.LongTensor` of shape `(batch_size, num_choices, sequence_length)`, *optional*) -- | |
| Segment token indices to indicate first and second portions of the inputs. Indices are selected in `[0, | |
| 1]`: | |
| - 0 corresponds to a *sentence A* token, | |
| - 1 corresponds to a *sentence B* token. | |
| [What are token type IDs?](../glossary#token-type-ids) | |
| - **position_ids** (`torch.LongTensor` of shape `(batch_size, num_choices, sequence_length)`, *optional*) -- | |
| Indices of positions of each input sequence tokens in the position embeddings. Selected in the range `[0, | |
| config.max_position_embeddings - 1]`. | |
| [What are position IDs?](../glossary#position-ids) | |
| - **inputs_embeds** (`torch.FloatTensor` of shape `(batch_size, num_choices, sequence_length, hidden_size)`, *optional*) -- | |
| Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This | |
| is useful if you want more control over how to convert `input_ids` indices into associated vectors than the | |
| model's internal embedding lookup matrix. | |
| - **labels** (`torch.LongTensor` of shape `(batch_size,)`, *optional*) -- | |
| Labels for computing the multiple choice classification loss. Indices should be in `[0, ..., | |
| num_choices-1]` where *num_choices* is the size of the second dimension of the input tensors. (see | |
| *input_ids* above)0[AlbertForPreTrainingOutput](/docs/transformers/main/ko/model_doc/albert#transformers.models.albert.modeling_albert.AlbertForPreTrainingOutput) or `tuple(torch.FloatTensor)`A [AlbertForPreTrainingOutput](/docs/transformers/main/ko/model_doc/albert#transformers.models.albert.modeling_albert.AlbertForPreTrainingOutput) or a tuple of | |
| `torch.FloatTensor` (if `return_dict=False` is passed or when `config.return_dict=False`) comprising various | |
| elements depending on the configuration ([AlbertConfig](/docs/transformers/main/ko/model_doc/albert#transformers.AlbertConfig)) and inputs. | |
| The [AlbertForMultipleChoice](/docs/transformers/main/ko/model_doc/albert#transformers.AlbertForMultipleChoice) forward method, overrides the `__call__` special method. | |
| Although the recipe for forward pass needs to be defined within this function, one should call the `Module` | |
| instance afterwards instead of this since the former takes care of running the pre and post processing steps while | |
| the latter silently ignores them. | |
| - **loss** (`*optional*`, returned when `labels` is provided, `torch.FloatTensor` of shape `(1,)`) -- Total loss as the sum of the masked language modeling loss and the next sequence prediction | |
| (classification) loss. | |
| - **prediction_logits** (`torch.FloatTensor` of shape `(batch_size, sequence_length, config.vocab_size)`) -- Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax). | |
| - **sop_logits** (`torch.FloatTensor` of shape `(batch_size, 2)`) -- Prediction scores of the next sequence prediction (classification) head (scores of True/False continuation | |
| before SoftMax). | |
| - **hidden_states** (`tuple[torch.FloatTensor]`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`) -- Tuple of `torch.FloatTensor` (one for the output of the embeddings, if the model has an embedding layer, + | |
| one for the output of each layer) of shape `(batch_size, sequence_length, hidden_size)`. | |
| Hidden-states of the model at the output of each layer plus the optional initial embedding outputs. | |
| - **attentions** (`tuple[torch.FloatTensor]`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`) -- Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length, | |
| sequence_length)`. | |
| Attentions weights after the attention softmax, used to compute the weighted average in the self-attention | |
| heads. | |
| Example: | |
| ```python | |
| >>> from transformers import AutoTokenizer, AlbertForMultipleChoice | |
| >>> import torch | |
| >>> tokenizer = AutoTokenizer.from_pretrained("albert/albert-xxlarge-v2") | |
| >>> model = AlbertForMultipleChoice.from_pretrained("albert/albert-xxlarge-v2") | |
| >>> prompt = "In Italy, pizza served in formal settings, such as at a restaurant, is presented unsliced." | |
| >>> choice0 = "It is eaten with a fork and a knife." | |
| >>> choice1 = "It is eaten while held in the hand." | |
| >>> labels = torch.tensor(0).unsqueeze(0) # choice0 is correct (according to Wikipedia ;)), batch size 1 | |
| >>> encoding = tokenizer([prompt, prompt], [choice0, choice1], return_tensors="pt", padding=True) | |
| >>> outputs = model(**{k: v.unsqueeze(0) for k, v in encoding.items()}, labels=labels) # batch size is 1 | |
| >>> # the linear classifier still needs to be trained | |
| >>> loss = outputs.loss | |
| >>> logits = outputs.logits | |
| ``` | |
| **Parameters:** | |
| config ([AlbertConfig](/docs/transformers/main/ko/model_doc/albert#transformers.AlbertConfig)) : Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the [from_pretrained()](/docs/transformers/main/ko/main_classes/model#transformers.PreTrainedModel.from_pretrained) method to load the model weights. | |
| **Returns:** | |
| `[AlbertForPreTrainingOutput](/docs/transformers/main/ko/model_doc/albert#transformers.models.albert.modeling_albert.AlbertForPreTrainingOutput) or `tuple(torch.FloatTensor)`` | |
| A [AlbertForPreTrainingOutput](/docs/transformers/main/ko/model_doc/albert#transformers.models.albert.modeling_albert.AlbertForPreTrainingOutput) or a tuple of | |
| `torch.FloatTensor` (if `return_dict=False` is passed or when `config.return_dict=False`) comprising various | |
| elements depending on the configuration ([AlbertConfig](/docs/transformers/main/ko/model_doc/albert#transformers.AlbertConfig)) and inputs. | |
| ## AlbertForTokenClassification[[albertfortokenclassification]][[transformers.AlbertForTokenClassification]] | |
| #### transformers.AlbertForTokenClassification[[transformers.AlbertForTokenClassification]] | |
| [Source](https://github.com/huggingface/transformers/blob/main/src/transformers/models/albert/modeling_albert.py#L744) | |
| The Albert transformer with a token classification head on top (a linear layer on top of the hidden-states | |
| output) e.g. for Named-Entity-Recognition (NER) tasks. | |
| This model inherits from [PreTrainedModel](/docs/transformers/main/ko/main_classes/model#transformers.PreTrainedModel). Check the superclass documentation for the generic methods the | |
| library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads | |
| etc.) | |
| This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass. | |
| Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage | |
| and behavior. | |
| forwardtransformers.AlbertForTokenClassification.forwardhttps://github.com/huggingface/transformers/blob/main/src/transformers/models/albert/modeling_albert.py#L761[{"name": "input_ids", "val": ": torch.LongTensor | None = None"}, {"name": "attention_mask", "val": ": torch.FloatTensor | None = None"}, {"name": "token_type_ids", "val": ": torch.LongTensor | None = None"}, {"name": "position_ids", "val": ": torch.LongTensor | None = None"}, {"name": "inputs_embeds", "val": ": torch.FloatTensor | None = None"}, {"name": "labels", "val": ": torch.LongTensor | None = None"}, {"name": "**kwargs", "val": ": typing_extensions.Unpack[transformers.utils.generic.TransformersKwargs]"}]- **input_ids** (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Indices of input sequence tokens in the vocabulary. Padding will be ignored by default. | |
| Indices can be obtained using [AutoTokenizer](/docs/transformers/main/ko/model_doc/auto#transformers.AutoTokenizer). See [PreTrainedTokenizer.encode()](/docs/transformers/main/ko/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.encode) and | |
| [PreTrainedTokenizer.__call__()](/docs/transformers/main/ko/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.__call__) for details. | |
| [What are input IDs?](../glossary#input-ids) | |
| - **attention_mask** (`torch.FloatTensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`: | |
| - 1 for tokens that are **not masked**, | |
| - 0 for tokens that are **masked**. | |
| [What are attention masks?](../glossary#attention-mask) | |
| - **token_type_ids** (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Segment token indices to indicate first and second portions of the inputs. Indices are selected in `[0, 1]`: | |
| - 0 corresponds to a *sentence A* token, | |
| - 1 corresponds to a *sentence B* token. | |
| [What are token type IDs?](../glossary#token-type-ids) | |
| - **position_ids** (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Indices of positions of each input sequence tokens in the position embeddings. Selected in the range `[0, config.n_positions - 1]`. | |
| [What are position IDs?](../glossary#position-ids) | |
| - **inputs_embeds** (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*) -- | |
| Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This | |
| is useful if you want more control over how to convert `input_ids` indices into associated vectors than the | |
| model's internal embedding lookup matrix. | |
| - **labels** (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Labels for computing the token classification loss. Indices should be in `[0, ..., config.num_labels - 1]`.0[TokenClassifierOutput](/docs/transformers/main/ko/main_classes/output#transformers.modeling_outputs.TokenClassifierOutput) or `tuple(torch.FloatTensor)`A [TokenClassifierOutput](/docs/transformers/main/ko/main_classes/output#transformers.modeling_outputs.TokenClassifierOutput) or a tuple of | |
| `torch.FloatTensor` (if `return_dict=False` is passed or when `config.return_dict=False`) comprising various | |
| elements depending on the configuration ([AlbertConfig](/docs/transformers/main/ko/model_doc/albert#transformers.AlbertConfig)) and inputs. | |
| The [AlbertForTokenClassification](/docs/transformers/main/ko/model_doc/albert#transformers.AlbertForTokenClassification) forward method, overrides the `__call__` special method. | |
| Although the recipe for forward pass needs to be defined within this function, one should call the `Module` | |
| instance afterwards instead of this since the former takes care of running the pre and post processing steps while | |
| the latter silently ignores them. | |
| - **loss** (`torch.FloatTensor` of shape `(1,)`, *optional*, returned when `labels` is provided) -- Classification loss. | |
| - **logits** (`torch.FloatTensor` of shape `(batch_size, sequence_length, config.num_labels)`) -- Classification scores (before SoftMax). | |
| - **hidden_states** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`) -- Tuple of `torch.FloatTensor` (one for the output of the embeddings, if the model has an embedding layer, + | |
| one for the output of each layer) of shape `(batch_size, sequence_length, hidden_size)`. | |
| Hidden-states of the model at the output of each layer plus the optional initial embedding outputs. | |
| - **attentions** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`) -- Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length, | |
| sequence_length)`. | |
| Attentions weights after the attention softmax, used to compute the weighted average in the self-attention | |
| heads. | |
| Example: | |
| ```python | |
| >>> from transformers import AutoTokenizer, AlbertForTokenClassification | |
| >>> import torch | |
| >>> tokenizer = AutoTokenizer.from_pretrained("albert/albert-xxlarge-v2") | |
| >>> model = AlbertForTokenClassification.from_pretrained("albert/albert-xxlarge-v2") | |
| >>> inputs = tokenizer( | |
| ... "HuggingFace is a company based in Paris and New York", add_special_tokens=False, return_tensors="pt" | |
| ... ) | |
| >>> with torch.no_grad(): | |
| ... logits = model(**inputs).logits | |
| >>> predicted_token_class_ids = logits.argmax(-1) | |
| >>> # Note that tokens are classified rather then input words which means that | |
| >>> # there might be more predicted token classes than words. | |
| >>> # Multiple token classes might account for the same word | |
| >>> predicted_tokens_classes = [model.config.id2label[t.item()] for t in predicted_token_class_ids[0]] | |
| >>> predicted_tokens_classes | |
| ... | |
| >>> labels = predicted_token_class_ids | |
| >>> loss = model(**inputs, labels=labels).loss | |
| >>> round(loss.item(), 2) | |
| ... | |
| ``` | |
| **Parameters:** | |
| config ([AlbertConfig](/docs/transformers/main/ko/model_doc/albert#transformers.AlbertConfig)) : Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the [from_pretrained()](/docs/transformers/main/ko/main_classes/model#transformers.PreTrainedModel.from_pretrained) method to load the model weights. | |
| **Returns:** | |
| `[TokenClassifierOutput](/docs/transformers/main/ko/main_classes/output#transformers.modeling_outputs.TokenClassifierOutput) or `tuple(torch.FloatTensor)`` | |
| A [TokenClassifierOutput](/docs/transformers/main/ko/main_classes/output#transformers.modeling_outputs.TokenClassifierOutput) or a tuple of | |
| `torch.FloatTensor` (if `return_dict=False` is passed or when `config.return_dict=False`) comprising various | |
| elements depending on the configuration ([AlbertConfig](/docs/transformers/main/ko/model_doc/albert#transformers.AlbertConfig)) and inputs. | |
| ## AlbertForQuestionAnswering[[albertforquestionanswering]][[transformers.AlbertForQuestionAnswering]] | |
| #### transformers.AlbertForQuestionAnswering[[transformers.AlbertForQuestionAnswering]] | |
| [Source](https://github.com/huggingface/transformers/blob/main/src/transformers/models/albert/modeling_albert.py#L806) | |
| The Albert transformer with a span classification head on top for extractive question-answering tasks like | |
| SQuAD (a linear layer on top of the hidden-states output to compute `span start logits` and `span end logits`). | |
| This model inherits from [PreTrainedModel](/docs/transformers/main/ko/main_classes/model#transformers.PreTrainedModel). Check the superclass documentation for the generic methods the | |
| library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads | |
| etc.) | |
| This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass. | |
| Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage | |
| and behavior. | |
| forwardtransformers.AlbertForQuestionAnswering.forwardhttps://github.com/huggingface/transformers/blob/main/src/transformers/models/albert/modeling_albert.py#L817[{"name": "input_ids", "val": ": torch.LongTensor | None = None"}, {"name": "attention_mask", "val": ": torch.FloatTensor | None = None"}, {"name": "token_type_ids", "val": ": torch.LongTensor | None = None"}, {"name": "position_ids", "val": ": torch.LongTensor | None = None"}, {"name": "inputs_embeds", "val": ": torch.FloatTensor | None = None"}, {"name": "start_positions", "val": ": torch.LongTensor | None = None"}, {"name": "end_positions", "val": ": torch.LongTensor | None = None"}, {"name": "**kwargs", "val": ": typing_extensions.Unpack[transformers.utils.generic.TransformersKwargs]"}]- **input_ids** (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Indices of input sequence tokens in the vocabulary. Padding will be ignored by default. | |
| Indices can be obtained using [AutoTokenizer](/docs/transformers/main/ko/model_doc/auto#transformers.AutoTokenizer). See [PreTrainedTokenizer.encode()](/docs/transformers/main/ko/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.encode) and | |
| [PreTrainedTokenizer.__call__()](/docs/transformers/main/ko/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.__call__) for details. | |
| [What are input IDs?](../glossary#input-ids) | |
| - **attention_mask** (`torch.FloatTensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`: | |
| - 1 for tokens that are **not masked**, | |
| - 0 for tokens that are **masked**. | |
| [What are attention masks?](../glossary#attention-mask) | |
| - **token_type_ids** (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Segment token indices to indicate first and second portions of the inputs. Indices are selected in `[0, 1]`: | |
| - 0 corresponds to a *sentence A* token, | |
| - 1 corresponds to a *sentence B* token. | |
| [What are token type IDs?](../glossary#token-type-ids) | |
| - **position_ids** (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Indices of positions of each input sequence tokens in the position embeddings. Selected in the range `[0, config.n_positions - 1]`. | |
| [What are position IDs?](../glossary#position-ids) | |
| - **inputs_embeds** (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*) -- | |
| Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This | |
| is useful if you want more control over how to convert `input_ids` indices into associated vectors than the | |
| model's internal embedding lookup matrix. | |
| - **start_positions** (`torch.LongTensor` of shape `(batch_size,)`, *optional*) -- | |
| Labels for position (index) of the start of the labelled span for computing the token classification loss. | |
| Positions are clamped to the length of the sequence (`sequence_length`). Position outside of the sequence | |
| are not taken into account for computing the loss. | |
| - **end_positions** (`torch.LongTensor` of shape `(batch_size,)`, *optional*) -- | |
| Labels for position (index) of the end of the labelled span for computing the token classification loss. | |
| Positions are clamped to the length of the sequence (`sequence_length`). Position outside of the sequence | |
| are not taken into account for computing the loss.0[AlbertForPreTrainingOutput](/docs/transformers/main/ko/model_doc/albert#transformers.models.albert.modeling_albert.AlbertForPreTrainingOutput) or `tuple(torch.FloatTensor)`A [AlbertForPreTrainingOutput](/docs/transformers/main/ko/model_doc/albert#transformers.models.albert.modeling_albert.AlbertForPreTrainingOutput) or a tuple of | |
| `torch.FloatTensor` (if `return_dict=False` is passed or when `config.return_dict=False`) comprising various | |
| elements depending on the configuration ([AlbertConfig](/docs/transformers/main/ko/model_doc/albert#transformers.AlbertConfig)) and inputs. | |
| The [AlbertForQuestionAnswering](/docs/transformers/main/ko/model_doc/albert#transformers.AlbertForQuestionAnswering) forward method, overrides the `__call__` special method. | |
| Although the recipe for forward pass needs to be defined within this function, one should call the `Module` | |
| instance afterwards instead of this since the former takes care of running the pre and post processing steps while | |
| the latter silently ignores them. | |
| - **loss** (`*optional*`, returned when `labels` is provided, `torch.FloatTensor` of shape `(1,)`) -- Total loss as the sum of the masked language modeling loss and the next sequence prediction | |
| (classification) loss. | |
| - **prediction_logits** (`torch.FloatTensor` of shape `(batch_size, sequence_length, config.vocab_size)`) -- Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax). | |
| - **sop_logits** (`torch.FloatTensor` of shape `(batch_size, 2)`) -- Prediction scores of the next sequence prediction (classification) head (scores of True/False continuation | |
| before SoftMax). | |
| - **hidden_states** (`tuple[torch.FloatTensor]`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`) -- Tuple of `torch.FloatTensor` (one for the output of the embeddings, if the model has an embedding layer, + | |
| one for the output of each layer) of shape `(batch_size, sequence_length, hidden_size)`. | |
| Hidden-states of the model at the output of each layer plus the optional initial embedding outputs. | |
| - **attentions** (`tuple[torch.FloatTensor]`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`) -- Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length, | |
| sequence_length)`. | |
| Attentions weights after the attention softmax, used to compute the weighted average in the self-attention | |
| heads. | |
| Example: | |
| ```python | |
| >>> from transformers import AutoTokenizer, AlbertForQuestionAnswering | |
| >>> import torch | |
| >>> tokenizer = AutoTokenizer.from_pretrained("albert/albert-xxlarge-v2") | |
| >>> model = AlbertForQuestionAnswering.from_pretrained("albert/albert-xxlarge-v2") | |
| >>> question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet" | |
| >>> inputs = tokenizer(question, text, return_tensors="pt") | |
| >>> with torch.no_grad(): | |
| ... outputs = model(**inputs) | |
| >>> answer_start_index = outputs.start_logits.argmax() | |
| >>> answer_end_index = outputs.end_logits.argmax() | |
| >>> predict_answer_tokens = inputs.input_ids[0, answer_start_index : answer_end_index + 1] | |
| >>> tokenizer.decode(predict_answer_tokens, skip_special_tokens=True) | |
| ... | |
| >>> # target is "nice puppet" | |
| >>> target_start_index = torch.tensor([14]) | |
| >>> target_end_index = torch.tensor([15]) | |
| >>> outputs = model(**inputs, start_positions=target_start_index, end_positions=target_end_index) | |
| >>> loss = outputs.loss | |
| >>> round(loss.item(), 2) | |
| ... | |
| ``` | |
| **Parameters:** | |
| config ([AlbertConfig](/docs/transformers/main/ko/model_doc/albert#transformers.AlbertConfig)) : Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the [from_pretrained()](/docs/transformers/main/ko/main_classes/model#transformers.PreTrainedModel.from_pretrained) method to load the model weights. | |
| **Returns:** | |
| `[AlbertForPreTrainingOutput](/docs/transformers/main/ko/model_doc/albert#transformers.models.albert.modeling_albert.AlbertForPreTrainingOutput) or `tuple(torch.FloatTensor)`` | |
| A [AlbertForPreTrainingOutput](/docs/transformers/main/ko/model_doc/albert#transformers.models.albert.modeling_albert.AlbertForPreTrainingOutput) or a tuple of | |
| `torch.FloatTensor` (if `return_dict=False` is passed or when `config.return_dict=False`) comprising various | |
| elements depending on the configuration ([AlbertConfig](/docs/transformers/main/ko/model_doc/albert#transformers.AlbertConfig)) and inputs. | |
Xet Storage Details
- Size:
- 83 kB
- Xet hash:
- 6b1949dfd8842856d65a2fae1ef2bd482c82e6887ab6077b56a7eb4ba84191aa
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.