Buckets:
| # DeBERTa-v2 | |
| ## 개요 | |
| DeBERTa 모델은 Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen이 작성한 [DeBERTa: 분리된 어텐션을 활용한 디코딩 강화 BERT](https://huggingface.co/papers/2006.03654)이라는 논문에서 제안되었습니다. 이 모델은 2018년 Google이 발표한 BERT 모델과 2019년 Facebook이 발표한 RoBERTa 모델을 기반으로 합니다. | |
| DeBERTa는 RoBERTa에서 사용된 데이터의 절반만을 사용하여 분리된(disentangled) 어텐션과 향상된 마스크 디코더 학습을 통해 RoBERTa를 개선했습니다. | |
| 논문의 초록은 다음과 같습니다: | |
| *사전 학습된 신경망 언어 모델의 최근 발전은 많은 자연어 처리(NLP) 작업의 성능을 크게 향상시켰습니다. 본 논문에서는 두 가지 새로운 기술을 사용하여 BERT와 RoBERTa 모델을 개선한 새로운 모델 구조인 DeBERTa를 제안합니다. 첫 번째는 분리된 어텐션 메커니즘으로, 각 단어가 내용과 위치를 각각 인코딩하는 두 개의 벡터로 표현되며, 단어들 간의 어텐션 가중치는 내용과 상대적 위치에 대한 분리된 행렬을 사용하여 계산됩니다. 두 번째로, 모델 사전 학습을 위해 마스킹된 토큰을 예측하는 출력 소프트맥스 층을 대체하는 향상된 마스크 디코더가 사용됩니다. 우리는 이 두 가지 기술이 모델 사전 학습의 효율성과 다운스트림 작업의 성능을 크게 향상시킨다는 것을 보여줍니다. RoBERTa-Large와 비교했을 때, 절반의 학습 데이터로 학습된 DeBERTa 모델은 광범위한 NLP 작업에서 일관되게 더 나은 성능을 보여주며, MNLI에서 +0.9%(90.2% vs 91.1%), SQuAD v2.0에서 +2.3%(88.4% vs 90.7%), RACE에서 +3.6%(83.2% vs 86.8%)의 성능 향상을 달성했습니다. DeBERTa 코드와 사전 학습된 모델은 https://github.com/microsoft/DeBERTa 에서 공개될 예정입니다.* | |
| 다음 정보들은 [원본 구현 저장소](https://github.com/microsoft/DeBERTa)에서 보실 수 있습니다. DeBERTa v2는 DeBERTa의 두번째 모델입니다. | |
| DeBERTa v2는 SuperGLUE 단일 모델 제출에 사용된 1.5B 모델을 포함하며, 인간 기준점(베이스라인) 89.8점 대비 89.9점을 달성했습니다. 저자의 | |
| [블로그](https://www.microsoft.com/en-us/research/blog/microsoft-deberta-surpasses-human-performance-on-the-superglue-benchmark/)에서 더 자세한 정보를 확인할 수 있습니다. | |
| v2의 새로운 점: | |
| - **어휘(Vocabulary)** v2에서는 학습 데이터로부터 구축된 128K 크기의 새로운 어휘를 사용하도록 토크나이저가 변경되었습니다. GPT2 기반 토크나이저 대신, 이제는 [센텐스피스 기반](https://github.com/google/sentencepiece) 토크나이저를 사용합니다. | |
| - **nGiE[n그램 유도(Induced) 입력 인코딩]** DeBERTa-v2 모델은 입력 토큰들의 지역적 의존성을 더 잘 학습하기 위해 첫 번째 트랜스포머 층과 함께 추가적인 합성곱 층을 사용합니다. | |
| - **어텐션 층에서 위치 투영 행렬과 내용 투영 행렬 공유** 이전 실험들을 기반으로, 이는 성능에 영향을 주지 않으면서 매개변수를 절약할 수 있습니다. | |
| - **상대적 위치를 인코딩하기 위한 버킷 적용** DeBERTa-v2 모델은 T5와 유사하게 상대적 위치를 인코딩하기 위해 로그 버킷을 사용합니다. | |
| - **900M 모델 & 1.5B 모델** 900M과 1.5B, 두 가지 추가 모델 크기가 제공되며, 이는 다운스트림 작업의 성능을 크게 향상시킵니다. | |
| [DeBERTa](https://huggingface.co/DeBERTa) 모델의 텐서플로 2.0 구현은 [kamalkraj](https://huggingface.co/kamalkraj)가 기여했습니다. 원본 코드는 [이곳](https://github.com/microsoft/DeBERTa)에서 확인하실 수 있습니다. | |
| ## 자료 | |
| - [텍스트 분류 작업 가이드](../tasks/sequence_classification) | |
| - [토큰 분류 작업 가이드](../tasks/token_classification) | |
| - [질의응답 작업 가이드](../tasks/question_answering) | |
| - [마스크 언어 모델링 작업 가이드](../tasks/masked_language_modeling) | |
| - [다중 선택 작업 가이드](../tasks/multiple_choice) | |
| ## DebertaV2Config[[transformers.DebertaV2Config]] | |
| #### transformers.DebertaV2Config[[transformers.DebertaV2Config]] | |
| [Source](https://github.com/huggingface/transformers/blob/main/src/transformers/models/deberta_v2/configuration_deberta_v2.py#L24) | |
| This is the configuration class to store the configuration of a Deberta V2Model. It is used to instantiate a Deberta V2 | |
| model according to the specified arguments, defining the model architecture. Instantiating a configuration with the | |
| defaults will yield a similar configuration to that of the [microsoft/deberta-v2-xlarge](https://huggingface.co/microsoft/deberta-v2-xlarge) | |
| Configuration objects inherit from [PreTrainedConfig](/docs/transformers/main/ko/main_classes/configuration#transformers.PreTrainedConfig) and can be used to control the model outputs. Read the | |
| documentation from [PreTrainedConfig](/docs/transformers/main/ko/main_classes/configuration#transformers.PreTrainedConfig) for more information. | |
| Example: | |
| ```python | |
| >>> from transformers import DebertaV2Config, DebertaV2Model | |
| >>> # Initializing a DeBERTa-v2 microsoft/deberta-v2-xlarge style configuration | |
| >>> configuration = DebertaV2Config() | |
| >>> # Initializing a model (with random weights) from the microsoft/deberta-v2-xlarge style configuration | |
| >>> model = DebertaV2Model(configuration) | |
| >>> # Accessing the model configuration | |
| >>> configuration = model.config | |
| ``` | |
| **Parameters:** | |
| vocab_size (`int`, *optional*, defaults to `128100`) : Vocabulary size of the model. Defines the number of different tokens that can be represented by the `input_ids`. | |
| hidden_size (`int`, *optional*, defaults to `1536`) : Dimension of the hidden representations. | |
| num_hidden_layers (`int`, *optional*, defaults to `24`) : Number of hidden layers in the Transformer decoder. | |
| num_attention_heads (`int`, *optional*, defaults to `24`) : Number of attention heads for each attention layer in the Transformer decoder. | |
| intermediate_size (`int`, *optional*, defaults to `6144`) : Dimension of the MLP representations. | |
| hidden_act (`str`, *optional*, defaults to `gelu`) : The non-linear activation function (function or string) in the decoder. For example, `"gelu"`, `"relu"`, `"silu"`, etc. | |
| hidden_dropout_prob (`Union[float, int]`, *optional*, defaults to `0.1`) : The dropout probability for all fully connected layers in the embeddings, encoder, and pooler. | |
| attention_probs_dropout_prob (`Union[float, int]`, *optional*, defaults to `0.1`) : The dropout ratio for the attention probabilities. | |
| max_position_embeddings (`int`, *optional*, defaults to `512`) : The maximum sequence length that this model might ever be used with. | |
| type_vocab_size (`int`, *optional*, defaults to `0`) : The vocabulary size of the `token_type_ids`. | |
| initializer_range (`float`, *optional*, defaults to `0.02`) : The standard deviation of the truncated_normal_initializer for initializing all weight matrices. | |
| layer_norm_eps (`float`, *optional*, defaults to `1e-07`) : The epsilon used by the layer normalization layers. | |
| relative_attention (`bool`, *optional*, defaults to `True`) : Whether use relative position encoding. | |
| max_relative_positions (`int`, *optional*, defaults to -1) : The range of relative positions `[-max_position_embeddings, max_position_embeddings]`. Use the same value as `max_position_embeddings`. | |
| pad_token_id (`int`, *optional*, defaults to `0`) : Token id used for padding in the vocabulary. | |
| bos_token_id (`int`, *optional*) : Token id used for beginning-of-stream in the vocabulary. | |
| eos_token_id (`Union[int, list[int]]`, *optional*) : Token id used for end-of-stream in the vocabulary. | |
| position_biased_input (`bool`, *optional*, defaults to `True`) : Whether add absolute position embedding to content embedding. | |
| pos_att_type (`list[str]`, *optional*) : The type of relative position attention, it can be a combination of `["p2c", "c2p"]`, e.g. `["p2c"]`, `["p2c", "c2p"]`, `["p2c", "c2p"]`. | |
| pooler_dropout (`float`, *optional*, defaults to `0`) : Dropout rate in the pooler module. | |
| pooler_hidden_act (`str`, *optional*, defaults to `"gelu"`) : Activation function used in the dropout module. | |
| legacy (`bool`, *optional*, defaults to `True`) : Whether or not the model should use the legacy `LegacyDebertaOnlyMLMHead`, which does not work properly for mask infilling tasks. | |
| tie_word_embeddings (`bool`, *optional*, defaults to `True`) : Whether to tie weight embeddings according to model's `tied_weights_keys` mapping. | |
| ## DebertaV2Tokenizer[[transformers.DebertaV2Tokenizer]] | |
| #### transformers.DebertaV2Tokenizer[[transformers.DebertaV2Tokenizer]] | |
| [Source](https://github.com/huggingface/transformers/blob/main/src/transformers/models/deberta_v2/tokenization_deberta_v2.py#L28) | |
| Construct a DeBERTa-v2 tokenizer (backed by HuggingFace's *tokenizers* library). Based on Unigram tokenization. | |
| This tokenizer inherits from [TokenizersBackend](/docs/transformers/main/ko/main_classes/tokenizer#transformers.TokenizersBackend) which contains most of the main methods. Users should | |
| refer to this superclass for more information regarding those methods. | |
| get_special_tokens_masktransformers.DebertaV2Tokenizer.get_special_tokens_maskhttps://github.com/huggingface/transformers/blob/main/src/transformers/tokenization_utils_base.py#L1319[{"name": "token_ids_0", "val": ": list[int]"}, {"name": "token_ids_1", "val": ": list[int] | None = None"}, {"name": "already_has_special_tokens", "val": ": bool = False"}]- **token_ids_0** -- List of IDs for the (possibly already formatted) sequence. | |
| - **token_ids_1** -- Unused when `already_has_special_tokens=True`. Must be None in that case. | |
| - **already_has_special_tokens** -- Whether the sequence is already formatted with special tokens.0A list of integers in the range [0, 1]1 for a special token, 0 for a sequence token. | |
| Retrieve sequence ids from a token list that has no special tokens added. | |
| For fast tokenizers, data collators call this with `already_has_special_tokens=True` to build a mask over an | |
| already-formatted sequence. In that case, we compute the mask by checking membership in `all_special_ids`. | |
| **Parameters:** | |
| vocab_file (`str`, *optional*) : Path to the vocabulary file (SentencePiece model file). Not used directly but kept for compatibility. | |
| vocab (`str`, `dict` or `list`, *optional*) : List of tuples (piece, score) for the vocabulary. | |
| precompiled_charsmap (`bytes`, *optional*) : Precompiled character map for normalization. | |
| do_lower_case (`bool`, *optional*, defaults to `False`) : Whether or not to lowercase the input when tokenizing. | |
| split_by_punct (`bool`, *optional*, defaults to `False`) : Whether to split by punctuation. | |
| bos_token (`str`, *optional*, defaults to `"[CLS]"`) : The beginning of sequence token. | |
| eos_token (`str`, *optional*, defaults to `"[SEP]"`) : The end of sequence token. | |
| unk_token (`str`, *optional*, defaults to `"[UNK]"`) : The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this token instead. | |
| sep_token (`str`, *optional*, defaults to `"[SEP]"`) : The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for sequence classification or for a text and a question for question answering. It is also used as the last token of a sequence built with special tokens. | |
| pad_token (`str`, *optional*, defaults to `"[PAD]"`) : The token used for padding, for example when batching sequences of different lengths. | |
| cls_token (`str`, *optional*, defaults to `"[CLS]"`) : The classifier token which is used when doing sequence classification (classification of the whole sequence instead of per-token classification). It is the first token of the sequence when built with special tokens. | |
| mask_token (`str`, *optional*, defaults to `"[MASK]"`) : The token used for masking values. This is the token used when training this model with masked language modeling. This is the token which the model will try to predict. | |
| add_prefix_space (`bool`, *optional*, defaults to `True`) : Whether or not to add an initial space to the input. This allows to treat the leading word just as any other word. | |
| unk_id (`int`, *optional*, defaults to index of `unk_token` in vocab) : The ID of the unknown token in the vocabulary. | |
| **Returns:** | |
| `A list of integers in the range [0, 1]` | |
| 1 for a special token, 0 for a sequence token. | |
| #### save_vocabulary[[transformers.DebertaV2Tokenizer.save_vocabulary]] | |
| [Source](https://github.com/huggingface/transformers/blob/main/src/transformers/tokenization_utils_tokenizers.py#L509) | |
| ## DebertaV2TokenizerFast[[transformers.DebertaV2Tokenizer]] | |
| #### transformers.DebertaV2Tokenizer[[transformers.DebertaV2Tokenizer]] | |
| [Source](https://github.com/huggingface/transformers/blob/main/src/transformers/models/deberta_v2/tokenization_deberta_v2.py#L28) | |
| Construct a DeBERTa-v2 tokenizer (backed by HuggingFace's *tokenizers* library). Based on Unigram tokenization. | |
| This tokenizer inherits from [TokenizersBackend](/docs/transformers/main/ko/main_classes/tokenizer#transformers.TokenizersBackend) which contains most of the main methods. Users should | |
| refer to this superclass for more information regarding those methods. | |
| **Parameters:** | |
| vocab_file (`str`, *optional*) : Path to the vocabulary file (SentencePiece model file). Not used directly but kept for compatibility. | |
| vocab (`str`, `dict` or `list`, *optional*) : List of tuples (piece, score) for the vocabulary. | |
| precompiled_charsmap (`bytes`, *optional*) : Precompiled character map for normalization. | |
| do_lower_case (`bool`, *optional*, defaults to `False`) : Whether or not to lowercase the input when tokenizing. | |
| split_by_punct (`bool`, *optional*, defaults to `False`) : Whether to split by punctuation. | |
| bos_token (`str`, *optional*, defaults to `"[CLS]"`) : The beginning of sequence token. | |
| eos_token (`str`, *optional*, defaults to `"[SEP]"`) : The end of sequence token. | |
| unk_token (`str`, *optional*, defaults to `"[UNK]"`) : The unknown token. A token that is not in the vocabulary cannot be converted to an ID and is set to be this token instead. | |
| sep_token (`str`, *optional*, defaults to `"[SEP]"`) : The separator token, which is used when building a sequence from multiple sequences, e.g. two sequences for sequence classification or for a text and a question for question answering. It is also used as the last token of a sequence built with special tokens. | |
| pad_token (`str`, *optional*, defaults to `"[PAD]"`) : The token used for padding, for example when batching sequences of different lengths. | |
| cls_token (`str`, *optional*, defaults to `"[CLS]"`) : The classifier token which is used when doing sequence classification (classification of the whole sequence instead of per-token classification). It is the first token of the sequence when built with special tokens. | |
| mask_token (`str`, *optional*, defaults to `"[MASK]"`) : The token used for masking values. This is the token used when training this model with masked language modeling. This is the token which the model will try to predict. | |
| add_prefix_space (`bool`, *optional*, defaults to `True`) : Whether or not to add an initial space to the input. This allows to treat the leading word just as any other word. | |
| unk_id (`int`, *optional*, defaults to index of `unk_token` in vocab) : The ID of the unknown token in the vocabulary. | |
| ## DebertaV2Model[[transformers.DebertaV2Model]] | |
| #### transformers.DebertaV2Model[[transformers.DebertaV2Model]] | |
| [Source](https://github.com/huggingface/transformers/blob/main/src/transformers/models/deberta_v2/modeling_deberta_v2.py#L704) | |
| The bare Deberta V2 Model outputting raw hidden-states without any specific head on top. | |
| This model inherits from [PreTrainedModel](/docs/transformers/main/ko/main_classes/model#transformers.PreTrainedModel). Check the superclass documentation for the generic methods the | |
| library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads | |
| etc.) | |
| This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass. | |
| Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage | |
| and behavior. | |
| forwardtransformers.DebertaV2Model.forwardhttps://github.com/huggingface/transformers/blob/main/src/transformers/models/deberta_v2/modeling_deberta_v2.py#L721[{"name": "input_ids", "val": ": torch.Tensor | None = None"}, {"name": "attention_mask", "val": ": torch.Tensor | None = None"}, {"name": "token_type_ids", "val": ": torch.Tensor | None = None"}, {"name": "position_ids", "val": ": torch.Tensor | None = None"}, {"name": "inputs_embeds", "val": ": torch.Tensor | None = None"}, {"name": "output_attentions", "val": ": bool | None = None"}, {"name": "output_hidden_states", "val": ": bool | None = None"}, {"name": "return_dict", "val": ": bool | None = None"}, {"name": "**kwargs", "val": ""}]- **input_ids** (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Indices of input sequence tokens in the vocabulary. Padding will be ignored by default. | |
| Indices can be obtained using [AutoTokenizer](/docs/transformers/main/ko/model_doc/auto#transformers.AutoTokenizer). See [PreTrainedTokenizer.encode()](/docs/transformers/main/ko/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.encode) and | |
| [PreTrainedTokenizer.__call__()](/docs/transformers/main/ko/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.__call__) for details. | |
| [What are input IDs?](../glossary#input-ids) | |
| - **attention_mask** (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`: | |
| - 1 for tokens that are **not masked**, | |
| - 0 for tokens that are **masked**. | |
| [What are attention masks?](../glossary#attention-mask) | |
| - **token_type_ids** (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Segment token indices to indicate first and second portions of the inputs. Indices are selected in `[0, 1]`: | |
| - 0 corresponds to a *sentence A* token, | |
| - 1 corresponds to a *sentence B* token. | |
| [What are token type IDs?](../glossary#token-type-ids) | |
| - **position_ids** (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Indices of positions of each input sequence tokens in the position embeddings. Selected in the range `[0, config.n_positions - 1]`. | |
| [What are position IDs?](../glossary#position-ids) | |
| - **inputs_embeds** (`torch.Tensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*) -- | |
| Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This | |
| is useful if you want more control over how to convert `input_ids` indices into associated vectors than the | |
| model's internal embedding lookup matrix. | |
| - **output_attentions** (`bool`, *optional*) -- | |
| Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned | |
| tensors for more detail. | |
| - **output_hidden_states** (`bool`, *optional*) -- | |
| Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for | |
| more detail. | |
| - **return_dict** (`bool`, *optional*) -- | |
| Whether or not to return a [ModelOutput](/docs/transformers/main/ko/main_classes/output#transformers.utils.ModelOutput) instead of a plain tuple.0[BaseModelOutput](/docs/transformers/main/ko/main_classes/output#transformers.modeling_outputs.BaseModelOutput) or `tuple(torch.FloatTensor)`A [BaseModelOutput](/docs/transformers/main/ko/main_classes/output#transformers.modeling_outputs.BaseModelOutput) or a tuple of | |
| `torch.FloatTensor` (if `return_dict=False` is passed or when `config.return_dict=False`) comprising various | |
| elements depending on the configuration ([DebertaV2Config](/docs/transformers/main/ko/model_doc/deberta-v2#transformers.DebertaV2Config)) and inputs. | |
| The [DebertaV2Model](/docs/transformers/main/ko/model_doc/deberta-v2#transformers.DebertaV2Model) forward method, overrides the `__call__` special method. | |
| Although the recipe for forward pass needs to be defined within this function, one should call the `Module` | |
| instance afterwards instead of this since the former takes care of running the pre and post processing steps while | |
| the latter silently ignores them. | |
| - **last_hidden_state** (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`) -- Sequence of hidden-states at the output of the last layer of the model. | |
| - **hidden_states** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`) -- Tuple of `torch.FloatTensor` (one for the output of the embeddings, if the model has an embedding layer, + | |
| one for the output of each layer) of shape `(batch_size, sequence_length, hidden_size)`. | |
| Hidden-states of the model at the output of each layer plus the optional initial embedding outputs. | |
| - **attentions** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`) -- Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length, | |
| sequence_length)`. | |
| Attentions weights after the attention softmax, used to compute the weighted average in the self-attention | |
| heads. | |
| **Parameters:** | |
| config ([DebertaV2Model](/docs/transformers/main/ko/model_doc/deberta-v2#transformers.DebertaV2Model)) : Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the [from_pretrained()](/docs/transformers/main/ko/main_classes/model#transformers.PreTrainedModel.from_pretrained) method to load the model weights. | |
| **Returns:** | |
| `[BaseModelOutput](/docs/transformers/main/ko/main_classes/output#transformers.modeling_outputs.BaseModelOutput) or `tuple(torch.FloatTensor)`` | |
| A [BaseModelOutput](/docs/transformers/main/ko/main_classes/output#transformers.modeling_outputs.BaseModelOutput) or a tuple of | |
| `torch.FloatTensor` (if `return_dict=False` is passed or when `config.return_dict=False`) comprising various | |
| elements depending on the configuration ([DebertaV2Config](/docs/transformers/main/ko/model_doc/deberta-v2#transformers.DebertaV2Config)) and inputs. | |
| ## DebertaV2PreTrainedModel[[transformers.DebertaV2PreTrainedModel]] | |
| #### transformers.DebertaV2PreTrainedModel[[transformers.DebertaV2PreTrainedModel]] | |
| [Source](https://github.com/huggingface/transformers/blob/main/src/transformers/models/deberta_v2/modeling_deberta_v2.py#L686) | |
| This model inherits from [PreTrainedModel](/docs/transformers/main/ko/main_classes/model#transformers.PreTrainedModel). Check the superclass documentation for the generic methods the | |
| library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads | |
| etc.) | |
| This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass. | |
| Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage | |
| and behavior. | |
| _forward_unimplementedtransformers.DebertaV2PreTrainedModel.forwardhttps://github.com/huggingface/transformers/blob/main/src/torch/nn/modules/module.py#L391[{"name": "*input", "val": ": typing.Any"}] | |
| Define the computation performed at every call. | |
| Should be overridden by all subclasses. | |
| Although the recipe for forward pass needs to be defined within | |
| this function, one should call the `Module` instance afterwards | |
| instead of this since the former takes care of running the | |
| registered hooks while the latter silently ignores them. | |
| **Parameters:** | |
| config ([PreTrainedConfig](/docs/transformers/main/ko/main_classes/configuration#transformers.PreTrainedConfig)) : Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the [from_pretrained()](/docs/transformers/main/ko/main_classes/model#transformers.PreTrainedModel.from_pretrained) method to load the model weights. | |
| ## DebertaV2ForMaskedLM[[transformers.DebertaV2ForMaskedLM]] | |
| #### transformers.DebertaV2ForMaskedLM[[transformers.DebertaV2ForMaskedLM]] | |
| [Source](https://github.com/huggingface/transformers/blob/main/src/transformers/models/deberta_v2/modeling_deberta_v2.py#L889) | |
| The Deberta V2 Model with a `language modeling` head on top." | |
| This model inherits from [PreTrainedModel](/docs/transformers/main/ko/main_classes/model#transformers.PreTrainedModel). Check the superclass documentation for the generic methods the | |
| library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads | |
| etc.) | |
| This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass. | |
| Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage | |
| and behavior. | |
| forwardtransformers.DebertaV2ForMaskedLM.forwardhttps://github.com/huggingface/transformers/blob/main/src/transformers/models/deberta_v2/modeling_deberta_v2.py#L924[{"name": "input_ids", "val": ": torch.Tensor | None = None"}, {"name": "attention_mask", "val": ": torch.Tensor | None = None"}, {"name": "token_type_ids", "val": ": torch.Tensor | None = None"}, {"name": "position_ids", "val": ": torch.Tensor | None = None"}, {"name": "inputs_embeds", "val": ": torch.Tensor | None = None"}, {"name": "labels", "val": ": torch.Tensor | None = None"}, {"name": "output_attentions", "val": ": bool | None = None"}, {"name": "output_hidden_states", "val": ": bool | None = None"}, {"name": "return_dict", "val": ": bool | None = None"}, {"name": "**kwargs", "val": ""}]- **input_ids** (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Indices of input sequence tokens in the vocabulary. Padding will be ignored by default. | |
| Indices can be obtained using [AutoTokenizer](/docs/transformers/main/ko/model_doc/auto#transformers.AutoTokenizer). See [PreTrainedTokenizer.encode()](/docs/transformers/main/ko/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.encode) and | |
| [PreTrainedTokenizer.__call__()](/docs/transformers/main/ko/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.__call__) for details. | |
| [What are input IDs?](../glossary#input-ids) | |
| - **attention_mask** (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`: | |
| - 1 for tokens that are **not masked**, | |
| - 0 for tokens that are **masked**. | |
| [What are attention masks?](../glossary#attention-mask) | |
| - **token_type_ids** (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Segment token indices to indicate first and second portions of the inputs. Indices are selected in `[0, 1]`: | |
| - 0 corresponds to a *sentence A* token, | |
| - 1 corresponds to a *sentence B* token. | |
| [What are token type IDs?](../glossary#token-type-ids) | |
| - **position_ids** (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Indices of positions of each input sequence tokens in the position embeddings. Selected in the range `[0, config.n_positions - 1]`. | |
| [What are position IDs?](../glossary#position-ids) | |
| - **inputs_embeds** (`torch.Tensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*) -- | |
| Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This | |
| is useful if you want more control over how to convert `input_ids` indices into associated vectors than the | |
| model's internal embedding lookup matrix. | |
| - **labels** (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Labels for computing the masked language modeling loss. Indices should be in `[-100, 0, ..., | |
| config.vocab_size]` (see `input_ids` docstring) Tokens with indices set to `-100` are ignored (masked), the | |
| loss is only computed for the tokens with labels in `[0, ..., config.vocab_size]` | |
| - **output_attentions** (`bool`, *optional*) -- | |
| Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned | |
| tensors for more detail. | |
| - **output_hidden_states** (`bool`, *optional*) -- | |
| Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for | |
| more detail. | |
| - **return_dict** (`bool`, *optional*) -- | |
| Whether or not to return a [ModelOutput](/docs/transformers/main/ko/main_classes/output#transformers.utils.ModelOutput) instead of a plain tuple.0[MaskedLMOutput](/docs/transformers/main/ko/main_classes/output#transformers.modeling_outputs.MaskedLMOutput) or `tuple(torch.FloatTensor)`A [MaskedLMOutput](/docs/transformers/main/ko/main_classes/output#transformers.modeling_outputs.MaskedLMOutput) or a tuple of | |
| `torch.FloatTensor` (if `return_dict=False` is passed or when `config.return_dict=False`) comprising various | |
| elements depending on the configuration ([DebertaV2Config](/docs/transformers/main/ko/model_doc/deberta-v2#transformers.DebertaV2Config)) and inputs. | |
| The [DebertaV2ForMaskedLM](/docs/transformers/main/ko/model_doc/deberta-v2#transformers.DebertaV2ForMaskedLM) forward method, overrides the `__call__` special method. | |
| Although the recipe for forward pass needs to be defined within this function, one should call the `Module` | |
| instance afterwards instead of this since the former takes care of running the pre and post processing steps while | |
| the latter silently ignores them. | |
| - **loss** (`torch.FloatTensor` of shape `(1,)`, *optional*, returned when `labels` is provided) -- Masked language modeling (MLM) loss. | |
| - **logits** (`torch.FloatTensor` of shape `(batch_size, sequence_length, config.vocab_size)`) -- Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax). | |
| - **hidden_states** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`) -- Tuple of `torch.FloatTensor` (one for the output of the embeddings, if the model has an embedding layer, + | |
| one for the output of each layer) of shape `(batch_size, sequence_length, hidden_size)`. | |
| Hidden-states of the model at the output of each layer plus the optional initial embedding outputs. | |
| - **attentions** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`) -- Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length, | |
| sequence_length)`. | |
| Attentions weights after the attention softmax, used to compute the weighted average in the self-attention | |
| heads. | |
| Example: | |
| ```python | |
| >>> from transformers import AutoTokenizer, DebertaV2ForMaskedLM | |
| >>> import torch | |
| >>> tokenizer = AutoTokenizer.from_pretrained("microsoft/deberta-v2-xlarge") | |
| >>> model = DebertaV2ForMaskedLM.from_pretrained("microsoft/deberta-v2-xlarge") | |
| >>> inputs = tokenizer("The capital of France is .", return_tensors="pt") | |
| >>> with torch.no_grad(): | |
| ... logits = model(**inputs).logits | |
| >>> # retrieve index of | |
| >>> mask_token_index = (inputs.input_ids == tokenizer.mask_token_id)[0].nonzero(as_tuple=True)[0] | |
| >>> predicted_token_id = logits[0, mask_token_index].argmax(axis=-1) | |
| >>> tokenizer.decode(predicted_token_id) | |
| ... | |
| >>> labels = tokenizer("The capital of France is Paris.", return_tensors="pt")["input_ids"] | |
| >>> # mask labels of non- tokens | |
| >>> labels = torch.where(inputs.input_ids == tokenizer.mask_token_id, labels, -100) | |
| >>> outputs = model(**inputs, labels=labels) | |
| >>> round(outputs.loss.item(), 2) | |
| ... | |
| ``` | |
| **Parameters:** | |
| config ([DebertaV2ForMaskedLM](/docs/transformers/main/ko/model_doc/deberta-v2#transformers.DebertaV2ForMaskedLM)) : Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the [from_pretrained()](/docs/transformers/main/ko/main_classes/model#transformers.PreTrainedModel.from_pretrained) method to load the model weights. | |
| **Returns:** | |
| `[MaskedLMOutput](/docs/transformers/main/ko/main_classes/output#transformers.modeling_outputs.MaskedLMOutput) or `tuple(torch.FloatTensor)`` | |
| A [MaskedLMOutput](/docs/transformers/main/ko/main_classes/output#transformers.modeling_outputs.MaskedLMOutput) or a tuple of | |
| `torch.FloatTensor` (if `return_dict=False` is passed or when `config.return_dict=False`) comprising various | |
| elements depending on the configuration ([DebertaV2Config](/docs/transformers/main/ko/model_doc/deberta-v2#transformers.DebertaV2Config)) and inputs. | |
| ## DebertaV2ForSequenceClassification[[transformers.DebertaV2ForSequenceClassification]] | |
| #### transformers.DebertaV2ForSequenceClassification[[transformers.DebertaV2ForSequenceClassification]] | |
| [Source](https://github.com/huggingface/transformers/blob/main/src/transformers/models/deberta_v2/modeling_deberta_v2.py#L1011) | |
| DeBERTa Model transformer with a sequence classification/regression head on top (a linear layer on top of the | |
| pooled output) e.g. for GLUE tasks. | |
| This model inherits from [PreTrainedModel](/docs/transformers/main/ko/main_classes/model#transformers.PreTrainedModel). Check the superclass documentation for the generic methods the | |
| library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads | |
| etc.) | |
| This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass. | |
| Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage | |
| and behavior. | |
| forwardtransformers.DebertaV2ForSequenceClassification.forwardhttps://github.com/huggingface/transformers/blob/main/src/transformers/models/deberta_v2/modeling_deberta_v2.py#L1036[{"name": "input_ids", "val": ": torch.Tensor | None = None"}, {"name": "attention_mask", "val": ": torch.Tensor | None = None"}, {"name": "token_type_ids", "val": ": torch.Tensor | None = None"}, {"name": "position_ids", "val": ": torch.Tensor | None = None"}, {"name": "inputs_embeds", "val": ": torch.Tensor | None = None"}, {"name": "labels", "val": ": torch.Tensor | None = None"}, {"name": "output_attentions", "val": ": bool | None = None"}, {"name": "output_hidden_states", "val": ": bool | None = None"}, {"name": "return_dict", "val": ": bool | None = None"}, {"name": "**kwargs", "val": ""}]- **input_ids** (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Indices of input sequence tokens in the vocabulary. Padding will be ignored by default. | |
| Indices can be obtained using [AutoTokenizer](/docs/transformers/main/ko/model_doc/auto#transformers.AutoTokenizer). See [PreTrainedTokenizer.encode()](/docs/transformers/main/ko/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.encode) and | |
| [PreTrainedTokenizer.__call__()](/docs/transformers/main/ko/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.__call__) for details. | |
| [What are input IDs?](../glossary#input-ids) | |
| - **attention_mask** (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`: | |
| - 1 for tokens that are **not masked**, | |
| - 0 for tokens that are **masked**. | |
| [What are attention masks?](../glossary#attention-mask) | |
| - **token_type_ids** (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Segment token indices to indicate first and second portions of the inputs. Indices are selected in `[0, 1]`: | |
| - 0 corresponds to a *sentence A* token, | |
| - 1 corresponds to a *sentence B* token. | |
| [What are token type IDs?](../glossary#token-type-ids) | |
| - **position_ids** (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Indices of positions of each input sequence tokens in the position embeddings. Selected in the range `[0, config.n_positions - 1]`. | |
| [What are position IDs?](../glossary#position-ids) | |
| - **inputs_embeds** (`torch.Tensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*) -- | |
| Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This | |
| is useful if you want more control over how to convert `input_ids` indices into associated vectors than the | |
| model's internal embedding lookup matrix. | |
| - **labels** (`torch.LongTensor` of shape `(batch_size,)`, *optional*) -- | |
| Labels for computing the sequence classification/regression loss. Indices should be in `[0, ..., | |
| config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), If | |
| `config.num_labels > 1` a classification loss is computed (Cross-Entropy). | |
| - **output_attentions** (`bool`, *optional*) -- | |
| Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned | |
| tensors for more detail. | |
| - **output_hidden_states** (`bool`, *optional*) -- | |
| Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for | |
| more detail. | |
| - **return_dict** (`bool`, *optional*) -- | |
| Whether or not to return a [ModelOutput](/docs/transformers/main/ko/main_classes/output#transformers.utils.ModelOutput) instead of a plain tuple.0[SequenceClassifierOutput](/docs/transformers/main/ko/main_classes/output#transformers.modeling_outputs.SequenceClassifierOutput) or `tuple(torch.FloatTensor)`A [SequenceClassifierOutput](/docs/transformers/main/ko/main_classes/output#transformers.modeling_outputs.SequenceClassifierOutput) or a tuple of | |
| `torch.FloatTensor` (if `return_dict=False` is passed or when `config.return_dict=False`) comprising various | |
| elements depending on the configuration ([DebertaV2Config](/docs/transformers/main/ko/model_doc/deberta-v2#transformers.DebertaV2Config)) and inputs. | |
| The [DebertaV2ForSequenceClassification](/docs/transformers/main/ko/model_doc/deberta-v2#transformers.DebertaV2ForSequenceClassification) forward method, overrides the `__call__` special method. | |
| Although the recipe for forward pass needs to be defined within this function, one should call the `Module` | |
| instance afterwards instead of this since the former takes care of running the pre and post processing steps while | |
| the latter silently ignores them. | |
| - **loss** (`torch.FloatTensor` of shape `(1,)`, *optional*, returned when `labels` is provided) -- Classification (or regression if config.num_labels==1) loss. | |
| - **logits** (`torch.FloatTensor` of shape `(batch_size, config.num_labels)`) -- Classification (or regression if config.num_labels==1) scores (before SoftMax). | |
| - **hidden_states** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`) -- Tuple of `torch.FloatTensor` (one for the output of the embeddings, if the model has an embedding layer, + | |
| one for the output of each layer) of shape `(batch_size, sequence_length, hidden_size)`. | |
| Hidden-states of the model at the output of each layer plus the optional initial embedding outputs. | |
| - **attentions** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`) -- Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length, | |
| sequence_length)`. | |
| Attentions weights after the attention softmax, used to compute the weighted average in the self-attention | |
| heads. | |
| Example of single-label classification: | |
| ```python | |
| >>> import torch | |
| >>> from transformers import AutoTokenizer, DebertaV2ForSequenceClassification | |
| >>> tokenizer = AutoTokenizer.from_pretrained("microsoft/deberta-v2-xlarge") | |
| >>> model = DebertaV2ForSequenceClassification.from_pretrained("microsoft/deberta-v2-xlarge") | |
| >>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt") | |
| >>> with torch.no_grad(): | |
| ... logits = model(**inputs).logits | |
| >>> predicted_class_id = logits.argmax().item() | |
| >>> model.config.id2label[predicted_class_id] | |
| ... | |
| >>> # To train a model on `num_labels` classes, you can pass `num_labels=num_labels` to `.from_pretrained(...)` | |
| >>> num_labels = len(model.config.id2label) | |
| >>> model = DebertaV2ForSequenceClassification.from_pretrained("microsoft/deberta-v2-xlarge", num_labels=num_labels) | |
| >>> labels = torch.tensor([1]) | |
| >>> loss = model(**inputs, labels=labels).loss | |
| >>> round(loss.item(), 2) | |
| ... | |
| ``` | |
| Example of multi-label classification: | |
| ```python | |
| >>> import torch | |
| >>> from transformers import AutoTokenizer, DebertaV2ForSequenceClassification | |
| >>> tokenizer = AutoTokenizer.from_pretrained("microsoft/deberta-v2-xlarge") | |
| >>> model = DebertaV2ForSequenceClassification.from_pretrained("microsoft/deberta-v2-xlarge", problem_type="multi_label_classification") | |
| >>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt") | |
| >>> with torch.no_grad(): | |
| ... logits = model(**inputs).logits | |
| >>> predicted_class_ids = torch.arange(0, logits.shape[-1])[torch.sigmoid(logits).squeeze(dim=0) > 0.5] | |
| >>> # To train a model on `num_labels` classes, you can pass `num_labels=num_labels` to `.from_pretrained(...)` | |
| >>> num_labels = len(model.config.id2label) | |
| >>> model = DebertaV2ForSequenceClassification.from_pretrained( | |
| ... "microsoft/deberta-v2-xlarge", num_labels=num_labels, problem_type="multi_label_classification" | |
| ... ) | |
| >>> labels = torch.sum( | |
| ... torch.nn.functional.one_hot(predicted_class_ids[None, :].clone(), num_classes=num_labels), dim=1 | |
| ... ).to(torch.float) | |
| >>> loss = model(**inputs, labels=labels).loss | |
| ``` | |
| **Parameters:** | |
| config ([DebertaV2ForSequenceClassification](/docs/transformers/main/ko/model_doc/deberta-v2#transformers.DebertaV2ForSequenceClassification)) : Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the [from_pretrained()](/docs/transformers/main/ko/main_classes/model#transformers.PreTrainedModel.from_pretrained) method to load the model weights. | |
| **Returns:** | |
| `[SequenceClassifierOutput](/docs/transformers/main/ko/main_classes/output#transformers.modeling_outputs.SequenceClassifierOutput) or `tuple(torch.FloatTensor)`` | |
| A [SequenceClassifierOutput](/docs/transformers/main/ko/main_classes/output#transformers.modeling_outputs.SequenceClassifierOutput) or a tuple of | |
| `torch.FloatTensor` (if `return_dict=False` is passed or when `config.return_dict=False`) comprising various | |
| elements depending on the configuration ([DebertaV2Config](/docs/transformers/main/ko/model_doc/deberta-v2#transformers.DebertaV2Config)) and inputs. | |
| ## DebertaV2ForTokenClassification[[transformers.DebertaV2ForTokenClassification]] | |
| #### transformers.DebertaV2ForTokenClassification[[transformers.DebertaV2ForTokenClassification]] | |
| [Source](https://github.com/huggingface/transformers/blob/main/src/transformers/models/deberta_v2/modeling_deberta_v2.py#L1121) | |
| The Deberta V2 transformer with a token classification head on top (a linear layer on top of the hidden-states | |
| output) e.g. for Named-Entity-Recognition (NER) tasks. | |
| This model inherits from [PreTrainedModel](/docs/transformers/main/ko/main_classes/model#transformers.PreTrainedModel). Check the superclass documentation for the generic methods the | |
| library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads | |
| etc.) | |
| This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass. | |
| Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage | |
| and behavior. | |
| forwardtransformers.DebertaV2ForTokenClassification.forwardhttps://github.com/huggingface/transformers/blob/main/src/transformers/models/deberta_v2/modeling_deberta_v2.py#L1133[{"name": "input_ids", "val": ": torch.Tensor | None = None"}, {"name": "attention_mask", "val": ": torch.Tensor | None = None"}, {"name": "token_type_ids", "val": ": torch.Tensor | None = None"}, {"name": "position_ids", "val": ": torch.Tensor | None = None"}, {"name": "inputs_embeds", "val": ": torch.Tensor | None = None"}, {"name": "labels", "val": ": torch.Tensor | None = None"}, {"name": "output_attentions", "val": ": bool | None = None"}, {"name": "output_hidden_states", "val": ": bool | None = None"}, {"name": "return_dict", "val": ": bool | None = None"}, {"name": "**kwargs", "val": ""}]- **input_ids** (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Indices of input sequence tokens in the vocabulary. Padding will be ignored by default. | |
| Indices can be obtained using [AutoTokenizer](/docs/transformers/main/ko/model_doc/auto#transformers.AutoTokenizer). See [PreTrainedTokenizer.encode()](/docs/transformers/main/ko/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.encode) and | |
| [PreTrainedTokenizer.__call__()](/docs/transformers/main/ko/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.__call__) for details. | |
| [What are input IDs?](../glossary#input-ids) | |
| - **attention_mask** (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`: | |
| - 1 for tokens that are **not masked**, | |
| - 0 for tokens that are **masked**. | |
| [What are attention masks?](../glossary#attention-mask) | |
| - **token_type_ids** (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Segment token indices to indicate first and second portions of the inputs. Indices are selected in `[0, 1]`: | |
| - 0 corresponds to a *sentence A* token, | |
| - 1 corresponds to a *sentence B* token. | |
| [What are token type IDs?](../glossary#token-type-ids) | |
| - **position_ids** (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Indices of positions of each input sequence tokens in the position embeddings. Selected in the range `[0, config.n_positions - 1]`. | |
| [What are position IDs?](../glossary#position-ids) | |
| - **inputs_embeds** (`torch.Tensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*) -- | |
| Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This | |
| is useful if you want more control over how to convert `input_ids` indices into associated vectors than the | |
| model's internal embedding lookup matrix. | |
| - **labels** (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Labels for computing the token classification loss. Indices should be in `[0, ..., config.num_labels - 1]`. | |
| - **output_attentions** (`bool`, *optional*) -- | |
| Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned | |
| tensors for more detail. | |
| - **output_hidden_states** (`bool`, *optional*) -- | |
| Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for | |
| more detail. | |
| - **return_dict** (`bool`, *optional*) -- | |
| Whether or not to return a [ModelOutput](/docs/transformers/main/ko/main_classes/output#transformers.utils.ModelOutput) instead of a plain tuple.0[TokenClassifierOutput](/docs/transformers/main/ko/main_classes/output#transformers.modeling_outputs.TokenClassifierOutput) or `tuple(torch.FloatTensor)`A [TokenClassifierOutput](/docs/transformers/main/ko/main_classes/output#transformers.modeling_outputs.TokenClassifierOutput) or a tuple of | |
| `torch.FloatTensor` (if `return_dict=False` is passed or when `config.return_dict=False`) comprising various | |
| elements depending on the configuration ([DebertaV2Config](/docs/transformers/main/ko/model_doc/deberta-v2#transformers.DebertaV2Config)) and inputs. | |
| The [DebertaV2ForTokenClassification](/docs/transformers/main/ko/model_doc/deberta-v2#transformers.DebertaV2ForTokenClassification) forward method, overrides the `__call__` special method. | |
| Although the recipe for forward pass needs to be defined within this function, one should call the `Module` | |
| instance afterwards instead of this since the former takes care of running the pre and post processing steps while | |
| the latter silently ignores them. | |
| - **loss** (`torch.FloatTensor` of shape `(1,)`, *optional*, returned when `labels` is provided) -- Classification loss. | |
| - **logits** (`torch.FloatTensor` of shape `(batch_size, sequence_length, config.num_labels)`) -- Classification scores (before SoftMax). | |
| - **hidden_states** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`) -- Tuple of `torch.FloatTensor` (one for the output of the embeddings, if the model has an embedding layer, + | |
| one for the output of each layer) of shape `(batch_size, sequence_length, hidden_size)`. | |
| Hidden-states of the model at the output of each layer plus the optional initial embedding outputs. | |
| - **attentions** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`) -- Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length, | |
| sequence_length)`. | |
| Attentions weights after the attention softmax, used to compute the weighted average in the self-attention | |
| heads. | |
| Example: | |
| ```python | |
| >>> from transformers import AutoTokenizer, DebertaV2ForTokenClassification | |
| >>> import torch | |
| >>> tokenizer = AutoTokenizer.from_pretrained("microsoft/deberta-v2-xlarge") | |
| >>> model = DebertaV2ForTokenClassification.from_pretrained("microsoft/deberta-v2-xlarge") | |
| >>> inputs = tokenizer( | |
| ... "HuggingFace is a company based in Paris and New York", add_special_tokens=False, return_tensors="pt" | |
| ... ) | |
| >>> with torch.no_grad(): | |
| ... logits = model(**inputs).logits | |
| >>> predicted_token_class_ids = logits.argmax(-1) | |
| >>> # Note that tokens are classified rather then input words which means that | |
| >>> # there might be more predicted token classes than words. | |
| >>> # Multiple token classes might account for the same word | |
| >>> predicted_tokens_classes = [model.config.id2label[t.item()] for t in predicted_token_class_ids[0]] | |
| >>> predicted_tokens_classes | |
| ... | |
| >>> labels = predicted_token_class_ids | |
| >>> loss = model(**inputs, labels=labels).loss | |
| >>> round(loss.item(), 2) | |
| ... | |
| ``` | |
| **Parameters:** | |
| config ([DebertaV2ForTokenClassification](/docs/transformers/main/ko/model_doc/deberta-v2#transformers.DebertaV2ForTokenClassification)) : Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the [from_pretrained()](/docs/transformers/main/ko/main_classes/model#transformers.PreTrainedModel.from_pretrained) method to load the model weights. | |
| **Returns:** | |
| `[TokenClassifierOutput](/docs/transformers/main/ko/main_classes/output#transformers.modeling_outputs.TokenClassifierOutput) or `tuple(torch.FloatTensor)`` | |
| A [TokenClassifierOutput](/docs/transformers/main/ko/main_classes/output#transformers.modeling_outputs.TokenClassifierOutput) or a tuple of | |
| `torch.FloatTensor` (if `return_dict=False` is passed or when `config.return_dict=False`) comprising various | |
| elements depending on the configuration ([DebertaV2Config](/docs/transformers/main/ko/model_doc/deberta-v2#transformers.DebertaV2Config)) and inputs. | |
| ## DebertaV2ForQuestionAnswering[[transformers.DebertaV2ForQuestionAnswering]] | |
| #### transformers.DebertaV2ForQuestionAnswering[[transformers.DebertaV2ForQuestionAnswering]] | |
| [Source](https://github.com/huggingface/transformers/blob/main/src/transformers/models/deberta_v2/modeling_deberta_v2.py#L1184) | |
| The Deberta V2 transformer with a span classification head on top for extractive question-answering tasks like | |
| SQuAD (a linear layer on top of the hidden-states output to compute `span start logits` and `span end logits`). | |
| This model inherits from [PreTrainedModel](/docs/transformers/main/ko/main_classes/model#transformers.PreTrainedModel). Check the superclass documentation for the generic methods the | |
| library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads | |
| etc.) | |
| This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass. | |
| Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage | |
| and behavior. | |
| forwardtransformers.DebertaV2ForQuestionAnswering.forwardhttps://github.com/huggingface/transformers/blob/main/src/transformers/models/deberta_v2/modeling_deberta_v2.py#L1195[{"name": "input_ids", "val": ": torch.Tensor | None = None"}, {"name": "attention_mask", "val": ": torch.Tensor | None = None"}, {"name": "token_type_ids", "val": ": torch.Tensor | None = None"}, {"name": "position_ids", "val": ": torch.Tensor | None = None"}, {"name": "inputs_embeds", "val": ": torch.Tensor | None = None"}, {"name": "start_positions", "val": ": torch.Tensor | None = None"}, {"name": "end_positions", "val": ": torch.Tensor | None = None"}, {"name": "output_attentions", "val": ": bool | None = None"}, {"name": "output_hidden_states", "val": ": bool | None = None"}, {"name": "return_dict", "val": ": bool | None = None"}, {"name": "**kwargs", "val": ""}]- **input_ids** (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Indices of input sequence tokens in the vocabulary. Padding will be ignored by default. | |
| Indices can be obtained using [AutoTokenizer](/docs/transformers/main/ko/model_doc/auto#transformers.AutoTokenizer). See [PreTrainedTokenizer.encode()](/docs/transformers/main/ko/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.encode) and | |
| [PreTrainedTokenizer.__call__()](/docs/transformers/main/ko/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.__call__) for details. | |
| [What are input IDs?](../glossary#input-ids) | |
| - **attention_mask** (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`: | |
| - 1 for tokens that are **not masked**, | |
| - 0 for tokens that are **masked**. | |
| [What are attention masks?](../glossary#attention-mask) | |
| - **token_type_ids** (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Segment token indices to indicate first and second portions of the inputs. Indices are selected in `[0, 1]`: | |
| - 0 corresponds to a *sentence A* token, | |
| - 1 corresponds to a *sentence B* token. | |
| [What are token type IDs?](../glossary#token-type-ids) | |
| - **position_ids** (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Indices of positions of each input sequence tokens in the position embeddings. Selected in the range `[0, config.n_positions - 1]`. | |
| [What are position IDs?](../glossary#position-ids) | |
| - **inputs_embeds** (`torch.Tensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*) -- | |
| Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This | |
| is useful if you want more control over how to convert `input_ids` indices into associated vectors than the | |
| model's internal embedding lookup matrix. | |
| - **start_positions** (`torch.Tensor` of shape `(batch_size,)`, *optional*) -- | |
| Labels for position (index) of the start of the labelled span for computing the token classification loss. | |
| Positions are clamped to the length of the sequence (`sequence_length`). Position outside of the sequence | |
| are not taken into account for computing the loss. | |
| - **end_positions** (`torch.Tensor` of shape `(batch_size,)`, *optional*) -- | |
| Labels for position (index) of the end of the labelled span for computing the token classification loss. | |
| Positions are clamped to the length of the sequence (`sequence_length`). Position outside of the sequence | |
| are not taken into account for computing the loss. | |
| - **output_attentions** (`bool`, *optional*) -- | |
| Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned | |
| tensors for more detail. | |
| - **output_hidden_states** (`bool`, *optional*) -- | |
| Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for | |
| more detail. | |
| - **return_dict** (`bool`, *optional*) -- | |
| Whether or not to return a [ModelOutput](/docs/transformers/main/ko/main_classes/output#transformers.utils.ModelOutput) instead of a plain tuple.0[QuestionAnsweringModelOutput](/docs/transformers/main/ko/main_classes/output#transformers.modeling_outputs.QuestionAnsweringModelOutput) or `tuple(torch.FloatTensor)`A [QuestionAnsweringModelOutput](/docs/transformers/main/ko/main_classes/output#transformers.modeling_outputs.QuestionAnsweringModelOutput) or a tuple of | |
| `torch.FloatTensor` (if `return_dict=False` is passed or when `config.return_dict=False`) comprising various | |
| elements depending on the configuration ([DebertaV2Config](/docs/transformers/main/ko/model_doc/deberta-v2#transformers.DebertaV2Config)) and inputs. | |
| The [DebertaV2ForQuestionAnswering](/docs/transformers/main/ko/model_doc/deberta-v2#transformers.DebertaV2ForQuestionAnswering) forward method, overrides the `__call__` special method. | |
| Although the recipe for forward pass needs to be defined within this function, one should call the `Module` | |
| instance afterwards instead of this since the former takes care of running the pre and post processing steps while | |
| the latter silently ignores them. | |
| - **loss** (`torch.FloatTensor` of shape `(1,)`, *optional*, returned when `labels` is provided) -- Total span extraction loss is the sum of a Cross-Entropy for the start and end positions. | |
| - **start_logits** (`torch.FloatTensor` of shape `(batch_size, sequence_length)`) -- Span-start scores (before SoftMax). | |
| - **end_logits** (`torch.FloatTensor` of shape `(batch_size, sequence_length)`) -- Span-end scores (before SoftMax). | |
| - **hidden_states** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`) -- Tuple of `torch.FloatTensor` (one for the output of the embeddings, if the model has an embedding layer, + | |
| one for the output of each layer) of shape `(batch_size, sequence_length, hidden_size)`. | |
| Hidden-states of the model at the output of each layer plus the optional initial embedding outputs. | |
| - **attentions** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`) -- Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length, | |
| sequence_length)`. | |
| Attentions weights after the attention softmax, used to compute the weighted average in the self-attention | |
| heads. | |
| Example: | |
| ```python | |
| >>> from transformers import AutoTokenizer, DebertaV2ForQuestionAnswering | |
| >>> import torch | |
| >>> tokenizer = AutoTokenizer.from_pretrained("microsoft/deberta-v2-xlarge") | |
| >>> model = DebertaV2ForQuestionAnswering.from_pretrained("microsoft/deberta-v2-xlarge") | |
| >>> question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet" | |
| >>> inputs = tokenizer(question, text, return_tensors="pt") | |
| >>> with torch.no_grad(): | |
| ... outputs = model(**inputs) | |
| >>> answer_start_index = outputs.start_logits.argmax() | |
| >>> answer_end_index = outputs.end_logits.argmax() | |
| >>> predict_answer_tokens = inputs.input_ids[0, answer_start_index : answer_end_index + 1] | |
| >>> tokenizer.decode(predict_answer_tokens, skip_special_tokens=True) | |
| ... | |
| >>> # target is "nice puppet" | |
| >>> target_start_index = torch.tensor([14]) | |
| >>> target_end_index = torch.tensor([15]) | |
| >>> outputs = model(**inputs, start_positions=target_start_index, end_positions=target_end_index) | |
| >>> loss = outputs.loss | |
| >>> round(loss.item(), 2) | |
| ... | |
| ``` | |
| **Parameters:** | |
| config ([DebertaV2ForQuestionAnswering](/docs/transformers/main/ko/model_doc/deberta-v2#transformers.DebertaV2ForQuestionAnswering)) : Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the [from_pretrained()](/docs/transformers/main/ko/main_classes/model#transformers.PreTrainedModel.from_pretrained) method to load the model weights. | |
| **Returns:** | |
| `[QuestionAnsweringModelOutput](/docs/transformers/main/ko/main_classes/output#transformers.modeling_outputs.QuestionAnsweringModelOutput) or `tuple(torch.FloatTensor)`` | |
| A [QuestionAnsweringModelOutput](/docs/transformers/main/ko/main_classes/output#transformers.modeling_outputs.QuestionAnsweringModelOutput) or a tuple of | |
| `torch.FloatTensor` (if `return_dict=False` is passed or when `config.return_dict=False`) comprising various | |
| elements depending on the configuration ([DebertaV2Config](/docs/transformers/main/ko/model_doc/deberta-v2#transformers.DebertaV2Config)) and inputs. | |
| ## DebertaV2ForMultipleChoice[[transformers.DebertaV2ForMultipleChoice]] | |
| #### transformers.DebertaV2ForMultipleChoice[[transformers.DebertaV2ForMultipleChoice]] | |
| [Source](https://github.com/huggingface/transformers/blob/main/src/transformers/models/deberta_v2/modeling_deberta_v2.py#L1262) | |
| The Deberta V2 Model with a multiple choice classification head on top (a linear layer on top of the pooled output and a | |
| softmax) e.g. for RocStories/SWAG tasks. | |
| This model inherits from [PreTrainedModel](/docs/transformers/main/ko/main_classes/model#transformers.PreTrainedModel). Check the superclass documentation for the generic methods the | |
| library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads | |
| etc.) | |
| This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass. | |
| Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage | |
| and behavior. | |
| forwardtransformers.DebertaV2ForMultipleChoice.forwardhttps://github.com/huggingface/transformers/blob/main/src/transformers/models/deberta_v2/modeling_deberta_v2.py#L1286[{"name": "input_ids", "val": ": torch.Tensor | None = None"}, {"name": "attention_mask", "val": ": torch.Tensor | None = None"}, {"name": "token_type_ids", "val": ": torch.Tensor | None = None"}, {"name": "position_ids", "val": ": torch.Tensor | None = None"}, {"name": "inputs_embeds", "val": ": torch.Tensor | None = None"}, {"name": "labels", "val": ": torch.Tensor | None = None"}, {"name": "output_attentions", "val": ": bool | None = None"}, {"name": "output_hidden_states", "val": ": bool | None = None"}, {"name": "return_dict", "val": ": bool | None = None"}, {"name": "**kwargs", "val": ""}]- **input_ids** (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Indices of input sequence tokens in the vocabulary. Padding will be ignored by default. | |
| Indices can be obtained using [AutoTokenizer](/docs/transformers/main/ko/model_doc/auto#transformers.AutoTokenizer). See [PreTrainedTokenizer.encode()](/docs/transformers/main/ko/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.encode) and | |
| [PreTrainedTokenizer.__call__()](/docs/transformers/main/ko/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.__call__) for details. | |
| [What are input IDs?](../glossary#input-ids) | |
| - **attention_mask** (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`: | |
| - 1 for tokens that are **not masked**, | |
| - 0 for tokens that are **masked**. | |
| [What are attention masks?](../glossary#attention-mask) | |
| - **token_type_ids** (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Segment token indices to indicate first and second portions of the inputs. Indices are selected in `[0, 1]`: | |
| - 0 corresponds to a *sentence A* token, | |
| - 1 corresponds to a *sentence B* token. | |
| [What are token type IDs?](../glossary#token-type-ids) | |
| - **position_ids** (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*) -- | |
| Indices of positions of each input sequence tokens in the position embeddings. Selected in the range `[0, config.n_positions - 1]`. | |
| [What are position IDs?](../glossary#position-ids) | |
| - **inputs_embeds** (`torch.Tensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*) -- | |
| Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This | |
| is useful if you want more control over how to convert `input_ids` indices into associated vectors than the | |
| model's internal embedding lookup matrix. | |
| - **labels** (`torch.LongTensor` of shape `(batch_size,)`, *optional*) -- | |
| Labels for computing the multiple choice classification loss. Indices should be in `[0, ..., | |
| num_choices-1]` where `num_choices` is the size of the second dimension of the input tensors. (See | |
| `input_ids` above) | |
| - **output_attentions** (`bool`, *optional*) -- | |
| Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned | |
| tensors for more detail. | |
| - **output_hidden_states** (`bool`, *optional*) -- | |
| Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for | |
| more detail. | |
| - **return_dict** (`bool`, *optional*) -- | |
| Whether or not to return a [ModelOutput](/docs/transformers/main/ko/main_classes/output#transformers.utils.ModelOutput) instead of a plain tuple.0[MultipleChoiceModelOutput](/docs/transformers/main/ko/main_classes/output#transformers.modeling_outputs.MultipleChoiceModelOutput) or `tuple(torch.FloatTensor)`A [MultipleChoiceModelOutput](/docs/transformers/main/ko/main_classes/output#transformers.modeling_outputs.MultipleChoiceModelOutput) or a tuple of | |
| `torch.FloatTensor` (if `return_dict=False` is passed or when `config.return_dict=False`) comprising various | |
| elements depending on the configuration ([DebertaV2Config](/docs/transformers/main/ko/model_doc/deberta-v2#transformers.DebertaV2Config)) and inputs. | |
| The [DebertaV2ForMultipleChoice](/docs/transformers/main/ko/model_doc/deberta-v2#transformers.DebertaV2ForMultipleChoice) forward method, overrides the `__call__` special method. | |
| Although the recipe for forward pass needs to be defined within this function, one should call the `Module` | |
| instance afterwards instead of this since the former takes care of running the pre and post processing steps while | |
| the latter silently ignores them. | |
| - **loss** (`torch.FloatTensor` of shape *(1,)*, *optional*, returned when `labels` is provided) -- Classification loss. | |
| - **logits** (`torch.FloatTensor` of shape `(batch_size, num_choices)`) -- *num_choices* is the second dimension of the input tensors. (see *input_ids* above). | |
| Classification scores (before SoftMax). | |
| - **hidden_states** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`) -- Tuple of `torch.FloatTensor` (one for the output of the embeddings, if the model has an embedding layer, + | |
| one for the output of each layer) of shape `(batch_size, sequence_length, hidden_size)`. | |
| Hidden-states of the model at the output of each layer plus the optional initial embedding outputs. | |
| - **attentions** (`tuple(torch.FloatTensor)`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`) -- Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length, | |
| sequence_length)`. | |
| Attentions weights after the attention softmax, used to compute the weighted average in the self-attention | |
| heads. | |
| Example: | |
| ```python | |
| >>> from transformers import AutoTokenizer, DebertaV2ForMultipleChoice | |
| >>> import torch | |
| >>> tokenizer = AutoTokenizer.from_pretrained("microsoft/deberta-v2-xlarge") | |
| >>> model = DebertaV2ForMultipleChoice.from_pretrained("microsoft/deberta-v2-xlarge") | |
| >>> prompt = "In Italy, pizza served in formal settings, such as at a restaurant, is presented unsliced." | |
| >>> choice0 = "It is eaten with a fork and a knife." | |
| >>> choice1 = "It is eaten while held in the hand." | |
| >>> labels = torch.tensor(0).unsqueeze(0) # choice0 is correct (according to Wikipedia ;)), batch size 1 | |
| >>> encoding = tokenizer([prompt, prompt], [choice0, choice1], return_tensors="pt", padding=True) | |
| >>> outputs = model(**{k: v.unsqueeze(0) for k, v in encoding.items()}, labels=labels) # batch size is 1 | |
| >>> # the linear classifier still needs to be trained | |
| >>> loss = outputs.loss | |
| >>> logits = outputs.logits | |
| ``` | |
| **Parameters:** | |
| config ([DebertaV2ForMultipleChoice](/docs/transformers/main/ko/model_doc/deberta-v2#transformers.DebertaV2ForMultipleChoice)) : Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the [from_pretrained()](/docs/transformers/main/ko/main_classes/model#transformers.PreTrainedModel.from_pretrained) method to load the model weights. | |
| **Returns:** | |
| `[MultipleChoiceModelOutput](/docs/transformers/main/ko/main_classes/output#transformers.modeling_outputs.MultipleChoiceModelOutput) or `tuple(torch.FloatTensor)`` | |
| A [MultipleChoiceModelOutput](/docs/transformers/main/ko/main_classes/output#transformers.modeling_outputs.MultipleChoiceModelOutput) or a tuple of | |
| `torch.FloatTensor` (if `return_dict=False` is passed or when `config.return_dict=False`) comprising various | |
| elements depending on the configuration ([DebertaV2Config](/docs/transformers/main/ko/model_doc/deberta-v2#transformers.DebertaV2Config)) and inputs. | |
Xet Storage Details
- Size:
- 70.1 kB
- Xet hash:
- d182fb01aa0862525bff19841b075c8b1779c0697646e52d65ed845afbae7c3e
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.