| <!--Copyright 2020 The HuggingFace Team. All rights reserved. | |
| Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | |
| the License. You may obtain a copy of the License at | |
| http://www.apache.org/licenses/LICENSE-2.0 | |
| Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | |
| an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | |
| specific language governing permissions and limitations under the License. | |
| β οΈ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be | |
| rendered properly in your Markdown viewer. | |
| --> | |
| # DeBERTa-v2 | |
| ## κ°μ | |
| DeBERTa λͺ¨λΈμ Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chenμ΄ μμ±ν [DeBERTa: λΆλ¦¬λ μ΄ν μ μ νμ©ν λμ½λ© κ°ν BERT](https://huggingface.co/papers/2006.03654)μ΄λΌλ λ Όλ¬Έμμ μ μλμμ΅λλ€. μ΄ λͺ¨λΈμ 2018λ Googleμ΄ λ°νν BERT λͺ¨λΈκ³Ό 2019λ Facebookμ΄ λ°νν RoBERTa λͺ¨λΈμ κΈ°λ°μΌλ‘ ν©λλ€. | |
| DeBERTaλ RoBERTaμμ μ¬μ©λ λ°μ΄ν°μ μ λ°λ§μ μ¬μ©νμ¬ λΆλ¦¬λ(disentangled) μ΄ν μ κ³Ό ν₯μλ λ§μ€ν¬ λμ½λ νμ΅μ ν΅ν΄ RoBERTaλ₯Ό κ°μ νμ΅λλ€. | |
| λ Όλ¬Έμ μ΄λ‘μ λ€μκ³Ό κ°μ΅λλ€: | |
| *μ¬μ νμ΅λ μ κ²½λ§ μΈμ΄ λͺ¨λΈμ μ΅κ·Ό λ°μ μ λ§μ μμ°μ΄ μ²λ¦¬(NLP) μμ μ μ±λ₯μ ν¬κ² ν₯μμμΌ°μ΅λλ€. λ³Έ λ Όλ¬Έμμλ λ κ°μ§ μλ‘μ΄ κΈ°μ μ μ¬μ©νμ¬ BERTμ RoBERTa λͺ¨λΈμ κ°μ ν μλ‘μ΄ λͺ¨λΈ κ΅¬μ‘°μΈ DeBERTaλ₯Ό μ μν©λλ€. 첫 λ²μ§Έλ λΆλ¦¬λ μ΄ν μ λ©μ»€λμ¦μΌλ‘, κ° λ¨μ΄κ° λ΄μ©κ³Ό μμΉλ₯Ό κ°κ° μΈμ½λ©νλ λ κ°μ 벑ν°λ‘ ννλλ©°, λ¨μ΄λ€ κ°μ μ΄ν μ κ°μ€μΉλ λ΄μ©κ³Ό μλμ μμΉμ λν λΆλ¦¬λ νλ ¬μ μ¬μ©νμ¬ κ³μ°λ©λλ€. λ λ²μ§Έλ‘, λͺ¨λΈ μ¬μ νμ΅μ μν΄ λ§μ€νΉλ ν ν°μ μμΈ‘νλ μΆλ ₯ μννΈλ§₯μ€ μΈ΅μ λ체νλ ν₯μλ λ§μ€ν¬ λμ½λκ° μ¬μ©λ©λλ€. μ°λ¦¬λ μ΄ λ κ°μ§ κΈ°μ μ΄ λͺ¨λΈ μ¬μ νμ΅μ ν¨μ¨μ±κ³Ό λ€μ΄μ€νΈλ¦Ό μμ μ μ±λ₯μ ν¬κ² ν₯μμν¨λ€λ κ²μ 보μ¬μ€λλ€. RoBERTa-Largeμ λΉκ΅νμ λ, μ λ°μ νμ΅ λ°μ΄ν°λ‘ νμ΅λ DeBERTa λͺ¨λΈμ κ΄λ²μν NLP μμ μμ μΌκ΄λκ² λ λμ μ±λ₯μ 보μ¬μ£Όλ©°, MNLIμμ +0.9%(90.2% vs 91.1%), SQuAD v2.0μμ +2.3%(88.4% vs 90.7%), RACEμμ +3.6%(83.2% vs 86.8%)μ μ±λ₯ ν₯μμ λ¬μ±νμ΅λλ€. DeBERTa μ½λμ μ¬μ νμ΅λ λͺ¨λΈμ https://github.com/microsoft/DeBERTa μμ 곡κ°λ μμ μ λλ€.* | |
| λ€μ μ 보λ€μ [μλ³Έ ꡬν μ μ₯μ](https://github.com/microsoft/DeBERTa)μμ λ³΄μ€ μ μμ΅λλ€. DeBERTa v2λ DeBERTaμ λλ²μ§Έ λͺ¨λΈμ λλ€. | |
| DeBERTa v2λ SuperGLUE λ¨μΌ λͺ¨λΈ μ μΆμ μ¬μ©λ 1.5B λͺ¨λΈμ ν¬ν¨νλ©°, μΈκ° κΈ°μ€μ (λ² μ΄μ€λΌμΈ) 89.8μ λλΉ 89.9μ μ λ¬μ±νμ΅λλ€. μ μμ | |
| [λΈλ‘κ·Έ](https://www.microsoft.com/en-us/research/blog/microsoft-deberta-surpasses-human-performance-on-the-superglue-benchmark/)μμ λ μμΈν μ 보λ₯Ό νμΈν μ μμ΅λλ€. | |
| v2μ μλ‘μ΄ μ : | |
| - **μ΄ν(Vocabulary)** v2μμλ νμ΅ λ°μ΄ν°λ‘λΆν° ꡬμΆλ 128K ν¬κΈ°μ μλ‘μ΄ μ΄νλ₯Ό μ¬μ©νλλ‘ ν ν¬λμ΄μ κ° λ³κ²½λμμ΅λλ€. GPT2 κΈ°λ° ν ν¬λμ΄μ λμ , μ΄μ λ [μΌν μ€νΌμ€ κΈ°λ°](https://github.com/google/sentencepiece) ν ν¬λμ΄μ λ₯Ό μ¬μ©ν©λλ€. | |
| - **nGiE[nκ·Έλ¨ μ λ(Induced) μ λ ₯ μΈμ½λ©]** DeBERTa-v2 λͺ¨λΈμ μ λ ₯ ν ν°λ€μ μ§μμ μμ‘΄μ±μ λ μ νμ΅νκΈ° μν΄ μ²« λ²μ§Έ νΈλμ€ν¬λ¨Έ μΈ΅κ³Ό ν¨κ» μΆκ°μ μΈ ν©μ±κ³± μΈ΅μ μ¬μ©ν©λλ€. | |
| - **μ΄ν μ μΈ΅μμ μμΉ ν¬μ νλ ¬κ³Ό λ΄μ© ν¬μ νλ ¬ 곡μ ** μ΄μ μ€νλ€μ κΈ°λ°μΌλ‘, μ΄λ μ±λ₯μ μν₯μ μ£Όμ§ μμΌλ©΄μ λ§€κ°λ³μλ₯Ό μ μ½ν μ μμ΅λλ€. | |
| - **μλμ μμΉλ₯Ό μΈμ½λ©νκΈ° μν λ²ν· μ μ©** DeBERTa-v2 λͺ¨λΈμ T5μ μ μ¬νκ² μλμ μμΉλ₯Ό μΈμ½λ©νκΈ° μν΄ λ‘κ·Έ λ²ν·μ μ¬μ©ν©λλ€. | |
| - **900M λͺ¨λΈ & 1.5B λͺ¨λΈ** 900Mκ³Ό 1.5B, λ κ°μ§ μΆκ° λͺ¨λΈ ν¬κΈ°κ° μ 곡λλ©°, μ΄λ λ€μ΄μ€νΈλ¦Ό μμ μ μ±λ₯μ ν¬κ² ν₯μμν΅λλ€. | |
| [DeBERTa](https://huggingface.co/DeBERTa) λͺ¨λΈμ ν μνλ‘ 2.0 ꡬνμ [kamalkraj](https://huggingface.co/kamalkraj)κ° κΈ°μ¬νμ΅λλ€. μλ³Έ μ½λλ [μ΄κ³³](https://github.com/microsoft/DeBERTa)μμ νμΈνμ€ μ μμ΅λλ€. | |
| ## μλ£ | |
| - [ν μ€νΈ λΆλ₯ μμ κ°μ΄λ](../tasks/sequence_classification) | |
| - [ν ν° λΆλ₯ μμ κ°μ΄λ](../tasks/token_classification) | |
| - [μ§μμλ΅ μμ κ°μ΄λ](../tasks/question_answering) | |
| - [λ§μ€ν¬ μΈμ΄ λͺ¨λΈλ§ μμ κ°μ΄λ](../tasks/masked_language_modeling) | |
| - [λ€μ€ μ ν μμ κ°μ΄λ](../tasks/multiple_choice) | |
| ## DebertaV2Config | |
| [[autodoc]] DebertaV2Config | |
| ## DebertaV2Tokenizer | |
| [[autodoc]] DebertaV2Tokenizer | |
| - build_inputs_with_special_tokens | |
| - get_special_tokens_mask | |
| - create_token_type_ids_from_sequences | |
| - save_vocabulary | |
| ## DebertaV2TokenizerFast | |
| [[autodoc]] DebertaV2TokenizerFast | |
| - build_inputs_with_special_tokens | |
| - create_token_type_ids_from_sequences | |
| <frameworkcontent> | |
| <pt> | |
| ## DebertaV2Model | |
| [[autodoc]] DebertaV2Model | |
| - forward | |
| ## DebertaV2PreTrainedModel | |
| [[autodoc]] DebertaV2PreTrainedModel | |
| - forward | |
| ## DebertaV2ForMaskedLM | |
| [[autodoc]] DebertaV2ForMaskedLM | |
| - forward | |
| ## DebertaV2ForSequenceClassification | |
| [[autodoc]] DebertaV2ForSequenceClassification | |
| - forward | |
| ## DebertaV2ForTokenClassification | |
| [[autodoc]] DebertaV2ForTokenClassification | |
| - forward | |
| ## DebertaV2ForQuestionAnswering | |
| [[autodoc]] DebertaV2ForQuestionAnswering | |
| - forward | |
| ## DebertaV2ForMultipleChoice | |
| [[autodoc]] DebertaV2ForMultipleChoice | |
| - forward | |
| </pt> | |
| <tf> | |
| ## TFDebertaV2Model | |
| [[autodoc]] TFDebertaV2Model | |
| - call | |
| ## TFDebertaV2PreTrainedModel | |
| [[autodoc]] TFDebertaV2PreTrainedModel | |
| - call | |
| ## TFDebertaV2ForMaskedLM | |
| [[autodoc]] TFDebertaV2ForMaskedLM | |
| - call | |
| ## TFDebertaV2ForSequenceClassification | |
| [[autodoc]] TFDebertaV2ForSequenceClassification | |
| - call | |
| ## TFDebertaV2ForTokenClassification | |
| [[autodoc]] TFDebertaV2ForTokenClassification | |
| - call | |
| ## TFDebertaV2ForQuestionAnswering | |
| [[autodoc]] TFDebertaV2ForQuestionAnswering | |
| - call | |
| ## TFDebertaV2ForMultipleChoice | |
| [[autodoc]] TFDebertaV2ForMultipleChoice | |
| - call | |
| </tf> | |
| </frameworkcontent> | |