File size: 6,605 Bytes
17c6d62 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 |
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
β οΈ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# DeBERTa-v2
## κ°μ
DeBERTa λͺ¨λΈμ Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chenμ΄ μμ±ν [DeBERTa: λΆλ¦¬λ μ΄ν
μ
μ νμ©ν λμ½λ© κ°ν BERT](https://arxiv.org/abs/2006.03654)μ΄λΌλ λ
Όλ¬Έμμ μ μλμμ΅λλ€. μ΄ λͺ¨λΈμ 2018λ
Googleμ΄ λ°νν BERT λͺ¨λΈκ³Ό 2019λ
Facebookμ΄ λ°νν RoBERTa λͺ¨λΈμ κΈ°λ°μΌλ‘ ν©λλ€.
DeBERTaλ RoBERTaμμ μ¬μ©λ λ°μ΄ν°μ μ λ°λ§μ μ¬μ©νμ¬ λΆλ¦¬λ(disentangled) μ΄ν
μ
κ³Ό ν₯μλ λ§μ€ν¬ λμ½λ νμ΅μ ν΅ν΄ RoBERTaλ₯Ό κ°μ νμ΅λλ€.
λ
Όλ¬Έμ μ΄λ‘μ λ€μκ³Ό κ°μ΅λλ€:
*μ¬μ νμ΅λ μ κ²½λ§ μΈμ΄ λͺ¨λΈμ μ΅κ·Ό λ°μ μ λ§μ μμ°μ΄ μ²λ¦¬(NLP) μμ
μ μ±λ₯μ ν¬κ² ν₯μμμΌ°μ΅λλ€. λ³Έ λ
Όλ¬Έμμλ λ κ°μ§ μλ‘μ΄ κΈ°μ μ μ¬μ©νμ¬ BERTμ RoBERTa λͺ¨λΈμ κ°μ ν μλ‘μ΄ λͺ¨λΈ κ΅¬μ‘°μΈ DeBERTaλ₯Ό μ μν©λλ€. 첫 λ²μ§Έλ λΆλ¦¬λ μ΄ν
μ
λ©μ»€λμ¦μΌλ‘, κ° λ¨μ΄κ° λ΄μ©κ³Ό μμΉλ₯Ό κ°κ° μΈμ½λ©νλ λ κ°μ 벑ν°λ‘ ννλλ©°, λ¨μ΄λ€ κ°μ μ΄ν
μ
κ°μ€μΉλ λ΄μ©κ³Ό μλμ μμΉμ λν λΆλ¦¬λ νλ ¬μ μ¬μ©νμ¬ κ³μ°λ©λλ€. λ λ²μ§Έλ‘, λͺ¨λΈ μ¬μ νμ΅μ μν΄ λ§μ€νΉλ ν ν°μ μμΈ‘νλ μΆλ ₯ μννΈλ§₯μ€ μΈ΅μ λ체νλ ν₯μλ λ§μ€ν¬ λμ½λκ° μ¬μ©λ©λλ€. μ°λ¦¬λ μ΄ λ κ°μ§ κΈ°μ μ΄ λͺ¨λΈ μ¬μ νμ΅μ ν¨μ¨μ±κ³Ό λ€μ΄μ€νΈλ¦Ό μμ
μ μ±λ₯μ ν¬κ² ν₯μμν¨λ€λ κ²μ 보μ¬μ€λλ€. RoBERTa-Largeμ λΉκ΅νμ λ, μ λ°μ νμ΅ λ°μ΄ν°λ‘ νμ΅λ DeBERTa λͺ¨λΈμ κ΄λ²μν NLP μμ
μμ μΌκ΄λκ² λ λμ μ±λ₯μ 보μ¬μ£Όλ©°, MNLIμμ +0.9%(90.2% vs 91.1%), SQuAD v2.0μμ +2.3%(88.4% vs 90.7%), RACEμμ +3.6%(83.2% vs 86.8%)μ μ±λ₯ ν₯μμ λ¬μ±νμ΅λλ€. DeBERTa μ½λμ μ¬μ νμ΅λ λͺ¨λΈμ https://github.com/microsoft/DeBERTa μμ 곡κ°λ μμ μ
λλ€.*
λ€μ μ 보λ€μ [μλ³Έ ꡬν μ μ₯μ](https://github.com/microsoft/DeBERTa)μμ λ³΄μ€ μ μμ΅λλ€. DeBERTa v2λ DeBERTaμ λλ²μ§Έ λͺ¨λΈμ
λλ€.
DeBERTa v2λ SuperGLUE λ¨μΌ λͺ¨λΈ μ μΆμ μ¬μ©λ 1.5B λͺ¨λΈμ ν¬ν¨νλ©°, μΈκ° κΈ°μ€μ (λ² μ΄μ€λΌμΈ) 89.8μ λλΉ 89.9μ μ λ¬μ±νμ΅λλ€. μ μμ
[λΈλ‘κ·Έ](https://www.microsoft.com/en-us/research/blog/microsoft-deberta-surpasses-human-performance-on-the-superglue-benchmark/)μμ λ μμΈν μ 보λ₯Ό νμΈν μ μμ΅λλ€.
v2μ μλ‘μ΄ μ :
- **μ΄ν(Vocabulary)** v2μμλ νμ΅ λ°μ΄ν°λ‘λΆν° ꡬμΆλ 128K ν¬κΈ°μ μλ‘μ΄ μ΄νλ₯Ό μ¬μ©νλλ‘ ν ν¬λμ΄μ κ° λ³κ²½λμμ΅λλ€. GPT2 κΈ°λ° ν ν¬λμ΄μ λμ , μ΄μ λ [μΌν
μ€νΌμ€ κΈ°λ°](https://github.com/google/sentencepiece) ν ν¬λμ΄μ λ₯Ό μ¬μ©ν©λλ€.
- **nGiE[nκ·Έλ¨ μ λ(Induced) μ
λ ₯ μΈμ½λ©]** DeBERTa-v2 λͺ¨λΈμ μ
λ ₯ ν ν°λ€μ μ§μμ μμ‘΄μ±μ λ μ νμ΅νκΈ° μν΄ μ²« λ²μ§Έ νΈλμ€ν¬λ¨Έ μΈ΅κ³Ό ν¨κ» μΆκ°μ μΈ ν©μ±κ³± μΈ΅μ μ¬μ©ν©λλ€.
- **μ΄ν
μ
μΈ΅μμ μμΉ ν¬μ νλ ¬κ³Ό λ΄μ© ν¬μ νλ ¬ 곡μ ** μ΄μ μ€νλ€μ κΈ°λ°μΌλ‘, μ΄λ μ±λ₯μ μν₯μ μ£Όμ§ μμΌλ©΄μ λ§€κ°λ³μλ₯Ό μ μ½ν μ μμ΅λλ€.
- **μλμ μμΉλ₯Ό μΈμ½λ©νκΈ° μν λ²ν· μ μ©** DeBERTa-v2 λͺ¨λΈμ T5μ μ μ¬νκ² μλμ μμΉλ₯Ό μΈμ½λ©νκΈ° μν΄ λ‘κ·Έ λ²ν·μ μ¬μ©ν©λλ€.
- **900M λͺ¨λΈ & 1.5B λͺ¨λΈ** 900Mκ³Ό 1.5B, λ κ°μ§ μΆκ° λͺ¨λΈ ν¬κΈ°κ° μ 곡λλ©°, μ΄λ λ€μ΄μ€νΈλ¦Ό μμ
μ μ±λ₯μ ν¬κ² ν₯μμν΅λλ€.
[DeBERTa](https://huggingface.co/DeBERTa) λͺ¨λΈμ ν
μνλ‘ 2.0 ꡬνμ [kamalkraj](https://huggingface.co/kamalkraj)κ° κΈ°μ¬νμ΅λλ€. μλ³Έ μ½λλ [μ΄κ³³](https://github.com/microsoft/DeBERTa)μμ νμΈνμ€ μ μμ΅λλ€.
## μλ£
- [ν
μ€νΈ λΆλ₯ μμ
κ°μ΄λ](../tasks/sequence_classification)
- [ν ν° λΆλ₯ μμ
κ°μ΄λ](../tasks/token_classification)
- [μ§μμλ΅ μμ
κ°μ΄λ](../tasks/question_answering)
- [λ§μ€ν¬ μΈμ΄ λͺ¨λΈλ§ μμ
κ°μ΄λ](../tasks/masked_language_modeling)
- [λ€μ€ μ ν μμ
κ°μ΄λ](../tasks/multiple_choice)
## DebertaV2Config
[[autodoc]] DebertaV2Config
## DebertaV2Tokenizer
[[autodoc]] DebertaV2Tokenizer
- build_inputs_with_special_tokens
- get_special_tokens_mask
- create_token_type_ids_from_sequences
- save_vocabulary
## DebertaV2TokenizerFast
[[autodoc]] DebertaV2TokenizerFast
- build_inputs_with_special_tokens
- create_token_type_ids_from_sequences
<frameworkcontent>
<pt>
## DebertaV2Model
[[autodoc]] DebertaV2Model
- forward
## DebertaV2PreTrainedModel
[[autodoc]] DebertaV2PreTrainedModel
- forward
## DebertaV2ForMaskedLM
[[autodoc]] DebertaV2ForMaskedLM
- forward
## DebertaV2ForSequenceClassification
[[autodoc]] DebertaV2ForSequenceClassification
- forward
## DebertaV2ForTokenClassification
[[autodoc]] DebertaV2ForTokenClassification
- forward
## DebertaV2ForQuestionAnswering
[[autodoc]] DebertaV2ForQuestionAnswering
- forward
## DebertaV2ForMultipleChoice
[[autodoc]] DebertaV2ForMultipleChoice
- forward
</pt>
<tf>
## TFDebertaV2Model
[[autodoc]] TFDebertaV2Model
- call
## TFDebertaV2PreTrainedModel
[[autodoc]] TFDebertaV2PreTrainedModel
- call
## TFDebertaV2ForMaskedLM
[[autodoc]] TFDebertaV2ForMaskedLM
- call
## TFDebertaV2ForSequenceClassification
[[autodoc]] TFDebertaV2ForSequenceClassification
- call
## TFDebertaV2ForTokenClassification
[[autodoc]] TFDebertaV2ForTokenClassification
- call
## TFDebertaV2ForQuestionAnswering
[[autodoc]] TFDebertaV2ForQuestionAnswering
- call
## TFDebertaV2ForMultipleChoice
[[autodoc]] TFDebertaV2ForMultipleChoice
- call
</tf>
</frameworkcontent>
|