| <!--Copyright 2020 The HuggingFace Team. All rights reserved. | |
| Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | |
| the License. You may obtain a copy of the License at | |
| http://www.apache.org/licenses/LICENSE-2.0 | |
| Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | |
| an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | |
| specific language governing permissions and limitations under the License. | |
| โ ๏ธ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be | |
| rendered properly in your Markdown viewer. | |
| --> | |
| # DeBERTa[[deberta]] | |
| ## ๊ฐ์[[overview]] | |
| DeBERTa ๋ชจ๋ธ์ Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen์ด ์์ฑํ [DeBERTa: ๋ถ๋ฆฌ๋ ์ดํ ์ ์ ํ์ฉํ ๋์ฝ๋ฉ ๊ฐํ BERT](https://huggingface.co/papers/2006.03654)์ด๋ผ๋ ๋ ผ๋ฌธ์์ ์ ์๋์์ต๋๋ค. ์ด ๋ชจ๋ธ์ 2018๋ Google์ด ๋ฐํํ BERT ๋ชจ๋ธ๊ณผ 2019๋ Facebook์ด ๋ฐํํ RoBERTa ๋ชจ๋ธ์ ๊ธฐ๋ฐ์ผ๋ก ํฉ๋๋ค. | |
| DeBERTa๋ RoBERTa์์ ์ฌ์ฉ๋ ๋ฐ์ดํฐ์ ์ ๋ฐ๋ง์ ์ฌ์ฉํ์ฌ ๋ถ๋ฆฌ๋(disentangled) ์ดํ ์ ๊ณผ ํฅ์๋ ๋ง์คํฌ ๋์ฝ๋ ํ์ต์ ํตํด RoBERTa๋ฅผ ๊ฐ์ ํ์ต๋๋ค. | |
| ๋ ผ๋ฌธ์ ์ด๋ก์ ๋ค์๊ณผ ๊ฐ์ต๋๋ค: | |
| *์ฌ์ ํ์ต๋ ์ ๊ฒฝ๋ง ์ธ์ด ๋ชจ๋ธ์ ์ต๊ทผ ๋ฐ์ ์ ๋ง์ ์์ฐ์ด ์ฒ๋ฆฌ(NLP) ์์ ์ ์ฑ๋ฅ์ ํฌ๊ฒ ํฅ์์์ผฐ์ต๋๋ค. ๋ณธ ๋ ผ๋ฌธ์์๋ ๋ ๊ฐ์ง ์๋ก์ด ๊ธฐ์ ์ ์ฌ์ฉํ์ฌ BERT์ RoBERTa ๋ชจ๋ธ์ ๊ฐ์ ํ ์๋ก์ด ๋ชจ๋ธ ๊ตฌ์กฐ์ธ DeBERTa๋ฅผ ์ ์ํฉ๋๋ค. ์ฒซ ๋ฒ์งธ๋ ๋ถ๋ฆฌ๋ ์ดํ ์ ๋ฉ์ปค๋์ฆ์ผ๋ก, ๊ฐ ๋จ์ด๊ฐ ๋ด์ฉ๊ณผ ์์น๋ฅผ ๊ฐ๊ฐ ์ธ์ฝ๋ฉํ๋ ๋ ๊ฐ์ ๋ฒกํฐ๋ก ํํ๋๋ฉฐ, ๋จ์ด๋ค ๊ฐ์ ์ดํ ์ ๊ฐ์ค์น๋ ๋ด์ฉ๊ณผ ์๋์ ์์น์ ๋ํ ๋ถ๋ฆฌ๋ ํ๋ ฌ์ ์ฌ์ฉํ์ฌ ๊ณ์ฐ๋ฉ๋๋ค. ๋ ๋ฒ์งธ๋ก, ๋ชจ๋ธ ์ฌ์ ํ์ต์ ์ํด ๋ง์คํน๋ ํ ํฐ์ ์์ธกํ๋ ์ถ๋ ฅ ์ํํธ๋งฅ์ค ์ธต์ ๋์ฒดํ๋ ํฅ์๋ ๋ง์คํฌ ๋์ฝ๋๊ฐ ์ฌ์ฉ๋ฉ๋๋ค. ์ฐ๋ฆฌ๋ ์ด ๋ ๊ฐ์ง ๊ธฐ์ ์ด ๋ชจ๋ธ ์ฌ์ ํ์ต์ ํจ์จ์ฑ๊ณผ ๋ค์ด์คํธ๋ฆผ ์์ ์ ์ฑ๋ฅ์ ํฌ๊ฒ ํฅ์์ํจ๋ค๋ ๊ฒ์ ๋ณด์ฌ์ค๋๋ค. RoBERTa-Large์ ๋น๊ตํ์ ๋, ์ ๋ฐ์ ํ์ต ๋ฐ์ดํฐ๋ก ํ์ต๋ DeBERTa ๋ชจ๋ธ์ ๊ด๋ฒ์ํ NLP ์์ ์์ ์ผ๊ด๋๊ฒ ๋ ๋์ ์ฑ๋ฅ์ ๋ณด์ฌ์ฃผ๋ฉฐ, MNLI์์ +0.9%(90.2% vs 91.1%), SQuAD v2.0์์ +2.3%(88.4% vs 90.7%), RACE์์ +3.6%(83.2% vs 86.8%)์ ์ฑ๋ฅ ํฅ์์ ๋ฌ์ฑํ์ต๋๋ค. DeBERTa ์ฝ๋์ ์ฌ์ ํ์ต๋ ๋ชจ๋ธ์ https://github.com/microsoft/DeBERTa ์์ ๊ณต๊ฐ๋ ์์ ์ ๋๋ค.* | |
| [DeBERTa](https://huggingface.co/DeBERTa) ๋ชจ๋ธ์ ํ ์ํ๋ก 2.0 ๊ตฌํ์ [kamalkraj](https://huggingface.co/kamalkraj)๊ฐ ๊ธฐ์ฌํ์ต๋๋ค. ์๋ณธ ์ฝ๋๋ [์ด๊ณณ](https://github.com/microsoft/DeBERTa)์์ ํ์ธํ์ค ์ ์์ต๋๋ค. | |
| ## ๋ฆฌ์์ค[[resources]] | |
| DeBERTa๋ฅผ ์์ํ๋ ๋ฐ ๋์์ด ๋๋ Hugging Face์ community ์๋ฃ ๋ชฉ๋ก(๐๋ก ํ์๋จ) ์ ๋๋ค. ์ฌ๊ธฐ์ ํฌํจ๋ ์๋ฃ๋ฅผ ์ ์ถํ๊ณ ์ถ์ผ์๋ค๋ฉด PR(Pull Request)๋ฅผ ์ด์ด์ฃผ์ธ์. ๋ฆฌ๋ทฐํด ๋๋ฆฌ๊ฒ ์ต๋๋ค! ์๋ฃ๋ ๊ธฐ์กด ์๋ฃ๋ฅผ ๋ณต์ ํ๋ ๋์ ์๋ก์ด ๋ด์ฉ์ ๋ด๊ณ ์์ด์ผ ํฉ๋๋ค. | |
| <PipelineTag pipeline="text-classification"/> | |
| - DeBERTa์ [DeepSpeed๋ฅผ ์ด์ฉํด์ ๋ํ ๋ชจ๋ธ ํ์ต์ ๊ฐ์์ํค๋](https://huggingface.co/blog/accelerate-deepspeed) ๋ฐฉ๋ฒ์ ๋ํ ํฌ์คํธ. | |
| - DeBERTa์ [๋จธ์ ๋ฌ๋์ผ๋ก ํ์ธต ํฅ์๋ ๊ณ ๊ฐ ์๋น์ค](https://huggingface.co/blog/supercharge-customer-service-with-machine-learning)์ ๋ํ ๋ธ๋ก๊ทธ ํฌ์คํธ. | |
| - [`DebertaForSequenceClassification`]๋ ์ด [์์ ์คํฌ๋ฆฝํธ](https://github.com/huggingface/transformers/tree/main/examples/pytorch/text-classification)์ [๋ ธํธ๋ถ](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/text_classification.ipynb)์์ ์ง์๋ฉ๋๋ค. | |
| - [`TFDebertaForSequenceClassification`]๋ ์ด [์์ ์คํฌ๋ฆฝํธ](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/text-classification)์ [๋ ธํธ๋ถ](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/text_classification-tf.ipynb)์์ ์ง์๋ฉ๋๋ค. | |
| - [ํ ์คํธ ๋ถ๋ฅ ์์ ๊ฐ์ด๋](../tasks/sequence_classification) | |
| <PipelineTag pipeline="token-classification" /> | |
| - [`DebertaForTokenClassification`]๋ ์ด [์์ ์คํฌ๋ฆฝํธ](https://github.com/huggingface/transformers/tree/main/examples/pytorch/token-classification)์ [๋ ธํธ๋ถ](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/token_classification.ipynb)์์ ์ง์ํฉ๋๋ค. | |
| - [`TFDebertaForTokenClassification`]๋ ์ด [์์ ์คํฌ๋ฆฝํธ](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/token-classification)์ [๋ ธํธ๋ถ](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/token_classification-tf.ipynb)์์ ์ง์ํฉ๋๋ค. | |
| - ๐ค Hugging Face ์ฝ์ค์ [ํ ํฐ ๋ถ๋ฅ](https://huggingface.co/course/chapter7/2?fw=pt) ์ฅ. | |
| - ๐ค Hugging Face ์ฝ์ค์ [BPE(Byte-Pair Encoding) ํ ํฐํ](https://huggingface.co/course/chapter6/5?fw=pt) ์ฅ. | |
| - [ํ ํฐ ๋ถ๋ฅ ์์ ๊ฐ์ด๋](../tasks/token_classification) | |
| <PipelineTag pipeline="fill-mask"/> | |
| - [`DebertaForMaskedLM`]๋ ์ด [์์ ์คํฌ๋ฆฝํธ](https://github.com/huggingface/transformers/tree/main/examples/pytorch/language-modeling#robertabertdistilbert-and-masked-language-modeling)์ [๋ ธํธ๋ถ](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/language_modeling.ipynb)์์ ์ง์ํฉ๋๋ค. | |
| - [`TFDebertaForMaskedLM`]์ ์ด [์์ ์คํฌ๋ฆฝํธ](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/language-modeling#run_mlmpy)์ [๋ ธํธ๋ถ](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/language_modeling-tf.ipynb)์์ ์ง์ํฉ๋๋ค. | |
| - ๐ค Hugging Face ์ฝ์ค์ [๋ง์คํฌ ์ธ์ด ๋ชจ๋ธ๋ง](https://huggingface.co/course/chapter7/3?fw=pt) ์ฅ. | |
| - [๋ง์คํฌ ์ธ์ด ๋ชจ๋ธ๋ง ์์ ๊ฐ์ด๋](../tasks/masked_language_modeling) | |
| <PipelineTag pipeline="question-answering"/> | |
| - [`DebertaForQuestionAnswering`]์ ์ด [์์ ์คํฌ๋ฆฝํธ](https://github.com/huggingface/transformers/tree/main/examples/pytorch/question-answering)์ [๋ ธํธ๋ถ](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/question_answering.ipynb)์์ ์ง์ํฉ๋๋ค. | |
| - [`TFDebertaForQuestionAnswering`]๋ ์ด [์์ ์คํฌ๋ฆฝํธ](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/question-answering)์ [๋ ธํธ๋ถ](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/question_answering-tf.ipynb)์์ ์ง์ํฉ๋๋ค. | |
| - ๐ค Hugging Face ์ฝ์ค์ [์ง์์๋ต(Question answering)](https://huggingface.co/course/chapter7/7?fw=pt) ์ฅ. | |
| - [์ง์์๋ต ์์ ๊ฐ์ด๋](../tasks/question_answering) | |
| ## DebertaConfig[[transformers.DebertaConfig]] | |
| [[autodoc]] DebertaConfig | |
| ## DebertaTokenizer[[transformers.DebertaTokenizer]] | |
| [[autodoc]] DebertaTokenizer | |
| - get_special_tokens_mask | |
| - save_vocabulary | |
| ## DebertaTokenizerFast[[transformers.DebertaTokenizerFast]] | |
| [[autodoc]] DebertaTokenizerFast | |
| ## DebertaModel[[transformers.DebertaModel]] | |
| [[autodoc]] DebertaModel | |
| - forward | |
| ## DebertaPreTrainedModel[[transformers.DebertaPreTrainedModel]] | |
| [[autodoc]] DebertaPreTrainedModel | |
| ## DebertaForMaskedLM[[transformers.DebertaForMaskedLM]] | |
| [[autodoc]] DebertaForMaskedLM | |
| - forward | |
| ## DebertaForSequenceClassification[[transformers.DebertaForSequenceClassification]] | |
| [[autodoc]] DebertaForSequenceClassification | |
| - forward | |
| ## DebertaForTokenClassification[[transformers.DebertaForTokenClassification]] | |
| [[autodoc]] DebertaForTokenClassification | |
| - forward | |
| ## DebertaForQuestionAnswering[[transformers.DebertaForQuestionAnswering]] | |
| [[autodoc]] DebertaForQuestionAnswering | |
| - forward | |