| <!--Copyright 2020 The HuggingFace Team. All rights reserved. | |
| Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | |
| the License. You may obtain a copy of the License at | |
| http://www.apache.org/licenses/LICENSE-2.0 | |
| Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | |
| an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | |
| specific language governing permissions and limitations under the License. | |
| โ ๏ธ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be | |
| rendered properly in your Markdown viewer. | |
| --> | |
| # RoBERTa[[roberta]] | |
| <div class="flex flex-wrap space-x-1"> | |
| <img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white"> | |
| <img alt="TensorFlow" src="https://img.shields.io/badge/TensorFlow-FF6F00?style=flat&logo=tensorflow&logoColor=white"> | |
| <img alt="Flax" src="https://img.shields.io/badge/Flax-29a79b.svg?style=flat&logo= | |
| "> | |
| <img alt="SDPA" src="https://img.shields.io/badge/SDPA-DE3412?style=flat&logo=pytorch&logoColor=white"> | |
| </div> | |
| ## ๊ฐ์[[overview]] | |
| RoBERTa ๋ชจ๋ธ์ Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov๊ฐ ์ ์ํ ๋ ผ๋ฌธ [RoBERTa: A Robustly Optimized BERT Pretraining Approach](https://huggingface.co/papers/1907.11692)์์ ์๊ฐ๋์์ต๋๋ค. ์ด ๋ชจ๋ธ์ 2018๋ ์ ๊ตฌ๊ธ์์ ๋ฐํํ BERT ๋ชจ๋ธ์ ๊ธฐ๋ฐ์ผ๋ก ํฉ๋๋ค. | |
| RoBERTa๋ BERT๋ฅผ ๊ธฐ๋ฐ์ผ๋ก ํ๋ฉฐ, ์ฃผ์ ํ์ดํผํ๋ผ๋ฏธํฐ๋ฅผ ์์ ํ๊ณ , ์ฌ์ ํ์ต ๋จ๊ณ์์ ๋ค์ ๋ฌธ์ฅ ์์ธก(Next Sentence Prediction)์ ์ ๊ฑฐํ์ผ๋ฉฐ, ํจ์ฌ ๋ ํฐ ๋ฏธ๋ ๋ฐฐ์น ํฌ๊ธฐ์ ํ์ต๋ฅ ์ ์ฌ์ฉํ์ฌ ํ์ต์ ์งํํ์ต๋๋ค. | |
| ํด๋น ๋ ผ๋ฌธ์ ์ด๋ก์ ๋๋ค: | |
| *์ธ์ด ๋ชจ๋ธ ์ฌ์ ํ์ต์ ์ฑ๋ฅ์ ํฌ๊ฒ ํฅ์์์ผฐ์ง๋ง, ์๋ก ๋ค๋ฅธ ์ ๊ทผ ๋ฐฉ์์ ๋ฉด๋ฐํ ๋น๊ตํ๋ ๊ฒ์ ์ด๋ ต์ต๋๋ค. ํ์ต์ ๊ณ์ฐ ๋น์ฉ์ด ๋ง์ด ๋ค๊ณ , ์ข ์ข ํฌ๊ธฐ๊ฐ ์๋ก ๋ค๋ฅธ ๋น๊ณต๊ฐ ๋ฐ์ดํฐ์ ์์ ์ํ๋๋ฉฐ, ๋ณธ ๋ ผ๋ฌธ์์ ๋ณด์ฌ์ฃผ๋ฏ์ด ํ์ดํผํ๋ผ๋ฏธํฐ ์ ํ์ด ์ต์ข ์ฑ๋ฅ์ ํฐ ์ํฅ์ ๋ฏธ์นฉ๋๋ค. ์ฐ๋ฆฌ๋ BERT ์ฌ์ ํ์ต(Devlin et al., 2019)์ ๋ํ ์ฌํ ์ฐ๊ตฌ๋ฅผ ์ํํ์ฌ, ์ฌ๋ฌ ํต์ฌ ํ์ดํผํ๋ผ๋ฏธํฐ์ ํ์ต ๋ฐ์ดํฐ ํฌ๊ธฐ์ ์ํฅ์ ๋ฉด๋ฐํ ์ธก์ ํ์์ต๋๋ค. ๊ทธ ๊ฒฐ๊ณผ, BERT๋ ์ถฉ๋ถํ ํ์ต๋์ง ์์์ผ๋ฉฐ, ์ดํ ๋ฐํ๋ ๋ชจ๋ ๋ชจ๋ธ์ ์ฑ๋ฅ์ ๋ง์ถ๊ฑฐ๋ ๋ฅ๊ฐํ ์ ์์์ ๋ฐ๊ฒฌํ์ต๋๋ค. ์ฐ๋ฆฌ๊ฐ ์ ์ํ ์ต์์ ๋ชจ๋ธ์ GLUE, RACE, SQuAD์์ ์ต๊ณ ์ฑ๋ฅ(state-of-the-art)์ ๋ฌ์ฑํ์ต๋๋ค. ์ด ๊ฒฐ๊ณผ๋ ์ง๊ธ๊น์ง ๊ฐ๊ณผ๋์ด ์จ ์ค๊ณ ์ ํ์ ์ค์์ฑ์ ๊ฐ์กฐํ๋ฉฐ, ์ต๊ทผ ๋ณด๊ณ ๋ ์ฑ๋ฅ ํฅ์์ ๊ทผ์์ด ๋ฌด์์ธ์ง์ ๋ํ ์๋ฌธ์ ์ ๊ธฐํฉ๋๋ค. ์ฐ๋ฆฌ๋ ๋ณธ ์ฐ๊ตฌ์์ ์ฌ์ฉํ ๋ชจ๋ธ๊ณผ ์ฝ๋๋ฅผ ๊ณต๊ฐํฉ๋๋ค.* | |
| ์ด ๋ชจ๋ธ์ [julien-c](https://huggingface.co/julien-c)๊ฐ ๊ธฐ์ฌํ์์ต๋๋ค. ์๋ณธ ์ฝ๋๋ [์ฌ๊ธฐ](https://github.com/pytorch/fairseq/tree/master/examples/roberta)์์ ํ์ธํ ์ ์์ต๋๋ค. | |
| ## ์ฌ์ฉ ํ[[usage-tips]] | |
| - ์ด ๊ตฌํ์ [`BertModel`]๊ณผ ๋์ผํ์ง๋ง, ์๋ฒ ๋ฉ ๋ถ๋ถ์ ์ฝ๊ฐ์ ์์ ์ด ์์ผ๋ฉฐ RoBERTa ์ฌ์ ํ์ต ๋ชจ๋ธ์ ๋ง๊ฒ ์ค์ ๋์ด ์์ต๋๋ค. | |
| - RoBERTa๋ BERT์ ๋์ผํ ์ํคํ ์ฒ๋ฅผ ๊ฐ์ง๊ณ ์์ง๋ง, ํ ํฌ๋์ด์ ๋ก ๋ฐ์ดํธ ์์ค BPE(Byte-Pair Encoding, GPT-2์ ๋์ผ)๋ฅผ ์ฌ์ฉํ๊ณ , ์ฌ์ ํ์ต ๋ฐฉ์์ด ๋ค๋ฆ ๋๋ค. | |
| - RoBERTa๋ `token_type_ids`๋ฅผ ์ฌ์ฉํ์ง ์๊ธฐ ๋๋ฌธ์, ์ด๋ค ํ ํฐ์ด ์ด๋ค ๋ฌธ์ฅ(segment)์ ์ํ๋์ง ๋ณ๋๋ก ํ์ํ ํ์๊ฐ ์์ต๋๋ค. ๋ฌธ์ฅ ๊ตฌ๋ถ์ ๋ถ๋ฆฌ ํ ํฐ `tokenizer.sep_token`(๋๋ `</s>`)์ ์ฌ์ฉํด ๋๋๋ฉด ๋ฉ๋๋ค. | |
| - RoBERTa๋ BERT์ ์ ์ฌํ์ง๋ง, ๋ ๋์ ์ฌ์ ํ์ต ๊ธฐ๋ฒ์ ์ฌ์ฉํฉ๋๋ค: | |
| * ๋์ ๋ง์คํน: RoBERTa๋ ๋งค ์ํญ๋ง๋ค ํ ํฐ์ ๋ค๋ฅด๊ฒ ๋ง์คํนํ๋ ๋ฐ๋ฉด, BERT๋ ํ ๋ฒ๋ง ๋ง์คํนํฉ๋๋ค. | |
| * ๋ฌธ์ฅ ํจํน: ์ฌ๋ฌ ๋ฌธ์ฅ์ ์ต๋ 512 ํ ํฐ๊น์ง ํจ๊ป ํจํนํ์ฌ, ๋ฌธ์ฅ์ด ์ฌ๋ฌ ๋ฌธ์์ ๊ฑธ์ณ ์์ ์๋ ์์ต๋๋ค. | |
| * ๋ ํฐ ๋ฐฐ์น ์ฌ์ด์ฆ: ํ์ต ์ ๋ ํฐ ๋ฏธ๋๋ฐฐ์น๋ฅผ ์ฌ์ฉํฉ๋๋ค. | |
| * ๋ฐ์ดํธ ์์ค BPE ์ดํ: ๋ฌธ์๋ฅผ ๋จ์๋ก ํ์ง ์๊ณ ๋ฐ์ดํธ ๋จ์๋ก BPE๋ฅผ ์ ์ฉํ์ฌ ์ ๋์ฝ๋ ๋ฌธ์๋ฅผ ๋ ์ ์ฐํ๊ฒ ์ฒ๋ฆฌํ ์ ์์ต๋๋ค. | |
| - [CamemBERT](camembert)์ RoBERTa๋ฅผ ๊ธฐ๋ฐ์ผ๋ก ํ ๋ํผ ๋ชจ๋ธ์ ๋๋ค. ์ฌ์ฉ ์์ ๋ ํด๋น ๋ชจ๋ธ ํ์ด์ง๋ฅผ ์ฐธ๊ณ ํ์ธ์. | |
| ## ์๋ฃ[[resources]] | |
| RoBERTa๋ฅผ ์ฒ์ ๋ค๋ฃฐ ๋ ๋์์ด ๋๋ Hugging Face ๊ณต์ ์๋ฃ์ ์ปค๋ฎค๋ํฐ ์๋ฃ(๐ ์์ด์ฝ์ผ๋ก ํ์๋จ) ๋ชฉ๋ก์ ๋๋ค. ์ด ๋ชฉ๋ก์ ์๋ฃ๋ฅผ ์ถ๊ฐํ๊ณ ์ถ๋ค๋ฉด ์ธ์ ๋ ์ง Pull Request๋ฅผ ๋ณด๋ด์ฃผ์ธ์! ์ ํฌ๊ฐ ๊ฒํ ํ ๋ฐ์ํ๊ฒ ์ต๋๋ค. ์ถ๊ฐํ๋ ค๋ ์๋ฃ๋ ๊ธฐ์กด ์๋ฃ๋ฅผ ๋จ์ํ ๋ณต์ ํ๋ ๊ฒ์ด ์๋, ์๋กญ๊ฑฐ๋ ์ ์๋ฏธํ ๋ด์ฉ์ ํฌํจํ๊ณ ์๋ ๊ฒ์ด ์ข์ต๋๋ค. | |
| <PipelineTag pipeline="text-classification"/> | |
| - RoBERTa์ [Inference API](https://huggingface.co/inference-api)๋ฅผ ํ์ฉํ [ํธ์ํฐ ๊ฐ์ฑ ๋ถ์ ์์ํ๊ธฐ](https://huggingface.co/blog/sentiment-analysis-twitter) ๋ธ๋ก๊ทธ ํฌ์คํธ. | |
| - RoBERTa๋ฅผ ํ์ฉํ [Kili ๋ฐ Hugging Face AutoTrain์ ์ด์ฉํ ์๊ฒฌ ๋ถ๋ฅ](https://huggingface.co/blog/opinion-classification-with-kili)์ ๊ดํ ๋ธ๋ก๊ทธ ํฌ์คํธ. | |
| - [๊ฐ์ฑ ๋ถ์์ ์ํ RoBERTa ๋ฏธ์ธ์กฐ์ ](https://colab.research.google.com/github/DhavalTaunk08/NLP_scripts/blob/master/sentiment_analysis_using_roberta.ipynb)์ ํ๋ ๋ฐฉ๋ฒ์ ๋ํ ๋ ธํธ๋ถ.๐ | |
| - ['RobertaForSequenceClassification']์ [์์ ์คํฌ๋ฆฝํธ](https://github.com/huggingface/transformers/tree/main/examples/pytorch/text-classification)์ [๋ ธํธ๋ถ](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/text_classification.ipynb)์์ ์ง์๋ฉ๋๋ค. | |
| - [`TFRobertaForSequenceClassification`]๋ [์์ ์คํฌ๋ฆฝํธ](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/text-classification)์ [๋ ธํธ๋ถ](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/text_classification-tf.ipynb)์์ ์ง์๋ฉ๋๋ค. | |
| - [`FlaxRobertaForSequenceClassification`]๋ [์์ ์คํฌ๋ฆฝํธ](https://github.com/huggingface/transformers/tree/main/examples/flax/text-classification)์ [๋ ธํธ๋ถ](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/text_classification_flax.ipynb)์์ ์ง์๋ฉ๋๋ค. | |
| - [ํ ์คํธ ๋ถ๋ฅ ์์ ๊ฐ์ด๋](../tasks/sequence_classification) | |
| <PipelineTag pipeline="token-classification"/> | |
| - [`RobertaForTokenClassification`]์ [์์ ์คํฌ๋ฆฝํธ](https://github.com/huggingface/transformers/tree/main/examples/pytorch/token-classification)์ [๋ ธํธ๋ถ](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/token_classification.ipynb)์์ ์ง์๋ฉ๋๋ค. | |
| - [`TFRobertaForTokenClassification`]์ [์์ ์คํฌ๋ฆฝํธ](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/token-classification)์ [๋ ธํธ๋ถ](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/token_classification-tf.ipynb)์์ ์ง์๋ฉ๋๋ค. | |
| - [`FlaxRobertaForTokenClassification`]๋ [์์ ์คํฌ๋ฆฝํธ](https://github.com/huggingface/transformers/tree/main/examples/flax/token-classification)์์ ์ง์๋ฉ๋๋ค. | |
| - ๐ค Hugging Face ์ฝ์ค์ [ํ ํฐ ๋ถ๋ฅ ์ฑํฐ](https://huggingface.co/course/chapter7/2?fw=pt) | |
| - [ํ ํฐ ๋ถ๋ฅ ์์ ๊ฐ์ด๋](../tasks/token_classification) | |
| <PipelineTag pipeline="fill-mask"/> | |
| - RoBERTa๋ฅผ ํ์ฉํ [Transformers์ Tokenizers๋ฅผ ํ์ฉํ ์๋ก์ด ์ธ์ด ๋ชจ๋ธ์ ์ฒ์๋ถํฐ ํ์ตํ๋ ๋ฐฉ๋ฒ](https://huggingface.co/blog/how-to-train)์ ๋ํ ๋ธ๋ก๊ทธ ํฌ์คํธ. | |
| - [`RobertaForMaskedLM`]์ [์์ ์คํฌ๋ฆฝํธ](https://github.com/huggingface/transformers/tree/main/examples/pytorch/language-modeling#robertabertdistilbert-and-masked-language-modeling)์ [๋ ธํธ๋ถ](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/language_modeling.ipynb)์์ ์ง์๋ฉ๋๋ค. | |
| - [`TFRobertaForMaskedLM`]์ [์์ ์คํฌ๋ฆฝํธ](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/language-modeling#run_mlmpy)์ [๋ ธํธ๋ถ](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/language_modeling-tf.ipynb)์์ ์ง์๋ฉ๋๋ค. | |
| - [`FlaxRobertaForMaskedLM`]์ [์์ ์คํฌ๋ฆฝํธ](https://github.com/huggingface/transformers/tree/main/examples/flax/language-modeling#masked-language-modeling)์ [๋ ธํธ๋ถ](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/masked_language_modeling_flax.ipynb)์์ ์ง์๋ฉ๋๋ค. | |
| - ๐ค Hugging Face ์ฝ์ค์ [๋ง์คํน ์ธ์ด ๋ชจ๋ธ๋ง ์ฑํฐ](https://huggingface.co/course/chapter7/3?fw=pt) | |
| - [๋ง์คํน ์ธ์ด ๋ชจ๋ธ๋ง ์์ ๊ฐ์ด๋](../tasks/masked_language_modeling) | |
| <PipelineTag pipeline="question-answering"/> | |
| - RoBERTa๋ฅผ ํ์ฉํ ์ง๋ฌธ ์๋ต ์์ ์์์ [Optimum๊ณผ Transformers ํ์ดํ๋ผ์ธ์ ์ด์ฉํ ์ถ๋ก ๊ฐ์ํ](https://huggingface.co/blog/optimum-inference)์ ๋ํ ๋ธ๋ก๊ทธ ํฌ์คํธ. | |
| - [`RobertaForQuestionAnswering`]์ [์์ ์คํฌ๋ฆฝํธ](https://github.com/huggingface/transformers/tree/main/examples/pytorch/question-answering)์ [๋ ธํธ๋ถ](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/question_answering.ipynb)์์ ์ง์๋ฉ๋๋ค. | |
| - [`TFRobertaForQuestionAnswering`]์ [์์ ์คํฌ๋ฆฝํธ](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/question-answering)์ [๋ ธํธ๋ถ](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/question_answering-tf.ipynb)์์ ์ง์๋ฉ๋๋ค. | |
| - [`FlaxRobertaForQuestionAnswering`]์ [์์ ์คํฌ๋ฆฝํธ](https://github.com/huggingface/transformers/tree/main/examples/flax/question-answering)์์ ์ง์๋ฉ๋๋ค. | |
| - ๐ค Hugging Face ์ฝ์ค์ [์ง์์๋ต ์ฑํฐ](https://huggingface.co/course/chapter7/7?fw=pt) | |
| - [์ง์์๋ต ์์ ๊ฐ์ด๋](../tasks/question_answering) | |
| **๋ค์ค ์ ํ** | |
| - [`RobertaForMultipleChoice`]๋ [์์ ์คํฌ๋ฆฝํธ](https://github.com/huggingface/transformers/tree/main/examples/pytorch/multiple-choice)์ [๋ ธํธ๋ถ](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/multiple_choice.ipynb)์์ ์ง์๋ฉ๋๋ค. | |
| - [`TFRobertaForMultipleChoice`]๋ [์์ ์คํฌ๋ฆฝํธ](https://github.com/huggingface/transformers/tree/main/examples/tensorflow/multiple-choice)์ [๋ ธํธ๋ถ](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/multiple_choice-tf.ipynb)์์ ์ง์๋ฉ๋๋ค. | |
| - [๋ค์ค ์ ํ ์์ ๊ฐ์ด๋](../tasks/multiple_choice) | |
| ## RobertaConfig | |
| [[autodoc]] RobertaConfig | |
| ## RobertaTokenizer | |
| [[autodoc]] RobertaTokenizer | |
| - build_inputs_with_special_tokens | |
| - get_special_tokens_mask | |
| - create_token_type_ids_from_sequences | |
| - save_vocabulary | |
| ## RobertaTokenizerFast | |
| [[autodoc]] RobertaTokenizerFast | |
| - build_inputs_with_special_tokens | |
| <frameworkcontent> | |
| <pt> | |
| ## RobertaModel | |
| [[autodoc]] RobertaModel | |
| - forward | |
| ## RobertaForCausalLM | |
| [[autodoc]] RobertaForCausalLM | |
| - forward | |
| ## RobertaForMaskedLM | |
| [[autodoc]] RobertaForMaskedLM | |
| - forward | |
| ## RobertaForSequenceClassification | |
| [[autodoc]] RobertaForSequenceClassification | |
| - forward | |
| ## RobertaForMultipleChoice | |
| [[autodoc]] RobertaForMultipleChoice | |
| - forward | |
| ## RobertaForTokenClassification | |
| [[autodoc]] RobertaForTokenClassification | |
| - forward | |
| ## RobertaForQuestionAnswering | |
| [[autodoc]] RobertaForQuestionAnswering | |
| - forward | |
| </pt> | |
| <tf> | |
| ## TFRobertaModel | |
| [[autodoc]] TFRobertaModel | |
| - call | |
| ## TFRobertaForCausalLM | |
| [[autodoc]] TFRobertaForCausalLM | |
| - call | |
| ## TFRobertaForMaskedLM | |
| [[autodoc]] TFRobertaForMaskedLM | |
| - call | |
| ## TFRobertaForSequenceClassification | |
| [[autodoc]] TFRobertaForSequenceClassification | |
| - call | |
| ## TFRobertaForMultipleChoice | |
| [[autodoc]] TFRobertaForMultipleChoice | |
| - call | |
| ## TFRobertaForTokenClassification | |
| [[autodoc]] TFRobertaForTokenClassification | |
| - call | |
| ## TFRobertaForQuestionAnswering | |
| [[autodoc]] TFRobertaForQuestionAnswering | |
| - call | |
| </tf> | |
| <jax> | |
| ## FlaxRobertaModel | |
| [[autodoc]] FlaxRobertaModel | |
| - __call__ | |
| ## FlaxRobertaForCausalLM | |
| [[autodoc]] FlaxRobertaForCausalLM | |
| - __call__ | |
| ## FlaxRobertaForMaskedLM | |
| [[autodoc]] FlaxRobertaForMaskedLM | |
| - __call__ | |
| ## FlaxRobertaForSequenceClassification | |
| [[autodoc]] FlaxRobertaForSequenceClassification | |
| - __call__ | |
| ## FlaxRobertaForMultipleChoice | |
| [[autodoc]] FlaxRobertaForMultipleChoice | |
| - __call__ | |
| ## FlaxRobertaForTokenClassification | |
| [[autodoc]] FlaxRobertaForTokenClassification | |
| - __call__ | |
| ## FlaxRobertaForQuestionAnswering | |
| [[autodoc]] FlaxRobertaForQuestionAnswering | |
| - __call__ | |
| </jax> | |
| </frameworkcontent> | |