ghrua's picture
Initial commit with Dockerfile
8b821fa

RoBERTa[[roberta]]

PyTorch TensorFlow Flax SDPA

๊ฐœ์š”[[overview]]

RoBERTa ๋ชจ๋ธ์€ Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov๊ฐ€ ์ œ์•ˆํ•œ ๋…ผ๋ฌธ RoBERTa: A Robustly Optimized BERT Pretraining Approach์—์„œ ์†Œ๊ฐœ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ด ๋ชจ๋ธ์€ 2018๋…„์— ๊ตฌ๊ธ€์—์„œ ๋ฐœํ‘œํ•œ BERT ๋ชจ๋ธ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•ฉ๋‹ˆ๋‹ค.

RoBERTa๋Š” BERT๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•˜๋ฉฐ, ์ฃผ์š” ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ˆ˜์ •ํ•˜๊ณ , ์‚ฌ์ „ ํ•™์Šต ๋‹จ๊ณ„์—์„œ ๋‹ค์Œ ๋ฌธ์žฅ ์˜ˆ์ธก(Next Sentence Prediction)์„ ์ œ๊ฑฐํ–ˆ์œผ๋ฉฐ, ํ›จ์”ฌ ๋” ํฐ ๋ฏธ๋‹ˆ ๋ฐฐ์น˜ ํฌ๊ธฐ์™€ ํ•™์Šต๋ฅ ์„ ์‚ฌ์šฉํ•˜์—ฌ ํ•™์Šต์„ ์ง„ํ–‰ํ–ˆ์Šต๋‹ˆ๋‹ค.

ํ•ด๋‹น ๋…ผ๋ฌธ์˜ ์ดˆ๋ก์ž…๋‹ˆ๋‹ค:

์–ธ์–ด ๋ชจ๋ธ ์‚ฌ์ „ ํ•™์Šต์€ ์„ฑ๋Šฅ์„ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œ์ผฐ์ง€๋งŒ, ์„œ๋กœ ๋‹ค๋ฅธ ์ ‘๊ทผ ๋ฐฉ์‹์„ ๋ฉด๋ฐ€ํžˆ ๋น„๊ตํ•˜๋Š” ๊ฒƒ์€ ์–ด๋ ต์Šต๋‹ˆ๋‹ค. ํ•™์Šต์€ ๊ณ„์‚ฐ ๋น„์šฉ์ด ๋งŽ์ด ๋“ค๊ณ , ์ข…์ข… ํฌ๊ธฐ๊ฐ€ ์„œ๋กœ ๋‹ค๋ฅธ ๋น„๊ณต๊ฐœ ๋ฐ์ดํ„ฐ์…‹์—์„œ ์ˆ˜ํ–‰๋˜๋ฉฐ, ๋ณธ ๋…ผ๋ฌธ์—์„œ ๋ณด์—ฌ์ฃผ๋“ฏ์ด ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ์„ ํƒ์ด ์ตœ์ข… ์„ฑ๋Šฅ์— ํฐ ์˜ํ–ฅ์„ ๋ฏธ์นฉ๋‹ˆ๋‹ค. ์šฐ๋ฆฌ๋Š” BERT ์‚ฌ์ „ ํ•™์Šต(Devlin et al., 2019)์— ๋Œ€ํ•œ ์žฌํ˜„ ์—ฐ๊ตฌ๋ฅผ ์ˆ˜ํ–‰ํ•˜์—ฌ, ์—ฌ๋Ÿฌ ํ•ต์‹ฌ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ์™€ ํ•™์Šต ๋ฐ์ดํ„ฐ ํฌ๊ธฐ์˜ ์˜ํ–ฅ์„ ๋ฉด๋ฐ€ํžˆ ์ธก์ •ํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๊ทธ ๊ฒฐ๊ณผ, BERT๋Š” ์ถฉ๋ถ„ํžˆ ํ•™์Šต๋˜์ง€ ์•Š์•˜์œผ๋ฉฐ, ์ดํ›„ ๋ฐœํ‘œ๋œ ๋ชจ๋“  ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ๋งž์ถ”๊ฑฐ๋‚˜ ๋Šฅ๊ฐ€ํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ฐœ๊ฒฌํ–ˆ์Šต๋‹ˆ๋‹ค. ์šฐ๋ฆฌ๊ฐ€ ์ œ์•ˆํ•œ ์ตœ์ƒ์˜ ๋ชจ๋ธ์€ GLUE, RACE, SQuAD์—์„œ ์ตœ๊ณ  ์„ฑ๋Šฅ(state-of-the-art)์„ ๋‹ฌ์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด ๊ฒฐ๊ณผ๋Š” ์ง€๊ธˆ๊นŒ์ง€ ๊ฐ„๊ณผ๋˜์–ด ์˜จ ์„ค๊ณ„ ์„ ํƒ์˜ ์ค‘์š”์„ฑ์„ ๊ฐ•์กฐํ•˜๋ฉฐ, ์ตœ๊ทผ ๋ณด๊ณ ๋œ ์„ฑ๋Šฅ ํ–ฅ์ƒ์˜ ๊ทผ์›์ด ๋ฌด์—‡์ธ์ง€์— ๋Œ€ํ•œ ์˜๋ฌธ์„ ์ œ๊ธฐํ•ฉ๋‹ˆ๋‹ค. ์šฐ๋ฆฌ๋Š” ๋ณธ ์—ฐ๊ตฌ์—์„œ ์‚ฌ์šฉํ•œ ๋ชจ๋ธ๊ณผ ์ฝ”๋“œ๋ฅผ ๊ณต๊ฐœํ•ฉ๋‹ˆ๋‹ค.

์ด ๋ชจ๋ธ์€ julien-c๊ฐ€ ๊ธฐ์—ฌํ•˜์˜€์Šต๋‹ˆ๋‹ค. ์›๋ณธ ์ฝ”๋“œ๋Š” ์—ฌ๊ธฐ์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์‚ฌ์šฉ ํŒ[[usage-tips]]

  • ์ด ๊ตฌํ˜„์€ [BertModel]๊ณผ ๋™์ผํ•˜์ง€๋งŒ, ์ž„๋ฒ ๋”ฉ ๋ถ€๋ถ„์— ์•ฝ๊ฐ„์˜ ์ˆ˜์ •์ด ์žˆ์œผ๋ฉฐ RoBERTa ์‚ฌ์ „ํ•™์Šต ๋ชจ๋ธ์— ๋งž๊ฒŒ ์„ค์ •๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

  • RoBERTa๋Š” BERT์™€ ๋™์ผํ•œ ์•„ํ‚คํ…์ฒ˜๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์ง€๋งŒ, ํ† ํฌ๋‚˜์ด์ €๋กœ ๋ฐ”์ดํŠธ ์ˆ˜์ค€ BPE(Byte-Pair Encoding, GPT-2์™€ ๋™์ผ)๋ฅผ ์‚ฌ์šฉํ•˜๊ณ , ์‚ฌ์ „ํ•™์Šต ๋ฐฉ์‹์ด ๋‹ค๋ฆ…๋‹ˆ๋‹ค.

  • RoBERTa๋Š” token_type_ids๋ฅผ ์‚ฌ์šฉํ•˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์—, ์–ด๋–ค ํ† ํฐ์ด ์–ด๋–ค ๋ฌธ์žฅ(segment)์— ์†ํ•˜๋Š”์ง€ ๋ณ„๋„๋กœ ํ‘œ์‹œํ•  ํ•„์š”๊ฐ€ ์—†์Šต๋‹ˆ๋‹ค. ๋ฌธ์žฅ ๊ตฌ๋ถ„์€ ๋ถ„๋ฆฌ ํ† ํฐ tokenizer.sep_token(๋˜๋Š” </s>)์„ ์‚ฌ์šฉํ•ด ๋‚˜๋ˆ„๋ฉด ๋ฉ๋‹ˆ๋‹ค.

  • RoBERTa๋Š” BERT์™€ ์œ ์‚ฌํ•˜์ง€๋งŒ, ๋” ๋‚˜์€ ์‚ฌ์ „ํ•™์Šต ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค:

    • ๋™์  ๋งˆ์Šคํ‚น: RoBERTa๋Š” ๋งค ์—ํญ๋งˆ๋‹ค ํ† ํฐ์„ ๋‹ค๋ฅด๊ฒŒ ๋งˆ์Šคํ‚นํ•˜๋Š” ๋ฐ˜๋ฉด, BERT๋Š” ํ•œ ๋ฒˆ๋งŒ ๋งˆ์Šคํ‚นํ•ฉ๋‹ˆ๋‹ค.
    • ๋ฌธ์žฅ ํŒจํ‚น: ์—ฌ๋Ÿฌ ๋ฌธ์žฅ์„ ์ตœ๋Œ€ 512 ํ† ํฐ๊นŒ์ง€ ํ•จ๊ป˜ ํŒจํ‚นํ•˜์—ฌ, ๋ฌธ์žฅ์ด ์—ฌ๋Ÿฌ ๋ฌธ์„œ์— ๊ฑธ์ณ ์žˆ์„ ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค.
    • ๋” ํฐ ๋ฐฐ์น˜ ์‚ฌ์ด์ฆˆ: ํ•™์Šต ์‹œ ๋” ํฐ ๋ฏธ๋‹ˆ๋ฐฐ์น˜๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
    • ๋ฐ”์ดํŠธ ์ˆ˜์ค€ BPE ์–ดํœ˜: ๋ฌธ์ž๋ฅผ ๋‹จ์œ„๋กœ ํ•˜์ง€ ์•Š๊ณ  ๋ฐ”์ดํŠธ ๋‹จ์œ„๋กœ BPE๋ฅผ ์ ์šฉํ•˜์—ฌ ์œ ๋‹ˆ์ฝ”๋“œ ๋ฌธ์ž๋ฅผ ๋” ์œ ์—ฐํ•˜๊ฒŒ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • CamemBERT์€ RoBERTa๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ ๋ž˜ํผ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. ์‚ฌ์šฉ ์˜ˆ์ œ๋Š” ํ•ด๋‹น ๋ชจ๋ธ ํŽ˜์ด์ง€๋ฅผ ์ฐธ๊ณ ํ•˜์„ธ์š”.

์ž๋ฃŒ[[resources]]

RoBERTa๋ฅผ ์ฒ˜์Œ ๋‹ค๋ฃฐ ๋•Œ ๋„์›€์ด ๋˜๋Š” Hugging Face ๊ณต์‹ ์ž๋ฃŒ์™€ ์ปค๋ฎค๋‹ˆํ‹ฐ ์ž๋ฃŒ(๐ŸŒŽ ์•„์ด์ฝ˜์œผ๋กœ ํ‘œ์‹œ๋จ) ๋ชฉ๋ก์ž…๋‹ˆ๋‹ค. ์ด ๋ชฉ๋ก์— ์ž๋ฃŒ๋ฅผ ์ถ”๊ฐ€ํ•˜๊ณ  ์‹ถ๋‹ค๋ฉด ์–ธ์ œ๋“ ์ง€ Pull Request๋ฅผ ๋ณด๋‚ด์ฃผ์„ธ์š”! ์ €ํฌ๊ฐ€ ๊ฒ€ํ†  ํ›„ ๋ฐ˜์˜ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ์ถ”๊ฐ€ํ•˜๋ ค๋Š” ์ž๋ฃŒ๋Š” ๊ธฐ์กด ์ž๋ฃŒ๋ฅผ ๋‹จ์ˆœํžˆ ๋ณต์ œํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹Œ, ์ƒˆ๋กญ๊ฑฐ๋‚˜ ์œ ์˜๋ฏธํ•œ ๋‚ด์šฉ์„ ํฌํ•จํ•˜๊ณ  ์žˆ๋Š” ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค.

๋‹ค์ค‘ ์„ ํƒ

RobertaConfig

[[autodoc]] RobertaConfig

RobertaTokenizer

[[autodoc]] RobertaTokenizer - build_inputs_with_special_tokens - get_special_tokens_mask - create_token_type_ids_from_sequences - save_vocabulary

RobertaTokenizerFast

[[autodoc]] RobertaTokenizerFast - build_inputs_with_special_tokens

RobertaModel

[[autodoc]] RobertaModel - forward

RobertaForCausalLM

[[autodoc]] RobertaForCausalLM - forward

RobertaForMaskedLM

[[autodoc]] RobertaForMaskedLM - forward

RobertaForSequenceClassification

[[autodoc]] RobertaForSequenceClassification - forward

RobertaForMultipleChoice

[[autodoc]] RobertaForMultipleChoice - forward

RobertaForTokenClassification

[[autodoc]] RobertaForTokenClassification - forward

RobertaForQuestionAnswering

[[autodoc]] RobertaForQuestionAnswering - forward

TFRobertaModel

[[autodoc]] TFRobertaModel - call

TFRobertaForCausalLM

[[autodoc]] TFRobertaForCausalLM - call

TFRobertaForMaskedLM

[[autodoc]] TFRobertaForMaskedLM - call

TFRobertaForSequenceClassification

[[autodoc]] TFRobertaForSequenceClassification - call

TFRobertaForMultipleChoice

[[autodoc]] TFRobertaForMultipleChoice - call

TFRobertaForTokenClassification

[[autodoc]] TFRobertaForTokenClassification - call

TFRobertaForQuestionAnswering

[[autodoc]] TFRobertaForQuestionAnswering - call

FlaxRobertaModel

[[autodoc]] FlaxRobertaModel - call

FlaxRobertaForCausalLM

[[autodoc]] FlaxRobertaForCausalLM - call

FlaxRobertaForMaskedLM

[[autodoc]] FlaxRobertaForMaskedLM - call

FlaxRobertaForSequenceClassification

[[autodoc]] FlaxRobertaForSequenceClassification - call

FlaxRobertaForMultipleChoice

[[autodoc]] FlaxRobertaForMultipleChoice - call

FlaxRobertaForTokenClassification

[[autodoc]] FlaxRobertaForTokenClassification - call

FlaxRobertaForQuestionAnswering

[[autodoc]] FlaxRobertaForQuestionAnswering - call