lsmpp's picture
Add files using upload-large-folder tool
4cef5ec verified

ELECTRA[[electra]]

PyTorch TensorFlow Flax

๊ฐœ์š”[[overview]]

ELECTRA ๋ชจ๋ธ์€ ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ELECTRA๋Š” ๋‘๊ฐ€์ง€ ํŠธ๋žœ์Šคํฌ๋จธ ๋ชจ๋ธ์ธ ์ƒ์„ฑ ๋ชจ๋ธ๊ณผ ํŒ๋ณ„ ๋ชจ๋ธ์„ ํ•™์Šต์‹œํ‚ค๋Š” ์ƒˆ๋กœ์šด ์‚ฌ์ „ํ•™์Šต ์ ‘๊ทผ๋ฒ•์ž…๋‹ˆ๋‹ค. ์ƒ์„ฑ ๋ชจ๋ธ์˜ ์—ญํ• ์€ ์‹œํ€€์Šค์— ์žˆ๋Š” ํ† ํฐ์„ ๋Œ€์ฒดํ•˜๋Š” ๊ฒƒ์ด๋ฉฐ ๋งˆ์Šคํ‚น๋œ ์–ธ์–ด ๋ชจ๋ธ๋กœ ํ•™์Šต๋ฉ๋‹ˆ๋‹ค. ์šฐ๋ฆฌ๊ฐ€ ๊ด€์‹ฌ์„ ๊ฐ€์ง„ ํŒ๋ณ„ ๋ชจ๋ธ์€ ์‹œํ€€์Šค์—์„œ ์–ด๋–ค ํ† ํฐ์ด ์ƒ์„ฑ ๋ชจ๋ธ์— ์˜ํ•ด ๋Œ€์ฒด๋˜์—ˆ๋Š”์ง€ ์‹๋ณ„ํ•ฉ๋‹ˆ๋‹ค.

๋…ผ๋ฌธ์˜ ์ดˆ๋ก์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค:

BERT์™€ ๊ฐ™์€ ๋งˆ์Šคํ‚น๋œ ์–ธ์–ด ๋ชจ๋ธ(MLM) ์‚ฌ์ „ํ•™์Šต ๋ฐฉ๋ฒ•์€ ์ผ๋ถ€ ํ† ํฐ์„ [MASK] ํ† ํฐ์œผ๋กœ ๋ฐ”๊ฟ” ์†์ƒ์‹œํ‚ค๊ณ  ๋‚œ ๋’ค, ๋ชจ๋ธ์ด ๋‹ค์‹œ ์›๋ณธ ํ† ํฐ์„ ๋ณต์›ํ•˜๋„๋ก ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฐ ๋ฐฉ์‹์€ ๋‹ค์šด์ŠคํŠธ๋ฆผ NLP ์ž‘์—…์„ ์ „์ดํ•  ๋•Œ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋‚ด์ง€๋งŒ, ํšจ๊ณผ์ ์œผ๋กœ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ ๋งŽ์€ ์–‘์˜ ์—ฐ์‚ฐ์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ๋Œ€์•ˆ์œผ๋กœ, ๋Œ€์ฒด ํ† ํฐ ํƒ์ง€๋ผ๊ณ  ๋ถˆ๋ฆฌ๋Š” ์ƒ˜ํ”Œ-ํšจ๊ณผ์ ์ธ ์‚ฌ์ „ํ•™์Šต์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ์šฐ๋ฆฌ์˜ ๋ฐฉ๋ฒ•๋ก ์€ ์ž…๋ ฅ์— ๋งˆ์Šคํ‚น์„ ํ•˜๋Š” ๋Œ€์‹ ์— ์†Œํ˜• ์ƒ์„ฑ ๋ชจ๋ธ์˜ ๊ทธ๋Ÿด๋“ฏํ•œ ๋Œ€์•ˆ ํ† ํฐ์œผ๋กœ ์†์ƒ์‹œํ‚ต๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๋‚˜์„œ, ๋ชจ๋ธ์ด ์†์ƒ๋œ ํ† ํฐ์˜ ์›๋ž˜ ํ† ํฐ์„ ์˜ˆ์ธกํ•˜๋„๋ก ํ›ˆ๋ จ์‹œํ‚ค๋Š” ๋Œ€์‹ , ํŒ๋ณ„ ๋ชจ๋ธ์„ ๊ฐ๊ฐ์˜ ํ† ํฐ์ด ์ƒ์„ฑ ๋ชจ๋ธ์˜ ์ƒ˜ํ”Œ๋กœ ์†์ƒ๋˜์—ˆ๋Š”์ง€ ์•„๋‹Œ์ง€ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค. ์‹คํ—˜๋“ค์€ ํ†ตํ•ด ์ด ์ƒˆ๋กœ์šด ์‚ฌ์ „ํ•™์Šต ๋ฐฉ์‹์€ ๋งˆ์Šคํ‚น๋œ ์ผ๋ถ€ ํ† ํฐ์—๋งŒ ์ ์šฉ๋˜๋Š” ๊ธฐ์กด ๋ฐฉ์‹๊ณผ ๋‹ฌ๋ฆฌ ๋ชจ๋“  ์ž…๋ ฅ ํ† ํฐ์— ๋Œ€ํ•ด ํ•™์Šต์ด ์ด๋ค„์ง€๊ธฐ ๋•Œ๋ฌธ์— ๋งˆ์Šคํ‚น๋œ ์–ธ์–ด ๋ชจ๋ธ(MLM)๋ณด๋‹ค ๋” ํšจ์œจ์ ์ž„์„ ์ž…์ฆํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๊ฒฐ๊ณผ์ ์œผ๋กœ ์†Œ๊ฐœ๋œ ๋ฐฉ์‹์ด ๊ฐ™์€ ๋ชจ๋ธ ํฌ๊ธฐ, ๋ฐ์ดํ„ฐ, ์—ฐ์‚ฐ๋Ÿ‰์„ ๊ฐ€์ง„ BERT๋ชจ๋ธ๋กœ ํ•™์Šตํ•œ ๊ฒฐ๊ณผ๋ฅผ ์••๋„ํ•˜๋Š” ๋ฌธ๋งฅ ํ‘œํ˜„ ํ•™์Šต์„ ํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์„ ํ™•์ธํ–ˆ์Šต๋‹ˆ๋‹ค. ํŠนํžˆ ์ž‘์€ ๋ชจ๋ธ์—์„œ ์„ฑ๋Šฅ ํ–ฅ์ƒ์ด ๋‘๋“œ๋Ÿฌ์ง€๋ฉฐ, ์˜ˆ๋ฅผ ๋“ค์–ด GPU ํ•œ ๋Œ€๋กœ 4์ผ๊ฐ„ ํ•™์Šตํ•œ ๋ชจ๋ธ์ด 30๋ฐฐ ๋” ๋งŽ์€ ๊ณ„์‚ฐ ์ž์›์„ ์‚ฌ์šฉํ•œ GPT๋ณด๋‹ค GLUE ์ž์—ฐ์–ด ์ดํ•ด ๋ฒค์น˜๋งˆํฌ์—์„œ ๋” ๋‚˜์€ ์„ฑ๋Šฅ์„ ๋ณด์ž…๋‹ˆ๋‹ค. ๋Œ€๊ทœ๋ชจ ํ™˜๊ฒฝ์—์„œ๋„ ์œ ํšจํ•˜๋ฉฐ ๋” ์ ์€ ์—ฐ์‚ฐ๋Ÿ‰์œผ๋กœ RoBERTa์™€ XLNet๊ณผ ๋น„์Šทํ•œ ์„ฑ๋Šฅ์„ ๋‚ผ ์ˆ˜ ์žˆ์œผ๋ฉฐ, ๋™์ผํ•œ ์—ฐ์‚ฐ๋Ÿ‰์„ ๊ฐ€์งˆ ๊ฒฝ์šฐ ์ด๋“ค์˜ ์„ฑ๋Šฅ์„ ๋Šฅ๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.

์ด ๋ชจ๋ธ์€ lysandre์ด ๊ธฐ์—ฌํ–ˆ์Šต๋‹ˆ๋‹ค. ์›๋ณธ ์ฝ”๋“œ๋Š” ์ด๊ณณ์—์„œ ์ฐพ์•„๋ณด์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์‚ฌ์šฉ ํŒ[[usage-tips]]

  • ELECTRA๋Š” ์‚ฌ์ „ํ•™์Šต ๋ฐฉ๋ฒ•์œผ๋กœ ๊ธฐ๋ณธ ๋ชจ๋ธ์ธ BERT์˜ ๊ตฌ์กฐ์™€ ๊ฑฐ์˜ ์ฐจ์ด๊ฐ€ ์—†์Šต๋‹ˆ๋‹ค. ์œ ์ผํ•œ ์ฐจ์ด๋Š” ์ž„๋ฒ ๋”ฉ ํฌ๊ธฐ์™€ ํžˆ๋“  ํฌ๊ธฐ๋ฅผ ๊ตฌ๋ถ„ํ–ˆ๋‹ค๋Š” ์ ์ž…๋‹ˆ๋‹ค. ์ž„๋ฒ ๋”ฉ ํฌ๊ธฐ๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ ๋” ์ž‘๊ณ , ํžˆ๋“  ํฌ๊ธฐ๋Š” ๋” ํฝ๋‹ˆ๋‹ค. ์ž„๋ฒ ๋”ฉ์—์„œ ์ž„๋ฒ ๋”ฉ ํฌ๊ธฐ๋ฅผ ํžˆ๋“  ํฌ๊ธฐ๋กœ ๋ณ€ํ™˜ํ•˜๊ธฐ ์œ„ํ•ด ์ถ”๊ฐ€๋กœ ์„ ํ˜• ๋ณ€ํ™˜ ์ธต์ด ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. ์ž„๋ฒ ๋”ฉ ํฌ๊ธฐ์™€ ํžˆ๋“  ํฌ๊ธฐ๊ฐ€ ๋™์ผํ•  ๊ฒฝ์šฐ์—๋Š” ์ด ์„ ํ˜• ๋ณ€ํ™˜ ์ธต์ด ํ•„์š”ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
  • ELECTRA๋Š” ๋˜ ๋‹ค๋ฅธ (์ž‘์€) ๋งˆ์Šคํ‚น๋œ ์–ธ์–ด ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•ด ์‚ฌ์ „ํ•™์Šต ๋œ ํŠธ๋žœ์Šคํฌ๋จธ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. ์ž‘์€ ์–ธ์–ด ๋ชจ๋ธ์ด ์ž…๋ ฅ ํ…์ŠคํŠธ์˜ ์ผ๋ถ€๋ฅผ ๋ฌด์ž‘์œ„๋กœ ๋งˆ์Šคํ‚นํ•˜๊ณ , ๊ทธ ์ž๋ฆฌ์— ์ƒˆ๋กœ์šด ํ† ํฐ์„ ์‚ฝ์ž…ํ•ฉ๋‹ˆ๋‹ค. ELECTRA๋Š” ์›๋ž˜ ํ† ํฐ๊ณผ ๋Œ€์ฒด๋œ ํ† ํฐ์„ ๊ตฌ๋ถ„ํ•˜๋Š” ์—ญํ• ์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. GAN ํ›ˆ๋ จ๊ณผ ๋น„์Šทํ•˜์ง€๋งŒ, ์ƒ์„ฑ ๋ชจ๋ธ์€ ELECTRA ๋ชจ๋ธ์„ ์†์ด๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ ์›๋ž˜ ํ…์ŠคํŠธ๋ฅผ ๋ณต์›ํ•˜๋Š” ๋ชฉํ‘œ๋กœ ๋ช‡ ๋‹จ๊ณ„ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค. ๊ทธ ํ›„ ELECTRA๊ฐ€ ํ•™์Šต์„ ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.
  • ๊ตฌ๊ธ€ ๋ฆฌ์„œ์น˜์˜ ๊ตฌํ˜„์œผ๋กœ ์ €์žฅ๋œ ELECTRA checkpoints๋Š” ์ƒ์„ฑ ๋ชจ๋ธ๊ณผ ํŒ๋ณ„ ๋ชจ๋ธ์„ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค. ๋ณ€ํ™˜ ์Šคํฌ๋ฆฝํŠธ์—์„œ๋Š” ์‚ฌ์šฉ์ž๊ฐ€ ์–ด๋–ค ๋ชจ๋ธ์„ ์–ด๋–ค ์•„ํ‚คํ…์ฒ˜๋กœ ๋‚ด๋ณด๋‚ผ์ง€ ๋ช…์‹œํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ผ๋‹จ Hugging Face ํฌ๋งท์œผ๋กœ ๋ณ€ํ™˜๋˜๋ฉด, ์ด ์ฒดํฌํฌ์ธํŠธ๋“ค์€ ๋ชจ๋“  ELECTRA ๋ชจ๋ธ์—์„œ ๋ถˆ๋Ÿฌ์˜ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ฆ‰, ํŒ๋ณ„ ๋ชจ๋ธ์€ [ElectraForMaskedLM] ๋ชจ๋ธ์—, ์ƒ์„ฑ ๋ชจ๋ธ์€ [ElectraForPreTraining]๋ชจ๋ธ์— ๋ถˆ๋Ÿฌ์˜ฌ ์ˆ˜ ์žˆ๋‹ค๋Š” ์˜๋ฏธ์ž…๋‹ˆ๋‹ค. (๋‹จ, ์ƒ์„ฑ ๋ชจ๋ธ์—๋Š” ๋ถ„๋ฅ˜ ํ—ค๋“œ๊ฐ€ ์กด์žฌํ•˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์—, ํ•ด๋‹น ๋ถ€๋ถ„์€ ๋ฌด์ž‘์œ„๋กœ ์ดˆ๊ธฐํ™”๋ฉ๋‹ˆ๋‹ค.)

์ฐธ๊ณ  ์ž๋ฃŒ[[resources]]

ElectraConfig

[[autodoc]] ElectraConfig

ElectraTokenizer

[[autodoc]] ElectraTokenizer

ElectraTokenizerFast

[[autodoc]] ElectraTokenizerFast

Electra specific outputs

[[autodoc]] models.electra.modeling_electra.ElectraForPreTrainingOutput

[[autodoc]] models.electra.modeling_tf_electra.TFElectraForPreTrainingOutput

ElectraModel

[[autodoc]] ElectraModel - forward

ElectraForPreTraining

[[autodoc]] ElectraForPreTraining - forward

ElectraForCausalLM

[[autodoc]] ElectraForCausalLM - forward

ElectraForMaskedLM

[[autodoc]] ElectraForMaskedLM - forward

ElectraForSequenceClassification

[[autodoc]] ElectraForSequenceClassification - forward

ElectraForMultipleChoice

[[autodoc]] ElectraForMultipleChoice - forward

ElectraForTokenClassification

[[autodoc]] ElectraForTokenClassification - forward

ElectraForQuestionAnswering

[[autodoc]] ElectraForQuestionAnswering - forward

TFElectraModel

[[autodoc]] TFElectraModel - call

TFElectraForPreTraining

[[autodoc]] TFElectraForPreTraining - call

TFElectraForMaskedLM

[[autodoc]] TFElectraForMaskedLM - call

TFElectraForSequenceClassification

[[autodoc]] TFElectraForSequenceClassification - call

TFElectraForMultipleChoice

[[autodoc]] TFElectraForMultipleChoice - call

TFElectraForTokenClassification

[[autodoc]] TFElectraForTokenClassification - call

TFElectraForQuestionAnswering

[[autodoc]] TFElectraForQuestionAnswering - call

FlaxElectraModel

[[autodoc]] FlaxElectraModel - call

FlaxElectraForPreTraining

[[autodoc]] FlaxElectraForPreTraining - call

FlaxElectraForCausalLM

[[autodoc]] FlaxElectraForCausalLM - call

FlaxElectraForMaskedLM

[[autodoc]] FlaxElectraForMaskedLM - call

FlaxElectraForSequenceClassification

[[autodoc]] FlaxElectraForSequenceClassification - call

FlaxElectraForMultipleChoice

[[autodoc]] FlaxElectraForMultipleChoice - call

FlaxElectraForTokenClassification

[[autodoc]] FlaxElectraForTokenClassification - call

FlaxElectraForQuestionAnswering

[[autodoc]] FlaxElectraForQuestionAnswering - call