lsmpp's picture
Add files using upload-large-folder tool
4cef5ec verified
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# ELECTRA[[electra]]
<div class="flex flex-wrap space-x-1">
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="TensorFlow" src="https://img.shields.io/badge/TensorFlow-FF6F00?style=flat&logo=tensorflow&logoColor=white">
<img alt="Flax" src="https://img.shields.io/badge/Flax-29a79b.svg?style=flat&logo=
">
</div>
## κ°œμš”[[overview]]
ELECTRA λͺ¨λΈμ€ [ELECTRA: Pre-training Text Encoders as Discriminators Rather Than
Generators](https://openreview.net/pdf?id=r1xMH1BtvB) λ…Όλ¬Έμ—μ„œ μ œμ•ˆλ˜μ—ˆμŠ΅λ‹ˆλ‹€. ELECTRAλŠ” 두가지 트랜슀포머 λͺ¨λΈμΈ 생성 λͺ¨λΈκ³Ό νŒλ³„ λͺ¨λΈμ„ ν•™μŠ΅μ‹œν‚€λŠ” μƒˆλ‘œμš΄ μ‚¬μ „ν•™μŠ΅ μ ‘κ·Όλ²•μž…λ‹ˆλ‹€. 생성 λͺ¨λΈμ˜ 역할은 μ‹œν€€μŠ€μ— μžˆλŠ” 토큰을 λŒ€μ²΄ν•˜λŠ” 것이며 λ§ˆμŠ€ν‚Ήλœ μ–Έμ–΄ λͺ¨λΈλ‘œ ν•™μŠ΅λ©λ‹ˆλ‹€. μš°λ¦¬κ°€ 관심을 κ°€μ§„ νŒλ³„ λͺ¨λΈμ€ μ‹œν€€μŠ€μ—μ„œ μ–΄λ–€ 토큰이 생성 λͺ¨λΈμ— μ˜ν•΄ λŒ€μ²΄λ˜μ—ˆλŠ”μ§€ μ‹λ³„ν•©λ‹ˆλ‹€.
λ…Όλ¬Έμ˜ μ΄ˆλ‘μ€ λ‹€μŒκ³Ό κ°™μŠ΅λ‹ˆλ‹€:
*BERT와 같은 λ§ˆμŠ€ν‚Ήλœ μ–Έμ–΄ λͺ¨λΈ(MLM) μ‚¬μ „ν•™μŠ΅ 방법은 일뢀 토큰을 [MASK] ν† ν°μœΌλ‘œ λ°”κΏ” μ†μƒμ‹œν‚€κ³  λ‚œ λ’€, λͺ¨λΈμ΄ λ‹€μ‹œ 원본 토큰을 λ³΅μ›ν•˜λ„λ‘ ν•™μŠ΅ν•©λ‹ˆλ‹€. 이런 방식은 λ‹€μš΄μŠ€νŠΈλ¦Ό NLP μž‘μ—…μ„ 전이할 λ•Œ 쒋은 μ„±λŠ₯을 λ‚΄μ§€λ§Œ, 효과적으둜 μ‚¬μš©ν•˜κΈ° μœ„ν•΄μ„œλŠ” 일반적으둜 λ§Žμ€ μ–‘μ˜ 연산이 ν•„μš”ν•©λ‹ˆλ‹€. λ”°λΌμ„œ λŒ€μ•ˆμœΌλ‘œ, λŒ€μ²΄ 토큰 탐지라고 λΆˆλ¦¬λŠ” μƒ˜ν”Œ-효과적인 μ‚¬μ „ν•™μŠ΅μ„ μ œμ•ˆν•©λ‹ˆλ‹€. 우리의 방법둠은 μž…λ ₯에 λ§ˆμŠ€ν‚Ήμ„ ν•˜λŠ” λŒ€μ‹ μ— μ†Œν˜• 생성 λͺ¨λΈμ˜ κ·ΈλŸ΄λ“―ν•œ λŒ€μ•ˆ ν† ν°μœΌλ‘œ μ†μƒμ‹œν‚΅λ‹ˆλ‹€. 그리고 λ‚˜μ„œ, λͺ¨λΈμ΄ μ†μƒλœ ν† ν°μ˜ μ›λž˜ 토큰을 μ˜ˆμΈ‘ν•˜λ„λ‘ ν›ˆλ ¨μ‹œν‚€λŠ” λŒ€μ‹ , νŒλ³„ λͺ¨λΈμ„ 각각의 토큰이 생성 λͺ¨λΈμ˜ μƒ˜ν”Œλ‘œ μ†μƒλ˜μ—ˆλŠ”μ§€ μ•„λ‹Œμ§€ ν•™μŠ΅ν•©λ‹ˆλ‹€. μ‹€ν—˜λ“€μ€ 톡해 이 μƒˆλ‘œμš΄ μ‚¬μ „ν•™μŠ΅ 방식은 λ§ˆμŠ€ν‚Ήλœ 일뢀 ν† ν°μ—λ§Œ μ μš©λ˜λŠ” κΈ°μ‘΄ 방식과 달리 λͺ¨λ“  μž…λ ₯ 토큰에 λŒ€ν•΄ ν•™μŠ΅μ΄ 이뀄지기 λ•Œλ¬Έμ— λ§ˆμŠ€ν‚Ήλœ μ–Έμ–΄ λͺ¨λΈ(MLM)보닀 더 νš¨μœ¨μ μž„μ„ μž…μ¦ν•˜μ˜€μŠ΅λ‹ˆλ‹€. 결과적으둜 μ†Œκ°œλœ 방식이 같은 λͺ¨λΈ 크기, 데이터, μ—°μ‚°λŸ‰μ„ κ°€μ§„ BERTλͺ¨λΈλ‘œ ν•™μŠ΅ν•œ κ²°κ³Όλ₯Ό μ••λ„ν•˜λŠ” λ¬Έλ§₯ ν‘œν˜„ ν•™μŠ΅μ„ ν•  수 μžˆλ‹€λŠ” 것을 ν™•μΈν–ˆμŠ΅λ‹ˆλ‹€. 특히 μž‘μ€ λͺ¨λΈμ—μ„œ μ„±λŠ₯ ν–₯상이 λ‘λ“œλŸ¬μ§€λ©°, 예λ₯Ό λ“€μ–΄ GPU ν•œ λŒ€λ‘œ 4일간 ν•™μŠ΅ν•œ λͺ¨λΈμ΄ 30λ°° 더 λ§Žμ€ 계산 μžμ›μ„ μ‚¬μš©ν•œ GPT보닀 GLUE μžμ—°μ–΄ 이해 λ²€μΉ˜λ§ˆν¬μ—μ„œ 더 λ‚˜μ€ μ„±λŠ₯을 λ³΄μž…λ‹ˆλ‹€. λŒ€κ·œλͺ¨ ν™˜κ²½μ—μ„œλ„ μœ νš¨ν•˜λ©° 더 적은 μ—°μ‚°λŸ‰μœΌλ‘œ RoBERTa와 XLNetκ³Ό λΉ„μŠ·ν•œ μ„±λŠ₯을 λ‚Ό 수 있으며, λ™μΌν•œ μ—°μ‚°λŸ‰μ„ κ°€μ§ˆ 경우 μ΄λ“€μ˜ μ„±λŠ₯을 λŠ₯κ°€ν•©λ‹ˆλ‹€.*
이 λͺ¨λΈμ€ [lysandre](https://huggingface.co/lysandre)이 κΈ°μ—¬ν–ˆμŠ΅λ‹ˆλ‹€. 원본 μ½”λ“œλŠ” [이곳](https://github.com/google-research/electra)μ—μ„œ 찾아보싀 수 μžˆμŠ΅λ‹ˆλ‹€.
## μ‚¬μš© 팁[[usage-tips]]
- ELECTRAλŠ” μ‚¬μ „ν•™μŠ΅ λ°©λ²•μœΌλ‘œ κΈ°λ³Έ λͺ¨λΈμΈ BERT의 ꡬ쑰와 거의 차이가 μ—†μŠ΅λ‹ˆλ‹€. μœ μΌν•œ μ°¨μ΄λŠ” μž„λ² λ”© 크기와 νžˆλ“  크기λ₯Ό κ΅¬λΆ„ν–ˆλ‹€λŠ” μ μž…λ‹ˆλ‹€. μž„λ² λ”© ν¬κΈ°λŠ” 일반적으둜 더 μž‘κ³ , νžˆλ“  ν¬κΈ°λŠ” 더 ν½λ‹ˆλ‹€. μž„λ² λ”©μ—μ„œ μž„λ² λ”© 크기λ₯Ό νžˆλ“  크기둜 λ³€ν™˜ν•˜κΈ° μœ„ν•΄ μΆ”κ°€λ‘œ μ„ ν˜• λ³€ν™˜ 측이 μ‚¬μš©λ©λ‹ˆλ‹€. μž„λ² λ”© 크기와 νžˆλ“  크기가 동일할 κ²½μš°μ—λŠ” 이 μ„ ν˜• λ³€ν™˜ 측이 ν•„μš”ν•˜μ§€ μ•ŠμŠ΅λ‹ˆλ‹€.
- ELECTRAλŠ” 또 λ‹€λ₯Έ (μž‘μ€) λ§ˆμŠ€ν‚Ήλœ μ–Έμ–΄ λͺ¨λΈμ„ μ‚¬μš©ν•΄ μ‚¬μ „ν•™μŠ΅ 된 트랜슀포머 λͺ¨λΈμž…λ‹ˆλ‹€. μž‘μ€ μ–Έμ–΄ λͺ¨λΈμ΄ μž…λ ₯ ν…μŠ€νŠΈμ˜ 일뢀λ₯Ό λ¬΄μž‘μœ„λ‘œ λ§ˆμŠ€ν‚Ήν•˜κ³ , κ·Έ μžλ¦¬μ— μƒˆλ‘œμš΄ 토큰을 μ‚½μž…ν•©λ‹ˆλ‹€. ELECTRAλŠ” μ›λž˜ 토큰과 λŒ€μ²΄λœ 토큰을 κ΅¬λΆ„ν•˜λŠ” 역할을 μˆ˜ν–‰ν•©λ‹ˆλ‹€. GAN ν›ˆλ ¨κ³Ό λΉ„μŠ·ν•˜μ§€λ§Œ, 생성 λͺ¨λΈμ€ ELECTRA λͺ¨λΈμ„ μ†μ΄λŠ” 것이 μ•„λ‹ˆλΌ μ›λž˜ ν…μŠ€νŠΈλ₯Ό λ³΅μ›ν•˜λŠ” λͺ©ν‘œλ‘œ λͺ‡ 단계 ν•™μŠ΅ν•©λ‹ˆλ‹€. κ·Έ ν›„ ELECTRAκ°€ ν•™μŠ΅μ„ ν•˜κ²Œ λ©λ‹ˆλ‹€.
- [ꡬ글 λ¦¬μ„œμΉ˜μ˜ κ΅¬ν˜„](https://github.com/google-research/electra)으둜 μ €μž₯된 ELECTRA checkpointsλŠ” 생성 λͺ¨λΈκ³Ό νŒλ³„ λͺ¨λΈμ„ ν¬ν•¨ν•©λ‹ˆλ‹€. λ³€ν™˜ μŠ€ν¬λ¦½νŠΈμ—μ„œλŠ” μ‚¬μš©μžκ°€ μ–΄λ–€ λͺ¨λΈμ„ μ–΄λ–€ μ•„ν‚€ν…μ²˜λ‘œ 내보낼지 λͺ…μ‹œν•΄μ•Ό ν•©λ‹ˆλ‹€. 일단 Hugging Face 포맷으둜 λ³€ν™˜λ˜λ©΄, 이 μ²΄ν¬ν¬μΈνŠΈλ“€μ€ λͺ¨λ“  ELECTRA λͺ¨λΈμ—μ„œ 뢈러올 수 μžˆμŠ΅λ‹ˆλ‹€. 즉, νŒλ³„ λͺ¨λΈμ€ [`ElectraForMaskedLM`] λͺ¨λΈμ—, 생성 λͺ¨λΈμ€ [`ElectraForPreTraining`]λͺ¨λΈμ— 뢈러올 수 μžˆλ‹€λŠ” μ˜λ―Έμž…λ‹ˆλ‹€. (단, 생성 λͺ¨λΈμ—λŠ” λΆ„λ₯˜ ν—€λ“œκ°€ μ‘΄μž¬ν•˜μ§€ μ•ŠκΈ° λ•Œλ¬Έμ—, ν•΄λ‹Ή 뢀뢄은 λ¬΄μž‘μœ„λ‘œ μ΄ˆκΈ°ν™”λ©λ‹ˆλ‹€.)
## 참고 자료[[resources]]
- [ν…μŠ€νŠΈ λΆ„λ₯˜ κ°€μ΄λ“œ](../tasks/sequence_classification)
- [토큰 λΆ„λ₯˜ κ°€μ΄λ“œ](../tasks/token_classification)
- [질의 응닡 κ°€μ΄λ“œ](../tasks/question_answering)
- [인과 μ–Έμ–΄ λͺ¨λΈλ§ κ°€μ΄λ“œ](../tasks/language_modeling)
- [λ§ˆμŠ€ν‚Ήλœ μ–Έμ–΄ λͺ¨λΈλ§ κ°€μ΄λ“œ](../tasks/masked_language_modeling)
- [객관식 문제 κ°€μ΄λ“œ](../tasks/multiple_choice)
## ElectraConfig
[[autodoc]] ElectraConfig
## ElectraTokenizer
[[autodoc]] ElectraTokenizer
## ElectraTokenizerFast
[[autodoc]] ElectraTokenizerFast
## Electra specific outputs
[[autodoc]] models.electra.modeling_electra.ElectraForPreTrainingOutput
[[autodoc]] models.electra.modeling_tf_electra.TFElectraForPreTrainingOutput
<frameworkcontent>
<pt>
## ElectraModel
[[autodoc]] ElectraModel
- forward
## ElectraForPreTraining
[[autodoc]] ElectraForPreTraining
- forward
## ElectraForCausalLM
[[autodoc]] ElectraForCausalLM
- forward
## ElectraForMaskedLM
[[autodoc]] ElectraForMaskedLM
- forward
## ElectraForSequenceClassification
[[autodoc]] ElectraForSequenceClassification
- forward
## ElectraForMultipleChoice
[[autodoc]] ElectraForMultipleChoice
- forward
## ElectraForTokenClassification
[[autodoc]] ElectraForTokenClassification
- forward
## ElectraForQuestionAnswering
[[autodoc]] ElectraForQuestionAnswering
- forward
</pt>
<tf>
## TFElectraModel
[[autodoc]] TFElectraModel
- call
## TFElectraForPreTraining
[[autodoc]] TFElectraForPreTraining
- call
## TFElectraForMaskedLM
[[autodoc]] TFElectraForMaskedLM
- call
## TFElectraForSequenceClassification
[[autodoc]] TFElectraForSequenceClassification
- call
## TFElectraForMultipleChoice
[[autodoc]] TFElectraForMultipleChoice
- call
## TFElectraForTokenClassification
[[autodoc]] TFElectraForTokenClassification
- call
## TFElectraForQuestionAnswering
[[autodoc]] TFElectraForQuestionAnswering
- call
</tf>
<jax>
## FlaxElectraModel
[[autodoc]] FlaxElectraModel
- __call__
## FlaxElectraForPreTraining
[[autodoc]] FlaxElectraForPreTraining
- __call__
## FlaxElectraForCausalLM
[[autodoc]] FlaxElectraForCausalLM
- __call__
## FlaxElectraForMaskedLM
[[autodoc]] FlaxElectraForMaskedLM
- __call__
## FlaxElectraForSequenceClassification
[[autodoc]] FlaxElectraForSequenceClassification
- __call__
## FlaxElectraForMultipleChoice
[[autodoc]] FlaxElectraForMultipleChoice
- __call__
## FlaxElectraForTokenClassification
[[autodoc]] FlaxElectraForTokenClassification
- __call__
## FlaxElectraForQuestionAnswering
[[autodoc]] FlaxElectraForQuestionAnswering
- __call__
</jax>
</frameworkcontent>