File size: 10,581 Bytes
17c6d62 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 |
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
β οΈ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# μΈμ½λ-λμ½λ λͺ¨λΈ[[Encoder Decoder Models]]
## κ°μ[[Overview]]
[`EncoderDecoderModel`]μ μ¬μ νμ΅λ μλ μΈμ½λ©(autoencoding) λͺ¨λΈμ μΈμ½λλ‘, μ¬μ νμ΅λ μκ° νκ·(autoregressive) λͺ¨λΈμ λμ½λλ‘ νμ©νμ¬ μνμ€-ν¬-μνμ€(sequence-to-sequence) λͺ¨λΈμ μ΄κΈ°ννλ λ° μ΄μ©λ©λλ€.
μ¬μ νμ΅λ 체ν¬ν¬μΈνΈλ₯Ό νμ©ν΄ μνμ€-ν¬-μνμ€ λͺ¨λΈμ μ΄κΈ°ννλ κ²μ΄ μνμ€ μμ±(sequence generation) μμ
μ ν¨κ³Όμ μ΄λΌλ μ μ΄ Sascha Rothe, Shashi Narayan, Aliaksei Severynμ λ
Όλ¬Έ [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461)μμ μ
μ¦λμμ΅λλ€.
[`EncoderDecoderModel`]μ΄ νμ΅/λ―ΈμΈ μ‘°μ λ νμλ λ€λ₯Έ λͺ¨λΈκ³Ό λ§μ°¬κ°μ§λ‘ μ μ₯/λΆλ¬μ€κΈ°κ° κ°λ₯ν©λλ€. μμΈν μ¬μ©λ²μ μμ λ₯Ό μ°Έκ³ νμΈμ.
μ΄ μν€ν
μ²μ ν κ°μ§ μμ© μ¬λ‘λ λ κ°μ μ¬μ νμ΅λ [`BertModel`]μ κ°κ° μΈμ½λμ λμ½λλ‘ νμ©νμ¬ μμ½ λͺ¨λΈ(summarization model)μ ꡬμΆνλ κ²μ
λλ€. μ΄λ Yang Liuμ Mirella Lapataμ λ
Όλ¬Έ [Text Summarization with Pretrained Encoders](https://arxiv.org/abs/1908.08345)μμ μ μλ λ° μμ΅λλ€.
## λͺ¨λΈ μ€μ μμ `EncoderDecoderModel`μ 무μμ μ΄κΈ°ννκΈ°[[Randomly initializing `EncoderDecoderModel` from model configurations.]]
[`EncoderDecoderModel`]μ μΈμ½λμ λμ½λ μ€μ (config)μ κΈ°λ°μΌλ‘ 무μμ μ΄κΈ°νλ₯Ό ν μ μμ΅λλ€. μλ μμλ [`BertModel`] μ€μ μ μΈμ½λλ‘, κΈ°λ³Έ [`BertForCausalLM`] μ€μ μ λμ½λλ‘ μ¬μ©νλ λ°©λ²μ 보μ¬μ€λλ€.
```python
>>> from transformers import BertConfig, EncoderDecoderConfig, EncoderDecoderModel
>>> config_encoder = BertConfig()
>>> config_decoder = BertConfig()
>>> config = EncoderDecoderConfig.from_encoder_decoder_configs(config_encoder, config_decoder)
>>> model = EncoderDecoderModel(config=config)
```
## μ¬μ νμ΅λ μΈμ½λμ λμ½λλ‘ `EncoderDecoderModel` μ΄κΈ°ννκΈ°[[Initialising `EncoderDecoderModel` from a pretrained encoder and a pretrained decoder.]]
[`EncoderDecoderModel`]μ μ¬μ νμ΅λ μΈμ½λ 체ν¬ν¬μΈνΈμ μ¬μ νμ΅λ λμ½λ 체ν¬ν¬μΈνΈλ₯Ό μ¬μ©ν΄ μ΄κΈ°νν μ μμ΅λλ€. BERTμ κ°μ λͺ¨λ μ¬μ νμ΅λ μλ μΈμ½λ©(auto-encoding) λͺ¨λΈμ μΈμ½λλ‘ νμ©ν μ μμΌλ©°, GPT2μ κ°μ μκ° νκ·(autoregressive) λͺ¨λΈμ΄λ BARTμ λμ½λμ κ°μ΄ μ¬μ νμ΅λ μνμ€-ν¬-μνμ€ λμ½λ λͺ¨λΈμ λμ½λλ‘ μ¬μ©ν μ μμ΅λλ€. λμ½λλ‘ μ νν μν€ν
μ²μ λ°λΌ κ΅μ°¨ μ΄ν
μ
(cross-attention) λ μ΄μ΄κ° 무μμλ‘ μ΄κΈ°νλ μ μμ΅λλ€. μ¬μ νμ΅λ μΈμ½λμ λμ½λ 체ν¬ν¬μΈνΈλ₯Ό μ΄μ©ν΄ [`EncoderDecoderModel`]μ μ΄κΈ°ννλ €λ©΄, λͺ¨λΈμ λ€μ΄μ€νΈλ¦Ό μμ
μ λν΄ λ―ΈμΈ μ‘°μ (fine-tuning)ν΄μΌ ν©λλ€. μ΄μ λν μμΈν λ΄μ©μ [the *Warm-starting-encoder-decoder blog post*](https://huggingface.co/blog/warm-starting-encoder-decoder)μ μ€λͺ
λμ΄ μμ΅λλ€. μ΄ μμ
μ μν΄ `EncoderDecoderModel` ν΄λμ€λ [`EncoderDecoderModel.from_encoder_decoder_pretrained`] λ©μλλ₯Ό μ 곡ν©λλ€.
```python
>>> from transformers import EncoderDecoderModel, BertTokenizer
>>> tokenizer = BertTokenizer.from_pretrained("google-bert/bert-base-uncased")
>>> model = EncoderDecoderModel.from_encoder_decoder_pretrained("google-bert/bert-base-uncased", "google-bert/bert-base-uncased")
```
## κΈ°μ‘΄ `EncoderDecoderModel` 체ν¬ν¬μΈνΈ λΆλ¬μ€κΈ° λ° μΆλ‘ νκΈ°[[Loading an existing `EncoderDecoderModel` checkpoint and perform inference.]]
`EncoderDecoderModel` ν΄λμ€μ λ―ΈμΈ μ‘°μ (fine-tuned)λ 체ν¬ν¬μΈνΈλ₯Ό λΆλ¬μ€λ €λ©΄, Transformersμ λ€λ₯Έ λͺ¨λΈ μν€ν
μ²μ λ§μ°¬κ°μ§λ‘ [`EncoderDecoderModel`]μμ μ 곡νλ `from_pretrained(...)`λ₯Ό μ¬μ©ν μ μμ΅λλ€.
μΆλ‘ μ μννλ €λ©΄ [`generate`] λ©μλλ₯Ό νμ©νμ¬ ν
μ€νΈλ₯Ό μλ νκ·(autoregressive) λ°©μμΌλ‘ μμ±ν μ μμ΅λλ€. μ΄ λ©μλλ νμ λμ½λ©(greedy decoding), λΉ μμΉ(beam search), λ€ν μνλ§(multinomial sampling) λ± λ€μν λμ½λ© λ°©μμ μ§μν©λλ€.
```python
>>> from transformers import AutoTokenizer, EncoderDecoderModel
>>> # λ―ΈμΈ μ‘°μ λ seq2seq λͺ¨λΈκ³Ό λμνλ ν ν¬λμ΄μ κ°μ Έμ€κΈ°
>>> model = EncoderDecoderModel.from_pretrained("patrickvonplaten/bert2bert_cnn_daily_mail")
>>> tokenizer = AutoTokenizer.from_pretrained("patrickvonplaten/bert2bert_cnn_daily_mail")
>>> # let's perform inference on a long piece of text
>>> ARTICLE_TO_SUMMARIZE = (
... "PG&E stated it scheduled the blackouts in response to forecasts for high winds "
... "amid dry conditions. The aim is to reduce the risk of wildfires. Nearly 800 thousand customers were "
... "scheduled to be affected by the shutoffs which were expected to last through at least midday tomorrow."
... )
>>> input_ids = tokenizer(ARTICLE_TO_SUMMARIZE, return_tensors="pt").input_ids
>>> # μκΈ°νκ·μ μΌλ‘ μμ½ μμ± (κΈ°λ³Έμ μΌλ‘ 그리λ λμ½λ© μ¬μ©)
>>> generated_ids = model.generate(input_ids)
>>> generated_text = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
>>> print(generated_text)
nearly 800 thousand customers were affected by the shutoffs. the aim is to reduce the risk of wildfires. nearly 800, 000 customers were expected to be affected by high winds amid dry conditions. pg & e said it scheduled the blackouts to last through at least midday tomorrow.
```
## `TFEncoderDecoderModel`μ Pytorch 체ν¬ν¬μΈνΈ λΆλ¬μ€κΈ°[[Loading a PyTorch checkpoint into `TFEncoderDecoderModel`.]]
[`TFEncoderDecoderModel.from_pretrained`] λ©μλλ νμ¬ Pytorch 체ν¬ν¬μΈνΈλ₯Ό μ¬μ©ν λͺ¨λΈ μ΄κΈ°νλ₯Ό μ§μνμ§ μμ΅λλ€. μ΄ λ©μλμ `from_pt=True`λ₯Ό μ λ¬νλ©΄ μμΈ(exception)κ° λ°μν©λλ€. νΉμ μΈμ½λ-λμ½λ λͺ¨λΈμ λν Pytorch 체ν¬ν¬μΈνΈλ§ μ‘΄μ¬νλ κ²½μ°, λ€μκ³Ό κ°μ ν΄κ²° λ°©λ²μ μ¬μ©ν μ μμ΅λλ€:
```python
>>> # νμ΄ν μΉ μ²΄ν¬ν¬μΈνΈμμ λ‘λνλ ν΄κ²° λ°©λ²
>>> from transformers import EncoderDecoderModel, TFEncoderDecoderModel
>>> _model = EncoderDecoderModel.from_pretrained("patrickvonplaten/bert2bert-cnn_dailymail-fp16")
>>> _model.encoder.save_pretrained("./encoder")
>>> _model.decoder.save_pretrained("./decoder")
>>> model = TFEncoderDecoderModel.from_encoder_decoder_pretrained(
... "./encoder", "./decoder", encoder_from_pt=True, decoder_from_pt=True
... )
>>> # μ΄ λΆλΆμ νΉμ λͺ¨λΈμ ꡬ체μ μΈ μΈλΆμ¬νμ 볡μ¬ν λμλ§ μ¬μ©ν©λλ€.
>>> model.config = _model.config
```
## νμ΅[[Training]]
λͺ¨λΈμ΄ μμ±λ νμλ BART, T5 λλ κΈ°ν μΈμ½λ-λμ½λ λͺ¨λΈκ³Ό μ μ¬ν λ°©μμΌλ‘ λ―ΈμΈ μ‘°μ (fine-tuning)ν μ μμ΅λλ€.
보μλ€μνΌ, μμ€(loss)μ κ³μ°νλ €λ©΄ λ¨ 2κ°μ μ
λ ₯λ§ νμν©λλ€: `input_ids`(μ
λ ₯ μνμ€λ₯Ό μΈμ½λ©ν `input_ids`)μ `labels`(λͺ©ν μνμ€λ₯Ό μΈμ½λ©ν `input_ids`).
```python
>>> from transformers import BertTokenizer, EncoderDecoderModel
>>> tokenizer = BertTokenizer.from_pretrained("google-bert/bert-base-uncased")
>>> model = EncoderDecoderModel.from_encoder_decoder_pretrained("google-bert/bert-base-uncased", "google-bert/bert-base-uncased")
>>> model.config.decoder_start_token_id = tokenizer.cls_token_id
>>> model.config.pad_token_id = tokenizer.pad_token_id
>>> input_ids = tokenizer(
... "The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building, and the tallest structure in Paris. Its base is square, measuring 125 metres (410 ft) on each side.During its construction, the Eiffel Tower surpassed the Washington Monument to become the tallest man-made structure in the world, a title it held for 41 years until the Chrysler Building in New York City was finished in 1930. It was the first structure to reach a height of 300 metres. Due to the addition of a broadcasting aerial at the top of the tower in 1957, it is now taller than the Chrysler Building by 5.2 metres (17 ft).Excluding transmitters, the Eiffel Tower is the second tallest free-standing structure in France after the Millau Viaduct.",
... return_tensors="pt",
... ).input_ids
>>> labels = tokenizer(
... "the eiffel tower surpassed the washington monument to become the tallest structure in the world. it was the first structure to reach a height of 300 metres in paris in 1930. it is now taller than the chrysler building by 5. 2 metres ( 17 ft ) and is the second tallest free - standing structure in paris.",
... return_tensors="pt",
... ).input_ids
>>> # forward ν¨μκ° μλμΌλ‘ μ ν©ν decoder_input_idsλ₯Ό μμ±ν©λλ€.
>>> loss = model(input_ids=input_ids, labels=labels).loss
```
νλ ¨μ λν μμΈν λ΄μ©μ [colab](https://colab.research.google.com/drive/1WIk2bxglElfZewOHboPFNj8H44_VAyKE?usp=sharing#scrollTo=ZwQIEhKOrJpl) λ
ΈνΈλΆμ μ°Έμ‘°νμΈμ.
μ΄ λͺ¨λΈμ [thomwolf](https://github.com/thomwolf)κ° κΈ°μ¬νμΌλ©°, μ΄ λͺ¨λΈμ λν TensorFlow λ° Flax λ²μ μ [ydshieh](https://github.com/ydshieh)κ° κΈ°μ¬νμ΅λλ€.
## EncoderDecoderConfig
[[autodoc]] EncoderDecoderConfig
<frameworkcontent>
<pt>
## EncoderDecoderModel
[[autodoc]] EncoderDecoderModel
- forward
- from_encoder_decoder_pretrained
</pt>
<tf>
## TFEncoderDecoderModel
[[autodoc]] TFEncoderDecoderModel
- call
- from_encoder_decoder_pretrained
</tf>
<jax>
## FlaxEncoderDecoderModel
[[autodoc]] FlaxEncoderDecoderModel
- __call__
- from_encoder_decoder_pretrained
</jax>
</frameworkcontent>
|