| <! |
|
|
| Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
|
|
| http://www.apache.org/licenses/LICENSE-2.0 |
|
|
| Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on |
| an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations under the License. |
| |
|
|
| # BertGeneration |
|
|
| ## Overview |
|
|
| The BertGeneration model is a BERT model that can be leveraged for sequence-to-sequence tasks using |
| [`EncoderDecoderModel`] as proposed in [Leveraging Pre-trained Checkpoints for Sequence Generation |
| Tasks](https://arxiv.org/abs/1907.12461) by Sascha Rothe, Shashi Narayan, Aliaksei Severyn. |
|
|
| The abstract from the paper is the following: |
|
|
| *Unsupervised pretraining of large neural models has recently revolutionized Natural Language Processing. By |
| warm-starting from the publicly released checkpoints, NLP practitioners have pushed the state-of-the-art on multiple |
| benchmarks while saving significant amounts of compute time. So far the focus has been mainly on the Natural Language |
| Understanding tasks. In this paper, we demonstrate the efficacy of pre-trained checkpoints for Sequence Generation. We |
| developed a Transformer-based sequence-to-sequence model that is compatible with publicly available pre-trained BERT, |
| GPT-2 and RoBERTa checkpoints and conducted an extensive empirical study on the utility of initializing our model, both |
| encoder and decoder, with these checkpoints. Our models result in new state-of-the-art results on Machine Translation, |
| Text Summarization, Sentence Splitting, and Sentence Fusion.* |
|
|
| Usage: |
|
|
| - The model can be used in combination with the [`EncoderDecoderModel`] to leverage two pretrained |
| BERT checkpoints for subsequent fine-tuning. |
|
|
| ```python |
| >>> # leverage checkpoints for Bert2Bert model... |
| >>> # use BERT's cls token as BOS token and sep token as EOS token |
| >>> encoder = BertGenerationEncoder.from_pretrained("bert-large-uncased", bos_token_id=101, eos_token_id=102) |
| >>> # add cross attention layers and use BERT's cls token as BOS token and sep token as EOS token |
| >>> decoder = BertGenerationDecoder.from_pretrained( |
| ... "bert-large-uncased", add_cross_attention=True, is_decoder=True, bos_token_id=101, eos_token_id=102 |
| ... ) |
| >>> bert2bert = EncoderDecoderModel(encoder=encoder, decoder=decoder) |
|
|
| >>> # create tokenizer... |
| >>> tokenizer = BertTokenizer.from_pretrained("bert-large-uncased") |
|
|
| >>> input_ids = tokenizer( |
| ... "This is a long article to summarize", add_special_tokens=False, return_tensors="pt" |
| ... ).input_ids |
| >>> labels = tokenizer("This is a short summary", return_tensors="pt").input_ids |
|
|
| >>> # train... |
| >>> loss = bert2bert(input_ids=input_ids, decoder_input_ids=labels, labels=labels).loss |
| >>> loss.backward() |
| ``` |
|
|
| - Pretrained [`EncoderDecoderModel`] are also directly available in the model hub, e.g., |
|
|
|
|
| ```python |
| >>> # instantiate sentence fusion model |
| >>> sentence_fuser = EncoderDecoderModel.from_pretrained("google/roberta2roberta_L-24_discofuse") |
| >>> tokenizer = AutoTokenizer.from_pretrained("google/roberta2roberta_L-24_discofuse") |
|
|
| >>> input_ids = tokenizer( |
| ... "This is the first sentence. This is the second sentence.", add_special_tokens=False, return_tensors="pt" |
| ... ).input_ids |
|
|
| >>> outputs = sentence_fuser.generate(input_ids) |
|
|
| >>> print(tokenizer.decode(outputs[0])) |
| ``` |
|
|
| Tips: |
|
|
| - [`BertGenerationEncoder`] and [`BertGenerationDecoder`] should be used in |
| combination with [`EncoderDecoder`]. |
| - For summarization, sentence splitting, sentence fusion and translation, no special tokens are required for the input. |
| Therefore, no EOS token should be added to the end of the input. |
|
|
| This model was contributed by [patrickvonplaten](https://huggingface.co/patrickvonplaten). The original code can be |
| found [here](https://tfhub.dev/s?module-type=text-generation&subtype=module,placeholder). |
|
|
| ## BertGenerationConfig |
|
|
| [[autodoc]] BertGenerationConfig |
|
|
| ## BertGenerationTokenizer |
|
|
| [[autodoc]] BertGenerationTokenizer |
| - save_vocabulary |
|
|
| ## BertGenerationEncoder |
|
|
| [[autodoc]] BertGenerationEncoder |
| - forward |
|
|
| ## BertGenerationDecoder |
|
|
| [[autodoc]] BertGenerationDecoder |
| - forward |
|
|