ibm-research
/

gpt2-medium-multiexit

Text Generation

text-generation-inference

Model card Files Files and versions

arielgera commited on May 4, 2023

Commit

33df4c2

·

1 Parent(s): 4ec8a94

Update README.md

Files changed (1) hide show

README.md +53 -0

README.md CHANGED Viewed

@@ -1,3 +1,56 @@
 ---
 license: mit
 ---

 ---
 license: mit
+datasets:
+- cc100
+language:
+- en
+pipeline_tag: text-generation
 ---
+# GPT-2 Medium Multi-Exit
+Pre-trained language model with identical parameters to [gpt2-medium](https://huggingface.co/gpt2-medium), but with additional language modeling heads ("exits") connected to different layers of the model.
+These 12 additional heads (in layers 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24) were trained on the English portion of [CC-100](https://huggingface.co/datasets/cc100) while keeping the original pre-trained model parameters frozen.
+The model can be used for the _Autocontrastive Decoding_ text generation approach described in [Gera et al. 2023](https://arxiv.org/abs/2305.01628), for _early-exiting_ approaches, or for other algorithms that consider the next-token predictions of different model layers.
+## Usage
+Harnessing the additional language modeling heads requires loading the model using the [auto-contrastive-generation library](https://github.com/IBM/auto-contrastive-generation) (`pip install autocontrastive-gen`).
+In a nutshell, the user creates a `MultiExitConfiguration` that determines model behavior at training and inference, and then loads the model using the dedicated `AutoMultiExitModel` class. After that, the model can be used with the `transformers` API like any other model. See the [GitHub](https://github.com/IBM/auto-contrastive-generation) for detailed usage instructions.
+For example, the code below initializes the model to use _Autocontrastive Decoding_, and then performs text generation in this chosen setting:
+```python
+from transformers import AutoTokenizer
+from autocontrastive_gen.modeling.configuration import MultiExitConfiguration
+from autocontrastive_gen.modeling.auto_model import AutoMultiExitModel
+# initialize a pre-trained multi-exit model to use auto-contrast between layer 24 and layer 12
+multi_exit_config = MultiExitConfiguration(use_original_head=False,
+                                           contrast_layer_indices=(24, 12))
+model = AutoMultiExitModel.from_pretrained("IBM/gpt2-medium-multiexit", multi_exit_config=multi_exit_config)
+# perform text generation as usual
+tokenizer = AutoTokenizer.from_pretrained("IBM/gpt2-medium-multiexit")
+prompt = tokenizer("humpty dumpty sat on", return_tensors='pt')
+generated_ids = model.generate(**prompt, max_new_tokens=15)
+print(tokenizer.batch_decode(generated_ids))
+```
+## Citation
+Ariel Gera, Roni Friedman, Ofir Arviv, Chulaka Gunasekara, Benjamin Sznajder, Noam Slonim and Eyal Shnarch.
+[The Benefits of Bad Advice: Autocontrastive Decoding across Model Layers](https://arxiv.org/abs/2305.01628). ACL 2023.
+```bibtex
+@inproceedings{gera2023autocontrastive,
+  title={The Benefits of Bad Advice: Autocontrastive Decoding across Model Layers},
+  author={Gera, Ariel and Friedman, Roni and Arviv, Ofir and Gunasekara, Chulaka and Sznajder, Benjamin and Slonim, Noam and Shnarch, Eyal},
+  booktitle={Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
+  month={july},
+  address={Toronto, Canada},
+  year={2023}
+}
+```