File size: 1,689 Bytes
ac5cabf 9ca6484 ac5cabf |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 |
---
language: de
widget:
- text: "In einer schockierenden Entdeckung fanden Wissenschaftler eine Herde Einhörner, die in einem abgelegenen, zuvor unerforschten Tal in den Anden lebten."
---
# BPT2
See the [GPT2 model card](https://huggingface.co/gpt2) for considerations on limitations and bias. See the [GPT2 documentation](https://huggingface.co/transformers/model_doc/gpt2.html) for details on GPT2.
## Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
tokenizer = AutoTokenizer.from_pretrained("tursunali/bpt2")
model = AutoModelForCausalLM.from_pretrained("tursunali/bpt2")
prompt = "<your prompt>"
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
print(pipe(prompt)[0]["generated_text"])
```
Also, two tricks might improve the generated text:
```python
output = model.generate(
# during training an EOS token was used to mark the beginning of each text
# so it can help to insert it at the start
torch.tensor(
[tokenizer.eos_token_id] + tokenizer.encode(prompt)
).unsqueeze(0),
do_sample=True,
# try setting bad_words_ids=[[0]] to disallow generating an EOS token, without this the model is
# prone to ending generation early because a significant number of texts from the training corpus
# is quite short
bad_words_ids=[[0]],
max_length=max_length,
)[0]
print(tokenizer.decode(output))
```
## Citing
Please cite BPT2 as follows:
```
@misc{Backpacker_Trail_German_large_2022,
author = {BackpackerTrail, Tursunali Kholdorov},
title = {{BPT2: Backpacker Trail German versions of BPT2}},
url = {https://github.com/Tursunali-Kholdorov/bptTrainer},
year = {2022}
}
```
|