metadata
language: de
widget:
- text: >-
In einer schockierenden Entdeckung fanden Wissenschaftler eine Herde
Einhörner, die in einem abgelegenen, zuvor unerforschten Tal in den Anden
lebten.
BPT2
See the GPT2 model card for considerations on limitations and bias. See the GPT2 documentation for details on GPT2.
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
tokenizer = AutoTokenizer.from_pretrained("tursunali/bpt2")
model = AutoModelForCausalLM.from_pretrained("tursunali/bpt2")
prompt = "<your prompt>"
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
print(pipe(prompt)[0]["generated_text"])
Also, two tricks might improve the generated text:
output = model.generate(
# during training an EOS token was used to mark the beginning of each text
# so it can help to insert it at the start
torch.tensor(
[tokenizer.eos_token_id] + tokenizer.encode(prompt)
).unsqueeze(0),
do_sample=True,
# try setting bad_words_ids=[[0]] to disallow generating an EOS token, without this the model is
# prone to ending generation early because a significant number of texts from the training corpus
# is quite short
bad_words_ids=[[0]],
max_length=max_length,
)[0]
print(tokenizer.decode(output))
Citing
Please cite BPT2 as follows:
@misc{Backpacker_Trail_German_large_2022,
author = {BackpackerTrail, Tursunali Kholdorov},
title = {{BPT2: Backpacker Trail German versions of BPT2}},
url = {https://github.com/Tursunali-Kholdorov/bptTrainer},
year = {2022}
}