--- language: de widget: - text: "In einer schockierenden Entdeckung fanden Wissenschaftler eine Herde Einhörner, die in einem abgelegenen, zuvor unerforschten Tal in den Anden lebten." --- # BPT2 See the [GPT2 model card](https://huggingface.co/gpt2) for considerations on limitations and bias. See the [GPT2 documentation](https://huggingface.co/transformers/model_doc/gpt2.html) for details on GPT2. ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline tokenizer = AutoTokenizer.from_pretrained("tursunali/bpt2") model = AutoModelForCausalLM.from_pretrained("tursunali/bpt2") prompt = "" pipe = pipeline("text-generation", model=model, tokenizer=tokenizer) print(pipe(prompt)[0]["generated_text"]) ``` Also, two tricks might improve the generated text: ```python output = model.generate( # during training an EOS token was used to mark the beginning of each text # so it can help to insert it at the start torch.tensor( [tokenizer.eos_token_id] + tokenizer.encode(prompt) ).unsqueeze(0), do_sample=True, # try setting bad_words_ids=[[0]] to disallow generating an EOS token, without this the model is # prone to ending generation early because a significant number of texts from the training corpus # is quite short bad_words_ids=[[0]], max_length=max_length, )[0] print(tokenizer.decode(output)) ``` ## Citing Please cite BPT2 as follows: ``` @misc{Backpacker_Trail_German_large_2022, author = {BackpackerTrail, Tursunali Kholdorov}, title = {{BPT2: Backpacker Trail German versions of BPT2}}, url = {https://github.com/Tursunali-Kholdorov/bptTrainer}, year = {2022} } ```