Instructions to use sdadas/polish-longformer-base-4096 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use sdadas/polish-longformer-base-4096 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("fill-mask", model="sdadas/polish-longformer-base-4096")# Load model directly from transformers import AutoTokenizer, AutoModelForMaskedLM tokenizer = AutoTokenizer.from_pretrained("sdadas/polish-longformer-base-4096") model = AutoModelForMaskedLM.from_pretrained("sdadas/polish-longformer-base-4096") - Notebooks
- Google Colab
- Kaggle
How to use this model for tokenization?
#1
by tprochenka - opened
Hi I tried to do tokenization:tokenizer = LongformerTokenizer.from_pretrained("sdadas/polish-longformer-base-4096")
I got an error that vocab_file is not found. Indeed, I see that there is no vocab.json, instead I see tokanizer.json. Could you please share a snippet showing how to do tokenization using your model?
Thanks!
Hi, the model supports fast tokenizer format only. Use LongformerTokenizerFast instead of LongformerTokenizer:
from transformers import LongformerTokenizerFast
tokenizer = LongformerTokenizerFast.from_pretrained("sdadas/polish-longformer-base-4096")
encoded = tokenizer("Za偶贸艂ci膰 g臋艣l膮 ja藕艅.")
print(encoded.input_ids)
Thanks for a quick answer, it works :)
tprochenka changed discussion status to closed