Instructions to use google/flan-t5-base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use google/flan-t5-base with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("google/flan-t5-base") model = AutoModelForSeq2SeqLM.from_pretrained("google/flan-t5-base") - Notebooks
- Google Colab
- Kaggle
Where to find the token ids of the tokenizer ?
Hello,
I was wondering how can I access and change the tokenizer's token ids ?
Thanks !
I may add that I speak about the mapping from tokens (part of words) and ids
Hey! The tokenizer by default is based on sentencepiece. You can't really change it but you can add tokens using add_tokens and see the vocab using tokenizer.get_vocab()
sentence = "What time is it, Tom?"
sentence_encoded = tokenizer(sentence, return_tensors='pt')
sentence_decoded = tokenizer.decode(
sentence_encoded["input_ids"][0],
skip_special_tokens=True
)
print('ENCODED SENTENCE:')
print(sentence_encoded["input_ids"][0])
print('\nDECODED SENTENCE:')
print(sentence_decoded)
ENCODED SENTENCE:
tensor([ 363, 97, 19, 34, 6, 3059, 58, 1])
DECODED SENTENCE:
What time is it, Tom?
If this helps.