YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Bantam Tokenizer
BPE tokenizer with 128,016 vocabulary entries (128,000 base + 16 special tokens).
Special tokens
| Token | ID | String |
|---|---|---|
unk_token |
128000 | <|UNK|> |
pad_token |
128001 | <|PAD|> |
bos_token |
128002 | <|BOS|> |
eos_token |
128003 | <|EOS|> |
| — | 128004 | <|SYSTEM_START|> |
| — | 128005 | <|SYSTEM_END|> |
| — | 128006 | <|USER_START|> |
| — | 128007 | <|USER_END|> |
| — | 128008 | <|AGENT_START|> |
| — | 128009 | <|AGENT_END|> |
| — | 128010 | <|THINK_START|> |
| — | 128011 | <|THINK_END|> |
| — | 128012 | <|COMPUTE_START|> |
| — | 128013 | <|COMPUTE_END|> |
| — | 128014 | <|IMAGE_START|> |
| — | 128015 | <|IMAGE_END|> |
Config quick-reference
pad_token_id: 128001
bos_token_id: 128002
eos_token_id: 128003
Usage
from transformers import AutoTokenizer
tok = AutoTokenizer.from_pretrained("path/to/bantam-tokenizer")
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support