metadata
datasets:
- flexitok/mod-arithmetic
Super Vocabulary
A merged super-vocabulary built from 9 tokenizer(s).
Vocab size: 100007
Tokenizers
flexitok/mod-tokenizers-individualflexitok/mod-tokenizers-ltr_3digitflexitok/mod-tokenizers-ltr_2digitflexitok/mod-tokenizers-ltr_4digitflexitok/mod-tokenizers-ltr_5digitflexitok/mod-tokenizers-rtl_2digitflexitok/mod-tokenizers-rtl_3digitflexitok/mod-tokenizers-rtl_4digitflexitok/mod-tokenizers-rtl_5digit
Files
super_vocab.json— merged vocabulary mapping token string → super indexconfig.yaml— model config withvocab_sizeparticipating_tokenizers.json— list of tokenizer names included<tokenizer>_super_mapping.json— per-tokenizer index → super index mapping<tokenizer>_vocab.json— per-tokenizer vocabulary<tokenizer>_info.json/.yaml— tokenizer metadata