| | --- |
| | datasets: |
| | - flexitok/mod-arithmetic |
| | --- |
| | # Super Vocabulary |
| |
|
| | A merged super-vocabulary built from 9 tokenizer(s). |
| |
|
| | **Vocab size:** 100007 |
| |
|
| | ## Tokenizers |
| |
|
| | - `flexitok/mod-tokenizers-individual` |
| | - `flexitok/mod-tokenizers-ltr_3digit` |
| | - `flexitok/mod-tokenizers-ltr_2digit` |
| | - `flexitok/mod-tokenizers-ltr_4digit` |
| | - `flexitok/mod-tokenizers-ltr_5digit` |
| | - `flexitok/mod-tokenizers-rtl_2digit` |
| | - `flexitok/mod-tokenizers-rtl_3digit` |
| | - `flexitok/mod-tokenizers-rtl_4digit` |
| | - `flexitok/mod-tokenizers-rtl_5digit` |
| |
|
| | ## Files |
| |
|
| | - `super_vocab.json` β merged vocabulary mapping token string β super index |
| | - `config.yaml` β model config with `vocab_size` |
| | - `participating_tokenizers.json` β list of tokenizer names included |
| | - `<tokenizer>_super_mapping.json` β per-tokenizer index β super index mapping |
| | - `<tokenizer>_vocab.json` β per-tokenizer vocabulary |
| | - `<tokenizer>_info.json` / `.yaml` β tokenizer metadata |