gsaltintas's picture
Upload README.md with huggingface_hub
7c08243 verified

Super Vocabulary

A merged super-vocabulary built from 9 tokenizer(s).

Vocab size: 110931

Tokenizers

  • flexitok/mod-tokenizers-zero-padded-individual
  • flexitok/mod-tokenizers-zero-padded-ltr_3digit
  • flexitok/mod-tokenizers-zero-padded-ltr_2digit
  • flexitok/mod-tokenizers-zero-padded-ltr_4digit
  • flexitok/mod-tokenizers-zero-padded-ltr_5digit
  • flexitok/mod-tokenizers-zero-padded-rtl_2digit
  • flexitok/mod-tokenizers-zero-padded-rtl_3digit
  • flexitok/mod-tokenizers-zero-padded-rtl_4digit
  • flexitok/mod-tokenizers-zero-padded-rtl_5digit

Files

  • super_vocab.json — merged vocabulary mapping token string → super index
  • config.yaml — model config with vocab_size
  • participating_tokenizers.json — list of tokenizer names included
  • <tokenizer>_super_mapping.json — per-tokenizer index → super index mapping
  • <tokenizer>_vocab.json — per-tokenizer vocabulary
  • <tokenizer>_info.json / .yaml — tokenizer metadata