Model Card for TokAlign-Pythia-1b-Distill-LLaMA-3-8b

The model, initialized from TokAlign-Pythia-1b-LLaMA3-Tokenizer, is token-level distilled from LLaMA-3.1-8B.

Code

The code used to train this model refers to the github repo.

Citation

@inproceedings{li-etal-2025-TokAlign,
  author    = {Chong Li and
               Jiajun Zhang and
               Chengqing Zong},
  title = "TokAlign: Efficient Vocabulary Adaptation via Token Alignment",
  booktitle = "Proceedings of the 63nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
  year = "2025",
  address = "Vienna, Austria",
  publisher = "Association for Computational Linguistics",
}
Downloads last month
-
Safetensors
Model size
1B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support