chongli17 commited on
Commit
2742f38
·
verified ·
1 Parent(s): a769010

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -3
README.md CHANGED
@@ -1,3 +1,24 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+ # Model Card for TokAlign-Pythia-1b-LLaMA3-Tokenizer
5
+
6
+ The model is initialized from [Pythia-1b](https://huggingface.co/EleutherAI/pythia-1b), replaced with the [LLaMA3 tokenizer](https://huggingface.co/meta-llama/Llama-3.1-8B), and fine-tuned 5k steps for vocabulary adaptation.
7
+
8
+ # Code
9
+
10
+ The code used to train this model refers to the [github](https://github.com/ZNLP/TokAlign) repo.
11
+
12
+ # Citation
13
+ ```
14
+ @inproceedings{li-etal-2025-TokAlign,
15
+ author = {Chong Li and
16
+ Jiajun Zhang and
17
+ Chengqing Zong},
18
+ title = "TokAlign: Efficient Vocabulary Adaptation via Token Alignment",
19
+ booktitle = "Proceedings of the 63nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
20
+ year = "2025",
21
+ address = "Vienna, Austria",
22
+ publisher = "Association for Computational Linguistics",
23
+ }
24
+ ```