File size: 2,465 Bytes
8a3fc84
 
364b968
 
 
 
 
8a3fc84
 
364b968
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
---
library_name: transformers
pipeline_tag: text-generation
license: mit # Please verify this license
tags:
  - long-sequence-generation
  - llm-acceleration
---

# TokenSwift: Lossless Acceleration of Ultra Long Sequence Generation

TokenSwift is a novel framework designed to substantially accelerate the generation process of ultra-long sequences (up to 100K tokens) while maintaining the target model's inherent quality. This model achieves over 3x speedup across models of varying scales and architectures, translating to significant time savings for ultra-long sequence generation. This model is described in [From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation up to 100K Tokens](https://arxiv.org/abs/2502.18890).

## Getting Started

### Models Download
| Model Name | Download Link |
|------------|-------------|
| TokenSwift-Yarn-Llama-2-7b-128k | [HuggingFace](https://huggingface.co/TokenSwift/TokenSwift-Yarn-Llama-2-7b-128k) |
| TokenSwift-Llama-3.1-8B | [HuggingFace](https://huggingface.co/TokenSwift/TokenSwift-Llama-3.1-8B) |
| TokenSwift-Qwen2.5-1.5B | [HuggingFace](https://huggingface.co/TokenSwift/TokenSwift-Qwen2.5-1.5B) |
| TokenSwift-Qwen2.5-7B | [HuggingFace](https://huggingface.co/TokenSwift/TokenSwift-Qwen2.5-7B) |
| TokenSwift-Qwen2.5-14B | [HuggingFace](https://huggingface.co/TokenSwift/TokenSwift-Qwen2.5-14B) |
| TokenSwift-DeepSeek-R1-Distill-Qwen-32B | [HuggingFace](https://huggingface.co/TokenSwift/TokenSwift-DeepSeek-R1-Distill-Qwen-32B) |


### Inference (LLaMA 3.1-8B Example)
```bash
torchrun  --master-port 1111 --nproc_per_node=1 main.py \
    --model_type llama3_1 \
    --ckpt_path your_checkpoint_path \
    --prefill_len 4096 \
    --retrival_max_budget 4096 \
    --gen_len 102400 \
    --gamma 4 \
    --min_p 0.1 \
    --temperature 1.0 \
    --tree_decoding \
    --ngram_topk 20 \
    --penalty 1.2 \
    --penalty_length 1024 \
    --prompt_id 0

  <NOTE: Modify the data and model path>
```

Code and further instructions are available at: https://github.com/bigai-nlco/TokenSwift


## Citation
```bibtex
@misc{tokenswift,
      title={From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation up to 100K Tokens}, 
      author={Tong Wu and Junzhe Shen and Zixia Jia and Yuxuan Wang and Zilong Zheng},
      year={2025},
      eprint={2502.18890},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2502.18890}, 
}
```