Add text generation pipeline tag, license, and model details

364b968 verified 12 months ago

2.47 kB

	---
	library_name: transformers
	pipeline_tag: text-generation
	license: mit # Please verify this license
	tags:
	- long-sequence-generation
	- llm-acceleration
	---

	# TokenSwift: Lossless Acceleration of Ultra Long Sequence Generation

	TokenSwift is a novel framework designed to substantially accelerate the generation process of ultra-long sequences (up to 100K tokens) while maintaining the target model's inherent quality. This model achieves over 3x speedup across models of varying scales and architectures, translating to significant time savings for ultra-long sequence generation. This model is described in [From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation up to 100K Tokens](https://arxiv.org/abs/2502.18890).

	## Getting Started

	### Models Download
	\| Model Name \| Download Link \|
	\|------------\|-------------\|
	\| TokenSwift-Yarn-Llama-2-7b-128k \| [HuggingFace](https://huggingface.co/TokenSwift/TokenSwift-Yarn-Llama-2-7b-128k) \|
	\| TokenSwift-Llama-3.1-8B \| [HuggingFace](https://huggingface.co/TokenSwift/TokenSwift-Llama-3.1-8B) \|
	\| TokenSwift-Qwen2.5-1.5B \| [HuggingFace](https://huggingface.co/TokenSwift/TokenSwift-Qwen2.5-1.5B) \|
	\| TokenSwift-Qwen2.5-7B \| [HuggingFace](https://huggingface.co/TokenSwift/TokenSwift-Qwen2.5-7B) \|
	\| TokenSwift-Qwen2.5-14B \| [HuggingFace](https://huggingface.co/TokenSwift/TokenSwift-Qwen2.5-14B) \|
	\| TokenSwift-DeepSeek-R1-Distill-Qwen-32B \| [HuggingFace](https://huggingface.co/TokenSwift/TokenSwift-DeepSeek-R1-Distill-Qwen-32B) \|


	### Inference (LLaMA 3.1-8B Example)
	```bash
	torchrun --master-port 1111 --nproc_per_node=1 main.py \
	--model_type llama3_1 \
	--ckpt_path your_checkpoint_path \
	--prefill_len 4096 \
	--retrival_max_budget 4096 \
	--gen_len 102400 \
	--gamma 4 \
	--min_p 0.1 \
	--temperature 1.0 \
	--tree_decoding \
	--ngram_topk 20 \
	--penalty 1.2 \
	--penalty_length 1024 \
	--prompt_id 0

	<NOTE: Modify the data and model path>
	```

	Code and further instructions are available at: https://github.com/bigai-nlco/TokenSwift


	## Citation
	```bibtex
	@misc{tokenswift,
	title={From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation up to 100K Tokens},
	author={Tong Wu and Junzhe Shen and Zixia Jia and Yuxuan Wang and Zilong Zheng},
	year={2025},
	eprint={2502.18890},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2502.18890},
	}
	```