LightTransfer: Your Long-Context LLM is Secretly a Hybrid Model with Effortless Adaptation
Paper
•
2410.13846
•
Published
•
2
QwQ-LightTransfer is a 32B-parameter model built on Qwen/Qwen2.5-32B-Instruct and fine-tuned via SFT on RUC-AIBOX/long_form_thought_data_5k.
We have evaluated QwQ-LightTransfer on several long reasoning generation benchmarks. Some of the evaluation results are shown in the table below.
| Method | Math-OAI | AIME24 | AIME25 | GSM8K |
|---|---|---|---|---|
| o1-preview | 85.5 | 44.6 | - | - |
| OwO-STILL | 90.2 | 46.7 | 33.3 | 95.6 |
| LongGen | 78.2 | 16.7 | - | 95.4 |
| LightTransfer | 90.7 | 53.3 | 40.0 | 95.5 |
To load the QwQ-LightTransfer model using Transformers, use the following code:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name = 'QwQ-32B-LightTransfer'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name,torch_dtype=torch.bfloat16,trust_remote_code=True,device_map='auto')
text = "Hi, I'm QwQ-32B-LightTransfer."
inputs = tokenizer(text, return_tensors='pt').to(model.device)
with torch.no_grad():
outputs = model.generate(inputs['input_ids'],max_gen_len=32000)
print(tokenizer.decode(outputs[0]))
@misc{zhang2025lighttransferlongcontextllmsecretly,
title={LightTransfer: Your Long-Context LLM is Secretly a Hybrid Model with Effortless Adaptation},
author={Xuan Zhang and Fengzhuo Zhang and Cunxiao Du and Chao Du and Tianyu Pang and Wei Gao and Min Lin},
year={2025},
eprint={2410.13846},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2410.13846},
}