chen-yingfa
/

HypeNet-2B

linear-attention

Model card Files Files and versions

HypeNet-2B / README.md

chen-yingfa's picture

Update README.md

81a6e00 verified about 1 month ago

|

history blame contribute delete

623 Bytes

	---
	license: apache-2.0
	datasets:
	- HuggingFaceFW/fineweb-edu
	language:
	- en
	base_model:
	- Qwen/Qwen3-1.7B
	tags:
	- linear-attention
	- hybrid
	- rnn
	- distillation
	---

	Links:

	- GitHub repo: <https://github.com/thunlp/hybrid-linear-attention>
	- Paper: <https://arxiv.org/abs/2601.22156>

	This is the final HypeNet-2B checkpoint from the paper [Hybrid Linear Attention Done Right: Efficient Distillation and Effective Architectures for Extremely Long Contexts](https://arxiv.org/pdf/2601.22156), distilled from Qwen3-1.7B using the HALO pipeline proposed in our paper. For more information, please refer to our GitHub repo.