| --- |
| license: apache-2.0 |
| datasets: |
| - HuggingFaceFW/fineweb-edu |
| language: |
| - en |
| base_model: |
| - Qwen/Qwen3-1.7B |
| tags: |
| - linear-attention |
| - hybrid |
| - rnn |
| - distillation |
| --- |
| |
| Links: |
|
|
| - GitHub repo: <https://github.com/thunlp/hybrid-linear-attention> |
| - Paper: <https://arxiv.org/abs/2601.22156> |
|
|
| This is the final HypeNet-2B checkpoint from the paper [Hybrid Linear Attention Done Right: Efficient Distillation and Effective Architectures for Extremely Long Contexts](https://arxiv.org/pdf/2601.22156), distilled from Qwen3-1.7B using the HALO pipeline proposed in our paper. For more information, please refer to our GitHub repo. |
|
|
|
|