| --- |
| language: |
| - en |
| - zh |
| license: apache-2.0 |
| pipeline_tag: text-generation |
| library_name: transformers |
| --- |
| |
| # DECO-0.2B |
|
|
| This is the 0.2B DECO checkpoint introduced by the paper [DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices](https://huggingface.co/papers/2605.10933). |
|
|
| DECO (Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices) is a sparse MoE architecture designed to match the performance of dense Transformers under identical total parameter budgets and training tokens. It is an improved version of the [BlockFFN](https://arxiv.org/pdf/2507.08771) architecture. |
|
|
| - **Authors:** Chenyang Song, Weilin Zhao, Xu Han, Chaojun Xiao, Yingfa Chen, Zhiyuan Liu |
| - **Paper:** [arXiv:2605.10933](https://huggingface.co/papers/2605.10933) |
| - **Code:** [https://github.com/thunlp/DECO](https://github.com/thunlp/DECO) |
|
|
| ### Quick start |
|
|
| You can load and use this model with `AutoTokenizer` and `AutoModelForCausalLM` from `transformers`. Since the model uses a custom architecture, `trust_remote_code=True` is required. |
|
|
| ```python |
| import torch |
| from transformers import AutoTokenizer, AutoModelForCausalLM |
| |
| model_id = "SparseLLM/DECO-0.2B" |
| tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True) |
| model = AutoModelForCausalLM.from_pretrained( |
| model_id, |
| torch_dtype=torch.bfloat16, |
| trust_remote_code=True, |
| ).to("cuda").eval() |
| |
| prompt = "Mixture-of-Experts models are useful because" |
| inputs = tokenizer(prompt, return_tensors="pt").to("cuda") |
| |
| with torch.no_grad(): |
| output = model.generate(**inputs, max_new_tokens=64, do_sample=False) |
| |
| print(tokenizer.decode(output[0], skip_special_tokens=True)) |
| ``` |
|
|
| ### Citation |
| If you find our work useful for your research, please kindly cite our paper as follows: |
|
|
| ```bibtex |
| @article{song2026deco, |
| title={{DECO}: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices}, |
| author={Chenyang Song, Weilin Zhao, Xu Han, Chaojun Xiao, Yingfa Chen, Zhiyuan Liu}, |
| journal={arXiv preprint arXiv:2605.10933}, |
| year={2026}, |
| url={https://arxiv.org/pdf/2605.10933}, |
| } |
| ``` |