How to use from the
Use from the
Transformers library
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="SparseLLM/DECO-0.5B", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)
# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("SparseLLM/DECO-0.5B", trust_remote_code=True, dtype="auto")
Quick Links

DECO-0.5B

This is the 0.5B DECO checkpoint introduced by the paper DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices.

DECO (Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices) is a sparse MoE architecture designed to match the performance of dense Transformers under identical total parameter budgets and training tokens. It is an improved version of the BlockFFN architecture.

Quick start

You can load and use this model with AutoTokenizer and AutoModelForCausalLM from transformers. Since the model uses a custom architecture, trust_remote_code=True is required.

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "SparseLLM/DECO-0.5B"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
).to("cuda").eval()

prompt = "Mixture-of-Experts models are useful because"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

with torch.no_grad():
    output = model.generate(**inputs, max_new_tokens=64, do_sample=False)

print(tokenizer.decode(output[0], skip_special_tokens=True))

Citation

If you find our work useful for your research, please kindly cite our paper as follows:

@article{song2026deco,
      title={{DECO}: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices}, 
      author={Chenyang Song, Weilin Zhao, Xu Han, Chaojun Xiao, Yingfa Chen, Zhiyuan Liu},
      journal={arXiv preprint arXiv:2605.10933},
      year={2026},
      url={https://arxiv.org/pdf/2605.10933}, 
}
Downloads last month
23
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Papers for SparseLLM/DECO-0.5B