nielsr's picture
nielsr HF Staff
Improve model card: Add `library_name` and usage example
521cf94 verified
|
raw
history blame
2.14 kB
metadata
language:
  - en
  - zh
license: apache-2.0
pipeline_tag: text-generation
library_name: transformers

BlockFFN-3B-SFT-EAGLE

This is the 3B BlockFFN model used in the paper BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity for acceleration tests.

BlockFFN introduces a novel Mixture-of-Experts (MoE) architecture designed for efficient inference, particularly on end-side devices. It aims to achieve high token-level and chunk-level sparsity, making it acceleration-friendly and compatible with techniques like speculative decoding. This model is based on the paper.

For the full codebase and more details, visit the official GitHub repository.

Usage

You can easily load and use this model with the Hugging Face transformers library:

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
import torch

model_name = "SparseLLM/BlockFFN-3B-SFT-EAGLE"

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16, # or torch.float16 if bfloat16 is not supported
    device_map="auto",
    trust_remote_code=True,
)

# Create a text generation pipeline
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
)

# Generate text
prompt = "The quick brown fox jumps over the lazy"
output = pipe(prompt, max_new_tokens=50, do_sample=True, temperature=0.7)

print(output[0]['generated_text'])

Citation

If you find our work useful for your research, please kindly cite our paper as follows:

@article{song2025blockffn,
      title={{BlockFFN}: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity},
      author={Chenyang Song and Weilin Zhao and Xu Han and Chaojun Xiao and Yingfa Chen and Yuxuan Li and Zhiyuan Liu and Maosong Sun},
      journal={arXiv preprint arXiv:2507.08771},
      year={2025},
      url={https://arxiv.org/pdf/2507.08771},
}