--- language: - en - zh license: apache-2.0 pipeline_tag: text-generation library_name: transformers tags: - moe - llm - acceleration --- # BlockFFN-Large This is the original 0.8B BlockFFN checkpoint used in the paper *BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity* for acceleration tests. Links: [[Paper](https://arxiv.org/pdf/2507.08771)] [[Codes](https://github.com/thunlp/BlockFFN)] ### How to use You can load and use this model directly with the `transformers` library. Ensure you set `trust_remote_code=True` due to the custom architecture. ```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch model_name = "SparseLLM/BlockFFN-Large" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True ) model.eval() # Set model to evaluation mode text = "The quick brown fox jumps over the lazy" inputs = tokenizer(text, return_tensors="pt").to(model.device) # Generate text outputs = model.generate(**inputs, max_new_tokens=20, do_sample=True, temperature=0.8, top_p=0.8) generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True) print(generated_text) ``` ### Citation If you find our work useful for your research, please kindly cite our paper as follows: ``` @article{song2025blockffn, title={{BlockFFN}: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity}, author={Chenyang Song and Weilin Zhao and Xu Han and Chaojun Xiao and Yingfa Chen and Yuxuan Li and Zhiyuan Liu and Maosong Sun}, journal={arXiv preprint arXiv:2507.08771}, year={2025}, url={https://arxiv.org/pdf/2507.08771}, }