Enhance model card with library, tags, usage example, and detailed description

This PR significantly improves the model card for BlockFFN-3B-SFT by:
- Adding `library_name: transformers` to the metadata, which enables the "How to use" widget and ensures proper categorization on the Hub.
- Adding the `moe` tag, as the model is based on a Mixture-of-Experts architecture, enhancing discoverability.
- Expanding the model description with a concise summary from the paper's abstract, providing more context to users.
- Including a runnable Python code snippet for `AutoTokenizer` and `AutoModelForCausalLM`, making it easier for users to get started with the model.
- Adding a link to the associated Hugging Face Models Collection (`SparseLLM`) for better project navigation.
The original Arxiv paper link has been retained as per guidance.

Files changed (1) hide show

README.md +33 -4

README.md CHANGED Viewed

@@ -1,17 +1,46 @@
 ---
-license: apache-2.0
 language:
 - en
 - zh
 pipeline_tag: text-generation
 ---
 # BlockFFN-3B-SFT
 This is the original 3B BlockFFN checkpoint used in the paper *BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity* for acceleration tests.
-You can load and use this model simply by using `AutoTokenizer` and `AutoModelForCausalLM`.
-Links: [[Paper](https://arxiv.org/pdf/2507.08771)] [[Codes](https://github.com/thunlp/BlockFFN)]
 ### Citation
@@ -25,4 +54,4 @@ If you find our work useful for your research, please kindly cite our paper as f
       year={2025},
       url={https://arxiv.org/pdf/2507.08771},
 }
-```

 ---
 language:
 - en
 - zh
+license: apache-2.0
 pipeline_tag: text-generation
+library_name: transformers
+tags:
+  - moe
 ---
 # BlockFFN-3B-SFT
 This is the original 3B BlockFFN checkpoint used in the paper *BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity* for acceleration tests.
+BlockFFN introduces a novel Mixture-of-Experts (MoE) architecture designed to alleviate the computational burden of large language models (LLMs) by promoting both token-level sparsity (TLS) and chunk-level sparsity (CLS). It features a new router integrating ReLU activation and RMSNorm for differentiable and flexible routing. CLS-aware training objectives are designed to enhance acceleration-friendliness, particularly for low-resource conditions like end-side devices. The model also integrates efficient acceleration kernels, combining activation sparsity and speculative decoding for the first time. Experimental results demonstrate BlockFFN's superior performance, achieving high TLS and CLS, and significant speedups on real end-side devices compared to dense models.
+Links: [[Paper](https://arxiv.org/pdf/2507.08771)] [[Codes](https://github.com/thunlp/BlockFFN)] [[Models Collection](https://huggingface.co/SparseLLM)]
+### How to use
+You can load and use this model simply by using `AutoTokenizer` and `AutoModelForCausalLM` from the `transformers` library:
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+import torch
+model_name = "SparseLLM/BlockFFN-3B-SFT"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    torch_dtype=torch.bfloat16,
+    device_map="auto",
+    trust_remote_code=True
+)
+text = "Hello, my name is"
+input_ids = tokenizer(text, return_tensors="pt").input_ids.to(model.device)
+outputs = model.generate(input_ids, max_new_tokens=20, do_sample=True, top_p=0.8, temperature=0.8)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
 ### Citation
       year={2025},
       url={https://arxiv.org/pdf/2507.08771},
 }
+```