Improve model card: Add `library_name` and usage example

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +41 -7
README.md CHANGED
@@ -1,17 +1,52 @@
1
  ---
2
- license: apache-2.0
3
  language:
4
  - en
5
  - zh
 
6
  pipeline_tag: text-generation
 
7
  ---
8
 
9
  # BlockFFN-3B-SFT-EAGLE
10
 
11
  This is the 3B BlockFFN model used in the paper *BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity* for acceleration tests.
12
- It is directly adaptable to the `inference` implementation of our [codes](https://github.com/thunlp/BlockFFN).
13
 
14
- Links: [[Paper](https://arxiv.org/pdf/2507.08771)] [[Codes](https://github.com/thunlp/BlockFFN)]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
 
16
  ### Citation
17
 
@@ -19,10 +54,9 @@ If you find our work useful for your research, please kindly cite our paper as f
19
 
20
  ```
21
  @article{song2025blockffn,
22
- title={{BlockFFN}: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity},
23
  author={Chenyang Song and Weilin Zhao and Xu Han and Chaojun Xiao and Yingfa Chen and Yuxuan Li and Zhiyuan Liu and Maosong Sun},
24
  journal={arXiv preprint arXiv:2507.08771},
25
  year={2025},
26
- url={https://arxiv.org/pdf/2507.08771},
27
- }
28
- ```
 
1
  ---
 
2
  language:
3
  - en
4
  - zh
5
+ license: apache-2.0
6
  pipeline_tag: text-generation
7
+ library_name: transformers
8
  ---
9
 
10
  # BlockFFN-3B-SFT-EAGLE
11
 
12
  This is the 3B BlockFFN model used in the paper *BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity* for acceleration tests.
 
13
 
14
+ **BlockFFN** introduces a novel Mixture-of-Experts (MoE) architecture designed for efficient inference, particularly on end-side devices. It aims to achieve high token-level and chunk-level sparsity, making it acceleration-friendly and compatible with techniques like speculative decoding. This model is based on the [paper](https://arxiv.org/pdf/2507.08771).
15
+
16
+ For the full codebase and more details, visit the official [GitHub repository](https://github.com/thunlp/BlockFFN).
17
+
18
+ ### Usage
19
+
20
+ You can easily load and use this model with the Hugging Face `transformers` library:
21
+
22
+ ```python
23
+ from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
24
+ import torch
25
+
26
+ model_name = "SparseLLM/BlockFFN-3B-SFT-EAGLE"
27
+
28
+ # Load model and tokenizer
29
+ tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
30
+ model = AutoModelForCausalLM.from_pretrained(
31
+ model_name,
32
+ torch_dtype=torch.bfloat16, # or torch.float16 if bfloat16 is not supported
33
+ device_map="auto",
34
+ trust_remote_code=True,
35
+ )
36
+
37
+ # Create a text generation pipeline
38
+ pipe = pipeline(
39
+ "text-generation",
40
+ model=model,
41
+ tokenizer=tokenizer,
42
+ )
43
+
44
+ # Generate text
45
+ prompt = "The quick brown fox jumps over the lazy"
46
+ output = pipe(prompt, max_new_tokens=50, do_sample=True, temperature=0.7)
47
+
48
+ print(output[0]['generated_text'])
49
+ ```
50
 
51
  ### Citation
52
 
 
54
 
55
  ```
56
  @article{song2025blockffn,
57
+ title={{BlockFFN}: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity},
58
  author={Chenyang Song and Weilin Zhao and Xu Han and Chaojun Xiao and Yingfa Chen and Yuxuan Li and Zhiyuan Liu and Maosong Sun},
59
  journal={arXiv preprint arXiv:2507.08771},
60
  year={2025},
61
+ url={https://arxiv.org/pdf/2507.08771},
62
+ }