nielsr HF Staff commited on
Commit
29b7b05
·
verified ·
1 Parent(s): 3e70aa8

Improve model card

Browse files

This PR improves the model card by adding the `library_name` and `pipeline_tag` metadata, which makes the model easier to discover and use. It also includes a brief model description and expands the usage example with an import statement. Additionally, the installation and evaluation instructions from the Github README are incorporated for better accessibility.

Files changed (1) hide show
  1. README.md +52 -33
README.md CHANGED
@@ -1,6 +1,11 @@
1
  ---
2
  license: apache-2.0
 
 
3
  ---
 
 
 
4
  **Paper**: [https://arxiv.org/pdf/2502.07780](https://arxiv.org/pdf/2502.07780)
5
  **Code**: https://github.com/IST-DASLab/DarwinLM
6
  **Models**: [DarwinLM-2.7B](https://huggingface.co/Shengkun/DarwinLM-2.7B), [DarwinLM-4.6B](https://huggingface.co/Shengkun/DarwinLM-4.6B), [DarwinLM-8.4B](https://huggingface.co/Shengkun/DarwinLM-8.4B)
@@ -8,54 +13,68 @@ license: apache-2.0
8
 
9
  ---
10
 
11
- This repository contains the weights of DarwinLM, an evolutionary structured pruning methods for large language models, as introduced in our paper. DarwinLM builds upon an evolutionary search process, generating multiple offspring models in each generation through mutation, and selecting the fittest for survival.
12
- ```
 
13
  # Please add trust_remote_code=True as the repo includes custom code to load and run DarwinLM
 
14
  model = AutoModelForCausalLM.from_pretrained("Shengkun/DarwinLM-2.7B-Pruned", trust_remote_code=True)
15
  ```
16
 
17
  ## Downstream Tasks
18
 
19
-
20
  **2.7B**
21
 
22
  | Method | Param. | SciQ | PIQA | WG | ArcE | ArcC | HS | LogiQA | BoolQ | Avg |
23
  |----------------------------|--------|------|------|------|------|------|------|--------|-------|------|
24
  | **Dense** | 6.7B | 93.7 | 78.1 | 69.3 | 76.4 | 53.0 | 78.6 | 30.7 | 77.7 | 69.2 |
25
- | **Uniform** | 3.4B | 44.1 | 57.1 | 53.3 | 33.5 | 32.2 | 27.3 | 25.0 | 49.0 | 40.1 |
26
- | **ZipLM** | 4.0B | 87.4 | 64.4 | 58.3 | 53.2 | 33.6 | 50.1 | 25.5 | 63.6 | 54.5 |
27
- | **ShearedLLama** | 2.7B | 84.5 | 66.4 | 53.4 | 49.8 | 28.4 | 47.6 | 27.6 | 50.9 | 51.0 |
28
- | *DarwinLM (one-shot)* | 2.7B | 85.6 | 70.8 | 55.8 | 63.3 | 38.1 | 53.2 | 28.5 | 62.7 | 57.2 |
29
- | **ShearedLLama (50B)** | 2.7B | 90.8 | 75.8 | 64.2 | 67.0 | 41.2 | 70.8 | 28.2 | 63.0 | 62.6 |
30
- | **ShearedLLama (10B†)** | 2.7B | 92.0 | 73.6 | 63.1 | 69.8 | 42.0 | 64.4 | 29.0 | 62.1 | 61.9 |
31
- | *DarwinLM (10B)* | 2.6B | 90.8 | 72.2 | 65.1 | 68.5 | 45.0 | 67.2 | 28.5 | 64.6 | 62.8 |
32
-
33
- **4.6B**
34
-
35
- | Model | Method | Param. | SciQ | PIQA | WG | ArcE | ArcC | HS | LogiQA | BoolQ | MMLU | Avg |
36
- |-----------------|------------------------|--------|------|------|------|------|------|------|--------|-------|------|------|
37
- | **Llama-3.1-8B** | **Dense** | 8B | 96.3 | 81.2 | 74.3 | 81.4 | 58.2 | 81.7 | 31.1 | 84.0 | 65.2 | 72.8 |
38
- | | **Uniform** | 4.5B | 29.1 | 53.6 | 51.7 | 26.0 | 23.6 | 27.1 | 25.5 | 62.1 | 25.7 | 36.1 |
39
- | | **ZipLM** | 6B | 65.5 | 60.6 | 56.0 | 40.2 | 34.4 | 34.4 | 28.1 | 63.0 | 27.9 | 45.7 |
40
- | | *DarwinLM (one-shot)* | 4.6B | 84.9 | 69.4 | 57.3 | 59.6 | 34.2 | 44.6 | 24.1 | 62.2 | 28.5 | 51.6 |
41
- | | **OLMO (2.5T)** | 7B | 92.8 | 79.4 | 70.4 | 73.3 | 44.9 | 77.1 | 27.9 | 72.5 | 28.3 | 62.9 |
42
- | | *DarwinLM (10.0B)* | 4.6B | 93.2 | 74.8 | 67.4 | 73.2 | 51.6 | 71.3 | 30.7 | 71.1 | 40.6 | 63.7 |
43
-
44
- **8.4B**
45
-
46
- | Model | Method | Param. | SciQ | PIQA | WG | ArcE | ArcC | HS | LogiQA | BoolQ | MMLU | Avg |
47
- |---------------------------|------------------------|--------|------|------|------|------|------|------|--------|-------|------|------|
48
- | **Qwen-2.5-14B-Instruct** | **Dense** | 14B | 96.8 | 81.9 | 79.1 | 85.7 | 72.8 | 85.1 | 38.5 | 87.9 | 80.0 | 78.6 |
49
- | | **Uniform** | 8.6B | 78.2 | 72.7 | 57.6 | 76.1 | 45.6 | 47.0 | 28.1 | 61.6 | 45.5 | 56.9 |
50
- | | **ZipLM** | 8.5B | 69.0 | 66.4 | 52.8 | 60.1 | 38.3 | 43.3 | 29.6 | 60.2 | 25.0 | 49.4 |
51
- | | *DarwinLM (one-shot)* | 8.4B | 84.3 | 73.9 | 60.5 | 75.7 | 48.0 | 53.3 | 29.3 | 66.9 | 43.1 | 59.4 |
52
- | | **OLMO-0424 (2.05T)** | 7B | 96.1 | 80.1 | 72.1 | 73.8 | 49.2 | 78.0 | 29.3 | 80.8 | 52.1 | 67.9 |
53
- | | *DarwinLM (10.0B)* | 8.4B | 89.5 | 78.1 | 70.7 | 79.6 | 57.6 | 74.9 | 33.5 | 73.9 | 57.9 | 68.4 |
54
 
 
55
 
 
56
 
57
- ## Bibtex
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
58
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
59
  @article{tang2025darwinlm,
60
  title={DarwinLM: Evolutionary Structured Pruning of Large Language Models},
61
  author={Tang, Shengkun and Sieberling, Oliver and Kurtic, Eldar and Shen, Zhiqiang and Alistarh, Dan},
 
1
  ---
2
  license: apache-2.0
3
+ library_name: transformers
4
+ pipeline_tag: text-generation
5
  ---
6
+
7
+ DarwinLM is an evolutionary structured pruning method for large language models. It builds upon an evolutionary search process, generating multiple offspring models in each generation through mutation, and selecting the fittest for survival. This significantly reduces the computational costs of LLMs, especially for real-time applications.
8
+
9
  **Paper**: [https://arxiv.org/pdf/2502.07780](https://arxiv.org/pdf/2502.07780)
10
  **Code**: https://github.com/IST-DASLab/DarwinLM
11
  **Models**: [DarwinLM-2.7B](https://huggingface.co/Shengkun/DarwinLM-2.7B), [DarwinLM-4.6B](https://huggingface.co/Shengkun/DarwinLM-4.6B), [DarwinLM-8.4B](https://huggingface.co/Shengkun/DarwinLM-8.4B)
 
13
 
14
  ---
15
 
16
+ This repository contains the weights of DarwinLM, as introduced in our paper.
17
+
18
+ ```python
19
  # Please add trust_remote_code=True as the repo includes custom code to load and run DarwinLM
20
+ from transformers import AutoModelForCausalLM
21
  model = AutoModelForCausalLM.from_pretrained("Shengkun/DarwinLM-2.7B-Pruned", trust_remote_code=True)
22
  ```
23
 
24
  ## Downstream Tasks
25
 
 
26
  **2.7B**
27
 
28
  | Method | Param. | SciQ | PIQA | WG | ArcE | ArcC | HS | LogiQA | BoolQ | Avg |
29
  |----------------------------|--------|------|------|------|------|------|------|--------|-------|------|
30
  | **Dense** | 6.7B | 93.7 | 78.1 | 69.3 | 76.4 | 53.0 | 78.6 | 30.7 | 77.7 | 69.2 |
31
+ | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32
 
33
+ **(Results for 4.6B and 8.4B)**
34
 
35
+ ## Installation
36
 
37
+ ```bash
38
+ conda env create -f environment.yml
39
+ conda activate darwinlm
40
+ ```
41
+
42
+ ## Database Preparation
43
+
44
+ ```bash
45
+ # For llama-2-7B
46
+ bash scripts/ziplm_llama2-7B.sh
47
+ # ... other model examples
48
+ ```
49
+
50
+ ## Evolutionary Search
51
+
52
+ ```bash
53
+ bash scripts/struct_prune_search.sh
54
  ```
55
+
56
+ ## Post-Training
57
+
58
+ After pruning, you can further fine-tune the model with the [Fineweb-edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu) dataset using the [llm-foundry](https://github.com/mosaicml/llm-foundry) repository. Refer to our paper for parameter settings.
59
+
60
+ ## Evaluation
61
+ Install the [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness).
62
+
63
+ **Option 1: Using pre-trained weights:**
64
+
65
+ ```bash
66
+ bash scripts/run_lmeval_hf.sh
67
+ ```
68
+
69
+ **Option 2: Evaluating your searched structure:**
70
+
71
+ ```bash
72
+ bash scripts/run_lmeval_config.sh
73
+ ```
74
+
75
+
76
+ ## Bibtex
77
+ ```bibtex
78
  @article{tang2025darwinlm,
79
  title={DarwinLM: Evolutionary Structured Pruning of Large Language Models},
80
  author={Tang, Shengkun and Sieberling, Oliver and Kurtic, Eldar and Shen, Zhiqiang and Alistarh, Dan},