Mungert commited on
Commit
d3fefc1
·
verified ·
0 Parent(s):

Super-squash history to reclaim storage

Browse files
.gitattributes ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ SmallThinker-4BA0.6B-Instruct-q5_1.gguf filter=lfs diff=lfs merge=lfs -text
37
+ SmallThinker-4BA0.6B-Instruct-q4_0_l.gguf filter=lfs diff=lfs merge=lfs -text
38
+ SmallThinker-4BA0.6B-Instruct-bf16.gguf filter=lfs diff=lfs merge=lfs -text
39
+ SmallThinker-4BA0.6B-Instruct-f16.gguf filter=lfs diff=lfs merge=lfs -text
40
+ SmallThinker-4BA0.6B-Instruct-q5_k_l.gguf filter=lfs diff=lfs merge=lfs -text
41
+ SmallThinker-4BA0.6B-Instruct-imatrix.gguf filter=lfs diff=lfs merge=lfs -text
42
+ SmallThinker-4BA0.6B-Instruct-q5_k_s.gguf filter=lfs diff=lfs merge=lfs -text
43
+ SmallThinker-4BA0.6B-Instruct-q4_k_l.gguf filter=lfs diff=lfs merge=lfs -text
44
+ SmallThinker-4BA0.6B-Instruct-q3_k_l.gguf filter=lfs diff=lfs merge=lfs -text
45
+ SmallThinker-4BA0.6B-Instruct-f16_q4_k.gguf filter=lfs diff=lfs merge=lfs -text
46
+ SmallThinker-4BA0.6B-Instruct-bf16_q4_k.gguf filter=lfs diff=lfs merge=lfs -text
47
+ SmallThinker-4BA0.6B-Instruct-q5_0_l.gguf filter=lfs diff=lfs merge=lfs -text
48
+ SmallThinker-4BA0.6B-Instruct-iq2_xs.gguf filter=lfs diff=lfs merge=lfs -text
49
+ SmallThinker-4BA0.6B-Instruct-q3_k_m.gguf filter=lfs diff=lfs merge=lfs -text
50
+ SmallThinker-4BA0.6B-Instruct-iq2_s.gguf filter=lfs diff=lfs merge=lfs -text
51
+ SmallThinker-4BA0.6B-Instruct-q8_0.gguf filter=lfs diff=lfs merge=lfs -text
52
+ SmallThinker-4BA0.6B-Instruct-q5_k_m.gguf filter=lfs diff=lfs merge=lfs -text
53
+ SmallThinker-4BA0.6B-Instruct-q4_1.gguf filter=lfs diff=lfs merge=lfs -text
54
+ SmallThinker-4BA0.6B-Instruct-q6_k_m.gguf filter=lfs diff=lfs merge=lfs -text
55
+ SmallThinker-4BA0.6B-Instruct-bf16_q8_0.gguf filter=lfs diff=lfs merge=lfs -text
56
+ SmallThinker-4BA0.6B-Instruct-q5_1_l.gguf filter=lfs diff=lfs merge=lfs -text
57
+ SmallThinker-4BA0.6B-Instruct-q3_k_s.gguf filter=lfs diff=lfs merge=lfs -text
58
+ SmallThinker-4BA0.6B-Instruct-q4_1_l.gguf filter=lfs diff=lfs merge=lfs -text
59
+ SmallThinker-4BA0.6B-Instruct-q5_0.gguf filter=lfs diff=lfs merge=lfs -text
60
+ SmallThinker-4BA0.6B-Instruct-q4_0.gguf filter=lfs diff=lfs merge=lfs -text
61
+ SmallThinker-4BA0.6B-Instruct-bf16_q6_k.gguf filter=lfs diff=lfs merge=lfs -text
62
+ SmallThinker-4BA0.6B-Instruct-f16_q8_0.gguf filter=lfs diff=lfs merge=lfs -text
63
+ SmallThinker-4BA0.6B-Instruct-iq2_xxs.gguf filter=lfs diff=lfs merge=lfs -text
64
+ SmallThinker-4BA0.6B-Instruct-f16_q6_k.gguf filter=lfs diff=lfs merge=lfs -text
65
+ SmallThinker-4BA0.6B-Instruct-q4_k_s.gguf filter=lfs diff=lfs merge=lfs -text
66
+ SmallThinker-4BA0.6B-Instruct-q2_k_s.gguf filter=lfs diff=lfs merge=lfs -text
67
+ SmallThinker-4BA0.6B-Instruct-q6_k_l.gguf filter=lfs diff=lfs merge=lfs -text
68
+ SmallThinker-4BA0.6B-Instruct-q4_k_m.gguf filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,119 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ pipeline_tag: text-generation
6
+ ---
7
+ ## Introduction
8
+
9
+ <p align="center">
10
+ &nbsp&nbsp🤗 <a href="https://huggingface.co/PowerInfer">Hugging Face</a>&nbsp&nbsp | &nbsp&nbsp🤖 <a href="https://modelscope.cn/organization/PowerInfer">ModelScope</a>&nbsp&nbsp | &nbsp&nbsp 📑 <a href="https://github.com/SJTU-IPADS/SmallThinker/blob/main/smallthinker-technical-report.pdf">Technical Report</a> &nbsp&nbsp
11
+ </p>
12
+
13
+ SmallThinker is a family of **on-device native** Mixture-of-Experts (MoE) language models specially designed for local deployment,
14
+ co-developed by the **IPADS and School of AI at Shanghai Jiao Tong University** and **Zenergize AI**.
15
+ Designed from the ground up for resource-constrained environments,
16
+ SmallThinker brings powerful, private, and low-latency AI directly to your personal devices,
17
+ without relying on the cloud.
18
+
19
+ ## Performance
20
+
21
+ Note: The model is trained mainly on English.
22
+
23
+ | Model | MMLU | GPQA-diamond | GSM8K | MATH-500 | IFEVAL | LIVEBENCH | HUMANEVAL | Average |
24
+ | :--- | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
25
+ | **SmallThinker-4BA0.6B-Instruct** | **66.11** | **31.31** | 80.02 | <u>60.60</u> | 69.69 | **42.20** | **82.32** | **61.75** |
26
+ | Qwen3-0.6B | 43.31 | 26.77 | 62.85 | 45.6 | 58.41 | 23.1 | 31.71 | 41.67 |
27
+ | Qwen3-1.7B | <u>64.19</u> | <u>27.78</u> | <u>81.88</u> | **63.6** | 69.50 | <u>35.60</u> | 61.59 | <u>57.73</u> |
28
+ | Gemma3nE2b-it | 63.04 | 20.2 | **82.34** | 58.6 | **73.2** | 27.90 | <u>64.63</u> | 55.70 |
29
+ | Llama-3.2-3B-Instruct | 64.15 | 24.24 | 75.51 | 40 | <u>71.16</u> | 15.30 | 55.49 | 49.41 |
30
+ | Llama-3.2-1B-Instruct | 45.66 | 22.73 | 1.67 | 14.4 | 48.06 | 13.50 | 37.20 | 26.17 |
31
+
32
+ For the MMLU evaluation, we use a 0-shot CoT setting.
33
+
34
+ All models are evaluated in non-thinking mode.
35
+
36
+
37
+ ## Speed
38
+ | Model | Memory(GiB) | i9 14900 | 1+13 8gen4 | rk3588 (16G) | rk3576 | Raspberry PI 5 | RDK X5 | rk3566 |
39
+ |-----------------------------------------------|---------------------|----------|------------|--------------|--------|----------------|--------|--------|
40
+ | SmallThinker 4B+sparse ffn +sparse lm_head | 2.24 | 108.17 | 78.99 | 39.76 | 15.10 | 28.77 | 7.23 | 6.33 |
41
+ | SmallThinker 4B+sparse ffn +sparse lm_head+limited memory | limit 1G| 29.99 | 20.91 | 15.04 | 2.60 | 0.75 | 0.67 | 0.74 |
42
+ | Qwen3 0.6B | 0.6 | 148.56 | 94.91 | 45.93 | 15.29 | 27.44 | 13.32 | 9.76 |
43
+ | Qwen3 1.7B | 1.3 | 62.24 | 41.00 | 20.29 | 6.09 | 11.08 | 6.35 | 4.15 |
44
+ | Qwen3 1.7B+limited memory | limit 1G | 2.66 | 1.09 | 1.00 | 0.47 | - | - | 0.11 |
45
+ | Gemma3n E2B | 1G, theoretically | 36.88 | 27.06 | 12.50 | 3.80 | 6.66 | 3.46 | 2.45 |
46
+
47
+ Note: i9 14900, 1+13 8ge4 use 4 threads, others use the number of threads that can achieve the maximum speed. All models here have been quantized to q4_0.
48
+
49
+ You can deploy SmallThinker with offloading support using [PowerInfer](https://github.com/SJTU-IPADS/PowerInfer/tree/main/smallthinker)
50
+
51
+ ## Model Card
52
+
53
+ <div align="center">
54
+
55
+ | **Architecture** | Mixture-of-Experts (MoE) |
56
+ |:---:|:---:|
57
+ | **Total Parameters** | 4B |
58
+ | **Activated Parameters** | 0.6B |
59
+ | **Number of Layers** | 32 |
60
+ | **Attention Hidden Dimension** | 1536 |
61
+ | **MoE Hidden Dimension** (per Expert) | 768 |
62
+ | **Number of Attention Heads** | 12 |
63
+ | **Number of Experts** | 32 |
64
+ | **Selected Experts per Token** | 4 |
65
+ | **Vocabulary Size** | 151,936 |
66
+ | **Context Length** | 32K |
67
+ | **Attention Mechanism** | GQA |
68
+ | **Activation Function** | ReGLU |
69
+ </div>
70
+
71
+ ## How to Run
72
+
73
+ ### Transformers
74
+
75
+ `transformers==4.53.3` is required, we are actively working to support the latest version.
76
+ The following contains a code snippet illustrating how to use the model generate content based on given inputs.
77
+
78
+ ```python
79
+ from transformers import AutoModelForCausalLM, AutoTokenizer
80
+ import torch
81
+
82
+ path = "PowerInfer/SmallThinker-4BA0.6B-Instruct"
83
+ device = "cuda"
84
+
85
+ tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True)
86
+ model = AutoModelForCausalLM.from_pretrained(path, torch_dtype=torch.bfloat16, device_map=device, trust_remote_code=True)
87
+
88
+ messages = [
89
+ {"role": "user", "content": "Give me a short introduction to large language model."},
90
+ ]
91
+ model_inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(device)
92
+
93
+ model_outputs = model.generate(
94
+ model_inputs,
95
+ do_sample=True,
96
+ max_new_tokens=1024
97
+ )
98
+
99
+ output_token_ids = [
100
+ model_outputs[i][len(model_inputs[i]):] for i in range(len(model_inputs))
101
+ ]
102
+
103
+ responses = tokenizer.batch_decode(output_token_ids, skip_special_tokens=True)[0]
104
+ print(responses)
105
+
106
+ ```
107
+
108
+ ### ModelScope
109
+
110
+ `ModelScope` adopts Python API similar to (though not entirely identical to) `Transformers`. For basic usage, simply modify the first line of the above code as follows:
111
+
112
+ ```python
113
+ from modelscope import AutoModelForCausalLM, AutoTokenizer
114
+ ```
115
+ ## Statement
116
+ - Due to the constraints of its model size and the limitations of its training data, its responses may contain factual inaccuracies, biases, or outdated information.
117
+ - Users bear full responsibility for independently evaluating and verifying the accuracy and appropriateness of all generated content.
118
+ - SmallThinker does not possess genuine comprehension or consciousness and cannot express personal opinions or value judgments.
119
+
SmallThinker-4BA0.6B-Instruct-bf16.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ece3698a81dc19e118113b9d16b57b9d97765a97def6ff08b67f82efb16e63b3
3
+ size 8546216928
SmallThinker-4BA0.6B-Instruct-bf16_q8_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7083e6363278786120c10f18d5bc7b9f4a2e347fd35cc1214a3431f636c197f6
3
+ size 6044994528
SmallThinker-4BA0.6B-Instruct-f16_q8_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3f915d3595fef991440c60c6616590058ffb30f758d5c1ac19b1a6d43d85bdcb
3
+ size 6044994528
SmallThinker-4BA0.6B-Instruct-imatrix.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0df2a9f44764191880546437c355b1c7dc53fec91be9885bbf88fe611ebb3388
3
+ size 16763072
SmallThinker-4BA0.6B-Instruct-iq2_s.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e8565d54ae73a16a55d4c7714189fc4f797de3d5e7d0485ad0406f0ff761c192
3
+ size 1692099872
SmallThinker-4BA0.6B-Instruct-iq2_xs.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:599098043d5b16400c8647db7be07bba0182a9a229617b36b591ceea607cade9
3
+ size 1687946528
SmallThinker-4BA0.6B-Instruct-iq2_xxs.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f0c7bb2a1759a9cccac1dc8e4dca378d857be0d6fe5cc4248341700ebec04446
3
+ size 1531581728
SmallThinker-4BA0.6B-Instruct-q2_k_s.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:29b0adfdefbed2a7b111c0e468215cc727f9ecdad2b470446ec7ac9ae7b10913
3
+ size 1809419552
SmallThinker-4BA0.6B-Instruct-q3_k_m.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0fd83727b771dcd73d685ee112d8eb164be39255ea0a01537407abff1beeeb1d
3
+ size 2367598880
SmallThinker-4BA0.6B-Instruct-q3_k_s.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6bfe96400e38c1e9070904188b2c6aaf96c4461f72401efca4213ba3cd9d4146
3
+ size 2303741216
SmallThinker-4BA0.6B-Instruct-q4_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7cf25605a9dedec7aee4c6248fb76ee2b872663628f15cf295b3792b80f942d3
3
+ size 2412711200
SmallThinker-4BA0.6B-Instruct-q4_1.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:90d0486c1b25a2ba123bd1d73cec25580a4e9b9d71f03e11f5d402aa5e3278d9
3
+ size 2679385376
SmallThinker-4BA0.6B-Instruct-q4_k_m.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f10fe6efa6d7b17f6c6d78d8a692f779b6ceff9068a9a4033a20d033e6235a99
3
+ size 2768236832
SmallThinker-4BA0.6B-Instruct-q4_k_s.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f9f34699249269b4ef5e5d90ac15ffb47dce967df35a0ad7680b453f30554a47
3
+ size 2649043232
SmallThinker-4BA0.6B-Instruct-q5_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4b3d7ffcb57fff2b0f908882181c73e13227f7cb037a7a5c65f84568b4b15676
3
+ size 2946059552
SmallThinker-4BA0.6B-Instruct-q5_1.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:21e6485c028a75ea6c00f379ed8603c0401253e4f3568b7c6ee37fd76be98fdb
3
+ size 3212733728
SmallThinker-4BA0.6B-Instruct-q5_k_m.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:958193a4829d100292a5268ea33a4fb24ea360a4fcd39421e99ceaf5540d7c3b
3
+ size 3171823904
SmallThinker-4BA0.6B-Instruct-q5_k_s.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bb17878e2a4b0988bddb4dfa3a1e84016d169f647e09a594de581e4628281951
3
+ size 3131298080
SmallThinker-4BA0.6B-Instruct-q6_k_m.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:43c0bb0f4f769f7869a402321be28c98454e4780f2b85a738108eedb7f020a2e
3
+ size 3512742176
SmallThinker-4BA0.6B-Instruct-q8_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d3048b15013197faf05e2e96be753bb661192a1343caee0fc02a6910d621c6d5
3
+ size 4546104288