Nekochu commited on
Commit
3aef82c
·
verified ·
1 Parent(s): d119f92

README with CSS, KLD plot, branch table

Browse files
Files changed (1) hide show
  1. README.md +167 -158
README.md CHANGED
@@ -1,159 +1,168 @@
1
- # Gemma-3-R1984-27B EXL3
2
-
3
- EXL3 quants of [VIDraft/Gemma-3-R1984-27B](https://huggingface.co/VIDraft/Gemma-3-R1984-27B) (27B).
4
- Each bpw variant is a separate branch. Attention tensors boosted to 8bpw via recompilation.
5
-
6
- Docs: [exllamav3 convert.md](https://github.com/turboderp-org/exllamav3/blob/master/doc/convert.md)
7
-
8
- ## Branches
9
-
10
- | Branch | Target | Actual bpw | Method |
11
- |---|---|---|---|
12
- | `2.96bpw_H6` | 2.0 | 2.96 | base + recompile |
13
- | `2.98bpw_H6` | 2.5 | 2.98 | optimized (2.0+3.0) + recompile |
14
- | `3.80bpw_H6` | 3.0 | 3.80 | base + recompile |
15
- | `3.83bpw_H6` | 3.5 | 3.83 | optimized (3.0+5.0) + recompile |
16
- | `3.97bpw_H6` | 4.0 | 3.97 | optimized (3.0+5.0) + recompile |
17
- | `4.13bpw_H6` | 4.5 | 4.13 | optimized (3.0+5.0) + recompile |
18
- | `5.48bpw_H6` | 5.0 | 5.48 | base + recompile |
19
- | `6.32bpw_H6` | 6.0 | 6.32 | base + recompile |
20
-
21
- H6 = head_bits 6. All variants recompiled with `*.self_attn.*` boosted to 8bpw.
22
- Gemma-3 is dense (no MoE), so `*.shared_experts.*` is not applicable.
23
-
24
- ## Build recipe
25
-
26
- ### 1. Base quants
27
- ```bash
28
- python convert.py -i <hf-model> -o <out> -w <work> -b <bpw>
29
- ```
30
- 5 base quants: 2.0, 3.0, 5.0, 6.0, 8.0 bpw.
31
-
32
- ### 2. KLD measurement
33
- ```bash
34
- python util/measure.py -r <hf-model> -ms 128 -i <2.0bpw> <8.0bpw> -o measurement.json
35
- ```
36
- Reusable across all optimized targets. Included in main branch.
37
-
38
- ### 3. Optimization (mixed-precision)
39
- ```bash
40
- python util/optimize.py -i <lo-bpw> <hi-bpw> -m measurement.json -o <out> -b <target>
41
- ```
42
- KLD-guided tensor replacement: tensors that matter most get higher-bpw versions.
43
-
44
- ### 4. Recompilation (attn override)
45
- ```yaml
46
- sources:
47
- - id: 8
48
- model_dir: /path/to/8.0bpw
49
- overrides:
50
- - key: "*.self_attn.*"
51
- source: 8
52
- ```
53
- ```bash
54
- python util/recompile.py -i <input> -o <final> -or override.yaml
55
- ```
56
- Actual bpw is determined after recompile (attn@8bpw shifts average up).
57
-
58
- ## Files
59
-
60
- - `main` branch: `measurement.json` (KLD map)
61
- - Each bpw branch: quantized model shards + config + tokenizer
62
-
63
- ## Credits
64
-
65
- - Base model: [VIDraft/Gemma-3-R1984-27B](https://huggingface.co/VIDraft/Gemma-3-R1984-27B)
66
- - Quantization: [exllamav3](https://github.com/turboderp-org/exllamav3) v0.0.34
67
-
68
  ---
69
-
70
- # EXL3 Optimization Guide
71
-
72
- ## Targets
73
- `2.5bpw_H6 3.0bpw_H6 3.5bpw_H6 4.0bpw_H6 4.5bpw_H6 5.0bpw_H6 6.0bpw_H6`
74
-
75
- | Target | Action |
76
- |---|---|
77
- | 2.5bpw_H6 | optimized |
78
- | 3.0bpw_H6 | direct convert |
79
- | 3.5bpw_H6 | optimized |
80
- | 4.0bpw_H6 | optimized |
81
- | 4.5bpw_H6 | optimized |
82
- | 5.0bpw_H6 | direct convert |
83
- | 6.0bpw_H6 | direct convert |
84
-
85
- ## Overview
86
- Dynamic EXL3 quants mix tensor precision, similar to mixed-precision GGUFs. There are two frameworks:
87
-
88
- - **Optimization**
89
- - **Recompilation**
90
-
91
- Usually, optimization and recompilation are used together: create a mixed quant through optimization, then run recompilation on top of it.
92
-
93
- ## Optimization
94
- 1. Start with two quants at different bpw, for example 2bpw and 3bpw.
95
- 2. `measure.py` measures KLD differences by replacing layer groups in the lower-bpw quant with groups from the higher-bpw quant; standard EXL3 calibration data is used.
96
- 3. The resulting `measurement.json` can be reused. You only have to create it once, no matter how many mixed quants you make.
97
- 4. `optimize.py` uses that `measurement.json` to create a third quant from two source quants, replacing the tensors that matter most with higher-bpw tensors.
98
-
99
- Measurement takes about 20min to an hour for big models. Optimization takes about 30s-1m.
100
-
101
- ```bash
102
- python util/measure.py -i /path/to/model-2bpw /path/to/model-3bpw -r /path/to/hf-model -o measurement.json -cr 10 -cc 1024 -d 0
103
- python util/optimize.py -i /path/to/model-2bpw /path/to/model-3bpw -m measurement.json -o /path/to/model-optimized -b 2.5 -ss 8192
104
- ```
105
-
106
- Alternative measure form with `-ms`:
107
- ```bash
108
- python util/measure.py -r /workspace/models/original-model -ms 128 -i /workspace/models/quant-2.5bpw /workspace/models/quant-3.5bpw -o /workspace/measurement.json
109
- ```
110
-
111
- Optimize example:
112
- ```bash
113
- python util/optimize.py -i /workspace/models/quant-2.5bpw /workspace/models/quant-3.0bpw -o /workspace/models/new-quant-2.75bpw -m /workspace/measurement.json -b 2.75
114
- ```
115
-
116
- ## Recompilation
117
- `override.yaml` replaces tensors in one quant with tensors from another quant. It is manual optimization. The source notes that attention and shared expert tensors were replaced with 8bpw tensors for all optimized quants. Recompilation takes about 30s-1m, and the actual new bpw is known after recompilation is done.
118
-
119
- ### Multi-source example
120
- ```yaml
121
- sources:
122
- - id: 6
123
- model_dir: /path/to/6bpw
124
- - id: 8
125
- model_dir: /path/to/8bpw
126
- overrides:
127
- - key: "*.self_attn.*"
128
- source: 6
129
- - key: "*.shared_experts.*"
130
- source: 8
131
- ```
132
-
133
- ### GLM-Air example
134
- The GLM-Air example replaces attention and shared experts with 8bpw tensors, and layers 2, 43, 1, 29 with 5bpw tensors because `measurement.json` showed those layers had the worst KLD.
135
-
136
- ```yaml
137
- sources:
138
- - id: 8
139
- model_dir: /workspace/models/quants-8.0bpw
140
- - id: 5
141
- model_dir: /workspace/models/quants-5.0bpw
142
- overrides:
143
- - key: "*.self_attn.*"
144
- source: 8
145
- - key: "*.shared_experts.*"
146
- source: 8
147
- - key: "model.layers.2.*"
148
- source: 5
149
- - key: "model.layers.43.*"
150
- source: 5
151
- - key: "model.layers.1.*"
152
- source: 5
153
- - key: "model.layers.29.*"
154
- source: 5
155
- ```
156
-
157
- ```bash
158
- python util/recompile.py -i /workspace/models/quant-2.75bpw -o /workspace/models/quant-recompiled -or override.yaml
159
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ base_model: VIDraft/Gemma-3-R1984-27B
3
+ base_model_relation: quantized
4
+ quantized_by: WeReCooking
5
+ pipeline_tag: text-generation
6
+ tags:
7
+ - exl3
8
+ ---
9
+ <style>
10
+ .container-dark { font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, Arial, sans-serif; line-height: 1.6; color: #d4d4d4; }
11
+ a { color: #569cd6; text-decoration: none; font-weight: 600; }
12
+ a:hover { text-decoration: underline; }
13
+ .card-dark { background-color: #252526; border-radius: 12px; padding: 24px; margin-bottom: 20px; box-shadow: 0 4px 12px rgba(0,0,0,0.3); border: 1px solid #3c3c3c; }
14
+ .card-dark h1 { font-size: 2.2em; color: #ffffff; text-align: center; margin-bottom: 10px; }
15
+ .card-dark .subtitle { text-align: center; font-size: 1.1em; color: #a0a0a0; }
16
+ .card-dark h2 { font-size: 1.5em; margin-top: 0; padding-bottom: 10px; border-bottom: 1px solid #3c3c3c; color: #c586c0; }
17
+ .styled-table { display: table; border: none; width: 100%; font-size: 0.95em; }
18
+ .styled-table thead th { background-color: #333333; color: #c586c0; text-align: left; padding: 12px 15px; }
19
+ .styled-table td { padding: 0; border-bottom: 1px solid #3c3c3c; }
20
+ .styled-table tbody tr { transition: background-color 0.1s ease; }
21
+ .styled-table tbody tr:hover { background-color: #3a3a3a; }
22
+ .styled-table tr:last-child td { border-bottom: none; }
23
+ .styled-table td a { display: block; padding: 12px 15px; }
24
+ .styled-table td a.fake-link { text-decoration:none; color:inherit; }
25
+ details { margin-top: 20px; border: 1px solid #3c3c3c; border-radius: 8px; overflow: hidden; }
26
+ summary { cursor: pointer; padding: 12px 18px; background-color: #6A5ACD; font-weight: 600; display: flex; align-items: center; gap: 10px; justify-content: space-between; list-style: none; }
27
+ summary::-webkit-details-marker { display: none; }
28
+ summary:hover { filter: brightness(1.1); }
29
+ summary::after { content: ''; display: inline-block; width: 8px; height: 8px; border-bottom: 2px solid white; border-right: 2px solid white; transform: rotate(45deg); transition: transform 0.3s ease; }
30
+ details[open] > summary::after { transform: rotate(225deg); }
31
+ .details-content { padding: 18px; }
32
+ </style>
33
+
34
+ <div class="container-dark">
35
+
36
+ <div class="card-dark">
37
+ <h1>Gemma-3-R1984-27B EXL3</h1>
38
+ <p class="subtitle">
39
+ EXL3 quants of <a href="https://huggingface.co/VIDraft/Gemma-3-R1984-27B">VIDraft/Gemma-3-R1984-27B</a>
40
+ using <a href="https://github.com/turboderp-org/exllamav3/">exllamav3</a> v0.0.34
41
+ </p>
42
+ </div>
43
+
44
+ <div class="card-dark">
45
+ <h2>KL Divergence vs VRAM</h2>
46
+ <img src="kld_plot.png" alt="KLD plot" style="width:100%; border-radius: 8px;" />
47
+ <p class="subtitle">Reference: 6.0bpw. Lower KLD = closer to reference quality. Measured on wikitext-2 (20 rows, 2048 ctx).</p>
48
+ </div>
49
+
50
+ <div class="card-dark">
51
+ <h2>Quants</h2>
52
+ <table class="styled-table">
53
+ <thead>
54
+ <tr><th>Branch</th><th>BPW</th><th>Head</th><th>VRAM (GB)</th><th>KLD</th><th>Type</th></tr>
55
+ </thead>
56
+ <tbody>
57
+ <tr>
58
+ <td><a href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/2.0bpw_H6">2.0bpw_H6</a></td>
59
+ <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/2.0bpw_H6">2.0</a></td>
60
+ <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/2.0bpw_H6">6</a></td>
61
+ <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/2.0bpw_H6">7.0</a></td>
62
+ <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/2.0bpw_H6">0.450</a></td>
63
+ <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/2.0bpw_H6">base</a></td>
64
+ </tr>
65
+ <tr>
66
+ <td><a href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/2.50bpw_H6">2.50bpw_H6</a></td>
67
+ <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/2.50bpw_H6">2.50</a></td>
68
+ <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/2.50bpw_H6">6</a></td>
69
+ <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/2.50bpw_H6">8.5</a></td>
70
+ <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/2.50bpw_H6">0.389</a></td>
71
+ <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/2.50bpw_H6">optimized</a></td>
72
+ </tr>
73
+ <tr>
74
+ <td><a href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.0bpw_H6">3.0bpw_H6</a></td>
75
+ <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.0bpw_H6">3.0</a></td>
76
+ <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.0bpw_H6">6</a></td>
77
+ <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.0bpw_H6">9.9</a></td>
78
+ <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.0bpw_H6">0.110</a></td>
79
+ <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.0bpw_H6">base</a></td>
80
+ </tr>
81
+ <tr>
82
+ <td><a href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.35bpw_H6">3.35bpw_H6</a></td>
83
+ <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.35bpw_H6">3.35</a></td>
84
+ <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.35bpw_H6">6</a></td>
85
+ <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.35bpw_H6">11.0</a></td>
86
+ <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.35bpw_H6">0.088</a></td>
87
+ <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.35bpw_H6">optimized</a></td>
88
+ </tr>
89
+ <tr>
90
+ <td><a href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/4.0bpw_H6">4.0bpw_H6</a></td>
91
+ <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/4.0bpw_H6">4.0</a></td>
92
+ <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/4.0bpw_H6">6</a></td>
93
+ <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/4.0bpw_H6">12.9</a></td>
94
+ <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/4.0bpw_H6">0.039</a></td>
95
+ <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/4.0bpw_H6">base</a></td>
96
+ </tr>
97
+ <tr>
98
+ <td><a href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/5.0bpw_H6">5.0bpw_H6</a></td>
99
+ <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/5.0bpw_H6">5.0</a></td>
100
+ <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/5.0bpw_H6">6</a></td>
101
+ <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/5.0bpw_H6">15.9</a></td>
102
+ <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/5.0bpw_H6">0.015</a></td>
103
+ <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/5.0bpw_H6">base</a></td>
104
+ </tr>
105
+ <tr>
106
+ <td><a href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/6.0bpw_H6">6.0bpw_H6</a></td>
107
+ <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/6.0bpw_H6">6.0</a></td>
108
+ <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/6.0bpw_H6">6</a></td>
109
+ <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/6.0bpw_H6">19.0</a></td>
110
+ <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/6.0bpw_H6">ref</a></td>
111
+ <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/6.0bpw_H6">base</a></td>
112
+ </tr>
113
+ <tr>
114
+ <td><a href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/7.0bpw_H6">7.0bpw_H6</a></td>
115
+ <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/7.0bpw_H6">7.0</a></td>
116
+ <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/7.0bpw_H6">6</a></td>
117
+ <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/7.0bpw_H6">~22</a></td>
118
+ <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/7.0bpw_H6">-</a></td>
119
+ <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/7.0bpw_H6">base</a></td>
120
+ </tr>
121
+ <tr>
122
+ <td><a href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/8.0bpw_H6">8.0bpw_H6</a></td>
123
+ <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/8.0bpw_H6">8.0</a></td>
124
+ <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/8.0bpw_H6">6</a></td>
125
+ <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/8.0bpw_H6">~29</a></td>
126
+ <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/8.0bpw_H6">-</a></td>
127
+ <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/8.0bpw_H6">base</a></td>
128
+ </tr>
129
+ </tbody>
130
+ </table>
131
+ <p class="subtitle">Optimized variants use KLD-guided tensor mixing + attn@5bpw recompile. Bases are direct converts. 7.0/8.0bpw KLD not measured (exceed 32 GB VRAM).</p>
132
+ </div>
133
+
134
+ <div class="card-dark">
135
+ <h2>Download</h2>
136
+ <details>
137
+ <summary>Download commands</summary>
138
+ <div class="details-content">
139
+ <b>Install CLI:</b>
140
+ <pre><code>pip install -U "huggingface_hub[cli]"</code></pre>
141
+ <b>Download a specific quant:</b>
142
+ <pre><code>huggingface-cli download WeReCooking/Gemma-3-R1984-27B-EXL3 --revision "4.0bpw_H6" --local-dir ./</code></pre>
143
+ </div>
144
+ </details>
145
+ <p class="subtitle">EXL3 quants run with <a href="https://github.com/theroyallab/tabbyapi">TabbyAPI</a> or any exllamav3-compatible backend.</p>
146
+ </div>
147
+
148
+ <div class="card-dark">
149
+ <h2>Build Details</h2>
150
+ <details>
151
+ <summary>How these were made</summary>
152
+ <div class="details-content">
153
+ <p><b>Base quants:</b> <code>convert.py -b &lt;bpw&gt;</code> (2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0)</p>
154
+ <p><b>KLD measurement:</b> <code>measure.py -r &lt;ref&gt; -ms 128 -i &lt;2.0bpw&gt; &lt;8.0bpw&gt;</code></p>
155
+ <p><b>Optimized (2.50, 3.35):</b> <code>optimize.py -i &lt;lo&gt; &lt;hi&gt; -m measurement.json -b &lt;target&gt;</code> then <code>recompile.py -or override.yaml</code> with <code>*.self_attn.* -> 5bpw</code></p>
156
+ <p><b>Note:</b> Gemma-3 is dense (no MoE), so <code>*.shared_experts.*</code> is not applicable. Only optimized variants are recompiled; bases stay at exact bpw.</p>
157
+ <p>Docs: <a href="https://github.com/turboderp-org/exllamav3/blob/master/doc/convert.md">exllamav3 convert.md</a></p>
158
+ </div>
159
+ </details>
160
+ </div>
161
+
162
+ <div class="card-dark">
163
+ <h2>Files</h2>
164
+ <p><code>main</code> branch: <code>measurement.json</code> (KLD map) + <code>kld_plot.png</code></p>
165
+ <p>Each bpw branch: quantized model shards + config + tokenizer</p>
166
+ </div>
167
+
168
+ </div>