File size: 13,666 Bytes
289dd45
3aef82c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d57a890
 
 
 
 
 
 
 
3aef82c
 
d57a890
 
 
 
 
 
 
 
3aef82c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
---
base_model: VIDraft/Gemma-3-R1984-27B
base_model_relation: quantized
quantized_by: WeReCooking
pipeline_tag: text-generation
tags:
- exl3
---
<style>
  .container-dark { font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, Arial, sans-serif; line-height: 1.6; color: #d4d4d4; }
  a { color: #569cd6; text-decoration: none; font-weight: 600; }
  a:hover { text-decoration: underline; }
  .card-dark { background-color: #252526; border-radius: 12px; padding: 24px; margin-bottom: 20px; box-shadow: 0 4px 12px rgba(0,0,0,0.3); border: 1px solid #3c3c3c; }
  .card-dark h1 { font-size: 2.2em; color: #ffffff; text-align: center; margin-bottom: 10px; }
  .card-dark .subtitle { text-align: center; font-size: 1.1em; color: #a0a0a0; }
  .card-dark h2 { font-size: 1.5em; margin-top: 0; padding-bottom: 10px; border-bottom: 1px solid #3c3c3c; color: #c586c0; }
  .styled-table { display: table; border: none; width: 100%; font-size: 0.95em; }
  .styled-table thead th { background-color: #333333; color: #c586c0; text-align: left; padding: 12px 15px; }
  .styled-table td { padding: 0; border-bottom: 1px solid #3c3c3c; }
  .styled-table tbody tr { transition: background-color 0.1s ease; }
  .styled-table tbody tr:hover { background-color: #3a3a3a; }
  .styled-table tr:last-child td { border-bottom: none; }
  .styled-table td a { display: block; padding: 12px 15px; }
  .styled-table td a.fake-link { text-decoration:none; color:inherit; }
  details { margin-top: 20px; border: 1px solid #3c3c3c; border-radius: 8px; overflow: hidden; }
  summary { cursor: pointer; padding: 12px 18px; background-color: #6A5ACD; font-weight: 600; display: flex; align-items: center; gap: 10px; justify-content: space-between; list-style: none; }
  summary::-webkit-details-marker { display: none; }
  summary:hover { filter: brightness(1.1); }
  summary::after { content: ''; display: inline-block; width: 8px; height: 8px; border-bottom: 2px solid white; border-right: 2px solid white; transform: rotate(45deg); transition: transform 0.3s ease; }
  details[open] > summary::after { transform: rotate(225deg); }
  .details-content { padding: 18px; }
</style>

<div class="container-dark">

  <div class="card-dark">
    <h1>Gemma-3-R1984-27B EXL3</h1>
    <p class="subtitle">
      EXL3 quants of <a href="https://huggingface.co/VIDraft/Gemma-3-R1984-27B">VIDraft/Gemma-3-R1984-27B</a>
      using <a href="https://github.com/turboderp-org/exllamav3/">exllamav3</a> v0.0.34
    </p>
  </div>

  <div class="card-dark">
    <h2>KL Divergence vs VRAM</h2>
    <img src="kld_plot.png" alt="KLD plot" style="width:100%; border-radius: 8px;" />
    <p class="subtitle">Reference: 6.0bpw. Lower KLD = closer to reference quality. Measured on wikitext-2 (20 rows, 2048 ctx).</p>
  </div>

  <div class="card-dark">
    <h2>Quants</h2>
    <table class="styled-table">
      <thead>
        <tr><th>Branch</th><th>BPW</th><th>Head</th><th>VRAM (GB)</th><th>KLD</th><th>Type</th></tr>
      </thead>
      <tbody>
        <tr>
          <td><a href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/2.0bpw_H6">2.0bpw_H6</a></td>
          <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/2.0bpw_H6">2.0</a></td>
          <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/2.0bpw_H6">6</a></td>
          <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/2.0bpw_H6">7.0</a></td>
          <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/2.0bpw_H6">0.450</a></td>
          <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/2.0bpw_H6">base</a></td>
        </tr>
        <tr>
          <td><a href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/2.50bpw_H6">2.50bpw_H6</a></td>
          <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/2.50bpw_H6">2.50</a></td>
          <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/2.50bpw_H6">6</a></td>
          <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/2.50bpw_H6">8.5</a></td>
          <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/2.50bpw_H6">0.389</a></td>
          <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/2.50bpw_H6">optimized</a></td>
        </tr>
        <tr>
          <td><a href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.0bpw_H6">3.0bpw_H6</a></td>
          <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.0bpw_H6">3.0</a></td>
          <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.0bpw_H6">6</a></td>
          <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.0bpw_H6">9.9</a></td>
          <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.0bpw_H6">0.110</a></td>
          <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.0bpw_H6">base</a></td>
        </tr>
        <tr>
          <td><a href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.35bpw_H6">3.35bpw_H6</a></td>
          <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.35bpw_H6">3.35</a></td>
          <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.35bpw_H6">6</a></td>
          <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.35bpw_H6">11.0</a></td>
          <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.35bpw_H6">0.088</a></td>
          <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.35bpw_H6">optimized</a></td>
        </tr>
                <tr>
          <td><a href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.49bpw_H6">3.49bpw_H6</a></td>
          <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.49bpw_H6">3.49</a></td>
          <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.49bpw_H6">6</a></td>
          <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.49bpw_H6">11.5</a></td>
          <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.49bpw_H6">0.075</a></td>
          <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.49bpw_H6">optimized</a></td>
        </tr>
        <tr>
          <td><a href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.65bpw_H6">3.65bpw_H6</a></td>
          <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.65bpw_H6">3.65</a></td>
          <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.65bpw_H6">6</a></td>
          <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.65bpw_H6">12.2</a></td>
          <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.65bpw_H6">0.065</a></td>
          <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.65bpw_H6">optimized</a></td>
        </tr>
<tr>
          <td><a href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/4.0bpw_H6">4.0bpw_H6</a></td>
          <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/4.0bpw_H6">4.0</a></td>
          <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/4.0bpw_H6">6</a></td>
          <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/4.0bpw_H6">12.9</a></td>
          <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/4.0bpw_H6">0.039</a></td>
          <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/4.0bpw_H6">base</a></td>
        </tr>
        <tr>
          <td><a href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/5.0bpw_H6">5.0bpw_H6</a></td>
          <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/5.0bpw_H6">5.0</a></td>
          <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/5.0bpw_H6">6</a></td>
          <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/5.0bpw_H6">15.9</a></td>
          <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/5.0bpw_H6">0.015</a></td>
          <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/5.0bpw_H6">base</a></td>
        </tr>
        <tr>
          <td><a href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/6.0bpw_H6">6.0bpw_H6</a></td>
          <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/6.0bpw_H6">6.0</a></td>
          <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/6.0bpw_H6">6</a></td>
          <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/6.0bpw_H6">19.0</a></td>
          <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/6.0bpw_H6">ref</a></td>
          <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/6.0bpw_H6">base</a></td>
        </tr>
        <tr>
          <td><a href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/7.0bpw_H6">7.0bpw_H6</a></td>
          <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/7.0bpw_H6">7.0</a></td>
          <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/7.0bpw_H6">6</a></td>
          <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/7.0bpw_H6">~22</a></td>
          <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/7.0bpw_H6">-</a></td>
          <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/7.0bpw_H6">base</a></td>
        </tr>
        <tr>
          <td><a href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/8.0bpw_H6">8.0bpw_H6</a></td>
          <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/8.0bpw_H6">8.0</a></td>
          <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/8.0bpw_H6">6</a></td>
          <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/8.0bpw_H6">~29</a></td>
          <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/8.0bpw_H6">-</a></td>
          <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/8.0bpw_H6">base</a></td>
        </tr>
      </tbody>
    </table>
    <p class="subtitle">Optimized variants use KLD-guided tensor mixing + attn@5bpw recompile. Bases are direct converts. 7.0/8.0bpw KLD not measured (exceed 32 GB VRAM).</p>
  </div>

  <div class="card-dark">
    <h2>Download</h2>
    <details>
      <summary>Download commands</summary>
      <div class="details-content">
        <b>Install CLI:</b>
        <pre><code>pip install -U "huggingface_hub[cli]"</code></pre>
        <b>Download a specific quant:</b>
        <pre><code>huggingface-cli download WeReCooking/Gemma-3-R1984-27B-EXL3 --revision "4.0bpw_H6" --local-dir ./</code></pre>
      </div>
    </details>
    <p class="subtitle">EXL3 quants run with <a href="https://github.com/theroyallab/tabbyapi">TabbyAPI</a> or any exllamav3-compatible backend.</p>
  </div>

  <div class="card-dark">
    <h2>Build Details</h2>
    <details>
      <summary>How these were made</summary>
      <div class="details-content">
        <p><b>Base quants:</b> <code>convert.py -b &lt;bpw&gt;</code> (2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0)</p>
        <p><b>KLD measurement:</b> <code>measure.py -r &lt;ref&gt; -ms 128 -i &lt;2.0bpw&gt; &lt;8.0bpw&gt;</code></p>
        <p><b>Optimized (2.50, 3.35):</b> <code>optimize.py -i &lt;lo&gt; &lt;hi&gt; -m measurement.json -b &lt;target&gt;</code> then <code>recompile.py -or override.yaml</code> with <code>*.self_attn.* -> 5bpw</code></p>
        <p><b>Note:</b> Gemma-3 is dense (no MoE), so <code>*.shared_experts.*</code> is not applicable. Only optimized variants are recompiled; bases stay at exact bpw.</p>
        <p>Docs: <a href="https://github.com/turboderp-org/exllamav3/blob/master/doc/convert.md">exllamav3 convert.md</a></p>
      </div>
    </details>
  </div>

  <div class="card-dark">
    <h2>Files</h2>
    <p><code>main</code> branch: <code>measurement.json</code> (KLD map) + <code>kld_plot.png</code></p>
    <p>Each bpw branch: quantized model shards + config + tokenizer</p>
  </div>

</div>