File size: 13,666 Bytes
289dd45 3aef82c d57a890 3aef82c d57a890 3aef82c | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 | ---
base_model: VIDraft/Gemma-3-R1984-27B
base_model_relation: quantized
quantized_by: WeReCooking
pipeline_tag: text-generation
tags:
- exl3
---
<style>
.container-dark { font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, Arial, sans-serif; line-height: 1.6; color: #d4d4d4; }
a { color: #569cd6; text-decoration: none; font-weight: 600; }
a:hover { text-decoration: underline; }
.card-dark { background-color: #252526; border-radius: 12px; padding: 24px; margin-bottom: 20px; box-shadow: 0 4px 12px rgba(0,0,0,0.3); border: 1px solid #3c3c3c; }
.card-dark h1 { font-size: 2.2em; color: #ffffff; text-align: center; margin-bottom: 10px; }
.card-dark .subtitle { text-align: center; font-size: 1.1em; color: #a0a0a0; }
.card-dark h2 { font-size: 1.5em; margin-top: 0; padding-bottom: 10px; border-bottom: 1px solid #3c3c3c; color: #c586c0; }
.styled-table { display: table; border: none; width: 100%; font-size: 0.95em; }
.styled-table thead th { background-color: #333333; color: #c586c0; text-align: left; padding: 12px 15px; }
.styled-table td { padding: 0; border-bottom: 1px solid #3c3c3c; }
.styled-table tbody tr { transition: background-color 0.1s ease; }
.styled-table tbody tr:hover { background-color: #3a3a3a; }
.styled-table tr:last-child td { border-bottom: none; }
.styled-table td a { display: block; padding: 12px 15px; }
.styled-table td a.fake-link { text-decoration:none; color:inherit; }
details { margin-top: 20px; border: 1px solid #3c3c3c; border-radius: 8px; overflow: hidden; }
summary { cursor: pointer; padding: 12px 18px; background-color: #6A5ACD; font-weight: 600; display: flex; align-items: center; gap: 10px; justify-content: space-between; list-style: none; }
summary::-webkit-details-marker { display: none; }
summary:hover { filter: brightness(1.1); }
summary::after { content: ''; display: inline-block; width: 8px; height: 8px; border-bottom: 2px solid white; border-right: 2px solid white; transform: rotate(45deg); transition: transform 0.3s ease; }
details[open] > summary::after { transform: rotate(225deg); }
.details-content { padding: 18px; }
</style>
<div class="container-dark">
<div class="card-dark">
<h1>Gemma-3-R1984-27B EXL3</h1>
<p class="subtitle">
EXL3 quants of <a href="https://huggingface.co/VIDraft/Gemma-3-R1984-27B">VIDraft/Gemma-3-R1984-27B</a>
using <a href="https://github.com/turboderp-org/exllamav3/">exllamav3</a> v0.0.34
</p>
</div>
<div class="card-dark">
<h2>KL Divergence vs VRAM</h2>
<img src="kld_plot.png" alt="KLD plot" style="width:100%; border-radius: 8px;" />
<p class="subtitle">Reference: 6.0bpw. Lower KLD = closer to reference quality. Measured on wikitext-2 (20 rows, 2048 ctx).</p>
</div>
<div class="card-dark">
<h2>Quants</h2>
<table class="styled-table">
<thead>
<tr><th>Branch</th><th>BPW</th><th>Head</th><th>VRAM (GB)</th><th>KLD</th><th>Type</th></tr>
</thead>
<tbody>
<tr>
<td><a href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/2.0bpw_H6">2.0bpw_H6</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/2.0bpw_H6">2.0</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/2.0bpw_H6">6</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/2.0bpw_H6">7.0</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/2.0bpw_H6">0.450</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/2.0bpw_H6">base</a></td>
</tr>
<tr>
<td><a href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/2.50bpw_H6">2.50bpw_H6</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/2.50bpw_H6">2.50</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/2.50bpw_H6">6</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/2.50bpw_H6">8.5</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/2.50bpw_H6">0.389</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/2.50bpw_H6">optimized</a></td>
</tr>
<tr>
<td><a href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.0bpw_H6">3.0bpw_H6</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.0bpw_H6">3.0</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.0bpw_H6">6</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.0bpw_H6">9.9</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.0bpw_H6">0.110</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.0bpw_H6">base</a></td>
</tr>
<tr>
<td><a href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.35bpw_H6">3.35bpw_H6</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.35bpw_H6">3.35</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.35bpw_H6">6</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.35bpw_H6">11.0</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.35bpw_H6">0.088</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.35bpw_H6">optimized</a></td>
</tr>
<tr>
<td><a href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.49bpw_H6">3.49bpw_H6</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.49bpw_H6">3.49</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.49bpw_H6">6</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.49bpw_H6">11.5</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.49bpw_H6">0.075</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.49bpw_H6">optimized</a></td>
</tr>
<tr>
<td><a href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.65bpw_H6">3.65bpw_H6</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.65bpw_H6">3.65</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.65bpw_H6">6</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.65bpw_H6">12.2</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.65bpw_H6">0.065</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.65bpw_H6">optimized</a></td>
</tr>
<tr>
<td><a href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/4.0bpw_H6">4.0bpw_H6</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/4.0bpw_H6">4.0</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/4.0bpw_H6">6</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/4.0bpw_H6">12.9</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/4.0bpw_H6">0.039</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/4.0bpw_H6">base</a></td>
</tr>
<tr>
<td><a href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/5.0bpw_H6">5.0bpw_H6</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/5.0bpw_H6">5.0</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/5.0bpw_H6">6</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/5.0bpw_H6">15.9</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/5.0bpw_H6">0.015</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/5.0bpw_H6">base</a></td>
</tr>
<tr>
<td><a href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/6.0bpw_H6">6.0bpw_H6</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/6.0bpw_H6">6.0</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/6.0bpw_H6">6</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/6.0bpw_H6">19.0</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/6.0bpw_H6">ref</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/6.0bpw_H6">base</a></td>
</tr>
<tr>
<td><a href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/7.0bpw_H6">7.0bpw_H6</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/7.0bpw_H6">7.0</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/7.0bpw_H6">6</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/7.0bpw_H6">~22</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/7.0bpw_H6">-</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/7.0bpw_H6">base</a></td>
</tr>
<tr>
<td><a href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/8.0bpw_H6">8.0bpw_H6</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/8.0bpw_H6">8.0</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/8.0bpw_H6">6</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/8.0bpw_H6">~29</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/8.0bpw_H6">-</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/8.0bpw_H6">base</a></td>
</tr>
</tbody>
</table>
<p class="subtitle">Optimized variants use KLD-guided tensor mixing + attn@5bpw recompile. Bases are direct converts. 7.0/8.0bpw KLD not measured (exceed 32 GB VRAM).</p>
</div>
<div class="card-dark">
<h2>Download</h2>
<details>
<summary>Download commands</summary>
<div class="details-content">
<b>Install CLI:</b>
<pre><code>pip install -U "huggingface_hub[cli]"</code></pre>
<b>Download a specific quant:</b>
<pre><code>huggingface-cli download WeReCooking/Gemma-3-R1984-27B-EXL3 --revision "4.0bpw_H6" --local-dir ./</code></pre>
</div>
</details>
<p class="subtitle">EXL3 quants run with <a href="https://github.com/theroyallab/tabbyapi">TabbyAPI</a> or any exllamav3-compatible backend.</p>
</div>
<div class="card-dark">
<h2>Build Details</h2>
<details>
<summary>How these were made</summary>
<div class="details-content">
<p><b>Base quants:</b> <code>convert.py -b <bpw></code> (2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0)</p>
<p><b>KLD measurement:</b> <code>measure.py -r <ref> -ms 128 -i <2.0bpw> <8.0bpw></code></p>
<p><b>Optimized (2.50, 3.35):</b> <code>optimize.py -i <lo> <hi> -m measurement.json -b <target></code> then <code>recompile.py -or override.yaml</code> with <code>*.self_attn.* -> 5bpw</code></p>
<p><b>Note:</b> Gemma-3 is dense (no MoE), so <code>*.shared_experts.*</code> is not applicable. Only optimized variants are recompiled; bases stay at exact bpw.</p>
<p>Docs: <a href="https://github.com/turboderp-org/exllamav3/blob/master/doc/convert.md">exllamav3 convert.md</a></p>
</div>
</details>
</div>
<div class="card-dark">
<h2>Files</h2>
<p><code>main</code> branch: <code>measurement.json</code> (KLD map) + <code>kld_plot.png</code></p>
<p>Each bpw branch: quantized model shards + config + tokenizer</p>
</div>
</div>
|