Nekochu's picture
Add 3.49 + 3.65 optimized branches to table
d57a890 verified
---
base_model: VIDraft/Gemma-3-R1984-27B
base_model_relation: quantized
quantized_by: WeReCooking
pipeline_tag: text-generation
tags:
- exl3
---
<style>
.container-dark { font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, Arial, sans-serif; line-height: 1.6; color: #d4d4d4; }
a { color: #569cd6; text-decoration: none; font-weight: 600; }
a:hover { text-decoration: underline; }
.card-dark { background-color: #252526; border-radius: 12px; padding: 24px; margin-bottom: 20px; box-shadow: 0 4px 12px rgba(0,0,0,0.3); border: 1px solid #3c3c3c; }
.card-dark h1 { font-size: 2.2em; color: #ffffff; text-align: center; margin-bottom: 10px; }
.card-dark .subtitle { text-align: center; font-size: 1.1em; color: #a0a0a0; }
.card-dark h2 { font-size: 1.5em; margin-top: 0; padding-bottom: 10px; border-bottom: 1px solid #3c3c3c; color: #c586c0; }
.styled-table { display: table; border: none; width: 100%; font-size: 0.95em; }
.styled-table thead th { background-color: #333333; color: #c586c0; text-align: left; padding: 12px 15px; }
.styled-table td { padding: 0; border-bottom: 1px solid #3c3c3c; }
.styled-table tbody tr { transition: background-color 0.1s ease; }
.styled-table tbody tr:hover { background-color: #3a3a3a; }
.styled-table tr:last-child td { border-bottom: none; }
.styled-table td a { display: block; padding: 12px 15px; }
.styled-table td a.fake-link { text-decoration:none; color:inherit; }
details { margin-top: 20px; border: 1px solid #3c3c3c; border-radius: 8px; overflow: hidden; }
summary { cursor: pointer; padding: 12px 18px; background-color: #6A5ACD; font-weight: 600; display: flex; align-items: center; gap: 10px; justify-content: space-between; list-style: none; }
summary::-webkit-details-marker { display: none; }
summary:hover { filter: brightness(1.1); }
summary::after { content: ''; display: inline-block; width: 8px; height: 8px; border-bottom: 2px solid white; border-right: 2px solid white; transform: rotate(45deg); transition: transform 0.3s ease; }
details[open] > summary::after { transform: rotate(225deg); }
.details-content { padding: 18px; }
</style>
<div class="container-dark">
<div class="card-dark">
<h1>Gemma-3-R1984-27B EXL3</h1>
<p class="subtitle">
EXL3 quants of <a href="https://huggingface.co/VIDraft/Gemma-3-R1984-27B">VIDraft/Gemma-3-R1984-27B</a>
using <a href="https://github.com/turboderp-org/exllamav3/">exllamav3</a> v0.0.34
</p>
</div>
<div class="card-dark">
<h2>KL Divergence vs VRAM</h2>
<img src="kld_plot.png" alt="KLD plot" style="width:100%; border-radius: 8px;" />
<p class="subtitle">Reference: 6.0bpw. Lower KLD = closer to reference quality. Measured on wikitext-2 (20 rows, 2048 ctx).</p>
</div>
<div class="card-dark">
<h2>Quants</h2>
<table class="styled-table">
<thead>
<tr><th>Branch</th><th>BPW</th><th>Head</th><th>VRAM (GB)</th><th>KLD</th><th>Type</th></tr>
</thead>
<tbody>
<tr>
<td><a href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/2.0bpw_H6">2.0bpw_H6</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/2.0bpw_H6">2.0</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/2.0bpw_H6">6</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/2.0bpw_H6">7.0</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/2.0bpw_H6">0.450</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/2.0bpw_H6">base</a></td>
</tr>
<tr>
<td><a href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/2.50bpw_H6">2.50bpw_H6</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/2.50bpw_H6">2.50</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/2.50bpw_H6">6</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/2.50bpw_H6">8.5</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/2.50bpw_H6">0.389</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/2.50bpw_H6">optimized</a></td>
</tr>
<tr>
<td><a href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.0bpw_H6">3.0bpw_H6</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.0bpw_H6">3.0</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.0bpw_H6">6</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.0bpw_H6">9.9</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.0bpw_H6">0.110</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.0bpw_H6">base</a></td>
</tr>
<tr>
<td><a href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.35bpw_H6">3.35bpw_H6</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.35bpw_H6">3.35</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.35bpw_H6">6</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.35bpw_H6">11.0</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.35bpw_H6">0.088</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.35bpw_H6">optimized</a></td>
</tr>
<tr>
<td><a href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.49bpw_H6">3.49bpw_H6</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.49bpw_H6">3.49</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.49bpw_H6">6</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.49bpw_H6">11.5</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.49bpw_H6">0.075</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.49bpw_H6">optimized</a></td>
</tr>
<tr>
<td><a href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.65bpw_H6">3.65bpw_H6</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.65bpw_H6">3.65</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.65bpw_H6">6</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.65bpw_H6">12.2</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.65bpw_H6">0.065</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.65bpw_H6">optimized</a></td>
</tr>
<tr>
<td><a href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/4.0bpw_H6">4.0bpw_H6</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/4.0bpw_H6">4.0</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/4.0bpw_H6">6</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/4.0bpw_H6">12.9</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/4.0bpw_H6">0.039</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/4.0bpw_H6">base</a></td>
</tr>
<tr>
<td><a href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/5.0bpw_H6">5.0bpw_H6</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/5.0bpw_H6">5.0</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/5.0bpw_H6">6</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/5.0bpw_H6">15.9</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/5.0bpw_H6">0.015</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/5.0bpw_H6">base</a></td>
</tr>
<tr>
<td><a href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/6.0bpw_H6">6.0bpw_H6</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/6.0bpw_H6">6.0</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/6.0bpw_H6">6</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/6.0bpw_H6">19.0</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/6.0bpw_H6">ref</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/6.0bpw_H6">base</a></td>
</tr>
<tr>
<td><a href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/7.0bpw_H6">7.0bpw_H6</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/7.0bpw_H6">7.0</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/7.0bpw_H6">6</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/7.0bpw_H6">~22</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/7.0bpw_H6">-</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/7.0bpw_H6">base</a></td>
</tr>
<tr>
<td><a href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/8.0bpw_H6">8.0bpw_H6</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/8.0bpw_H6">8.0</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/8.0bpw_H6">6</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/8.0bpw_H6">~29</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/8.0bpw_H6">-</a></td>
<td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/8.0bpw_H6">base</a></td>
</tr>
</tbody>
</table>
<p class="subtitle">Optimized variants use KLD-guided tensor mixing + attn@5bpw recompile. Bases are direct converts. 7.0/8.0bpw KLD not measured (exceed 32 GB VRAM).</p>
</div>
<div class="card-dark">
<h2>Download</h2>
<details>
<summary>Download commands</summary>
<div class="details-content">
<b>Install CLI:</b>
<pre><code>pip install -U "huggingface_hub[cli]"</code></pre>
<b>Download a specific quant:</b>
<pre><code>huggingface-cli download WeReCooking/Gemma-3-R1984-27B-EXL3 --revision "4.0bpw_H6" --local-dir ./</code></pre>
</div>
</details>
<p class="subtitle">EXL3 quants run with <a href="https://github.com/theroyallab/tabbyapi">TabbyAPI</a> or any exllamav3-compatible backend.</p>
</div>
<div class="card-dark">
<h2>Build Details</h2>
<details>
<summary>How these were made</summary>
<div class="details-content">
<p><b>Base quants:</b> <code>convert.py -b &lt;bpw&gt;</code> (2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0)</p>
<p><b>KLD measurement:</b> <code>measure.py -r &lt;ref&gt; -ms 128 -i &lt;2.0bpw&gt; &lt;8.0bpw&gt;</code></p>
<p><b>Optimized (2.50, 3.35):</b> <code>optimize.py -i &lt;lo&gt; &lt;hi&gt; -m measurement.json -b &lt;target&gt;</code> then <code>recompile.py -or override.yaml</code> with <code>*.self_attn.* -> 5bpw</code></p>
<p><b>Note:</b> Gemma-3 is dense (no MoE), so <code>*.shared_experts.*</code> is not applicable. Only optimized variants are recompiled; bases stay at exact bpw.</p>
<p>Docs: <a href="https://github.com/turboderp-org/exllamav3/blob/master/doc/convert.md">exllamav3 convert.md</a></p>
</div>
</details>
</div>
<div class="card-dark">
<h2>Files</h2>
<p><code>main</code> branch: <code>measurement.json</code> (KLD map) + <code>kld_plot.png</code></p>
<p>Each bpw branch: quantized model shards + config + tokenizer</p>
</div>
</div>