| --- |
| base_model: VIDraft/Gemma-3-R1984-27B |
| base_model_relation: quantized |
| quantized_by: WeReCooking |
| pipeline_tag: text-generation |
| tags: |
| - exl3 |
| --- |
| <style> |
| .container-dark { font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, Arial, sans-serif; line-height: 1.6; color: #d4d4d4; } |
| a { color: #569cd6; text-decoration: none; font-weight: 600; } |
| a:hover { text-decoration: underline; } |
| .card-dark { background-color: #252526; border-radius: 12px; padding: 24px; margin-bottom: 20px; box-shadow: 0 4px 12px rgba(0,0,0,0.3); border: 1px solid #3c3c3c; } |
| .card-dark h1 { font-size: 2.2em; color: #ffffff; text-align: center; margin-bottom: 10px; } |
| .card-dark .subtitle { text-align: center; font-size: 1.1em; color: #a0a0a0; } |
| .card-dark h2 { font-size: 1.5em; margin-top: 0; padding-bottom: 10px; border-bottom: 1px solid #3c3c3c; color: #c586c0; } |
| .styled-table { display: table; border: none; width: 100%; font-size: 0.95em; } |
| .styled-table thead th { background-color: #333333; color: #c586c0; text-align: left; padding: 12px 15px; } |
| .styled-table td { padding: 0; border-bottom: 1px solid #3c3c3c; } |
| .styled-table tbody tr { transition: background-color 0.1s ease; } |
| .styled-table tbody tr:hover { background-color: #3a3a3a; } |
| .styled-table tr:last-child td { border-bottom: none; } |
| .styled-table td a { display: block; padding: 12px 15px; } |
| .styled-table td a.fake-link { text-decoration:none; color:inherit; } |
| details { margin-top: 20px; border: 1px solid #3c3c3c; border-radius: 8px; overflow: hidden; } |
| summary { cursor: pointer; padding: 12px 18px; background-color: #6A5ACD; font-weight: 600; display: flex; align-items: center; gap: 10px; justify-content: space-between; list-style: none; } |
| summary::-webkit-details-marker { display: none; } |
| summary:hover { filter: brightness(1.1); } |
| summary::after { content: ''; display: inline-block; width: 8px; height: 8px; border-bottom: 2px solid white; border-right: 2px solid white; transform: rotate(45deg); transition: transform 0.3s ease; } |
| details[open] > summary::after { transform: rotate(225deg); } |
| .details-content { padding: 18px; } |
| </style> |
|
|
| <div class="container-dark"> |
|
|
| <div class="card-dark"> |
| <h1>Gemma-3-R1984-27B EXL3</h1> |
| <p class="subtitle"> |
| EXL3 quants of <a href="https://huggingface.co/VIDraft/Gemma-3-R1984-27B">VIDraft/Gemma-3-R1984-27B</a> |
| using <a href="https://github.com/turboderp-org/exllamav3/">exllamav3</a> v0.0.34 |
| </p> |
| </div> |
| |
| <div class="card-dark"> |
| <h2>KL Divergence vs VRAM</h2> |
| <img src="kld_plot.png" alt="KLD plot" style="width:100%; border-radius: 8px;" /> |
| <p class="subtitle">Reference: 6.0bpw. Lower KLD = closer to reference quality. Measured on wikitext-2 (20 rows, 2048 ctx).</p> |
| </div> |
| |
| <div class="card-dark"> |
| <h2>Quants</h2> |
| <table class="styled-table"> |
| <thead> |
| <tr><th>Branch</th><th>BPW</th><th>Head</th><th>VRAM (GB)</th><th>KLD</th><th>Type</th></tr> |
| </thead> |
| <tbody> |
| <tr> |
| <td><a href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/2.0bpw_H6">2.0bpw_H6</a></td> |
| <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/2.0bpw_H6">2.0</a></td> |
| <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/2.0bpw_H6">6</a></td> |
| <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/2.0bpw_H6">7.0</a></td> |
| <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/2.0bpw_H6">0.450</a></td> |
| <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/2.0bpw_H6">base</a></td> |
| </tr> |
| <tr> |
| <td><a href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/2.50bpw_H6">2.50bpw_H6</a></td> |
| <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/2.50bpw_H6">2.50</a></td> |
| <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/2.50bpw_H6">6</a></td> |
| <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/2.50bpw_H6">8.5</a></td> |
| <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/2.50bpw_H6">0.389</a></td> |
| <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/2.50bpw_H6">optimized</a></td> |
| </tr> |
| <tr> |
| <td><a href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.0bpw_H6">3.0bpw_H6</a></td> |
| <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.0bpw_H6">3.0</a></td> |
| <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.0bpw_H6">6</a></td> |
| <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.0bpw_H6">9.9</a></td> |
| <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.0bpw_H6">0.110</a></td> |
| <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.0bpw_H6">base</a></td> |
| </tr> |
| <tr> |
| <td><a href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.35bpw_H6">3.35bpw_H6</a></td> |
| <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.35bpw_H6">3.35</a></td> |
| <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.35bpw_H6">6</a></td> |
| <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.35bpw_H6">11.0</a></td> |
| <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.35bpw_H6">0.088</a></td> |
| <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.35bpw_H6">optimized</a></td> |
| </tr> |
| <tr> |
| <td><a href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.49bpw_H6">3.49bpw_H6</a></td> |
| <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.49bpw_H6">3.49</a></td> |
| <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.49bpw_H6">6</a></td> |
| <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.49bpw_H6">11.5</a></td> |
| <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.49bpw_H6">0.075</a></td> |
| <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.49bpw_H6">optimized</a></td> |
| </tr> |
| <tr> |
| <td><a href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.65bpw_H6">3.65bpw_H6</a></td> |
| <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.65bpw_H6">3.65</a></td> |
| <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.65bpw_H6">6</a></td> |
| <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.65bpw_H6">12.2</a></td> |
| <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.65bpw_H6">0.065</a></td> |
| <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/3.65bpw_H6">optimized</a></td> |
| </tr> |
| <tr> |
| <td><a href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/4.0bpw_H6">4.0bpw_H6</a></td> |
| <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/4.0bpw_H6">4.0</a></td> |
| <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/4.0bpw_H6">6</a></td> |
| <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/4.0bpw_H6">12.9</a></td> |
| <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/4.0bpw_H6">0.039</a></td> |
| <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/4.0bpw_H6">base</a></td> |
| </tr> |
| <tr> |
| <td><a href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/5.0bpw_H6">5.0bpw_H6</a></td> |
| <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/5.0bpw_H6">5.0</a></td> |
| <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/5.0bpw_H6">6</a></td> |
| <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/5.0bpw_H6">15.9</a></td> |
| <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/5.0bpw_H6">0.015</a></td> |
| <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/5.0bpw_H6">base</a></td> |
| </tr> |
| <tr> |
| <td><a href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/6.0bpw_H6">6.0bpw_H6</a></td> |
| <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/6.0bpw_H6">6.0</a></td> |
| <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/6.0bpw_H6">6</a></td> |
| <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/6.0bpw_H6">19.0</a></td> |
| <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/6.0bpw_H6">ref</a></td> |
| <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/6.0bpw_H6">base</a></td> |
| </tr> |
| <tr> |
| <td><a href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/7.0bpw_H6">7.0bpw_H6</a></td> |
| <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/7.0bpw_H6">7.0</a></td> |
| <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/7.0bpw_H6">6</a></td> |
| <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/7.0bpw_H6">~22</a></td> |
| <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/7.0bpw_H6">-</a></td> |
| <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/7.0bpw_H6">base</a></td> |
| </tr> |
| <tr> |
| <td><a href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/8.0bpw_H6">8.0bpw_H6</a></td> |
| <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/8.0bpw_H6">8.0</a></td> |
| <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/8.0bpw_H6">6</a></td> |
| <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/8.0bpw_H6">~29</a></td> |
| <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/8.0bpw_H6">-</a></td> |
| <td><a class="fake-link" href="https://huggingface.co/WeReCooking/Gemma-3-R1984-27B-EXL3/tree/8.0bpw_H6">base</a></td> |
| </tr> |
| </tbody> |
| </table> |
| <p class="subtitle">Optimized variants use KLD-guided tensor mixing + attn@5bpw recompile. Bases are direct converts. 7.0/8.0bpw KLD not measured (exceed 32 GB VRAM).</p> |
| </div> |
| |
| <div class="card-dark"> |
| <h2>Download</h2> |
| <details> |
| <summary>Download commands</summary> |
| <div class="details-content"> |
| <b>Install CLI:</b> |
| <pre><code>pip install -U "huggingface_hub[cli]"</code></pre> |
| <b>Download a specific quant:</b> |
| <pre><code>huggingface-cli download WeReCooking/Gemma-3-R1984-27B-EXL3 --revision "4.0bpw_H6" --local-dir ./</code></pre> |
| </div> |
| </details> |
| <p class="subtitle">EXL3 quants run with <a href="https://github.com/theroyallab/tabbyapi">TabbyAPI</a> or any exllamav3-compatible backend.</p> |
| </div> |
| |
| <div class="card-dark"> |
| <h2>Build Details</h2> |
| <details> |
| <summary>How these were made</summary> |
| <div class="details-content"> |
| <p><b>Base quants:</b> <code>convert.py -b <bpw></code> (2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0)</p> |
| <p><b>KLD measurement:</b> <code>measure.py -r <ref> -ms 128 -i <2.0bpw> <8.0bpw></code></p> |
| <p><b>Optimized (2.50, 3.35):</b> <code>optimize.py -i <lo> <hi> -m measurement.json -b <target></code> then <code>recompile.py -or override.yaml</code> with <code>*.self_attn.* -> 5bpw</code></p> |
| <p><b>Note:</b> Gemma-3 is dense (no MoE), so <code>*.shared_experts.*</code> is not applicable. Only optimized variants are recompiled; bases stay at exact bpw.</p> |
| <p>Docs: <a href="https://github.com/turboderp-org/exllamav3/blob/master/doc/convert.md">exllamav3 convert.md</a></p> |
| </div> |
| </details> |
| </div> |
| |
| <div class="card-dark"> |
| <h2>Files</h2> |
| <p><code>main</code> branch: <code>measurement.json</code> (KLD map) + <code>kld_plot.png</code></p> |
| <p>Each bpw branch: quantized model shards + config + tokenizer</p> |
| </div> |
| |
| </div> |
|
|