|
|
--- |
|
|
license: apache-2.0 |
|
|
--- |
|
|
## Content |
|
|
This model area holds the public parts of converted gguf models using Skipper (T3) or Mate (M8) technology. |
|
|
Future modes will also follow the nautic theme. |
|
|
|
|
|
## Demo Spaces |
|
|
- [x] [Granite4](https://huggingface.co/spaces/TobDeBer/Granite4Family) All Granite4 models (small, tiny, micro, nano 1b and nano 350m) |
|
|
- tbd: add PremiumZero, AdvancedZero, FrontierZero |
|
|
- tbd: all OSS models with Apache2.0 and MIT license |
|
|
- tbd: add larger models using advanced compression (REAP, M8, ...) |
|
|
|
|
|
## Challenge: high quality models in 1/2/4/8/.. GB size |
|
|
- Phone 4GB |
|
|
- Home 8GB |
|
|
- Game 16GB |
|
|
- Pro 32GB |
|
|
- Zero 64GB - 71GB |
|
|
- Server 128GB+ |
|
|
|
|
|
| Quality vs. Size | Casual | Premium | Advanced | Frontier | |
|
|
| :--- | :--- | :-: | :--- | :--- | |
|
|
| 64-71 GB | SOTA | SOTA | SOTA | BETA | |
|
|
| 32 GB | SOTA | SOTA | SOTA+ | RESEARCH | |
|
|
| 16 GB | SOTA | SOTA+ | BETA | - | |
|
|
| 8 GB | SOTA | BETA | BETA | - | |
|
|
| 4 GB | SOTA | RESEARCH | - | - | |
|
|
| 2 GB | RESEARCH | - | - | - | |
|
|
| 1 GB | - | - | - | - | |
|
|
|
|
|
- SOTA: K quants |
|
|
- SOTA+: UD quants |
|
|
- BETA: REAP + UD |
|
|
- RESEARCH: M8 and better |
|
|
|
|
|
## ELO (https://lmarena.ai/leaderboard/text) |
|
|
|
|
|
- Towards Frontier@Phone (within 40 ELO of #1) non plus ultra |
|
|
- qwen3-vl-235b-a22b-instruct 1415 (-37 ELO) |
|
|
- https://huggingface.co/Qwen/Qwen3-235B-A22B-Instruct-2507 |
|
|
- https://huggingface.co/unsloth/Qwen3-235B-A22B-Instruct-2507-GGUF/tree/main/Q2_K_L (85.8 GB) |
|
|
- Frontier@Zero: GLM-4.6-REAP-218B-A32B 1428 REAP50 + 3bpw (81,8GB) |
|
|
- Frontier@Phone: GLM-4.6-REAP-218B-A32B 1428 REAP75 + 0.3bpw (4,0GB) |
|
|
- Towards Advanced@Phone (within 60 ELO of #1) almost perfect |
|
|
- Advanced@Gamer: qwen3-next-80b-a3b-instruct 1402 REAP50 + 3.6bpw (21,6GB) |
|
|
- Advanced@Phone: qwen3-next-80b-a3b-instruct 1402 REAP75 + 1.2bpw (3,6GB) |
|
|
- Towards Premium@Phone (within 80 ELO of #1) extremely good for everyday |
|
|
- Premium@Home: qwen3-30b-a3b-instruct-2507 1385 REAP50 + 3.6bpw (8,1GB) |
|
|
- Premium@Phone: qwen3-30b-a3b-instruct-2507 1385 REAP75 + 3.6bpw (4,1GB) |
|
|
- Towards Casual@Phone (within 99 ELO of #1) very useful |
|
|
- Casual@Phone: gemma-3n-e4b-it (133 ELO diff!) 1318 (4.1GB) |
|
|
https://huggingface.co/unsloth/gemma-3n-E4B-it-GGUF/blob/main/gemma-3n-E4B-it-UD-Q3_K_XL.gguf |
|
|
|
|
|
## Versions |
|
|
| Version | Codename | Fileprefix | typical bpw range | new feature | |
|
|
| :--- | :--- | :--- | :--- | :--- | |
|
|
| 1.0 | Skipper | T3 and T2 | 0.8 .. 2.2 | introduce new compression method | |
|
|
| 1.5 | Mate | M8 | 0.4 .. 2 | compression improvements | |
|
|
| 2.0 | Cheng | Cx | 0.3 .. 2 | speed improvements | |
|
|
| 2.5 | Cheng++ | Cy | 0.1 .. 2 | reduce compute requirements | |
|
|
|
|
|
V1 does reduce model size significantly at same subjective quality, but leaves compute requirements high. |
|
|
|
|
|
V2 will scale down compute requirements and support cheap NPUs |
|
|
|
|
|
## expected bpw (bit per weight) |
|
|
Actual bpw are higher for small models and lower for larger models. Similar to JPEG and video encoding, higher input quality opens more opportunity for compression. |
|
|
|
|
|
| Base | Mode | % | bpw@30b | |
|
|
| :--- | :--- | :-: | :--- | |
|
|
| Q5_K | T3UD | 95 | 2 .. 2.2 | |
|
|
| Q4_K | T2UD | 90 | 1.4 .. 1.6 | |
|
|
| Q2_K | T2UD2 | 75 | 1 .. 1.2 | |
|
|
| Q2_K | T2UD1 | 60 | 0.8 | |
|
|
| Q2_K | M8HQ | 75 | 0.8 | |
|
|
| Q2_K | M8LQ | 60 | 0.4 .. 0.6 | |
|
|
|
|
|
|