File size: 3,233 Bytes
49555f0
 
 
 
 
 
 
c154b38
dbcf693
 
 
 
 
9e01e86
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
dbcf693
9e01e86
dbcf693
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c154b38
49555f0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9e01e86
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
---
license: apache-2.0
---
## Content
This model area holds the public parts of converted gguf models using Skipper (T3) or Mate (M8) technology.
Future modes will also follow the nautic theme. 

## Demo Spaces
- [x] [Granite4](https://huggingface.co/spaces/TobDeBer/Granite4Family) All Granite4 models (small, tiny, micro, nano 1b and nano 350m)
- tbd: add PremiumZero, AdvancedZero, FrontierZero
- tbd: all OSS models with Apache2.0 and MIT license
- tbd: add larger models using advanced compression (REAP, M8, ...)

## Challenge: high quality models in 1/2/4/8/.. GB size
  - Phone 4GB
  - Home  8GB
  - Game 16GB
  - Pro  32GB
  - Zero 64GB - 71GB
  - Server 128GB+

| Quality vs. Size | Casual | Premium | Advanced | Frontier | 
| :--- | :--- | :-: | :--- | :--- | 
| 64-71 GB | SOTA | SOTA | SOTA | BETA |
| 32 GB | SOTA | SOTA | SOTA+ | RESEARCH |
| 16 GB | SOTA | SOTA+ | BETA | - |
| 8 GB | SOTA | BETA | BETA | - |
| 4 GB | SOTA | RESEARCH | - | - |
| 2 GB | RESEARCH | - | - | - |
| 1 GB | - | - | - | - |

- SOTA: K quants
- SOTA+: UD quants
- BETA: REAP + UD
- RESEARCH: M8 and better

## ELO (https://lmarena.ai/leaderboard/text)

- Towards Frontier@Phone (within 40 ELO of #1)	non plus ultra
    - qwen3-vl-235b-a22b-instruct 1415 (-37 ELO) 
        - https://huggingface.co/Qwen/Qwen3-235B-A22B-Instruct-2507
        - https://huggingface.co/unsloth/Qwen3-235B-A22B-Instruct-2507-GGUF/tree/main/Q2_K_L (85.8 GB)
	- Frontier@Zero:	GLM-4.6-REAP-218B-A32B 			1428	REAP50 + 3bpw	(81,8GB)
	- Frontier@Phone:	GLM-4.6-REAP-218B-A32B 			1428	REAP75 + 0.3bpw (4,0GB)
- Towards	Advanced@Phone (within 60 ELO of #1)	almost perfect
	- Advanced@Gamer:	qwen3-next-80b-a3b-instruct		1402	REAP50 + 3.6bpw	(21,6GB)
	- Advanced@Phone:	qwen3-next-80b-a3b-instruct		1402	REAP75 + 1.2bpw (3,6GB)
- Towards Premium@Phone  (within 80 ELO of #1)	extremely good for everyday
	- Premium@Home:	qwen3-30b-a3b-instruct-2507		1385	REAP50 + 3.6bpw	(8,1GB)
	- Premium@Phone:	qwen3-30b-a3b-instruct-2507		1385	REAP75 + 3.6bpw	(4,1GB)
- Towards Casual@Phone   (within 99 ELO of #1)	very useful
	- Casual@Phone:	gemma-3n-e4b-it	(133 ELO diff!)	1318	(4.1GB)
	https://huggingface.co/unsloth/gemma-3n-E4B-it-GGUF/blob/main/gemma-3n-E4B-it-UD-Q3_K_XL.gguf

## Versions
| Version | Codename | Fileprefix | typical bpw range | new feature | 
| :--- | :--- | :--- | :--- | :--- |
| 1.0 | Skipper | T3 and T2 | 0.8 .. 2.2 | introduce new compression method | 
| 1.5 | Mate | M8 | 0.4 .. 2 | compression improvements  | 
| 2.0 | Cheng | Cx | 0.3 .. 2 | speed improvements | 
| 2.5 | Cheng++ | Cy | 0.1 .. 2 | reduce compute requirements | 

V1 does reduce model size significantly at same subjective quality, but leaves compute requirements high.

V2 will scale down compute requirements and support cheap NPUs

## expected bpw (bit per weight)
Actual bpw are higher for small models and lower for larger models. Similar to JPEG and video encoding, higher input quality opens more opportunity for compression.

| Base | Mode | % | bpw@30b |
| :--- | :--- | :-: | :--- |
| Q5_K | T3UD | 95 | 2 .. 2.2 |
| Q4_K | T2UD | 90 | 1.4 .. 1.6 |
| Q2_K | T2UD2 | 75 | 1 .. 1.2 |
| Q2_K | T2UD1 | 60 | 0.8 |
| Q2_K | M8HQ | 75 | 0.8 |
| Q2_K | M8LQ | 60 | 0.4 .. 0.6 |