TwentyQ โ€” GGUF

GGUF quantized versions of david-ar/20q, the world's smallest chat model.

This model was natively trained at 2-bit precision. All quantization levels above Q2_K are technically upscaled. Q2_K is the model's native precision.

Available Quantizations

File Quant Size Quality Loss
twentyq-f32.gguf F32 762 KB 0%
twentyq-f16.gguf F16 397 KB 0%
twentyq-q8_0.gguf Q8_0 228 KB 0%
twentyq-q4_0.gguf Q4_0 135 KB 0%
twentyq-q2_k.gguf Q2_K 95 KB 0%

All quantizations are lossless because the original weights are 2-bit integers (values 0-3). Q2_K is the only quantization level that doesn't waste bits.

Architecture

general.architecture: twentyq
twentyq.block_count: 0
twentyq.embedding_length: 156
twentyq.attention.head_count: 156
twentyq.context_length: 20
twentyq.vocab_size: 1200

Zero transformer blocks. 156 attention heads. 20-token context window. The output projection layer (output.weight) contains the entire model.

Compatibility

These files require a runtime with twentyq architecture support, which does not currently exist in llama.cpp, ollama, or any other GGUF runtime. For inference, use the original model via the transformers library, or the live demo.

Downloads last month
54
GGUF
Model size
187k params
Architecture
twentyq
Hardware compatibility
Log In to add your hardware

2-bit

4-bit

8-bit

16-bit

32-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for david-ar/20q-GGUF

Base model

david-ar/20q
Quantized
(1)
this model