Holo-3.1-4B-Coding-Repair38 GGUF

GGUF conversion of josephmayo/Holo-3.1-4B-Coding-Repair38-Merged for use with llama.cpp-compatible runtimes.

Files

File Quantization Size
Holo-3.1-4B-Coding-Repair38-F16.gguf F16 converted GGUF 8,424,393,088 bytes
Holo-3.1-4B-Coding-Repair38-Q8_0.gguf Q8_0 4,482,402,688 bytes
Holo-3.1-4B-Coding-Repair38-Q6_K.gguf Q6_K 3,464,055,168 bytes
Holo-3.1-4B-Coding-Repair38-Q4_K_M.gguf Q4_K_M 2,708,803,968 bytes

Notes

Q4_K_M is the supported 4-bit K-quant produced for this release by the available llama.cpp quantizer. No Q4_K_L file is published in this repository.

The source merged model, LoRA adapter, and tokenizer/config assets are maintained separately from this GGUF repository. This repository contains only the GGUF runtime artifacts and this model card.

Downloads last month
145
GGUF
Model size
4B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

4-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for josephmayo/Holo-3.1-4B-Coder-GGUF

Quantized
(3)
this model

Space using josephmayo/Holo-3.1-4B-Coder-GGUF 1