How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="build-small-hackathon/deku-gguf",
	filename="",
)
llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Deku — GGUF (llama.cpp)

GGUF builds of build-small-hackathon/deku, the One for All student: a Qwen2.5-0.5B distilled from 6 teachers via gated CKA geometry distillation. The LoRA adapter is merged into the base, then converted with llama.cpp's convert_hf_to_gguf.py.

Files

File Size Use
deku-q8_0.gguf ~531 MB what the Space serves — near-lossless, CPU-friendly
deku-f16.gguf ~994 MB archival full-precision build
gating.npz ~22 KB the teacher-gating head as numpy (weight 6×896, bias 6)

Run

llama-cli -m deku-q8_0.gguf -p "Explain gradient descent in one sentence."
from llama_cpp import Llama
llm = Llama(model_path="deku-q8_0.gguf", n_ctx=2048)
print(llm.create_chat_completion(
    messages=[{"role": "user", "content": "Why is the sky blue?"}]
)["choices"][0]["message"]["content"])

Teacher gating without torch

gating.npz lets you reproduce the live "teacher influence" meters from the Space using only numpy on a mean-pooled embedding from llama.cpp:

import numpy as np
g = np.load("gating.npz")          # g["weight"] (6, 896), g["bias"] (6,)
def gate(emb):                     # emb: 896-dim pooled embedding
    z = g["weight"] @ emb + g["bias"]
    e = np.exp(z - z.max())
    return e / e.sum()             # softmax over the 6 teachers

Teacher order: qwen, smollm, phi, gemma, minicpm, nemotron.

Downloads last month
172
GGUF
Model size
0.5B params
Architecture
qwen2
Hardware compatibility
Log In to add your hardware

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for build-small-hackathon/deku-gguf

Quantized
(1)
this model

Space using build-small-hackathon/deku-gguf 1

Collection including build-small-hackathon/deku-gguf