Instructions to use build-small-hackathon/deku with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use build-small-hackathon/deku with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-0.5B-Instruct")
model = PeftModel.from_pretrained(base_model, "build-small-hackathon/deku")

Transformers

How to use build-small-hackathon/deku with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="build-small-hackathon/deku")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("build-small-hackathon/deku", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use build-small-hackathon/deku with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "build-small-hackathon/deku"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "build-small-hackathon/deku",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/build-small-hackathon/deku

SGLang

How to use build-small-hackathon/deku with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "build-small-hackathon/deku" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "build-small-hackathon/deku",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "build-small-hackathon/deku" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "build-small-hackathon/deku",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use build-small-hackathon/deku with Docker Model Runner:
```
docker model run hf.co/build-small-hackathon/deku
```

deku

File size: 2,537 Bytes

cb922f9
 
7b84259
 
cb922f9
7b84259
cb922f9
7b84259
 
14b034c
7b84259
cb922f9
 
14b034c
cb922f9
7b84259
cb922f9
7b84259
cb922f9
7b84259
 
 
 
 
 
 
cb922f9
7b84259
 
 
cb922f9
7b84259
cb922f9
7b84259
cb922f9
7b84259
 
 
cb922f9
7b84259
 
 
cb922f9
7b84259
 
 
 
 
 
 
 
 
 
cb922f9
14b034c
cb922f9
14b034c
7b84259
 
cb922f9
7b84259
 
 
cb922f9
7b84259
 
 
 
cb922f9
7b84259
cb922f9
7b84259

---
base_model: Qwen/Qwen2.5-0.5B-Instruct
library_name: peft
pipeline_tag: text-generation
tags:
- base_model:adapter:Qwen/Qwen2.5-0.5B-Instruct
- lora
- transformers
- knowledge-distillation
- cka
license: mit
---

# Deku — One for All Student

Qwen2.5-0.5B-Instruct fine-tuned via **gated CKA geometry distillation** from 5 heterogeneous teacher LLMs. The student learns to absorb the representation geometry of multiple teachers simultaneously through a learned routing gate.

## Teachers

| Model | Strength |
|---|---|
| Qwen2.5-1.5B-Instruct | code, structured reasoning |
| SmolLM2-1.7B-Instruct | curated quality |
| Phi-3.5-mini-instruct | instruction following, CoT |
| gemma-2-2b-it | long context |
| MiniCPM-2B-sft-bf16 | multilingual, efficiency |

## Method

**Path B — geometry-only, tokenizer-agnostic distillation.**

Each teacher has a different tokenizer and hidden dimension, making token-level KL divergence ill-defined across the ensemble. Instead, the student learns to align its hidden-state geometry with each teacher via **CKA (Centered Kernel Alignment)**, weighted by a learned gating network that routes each input to the most relevant teacher.

The objective is:

```
L = λ1·L_task + λ2·L_KL(Qwen1.5B) + λ3·L_geo(gate)
```

- `L_task` — next-token cross-entropy on the training mix
- `L_KL` — KL divergence from Qwen2.5-1.5B (same tokenizer, zero friction)
- `L_geo` — gated CKA loss: `1 - mean_i gate_i · CKA(H_student, Pi_i · H_teacher_i)`

Lambdas follow a three-phase curriculum: task-only warmup → KL ramp-in → geometry ramp-in.

## Training

- **Base:** Qwen/Qwen2.5-0.5B-Instruct
- **Adapter:** LoRA r=64, α=128 on all attention + MLP projections
- **Data:** OpenHermes-2.5 (70%) + GSM8K (20%) + ARC-Challenge (10%)
- **Steps:** 5 000 · batch 8 · seq 512
- **Hardware:** A100-80GB via Modal
- **Precision:** bfloat16

## Usage

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-0.5B-Instruct")
model = PeftModel.from_pretrained(base, "build-small-hackathon/deku")
tok = AutoTokenizer.from_pretrained("build-small-hackathon/deku")

inputs = tok("Explain what a hash map is.", return_tensors="pt")
out = model.generate(**inputs, max_new_tokens=200)
print(tok.decode(out[0], skip_special_tokens=True))
```

## Demo

Live soul space + probe interface: [build-small-hackathon/one-for-all](https://huggingface.co/spaces/build-small-hackathon/one-for-all)

---
PEFT 0.19.1