Instructions to use build-small-hackathon/deku with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use build-small-hackathon/deku with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-0.5B-Instruct") model = PeftModel.from_pretrained(base_model, "build-small-hackathon/deku") - Transformers
How to use build-small-hackathon/deku with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="build-small-hackathon/deku") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("build-small-hackathon/deku", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use build-small-hackathon/deku with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "build-small-hackathon/deku" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "build-small-hackathon/deku", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/build-small-hackathon/deku
- SGLang
How to use build-small-hackathon/deku with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "build-small-hackathon/deku" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "build-small-hackathon/deku", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "build-small-hackathon/deku" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "build-small-hackathon/deku", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use build-small-hackathon/deku with Docker Model Runner:
docker model run hf.co/build-small-hackathon/deku
File size: 2,537 Bytes
cb922f9 7b84259 cb922f9 7b84259 cb922f9 7b84259 14b034c 7b84259 cb922f9 14b034c cb922f9 7b84259 cb922f9 7b84259 cb922f9 7b84259 cb922f9 7b84259 cb922f9 7b84259 cb922f9 7b84259 cb922f9 7b84259 cb922f9 7b84259 cb922f9 7b84259 cb922f9 14b034c cb922f9 14b034c 7b84259 cb922f9 7b84259 cb922f9 7b84259 cb922f9 7b84259 cb922f9 7b84259 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 | ---
base_model: Qwen/Qwen2.5-0.5B-Instruct
library_name: peft
pipeline_tag: text-generation
tags:
- base_model:adapter:Qwen/Qwen2.5-0.5B-Instruct
- lora
- transformers
- knowledge-distillation
- cka
license: mit
---
# Deku — One for All Student
Qwen2.5-0.5B-Instruct fine-tuned via **gated CKA geometry distillation** from 5 heterogeneous teacher LLMs. The student learns to absorb the representation geometry of multiple teachers simultaneously through a learned routing gate.
## Teachers
| Model | Strength |
|---|---|
| Qwen2.5-1.5B-Instruct | code, structured reasoning |
| SmolLM2-1.7B-Instruct | curated quality |
| Phi-3.5-mini-instruct | instruction following, CoT |
| gemma-2-2b-it | long context |
| MiniCPM-2B-sft-bf16 | multilingual, efficiency |
## Method
**Path B — geometry-only, tokenizer-agnostic distillation.**
Each teacher has a different tokenizer and hidden dimension, making token-level KL divergence ill-defined across the ensemble. Instead, the student learns to align its hidden-state geometry with each teacher via **CKA (Centered Kernel Alignment)**, weighted by a learned gating network that routes each input to the most relevant teacher.
The objective is:
```
L = λ1·L_task + λ2·L_KL(Qwen1.5B) + λ3·L_geo(gate)
```
- `L_task` — next-token cross-entropy on the training mix
- `L_KL` — KL divergence from Qwen2.5-1.5B (same tokenizer, zero friction)
- `L_geo` — gated CKA loss: `1 - mean_i gate_i · CKA(H_student, Pi_i · H_teacher_i)`
Lambdas follow a three-phase curriculum: task-only warmup → KL ramp-in → geometry ramp-in.
## Training
- **Base:** Qwen/Qwen2.5-0.5B-Instruct
- **Adapter:** LoRA r=64, α=128 on all attention + MLP projections
- **Data:** OpenHermes-2.5 (70%) + GSM8K (20%) + ARC-Challenge (10%)
- **Steps:** 5 000 · batch 8 · seq 512
- **Hardware:** A100-80GB via Modal
- **Precision:** bfloat16
## Usage
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-0.5B-Instruct")
model = PeftModel.from_pretrained(base, "build-small-hackathon/deku")
tok = AutoTokenizer.from_pretrained("build-small-hackathon/deku")
inputs = tok("Explain what a hash map is.", return_tensors="pt")
out = model.generate(**inputs, max_new_tokens=200)
print(tok.decode(out[0], skip_special_tokens=True))
```
## Demo
Live soul space + probe interface: [build-small-hackathon/one-for-all](https://huggingface.co/spaces/build-small-hackathon/one-for-all)
---
PEFT 0.19.1
|