Text Generation
Transformers
Safetensors
English
glm4_moe_lite
zen
zenlm
hanzo
code
coding
fast
conversational
8-bit precision
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("zenlm/zen-coder-flash")
model = AutoModelForCausalLM.from_pretrained("zenlm/zen-coder-flash")
messages = [
{"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))Quick Links
Zen Coder Flash
Ultra-fast compact code generation model optimized for real-time completions.
Overview
Built on Zen MoDE (Mixture of Distilled Experts) architecture with 4B parameters and 64K context window.
Developed by Hanzo AI and the Zoo Labs Foundation.
Quick Start
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "zenlm/zen-coder-flash"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto")
messages = [{"role": "user", "content": "Hello!"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
API Access
curl https://api.hanzo.ai/v1/chat/completions \
-H "Authorization: Bearer $HANZO_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model": "zen-coder-flash", "messages": [{"role": "user", "content": "Hello"}]}'
Get your API key at console.hanzo.ai — $5 free credit on signup.
Model Details
| Attribute | Value |
|---|---|
| Parameters | 4B |
| Architecture | Zen MoDE |
| Context | 64K tokens |
| License | Apache 2.0 |
License
Apache 2.0
- Downloads last month
- 9
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="zenlm/zen-coder-flash") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)