Instructions to use zenlm/zen-coder-flash with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use zenlm/zen-coder-flash with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="zenlm/zen-coder-flash")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("zenlm/zen-coder-flash")
model = AutoModelForCausalLM.from_pretrained("zenlm/zen-coder-flash")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use zenlm/zen-coder-flash with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "zenlm/zen-coder-flash"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "zenlm/zen-coder-flash",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/zenlm/zen-coder-flash

SGLang

How to use zenlm/zen-coder-flash with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "zenlm/zen-coder-flash" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "zenlm/zen-coder-flash",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "zenlm/zen-coder-flash" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "zenlm/zen-coder-flash",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use zenlm/zen-coder-flash with Docker Model Runner:
```
docker model run hf.co/zenlm/zen-coder-flash
```

zeekay commited on Jan 20

Commit

b0b1d91

verified ·

1 Parent(s): d2f19c6

Upload folder using huggingface_hub

Browse files

Files changed (9) hide show

README.md +13 -211
model-00001-of-00007.safetensors +3 -0
model-00002-of-00007.safetensors +3 -0
model-00003-of-00007.safetensors +3 -0
model-00004-of-00007.safetensors +3 -0
model-00005-of-00007.safetensors +3 -0
model-00006-of-00007.safetensors +3 -0
model-00007-of-00007.safetensors +3 -0
model.safetensors.index.json +0 -0

README.md CHANGED Viewed

@@ -1,228 +1,30 @@
 ---
-license: mit
 language:
 - en
 - zh
 library_name: transformers
 pipeline_tag: text-generation
 tags:
-- zen
-- code
-- moe
-- glm
-- coding
-- programming
-- software-engineering
 base_model: zai-org/GLM-4.7-Flash
-model-index:
-- name: zen-coder-flash
-  results:
-  - task:
-      type: text-generation
-      name: Code Generation
-    dataset:
-      name: SWE-bench Verified
-      type: swe-bench
-    metrics:
-    - type: accuracy
-      value: 59.2
-      name: SWE-bench Verified
-  - task:
-      type: text-generation
-      name: Mathematical Reasoning
-    dataset:
-      name: AIME 2025
-      type: aime
-    metrics:
-    - type: accuracy
-      value: 91.6
-      name: AIME 2025
 ---
-# Zen Coder Flash ⚡
-<div align="center">
-  <img src="https://zenlm.org/logo.png" alt="Zen AI" width="200"/>
-  **The Flagship Zen Coder Model**
-  [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
-  [![HuggingFace](https://img.shields.io/badge/🤗-zenlm%2Fzen--coder--flash-blue)](https://huggingface.co/zenlm/zen-coder-flash)
-</div>
-## Overview
-**Zen Coder Flash** is the flagship code-focused model in the Zen AI family. Built on GLM-4.7-Flash's cutting-edge Mixture of Experts architecture, it delivers frontier coding performance with practical efficiency.
-| Attribute | Value |
-|-----------|-------|
-| **Parameters** | 31B total / 3B active (MoE) |
-| **Context Length** | 131,072 tokens |
-| **Base Model** | [GLM-4.7-Flash](https://huggingface.co/zai-org/GLM-4.7-Flash) |
-| **License** | MIT |
-| **Languages** | 100+ programming languages |
-## Why Zen Coder Flash?
-- **59.2% SWE-bench** vs 22% Qwen3-30B - nearly **3x better** at real coding tasks
-- **Efficient MoE**: 31B params but only 3B active per token
-- **131K context**: Handle entire codebases in a single prompt
-- **Native tool calling**: Built-in function execution support
-- **Reasoning mode**: Extended chain-of-thought for complex problems
-## Performance
-| Benchmark | Score | vs Qwen3-30B |
-|-----------|-------|--------------|
-| SWE-bench Verified | **59.2%** | +37.2% (2.7x) |
-| AIME 2025 | **91.6%** | +6.6% |
-| GPQA | **75.2%** | +1.8% |
-| τ²-Bench | **79.5%** | +30.5% |
-## Zen Coder Family
-| Tier | Model | Parameters | Active | Use Case |
-|------|-------|------------|--------|----------|
-| Small | [zen-coder-4b](https://huggingface.co/zenlm/zen-coder) | 4B | 4B | Edge/mobile |
-| **Flagship** | **zen-coder-flash** | **31B MoE** | **3B** | **Balanced** |
-| Max | [zen-max](https://huggingface.co/zenlm/zen-max) | 671B MoE | 14B | Frontier |
-## Quick Start
-### Transformers
-```python
-import torch
-from transformers import AutoModelForCausalLM, AutoTokenizer
-model_id = "zenlm/zen-coder-flash"
-tokenizer = AutoTokenizer.from_pretrained(model_id)
-model = AutoModelForCausalLM.from_pretrained(
-    model_id,
-    torch_dtype=torch.bfloat16,
-    device_map="auto",
-)
-messages = [{"role": "user", "content": "Write a Python function to find all prime numbers up to n using the Sieve of Eratosthenes"}]
-inputs = tokenizer.apply_chat_template(
-    messages,
-    tokenize=True,
-    add_generation_prompt=True,
-    return_dict=True,
-    return_tensors="pt",
-).to(model.device)
-outputs = model.generate(**inputs, max_new_tokens=512, do_sample=True, temperature=0.7)
-response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
-print(response)
-```
-### vLLM (Recommended for Production)
-```bash
-vllm serve zenlm/zen-coder-flash \
-    --tensor-parallel-size 4 \
-    --speculative-config.method mtp \
-    --speculative-config.num_speculative_tokens 1 \
-    --tool-call-parser glm47 \
-    --reasoning-parser glm45 \
-    --enable-auto-tool-choice
-```
-### SGLang
-```bash
-python -m sglang.launch_server \
-    --model-path zenlm/zen-coder-flash \
-    --tp-size 4 \
-    --tool-call-parser glm47 \
-    --reasoning-parser glm45 \
-    --speculative-algorithm EAGLE \
-    --speculative-num-steps 3
-```
-### MLX (Apple Silicon)
-```python
-from mlx_lm import load, generate
-model, tokenizer = load("zenlm/zen-coder-flash")
-response = generate(model, tokenizer, prompt="Write a Rust function for binary search", max_tokens=256)
-print(response)
-```
-## Capabilities
-### Code Generation
-- 100+ programming languages
-- Framework-aware completions
-- Test generation
-- Documentation generation
-### Debugging & Analysis
-- Bug detection and fixes
-- Code review
-- Performance optimization
-- Security analysis
-### Software Engineering
-- Architecture design
-- API design
-- Refactoring suggestions
-- Migration assistance
-### Tool Calling
-```python
-# Native function calling support
-tools = [
-    {
-        "type": "function",
-        "function": {
-            "name": "run_tests",
-            "description": "Run test suite",
-            "parameters": {"type": "object", "properties": {}}
-        }
-    }
-]
-```
-## Identity
-I am **Zen Coder Flash**, the flagship code-focused model in the Zen AI family. I combine GLM-4.7's cutting-edge MoE architecture with Zen's philosophy of clarity and efficiency. With 31 billion parameters (only 3B active per token) and 131K context, I deliver frontier coding capability that's practical to deploy.
-## Training
-Zen Coder Flash is built through identity fine-tuning on GLM-4.7-Flash using MLX LoRA on Apple Silicon. The training emphasizes:
-- Zen identity and persona
-- Code-focused instruction following
-- Tool calling capabilities
-- Extended reasoning patterns
-## Citation
-```bibtex
-@misc{zen-coder-flash-2025,
-  title={Zen Coder Flash: Efficient Frontier Code Generation},
-  author={Hanzo AI},
-  year={2025},
-  url={https://huggingface.co/zenlm/zen-coder-flash}
-}
-```
-## Links
-- **Website**: [zenlm.org](https://zenlm.org)
-- **GitHub**: [zenlm/zen](https://github.com/zenlm/zen)
-- **Base Model**: [GLM-4.7-Flash](https://huggingface.co/zai-org/GLM-4.7-Flash)
-- **Organization**: [Hanzo AI](https://hanzo.ai)
-## License
-MIT License - inherited from GLM-4.7-Flash base model.
----
-*Zen AI: Clarity Through Intelligence*

 ---
 language:
 - en
 - zh
 library_name: transformers
+license: mit
 pipeline_tag: text-generation
 tags:
+- mlx
 base_model: zai-org/GLM-4.7-Flash
 ---
+## 💫 Community Model> GLM-4.7-Flash by zai-org
+_👾 [LM Studio](https://lmstudio.ai) Community models highlights program. Highlighting new & noteworthy models by the community. Join the conversation on [Discord](https://discord.gg/aPQfnNkxGC)_.
+**Model creator**: [zai-org](https://huggingface.co/zai-org)<br>
+**Original model**: [GLM-4.7-Flash](https://huggingface.co/zai-org/GLM-4.7-Flash)<br>
+**MLX quantization**: provided by [LM Studio team](https://x.com/lmstudio) using [mlx_lm](https://github.com/ml-explore/mlx-lm)<br>
+## Technical Details
+8-bit quantized version of GLM-4.7-Flash using MLX, optimized for Apple Silicon.
+## Special thanks
+🙏 Special thanks to the [Apple Machine Learning Research](https://github.com/ml-explore) team for creating [MLX](https://github.com/ml-explore/mlx).
+## Disclaimers
+LM Studio is not the creator, originator, or owner of any Model featured in the Community Model Program. Each Community Model is created and provided by third parties. LM Studio does not endorse, support, represent or guarantee the completeness, truthfulness, accuracy, or reliability of any Community Model. You understand that Community Models can produce content that might be offensive, harmful, inaccurate or otherwise inappropriate, or deceptive. Each Community Model is the sole responsibility of the person or entity who originated such Model. LM Studio may not monitor or control the Community Models and cannot, and does not, take responsibility for any such Model. LM Studio disclaims all warranties or guarantees about the accuracy, reliability or benefits of the Community Models. LM Studio further disclaims any warranty that the Community Model will meet your requirements, be secure, uninterrupted or available at any time or location, or error-free, viruses-free, or that any errors will be corrected, or otherwise. You will be solely responsible for any damage resulting from your use of or access to the Community Models, your downloading of any Community Model, or use of any other Community Model provided by or through LM Studio.

model-00001-of-00007.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8a550dabf6e2789a9d704211d75c681a27dd9d75e037c468e6d3fe25e797dfc8
+size 5176178595

model-00002-of-00007.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:05a51988a3602965ea8f21e0766240c8d890321fc3e219adcaa3d8b6108bb327
+size 5368050997

model-00003-of-00007.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0d0ed3c08f419f5c7ad90e96933948eaf4cd5d3b410dd9b2a3ffeb652ce026e0
+size 5187037498

model-00004-of-00007.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cf1c40d389c2d327844f6c7b0597d9ad519b059887c98f503ec87b3d00014375
+size 5187300215

model-00005-of-00007.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:94c98e48509c361cfc93ce786d6d2b55270595e9452bd521506e9e664ff79ff6
+size 5187300077

model-00006-of-00007.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b4335a3cad7c45bdc5377c7c3a1a6f31f2a1ab9a9b2022419749d8d738d36343
+size 5368051110

model-00007-of-00007.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:bc77ebb277c9f56a498eee9daefbe245464569f1517408961dfbde02ab653b3a
+size 347059898

model.safetensors.index.json ADDED Viewed

The diff for this file is too large to render. See raw diff