zenlm
/

zen-coder-flash

@@ -1,227 +1,64 @@
 ---
-license: mit
-language:
-- en
-- zh
-library_name: transformers
-pipeline_tag: text-generation
 tags:
-- zen
-- code
-- moe
-- coding
-- programming
-- software-engineering
-base_model: zenlm/zen-coder-flash
-model-index:
-- name: zen-coder-flash
-  results:
-  - task:
-      type: text-generation
-      name: Code Generation
-    dataset:
-      name: SWE-bench Verified
-      type: swe-bench
-    metrics:
-    - type: accuracy
-      value: 59.2
-      name: SWE-bench Verified
-  - task:
-      type: text-generation
-      name: Mathematical Reasoning
-    dataset:
-      name: AIME 2025
-      type: aime
-    metrics:
-    - type: accuracy
-      value: 91.6
-      name: AIME 2025
 ---
-# Zen Coder Flash ⚡
-<div align="center">
-  <img src="https://zenlm.org/logo.png" alt="Zen AI" width="200"/>
-  **The Flagship Zen Coder Model**
-  [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
-  [![HuggingFace](https://img.shields.io/badge/🤗-zenlm%2Fzen--coder--flash-blue)](https://huggingface.co/zenlm/zen-coder-flash)
-</div>
 ## Overview
-**Zen Coder Flash** is the flagship code-focused model in the Zen AI family. Built on a cutting-edge Mixture of Experts architecture, it delivers frontier coding performance with practical efficiency.
-| Attribute | Value |
-|-----------|-------|
-| **Parameters** | 31B total / 3B active (MoE) |
-| **Context Length** | 131,072 tokens |
-| **Architecture** | Mixture of Experts (MoE) |
-| **License** | MIT |
-| **Languages** | 100+ programming languages |
-## Why Zen Coder Flash?
-- **59.2% SWE-bench** nearly **3x better** than comparable models at real coding tasks
-- **Efficient MoE**: 31B params but only 3B active per token
-- **131K context**: Handle entire codebases in a single prompt
-- **Native tool calling**: Built-in function execution support
-- **Reasoning mode**: Extended chain-of-thought for complex problems
-## Performance
-| Benchmark | Score | Improvement |
-|-----------|-------|--------------|
-| SWE-bench Verified | **59.2%** | +37.2% (2.7x) |
-| AIME 2025 | **91.6%** | +6.6% |
-| GPQA | **75.2%** | +1.8% |
-| τ²-Bench | **79.5%** | +30.5% |
-## Zen Coder Family
-| Tier | Model | Parameters | Active | Use Case |
-|------|-------|------------|--------|----------|
-| Small | [zen-coder-4b](https://huggingface.co/zenlm/zen-coder) | 4B | 4B | Edge/mobile |
-| **Flagship** | **zen-coder-flash** | **31B MoE** | **3B** | **Balanced** |
-| Max | [zen-max](https://huggingface.co/zenlm/zen-max) | 671B MoE | 14B | Frontier |
 ## Quick Start
-### Transformers
 ```python
-import torch
 from transformers import AutoModelForCausalLM, AutoTokenizer
 model_id = "zenlm/zen-coder-flash"
 tokenizer = AutoTokenizer.from_pretrained(model_id)
-model = AutoModelForCausalLM.from_pretrained(
-    model_id,
-    torch_dtype=torch.bfloat16,
-    device_map="auto",
-)
-messages = [{"role": "user", "content": "Write a Python function to find all prime numbers up to n using the Sieve of Eratosthenes"}]
-inputs = tokenizer.apply_chat_template(
-    messages,
-    tokenize=True,
-    add_generation_prompt=True,
-    return_dict=True,
-    return_tensors="pt",
-).to(model.device)
-outputs = model.generate(**inputs, max_new_tokens=512, do_sample=True, temperature=0.7)
-response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
-print(response)
-```
-### vLLM (Recommended for Production)
-```bash
-vllm serve zenlm/zen-coder-flash \
-    --tensor-parallel-size 4 \
-    --speculative-config.method mtp \
-    --speculative-config.num_speculative_tokens 1 \
-    --tool-call-parser zen-coder \
-    --reasoning-parser zen-coder \
-    --enable-auto-tool-choice
 ```
-### SGLang
 ```bash
-python -m sglang.launch_server \
-    --model-path zenlm/zen-coder-flash \
-    --tp-size 4 \
-    --tool-call-parser zen-coder \
-    --reasoning-parser zen-coder \
-    --speculative-algorithm EAGLE \
-    --speculative-num-steps 3
-```
-### MLX (Apple Silicon)
-```python
-from mlx_lm import load, generate
-model, tokenizer = load("zenlm/zen-coder-flash")
-response = generate(model, tokenizer, prompt="Write a Rust function for binary search", max_tokens=256)
-print(response)
-```
-## Capabilities
-### Code Generation
-- 100+ programming languages
-- Framework-aware completions
-- Test generation
-- Documentation generation
-### Debugging & Analysis
-- Bug detection and fixes
-- Code review
-- Performance optimization
-- Security analysis
-### Software Engineering
-- Architecture design
-- API design
-- Refactoring suggestions
-- Migration assistance
-### Tool Calling
-```python
-# Native function calling support
-tools = [
-    {
-        "type": "function",
-        "function": {
-            "name": "run_tests",
-            "description": "Run test suite",
-            "parameters": {"type": "object", "properties": {}}
-        }
-    }
-]
-```
-## Identity
-I am **Zen Coder Flash**, the flagship code-focused model in the Zen AI family. I combine a cutting-edge MoE architecture with Zen's philosophy of clarity and efficiency. With 31 billion parameters (only 3B active per token) and 131K context, I deliver frontier coding capability that's practical to deploy.
-## Training
-Zen Coder Flash is built through identity fine-tuning using MLX LoRA on Apple Silicon. The training emphasizes:
-- Zen identity and persona
-- Code-focused instruction following
-- Tool calling capabilities
-- Extended reasoning patterns
-## Citation
-```bibtex
-@misc{zen-coder-flash-2025,
-  title={Zen Coder Flash: Efficient Frontier Code Generation},
-  author={Hanzo AI},
-  year={2025},
-  url={https://huggingface.co/zenlm/zen-coder-flash}
-}
 ```
-## Links
-- **Website**: [zenlm.org](https://zenlm.org)
-- **GitHub**: [zenlm/zen](https://github.com/zenlm/zen)
-- **Organization**: [Hanzo AI](https://hanzo.ai)
 ## License
-MIT License
----
-*Zen AI: Clarity Through Intelligence*

 ---
+language: en
+license: apache-2.0
 tags:
+  - text-generation
+  - zen
+  - zenlm
+  - hanzo
+  - code
+  - coding
+  - fast
+pipeline_tag: text-generation
+library_name: transformers
 ---
+# Zen Coder Flash
+Ultra-fast compact code generation model optimized for real-time completions.
 ## Overview
+Built on **Zen MoDE (Mixture of Distilled Experts)** architecture with 4B parameters and 64K context window.
+Developed by [Hanzo AI](https://hanzo.ai) and the [Zoo Labs Foundation](https://zoo.ngo).
 ## Quick Start
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
 model_id = "zenlm/zen-coder-flash"
 tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto")
+messages = [{"role": "user", "content": "Hello!"}]
+text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+inputs = tokenizer([text], return_tensors="pt").to(model.device)
+outputs = model.generate(**inputs, max_new_tokens=512)
+print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
 ```
+## API Access
 ```bash
+curl https://api.hanzo.ai/v1/chat/completions \
+  -H "Authorization: Bearer $HANZO_API_KEY" \
+  -H "Content-Type: application/json" \
+  -d '{"model": "zen-coder-flash", "messages": [{"role": "user", "content": "Hello"}]}'
 ```
+Get your API key at [console.hanzo.ai](https://console.hanzo.ai) — $5 free credit on signup.
+## Model Details
+| Attribute | Value |
+|-----------|-------|
+| Parameters | 4B |
+| Architecture | Zen MoDE |
+| Context | 64K tokens |
+| License | Apache 2.0 |
 ## License
+Apache 2.0