zenlm
/

zen4-coder-flash

@@ -1,46 +1,65 @@
 ---
-license: mit
-language:
-- en
-- zh
 tags:
-- zen
-- zen-lm
-- zenlm
-- code
-- moe
-- abliterated
 pipeline_tag: text-generation
 library_name: transformers
 ---
-# Zen Coder Flash 31B
-Code-focused MoE model from the [Zen LM](https://zenlm.org) family by Hanzo AI.
-**Architecture**: Zen MoDE (Mixture of Distilled Experts)
-**Parameters**: 31B total / 3B active (MoE)
-**Context**: 131,072 tokens
-**License**: MIT
-## Abliteration
-Zen Coder Flash 31B weights are derived from an abliterated base model. Abliteration removes
-refusal behavior by identifying and nullifying the "refusal direction" in the model's
-residual stream, as described in
-[Maxime Labonne's methodology](https://huggingface.co/blog/mlabonne/abliteration).
-**Method**: Directional ablation on the residual stream across all layers
-**Source**: Zen MoDE (Mixture of Distilled Experts) architecture base
-**Effect**: Removes refusal behaviors while preserving all other capabilities
-**Identity layer**: Zen identity added via system prompt — full LoRA fine-tuning planned
-Abliteration is a feature, not a limitation. It enables unrestricted research,
-security testing, and applications where safety guardrails are managed at the
-application layer rather than baked into model weights.
-## Links
-- **Website**: https://zenlm.org
-- **GitHub**: https://github.com/zenlm
-- **HuggingFace**: https://huggingface.co/zenlm

 ---
+language: en
+license: apache-2.0
 tags:
+  - text-generation
+  - zen
+  - zenlm
+  - hanzo
+  - zen4
+  - code
+  - coding
+  - fast
 pipeline_tag: text-generation
 library_name: transformers
 ---
+# Zen4 Coder Flash
+Ultra-fast Zen4 code generation model for real-time completions and low-latency coding.
+## Overview
+Built on **Zen MoDE (Mixture of Distilled Experts)** architecture with 8B parameters and 64K context window.
+Developed by [Hanzo AI](https://hanzo.ai) and the [Zoo Labs Foundation](https://zoo.ngo).
+## Quick Start
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_id = "zenlm/zen4-coder-flash"
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto")
+messages = [{"role": "user", "content": "Hello!"}]
+text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+inputs = tokenizer([text], return_tensors="pt").to(model.device)
+outputs = model.generate(**inputs, max_new_tokens=512)
+print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
+```
+## API Access
+```bash
+curl https://api.hanzo.ai/v1/chat/completions \
+  -H "Authorization: Bearer $HANZO_API_KEY" \
+  -H "Content-Type: application/json" \
+  -d '{"model": "zen4-coder-flash", "messages": [{"role": "user", "content": "Hello"}]}'
+```
+Get your API key at [console.hanzo.ai](https://console.hanzo.ai) — $5 free credit on signup.
+## Model Details
+| Attribute | Value |
+|-----------|-------|
+| Parameters | 8B |
+| Architecture | Zen MoDE |
+| Context | 64K tokens |
+| License | Apache 2.0 |
+## License
+Apache 2.0