zenlm
/

zen-vl-4b-agent

@@ -1,91 +1,50 @@
 ---
-license: apache-2.0
-tags:
-- vision-language
-- multimodal
-- function-calling
-- visual-agents
-- qwen3-vl
-- zen
-language:
-- en
-- multilingual
-base_model:
-- Qwen/Qwen3-VL-4B-Instruct
 library_name: transformers
 pipeline_tag: image-text-to-text
 ---
-# Zen Vl 4B Agent
-Zen VL 4B Agent - Vision-language model with function calling and tool use capabilities
-## Model Details
-- **Architecture**: Qwen3-VL
-- **Parameters**: 4B
-- **Context Window**: 256K tokens (expandable to 1M)
-- **License**: Apache 2.0
-- **Training**: Fine-tuned with Zen identity and function calling
-## Capabilities
-- 🎨 **Visual Understanding**: Image analysis, video comprehension, spatial reasoning
-- 📝 **OCR**: Text extraction in 32 languages
-- 🧠 **Multimodal Reasoning**: STEM, math, code generation
-- 🛠️ **Function Calling**: Tool use with visual context
-- 🤖 **Visual Agents**: GUI interaction, parameter extraction
-## Usage
 ```python
-from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
-from PIL import Image
-# Load model
-model = Qwen3VLForConditionalGeneration.from_pretrained(
-    "zenlm/zen-vl-4b-agent",
-    device_map="auto"
 )
-processor = AutoProcessor.from_pretrained("zenlm/zen-vl-4b-agent")
-# Process image
-image = Image.open("example.jpg")
-prompt = "What's in this image?"
-messages = [{"role": "user", "content": prompt}]
-text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
-inputs = processor(text=text, images=image, return_tensors="pt").to(model.device)
-# Generate
-outputs = model.generate(**inputs, max_new_tokens=256)
-response = processor.decode(outputs[0], skip_special_tokens=True)
-print(response)
-```
-## Links
-- 🌐 **Website**: [zenlm.org](https://zenlm.org)
-- 📚 **GitHub**: [zenlm/zen-vl](https://github.com/zenlm/zen-vl)
-- 📄 **Paper**: Coming soon
-- 🤗 **Model Family**: [zenlm](https://huggingface.co/zenlm)
-## Citation
-```bibtex
-@misc{zenvl2025,
-  title={Zen VL: Vision-Language Models with Integrated Function Calling},
-  author={Hanzo AI Team},
-  year={2025},
-  publisher={Zen Language Models},
-  url={https://github.com/zenlm/zen-vl}
-}
 ```
 ## License
 Apache 2.0
----
-Created by [Hanzo AI](https://hanzo.ai) for the Zen model family.

 ---
 library_name: transformers
 pipeline_tag: image-text-to-text
+tags:
+  - vision-language
+  - multimodal
+  - zen
+  - hanzo
+license: apache-2.0
 ---
+# Zen VL 4B Agent
+**Zen LM by Hanzo AI** — Compact vision-language agent for multimodal reasoning.
+## Specs
+| Property | Value |
+|----------|-------|
+| Parameters | 4B |
+| Context Length | 32,768 tokens |
+| Architecture | Zen MoDE (Mixture of Distilled Experts) |
+| Task | Vision-Language / Agent |
+## API Access
 ```python
+from openai import OpenAI
+client = OpenAI(
+    base_url='https://api.hanzo.ai/v1',
+    api_key='your-api-key',
 )
+response = client.chat.completions.create(
+    model='zen-vl-4b-agent',
+    messages=[{
+        'role': 'user',
+        'content': [
+            {'type': 'text', 'text': 'What is in this image?'},
+            {'type': 'image_url', 'image_url': {'url': 'https://example.com/image.jpg'}},
+        ],
+    }],
+)
+print(response.choices[0].message.content)
 ```
 ## License
 Apache 2.0