zeekay commited on
Commit
e99f6d9
Β·
verified Β·
1 Parent(s): 807282c

Update README with Zen branding

Browse files
Files changed (1) hide show
  1. README.md +31 -72
README.md CHANGED
@@ -1,91 +1,50 @@
1
  ---
2
- license: apache-2.0
3
- tags:
4
- - vision-language
5
- - multimodal
6
- - function-calling
7
- - visual-agents
8
- - qwen3-vl
9
- - zen
10
- language:
11
- - en
12
- - multilingual
13
- base_model:
14
- - Qwen/Qwen3-VL-4B-Instruct
15
  library_name: transformers
16
  pipeline_tag: image-text-to-text
 
 
 
 
 
 
17
  ---
18
 
19
- # Zen Vl 4B Agent
20
-
21
- Zen VL 4B Agent - Vision-language model with function calling and tool use capabilities
22
 
23
- ## Model Details
24
 
25
- - **Architecture**: Qwen3-VL
26
- - **Parameters**: 4B
27
- - **Context Window**: 256K tokens (expandable to 1M)
28
- - **License**: Apache 2.0
29
- - **Training**: Fine-tuned with Zen identity and function calling
30
 
31
- ## Capabilities
 
 
 
 
 
32
 
33
- - 🎨 **Visual Understanding**: Image analysis, video comprehension, spatial reasoning
34
- - πŸ“ **OCR**: Text extraction in 32 languages
35
- - 🧠 **Multimodal Reasoning**: STEM, math, code generation
36
- - πŸ› οΈ **Function Calling**: Tool use with visual context
37
- - πŸ€– **Visual Agents**: GUI interaction, parameter extraction
38
-
39
- ## Usage
40
 
41
  ```python
42
- from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
43
- from PIL import Image
44
 
45
- # Load model
46
- model = Qwen3VLForConditionalGeneration.from_pretrained(
47
- "zenlm/zen-vl-4b-agent",
48
- device_map="auto"
49
  )
50
- processor = AutoProcessor.from_pretrained("zenlm/zen-vl-4b-agent")
51
-
52
- # Process image
53
- image = Image.open("example.jpg")
54
- prompt = "What's in this image?"
55
 
56
- messages = [{"role": "user", "content": prompt}]
57
- text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
58
- inputs = processor(text=text, images=image, return_tensors="pt").to(model.device)
59
-
60
- # Generate
61
- outputs = model.generate(**inputs, max_new_tokens=256)
62
- response = processor.decode(outputs[0], skip_special_tokens=True)
63
- print(response)
64
- ```
65
-
66
- ## Links
67
-
68
- - 🌐 **Website**: [zenlm.org](https://zenlm.org)
69
- - πŸ“š **GitHub**: [zenlm/zen-vl](https://github.com/zenlm/zen-vl)
70
- - πŸ“„ **Paper**: Coming soon
71
- - πŸ€— **Model Family**: [zenlm](https://huggingface.co/zenlm)
72
-
73
- ## Citation
74
-
75
- ```bibtex
76
- @misc{zenvl2025,
77
- title={Zen VL: Vision-Language Models with Integrated Function Calling},
78
- author={Hanzo AI Team},
79
- year={2025},
80
- publisher={Zen Language Models},
81
- url={https://github.com/zenlm/zen-vl}
82
- }
83
  ```
84
 
85
  ## License
86
 
87
  Apache 2.0
88
-
89
- ---
90
-
91
- Created by [Hanzo AI](https://hanzo.ai) for the Zen model family.
 
1
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  library_name: transformers
3
  pipeline_tag: image-text-to-text
4
+ tags:
5
+ - vision-language
6
+ - multimodal
7
+ - zen
8
+ - hanzo
9
+ license: apache-2.0
10
  ---
11
 
12
+ # Zen VL 4B Agent
 
 
13
 
14
+ **Zen LM by Hanzo AI** β€” Compact vision-language agent for multimodal reasoning.
15
 
16
+ ## Specs
 
 
 
 
17
 
18
+ | Property | Value |
19
+ |----------|-------|
20
+ | Parameters | 4B |
21
+ | Context Length | 32,768 tokens |
22
+ | Architecture | Zen MoDE (Mixture of Distilled Experts) |
23
+ | Task | Vision-Language / Agent |
24
 
25
+ ## API Access
 
 
 
 
 
 
26
 
27
  ```python
28
+ from openai import OpenAI
 
29
 
30
+ client = OpenAI(
31
+ base_url='https://api.hanzo.ai/v1',
32
+ api_key='your-api-key',
 
33
  )
 
 
 
 
 
34
 
35
+ response = client.chat.completions.create(
36
+ model='zen-vl-4b-agent',
37
+ messages=[{
38
+ 'role': 'user',
39
+ 'content': [
40
+ {'type': 'text', 'text': 'What is in this image?'},
41
+ {'type': 'image_url', 'image_url': {'url': 'https://example.com/image.jpg'}},
42
+ ],
43
+ }],
44
+ )
45
+ print(response.choices[0].message.content)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
46
  ```
47
 
48
  ## License
49
 
50
  Apache 2.0