Zen Vl 4b Instruct

Compact 4B vision-language model for image understanding and multimodal instruction following.

Overview

Built on Zen MoDE (Mixture of Distilled Experts) architecture with 4B parameters and 32K context window.

Developed by Hanzo AI and the Zoo Labs Foundation.

Quick Start

from transformers import AutoModelForVision2Seq, AutoProcessor
from PIL import Image
import torch

model_id = "zenlm/zen-vl-4b-instruct"
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForVision2Seq.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True)

messages = [
    {"role": "user", "content": "Describe this image in detail."}
]

# Text-only
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(processor.batch_decode(outputs, skip_special_tokens=True)[0])

API Access

from openai import OpenAI

client = OpenAI(base_url="https://api.hanzo.ai/v1", api_key="your-api-key")
response = client.chat.completions.create(
    model="zen-vl-4b-instruct",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)

Model Details

Attribute Value
Parameters 4B
Architecture Zen MoDE
Context 32K tokens
License Apache 2.0

License

Apache 2.0

Downloads last month
32
Safetensors
Model size
4B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for zenlm/zen-vl-4b-instruct

Quantizations
2 models

Spaces using zenlm/zen-vl-4b-instruct 2