Instructions to use raxcore-dev/rax-3.5-chat with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use raxcore-dev/rax-3.5-chat with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="raxcore-dev/rax-3.5-chat") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("raxcore-dev/rax-3.5-chat") model = AutoModelForImageTextToText.from_pretrained("raxcore-dev/rax-3.5-chat") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use raxcore-dev/rax-3.5-chat with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "raxcore-dev/rax-3.5-chat" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "raxcore-dev/rax-3.5-chat", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/raxcore-dev/rax-3.5-chat
- SGLang
How to use raxcore-dev/rax-3.5-chat with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "raxcore-dev/rax-3.5-chat" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "raxcore-dev/rax-3.5-chat", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "raxcore-dev/rax-3.5-chat" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "raxcore-dev/rax-3.5-chat", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use raxcore-dev/rax-3.5-chat with Docker Model Runner:
docker model run hf.co/raxcore-dev/rax-3.5-chat
File size: 3,428 Bytes
83dff40 2704a04 83dff40 2704a04 83dff40 2704a04 83dff40 fc7e729 2704a04 fc7e729 2704a04 a7fc81c 2704a04 a7fc81c 2704a04 fc7e729 2704a04 fc7e729 2704a04 fc7e729 2704a04 fc7e729 2704a04 fc7e729 2704a04 fc7e729 2704a04 a7fc81c 2704a04 fc7e729 2704a04 fc7e729 2704a04 fc7e729 2704a04 fc7e729 2704a04 fc7e729 2704a04 fc7e729 2704a04 fc7e729 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 | ---
library_name: transformers
license: apache-2.0
pipeline_tag: image-text-to-text
tags:
- multimodal
- vision-language
- chat
---
# Rax 3.5 Chat
Rax 3.5 Chat is a compact 2B parameter multimodal model for vision-language understanding and conversational AI. It supports text and image inputs with extended context up to 262K tokens.
## Model Details
- **Parameters**: ~2B
- **Context Length**: 262,144 tokens
- **Input Modalities**: Text + Images
- **Attention**: Hybrid linear + full attention (24 layers)
- **Vision Encoder**: 24-layer transformer with 1024 hidden size
- **Text Hidden Size**: 2048
- **Precision**: BFloat16
## Key Features
- **Multimodal Understanding**: Processes text and images in unified reasoning
- **Long Context**: Supports up to 262K tokens for extended conversations
- **Efficient Architecture**: Hybrid attention mechanism for optimal performance
- **Production Ready**: Compatible with vLLM, SGLang, and Transformers
## Usage
### With Transformers
```python
from transformers import AutoModelForVision2Seq, AutoProcessor
from PIL import Image
model = AutoModelForVision2Seq.from_pretrained("raxcore/Rax-3.5-Chat", trust_remote_code=True)
processor = AutoProcessor.from_pretrained("raxcore/Rax-3.5-Chat", trust_remote_code=True)
# Text-only conversation
messages = [{"role": "user", "content": "What is the capital of France?"}]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=512)
print(processor.decode(outputs[0], skip_special_tokens=True))
# With image
image = Image.open("image.jpg")
messages = [{"role": "user", "content": [{"type": "image"}, {"type": "text", "text": "Describe this image."}]}]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=text, images=image, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=512)
print(processor.decode(outputs[0], skip_special_tokens=True))
```
### With vLLM
```bash
vllm serve raxcore/Rax-3.5-Chat --port 8000 --max-model-len 8192
```
```python
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="token")
response = client.chat.completions.create(
model="raxcore/Rax-3.5-Chat",
messages=[{"role": "user", "content": "Hello!"}],
temperature=0.7,
max_tokens=512
)
print(response.choices[0].message.content)
```
## Architecture Highlights
- **Hybrid Attention**: Alternates between linear attention and full attention layers for efficiency
- **Vision Encoder**: 24-layer transformer with patch size 16 and spatial merge 2x2
- **Efficient KV Cache**: 2 key-value heads for reduced memory footprint
- **Multi-resolution Position Embeddings**: Optimized for long-context understanding
## Best Practices
- Use temperature 0.6–0.8 for factual tasks, 0.8–1.0 for creative tasks
- For long context (>32K tokens), ensure sufficient GPU memory
- Enable trust_remote_code when loading the model
## Limitations
- 2B parameters may limit complex reasoning compared to larger models
- Vision understanding optimized for natural images
- Long context requires significant memory resources
## License
Apache 2.0
## Citation
```bibtex
@misc{rax3.5chat,
title={Rax 3.5 Chat: Efficient Multimodal Assistant Model},
author={Raxcore},
year={2026}
}
```
|