Instructions to use raxcore-dev/rax-3.5-chat with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use raxcore-dev/rax-3.5-chat with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="raxcore-dev/rax-3.5-chat") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("raxcore-dev/rax-3.5-chat") model = AutoModelForImageTextToText.from_pretrained("raxcore-dev/rax-3.5-chat") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use raxcore-dev/rax-3.5-chat with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "raxcore-dev/rax-3.5-chat" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "raxcore-dev/rax-3.5-chat", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/raxcore-dev/rax-3.5-chat
- SGLang
How to use raxcore-dev/rax-3.5-chat with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "raxcore-dev/rax-3.5-chat" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "raxcore-dev/rax-3.5-chat", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "raxcore-dev/rax-3.5-chat" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "raxcore-dev/rax-3.5-chat", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use raxcore-dev/rax-3.5-chat with Docker Model Runner:
docker model run hf.co/raxcore-dev/rax-3.5-chat
| library_name: transformers | |
| license: apache-2.0 | |
| pipeline_tag: image-text-to-text | |
| tags: | |
| - multimodal | |
| - vision-language | |
| - chat | |
| # Rax 3.5 Chat | |
| Rax 3.5 Chat is a compact 2B parameter multimodal model for vision-language understanding and conversational AI. It supports text and image inputs with extended context up to 262K tokens. | |
| ## Model Details | |
| - **Parameters**: ~2B | |
| - **Context Length**: 262,144 tokens | |
| - **Input Modalities**: Text + Images | |
| - **Attention**: Hybrid linear + full attention (24 layers) | |
| - **Vision Encoder**: 24-layer transformer with 1024 hidden size | |
| - **Text Hidden Size**: 2048 | |
| - **Precision**: BFloat16 | |
| ## Key Features | |
| - **Multimodal Understanding**: Processes text and images in unified reasoning | |
| - **Long Context**: Supports up to 262K tokens for extended conversations | |
| - **Efficient Architecture**: Hybrid attention mechanism for optimal performance | |
| - **Production Ready**: Compatible with vLLM, SGLang, and Transformers | |
| ## Usage | |
| ### With Transformers | |
| ```python | |
| from transformers import AutoModelForVision2Seq, AutoProcessor | |
| from PIL import Image | |
| model = AutoModelForVision2Seq.from_pretrained("raxcore/Rax-3.5-Chat", trust_remote_code=True) | |
| processor = AutoProcessor.from_pretrained("raxcore/Rax-3.5-Chat", trust_remote_code=True) | |
| # Text-only conversation | |
| messages = [{"role": "user", "content": "What is the capital of France?"}] | |
| text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) | |
| inputs = processor(text=text, return_tensors="pt") | |
| outputs = model.generate(**inputs, max_new_tokens=512) | |
| print(processor.decode(outputs[0], skip_special_tokens=True)) | |
| # With image | |
| image = Image.open("image.jpg") | |
| messages = [{"role": "user", "content": [{"type": "image"}, {"type": "text", "text": "Describe this image."}]}] | |
| text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) | |
| inputs = processor(text=text, images=image, return_tensors="pt") | |
| outputs = model.generate(**inputs, max_new_tokens=512) | |
| print(processor.decode(outputs[0], skip_special_tokens=True)) | |
| ``` | |
| ### With vLLM | |
| ```bash | |
| vllm serve raxcore/Rax-3.5-Chat --port 8000 --max-model-len 8192 | |
| ``` | |
| ```python | |
| from openai import OpenAI | |
| client = OpenAI(base_url="http://localhost:8000/v1", api_key="token") | |
| response = client.chat.completions.create( | |
| model="raxcore/Rax-3.5-Chat", | |
| messages=[{"role": "user", "content": "Hello!"}], | |
| temperature=0.7, | |
| max_tokens=512 | |
| ) | |
| print(response.choices[0].message.content) | |
| ``` | |
| ## Architecture Highlights | |
| - **Hybrid Attention**: Alternates between linear attention and full attention layers for efficiency | |
| - **Vision Encoder**: 24-layer transformer with patch size 16 and spatial merge 2x2 | |
| - **Efficient KV Cache**: 2 key-value heads for reduced memory footprint | |
| - **Multi-resolution Position Embeddings**: Optimized for long-context understanding | |
| ## Best Practices | |
| - Use temperature 0.6–0.8 for factual tasks, 0.8–1.0 for creative tasks | |
| - For long context (>32K tokens), ensure sufficient GPU memory | |
| - Enable trust_remote_code when loading the model | |
| ## Limitations | |
| - 2B parameters may limit complex reasoning compared to larger models | |
| - Vision understanding optimized for natural images | |
| - Long context requires significant memory resources | |
| ## License | |
| Apache 2.0 | |
| ## Citation | |
| ```bibtex | |
| @misc{rax3.5chat, | |
| title={Rax 3.5 Chat: Efficient Multimodal Assistant Model}, | |
| author={Raxcore}, | |
| year={2026} | |
| } | |
| ``` | |