Instructions to use raxcore-dev/Rax-4.5 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use raxcore-dev/Rax-4.5 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="raxcore-dev/Rax-4.5")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("raxcore-dev/Rax-4.5")
model = AutoModelForMultimodalLM.from_pretrained("raxcore-dev/Rax-4.5")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use raxcore-dev/Rax-4.5 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "raxcore-dev/Rax-4.5"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "raxcore-dev/Rax-4.5",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/raxcore-dev/Rax-4.5

SGLang

How to use raxcore-dev/Rax-4.5 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "raxcore-dev/Rax-4.5" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "raxcore-dev/Rax-4.5",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "raxcore-dev/Rax-4.5" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "raxcore-dev/Rax-4.5",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use raxcore-dev/Rax-4.5 with Docker Model Runner:
```
docker model run hf.co/raxcore-dev/Rax-4.5
```

raxder-ai commited on 8 days ago

Commit

4465e83

verified ·

1 Parent(s): 7f957d0

Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +139 -57

README.md CHANGED Viewed

@@ -5,105 +5,187 @@ pipeline_tag: image-text-to-text
 tags:
 - multimodal
 - vision-language
-- chat
 ---
-# Rax 3.5 Chat
-Rax 3.5 Chat is a compact 2B parameter multimodal model for vision-language understanding and conversational AI. It supports text and image inputs with extended context up to 262K tokens.
-## Model Details
-- **Parameters**: ~2B
-- **Context Length**: 262,144 tokens
-- **Input Modalities**: Text + Images
-- **Attention**: Hybrid linear + full attention (24 layers)
-- **Vision Encoder**: 24-layer transformer with 1024 hidden size
-- **Text Hidden Size**: 2048
-- **Precision**: BFloat16
-## Key Features
-- **Multimodal Understanding**: Processes text and images in unified reasoning
-- **Long Context**: Supports up to 262K tokens for extended conversations
-- **Efficient Architecture**: Hybrid attention mechanism for optimal performance
-- **Production Ready**: Compatible with vLLM, SGLang, and Transformers
-## Usage
-### With Transformers
-```python
 from transformers import AutoModelForVision2Seq, AutoProcessor
 from PIL import Image
-model = AutoModelForVision2Seq.from_pretrained("raxcore/Rax-3.5-Chat", trust_remote_code=True)
-processor = AutoProcessor.from_pretrained("raxcore/Rax-3.5-Chat", trust_remote_code=True)
-# Text-only conversation
-messages = [{"role": "user", "content": "What is the capital of France?"}]
 text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
 inputs = processor(text=text, return_tensors="pt")
 outputs = model.generate(**inputs, max_new_tokens=512)
 print(processor.decode(outputs[0], skip_special_tokens=True))
-# With image
-image = Image.open("image.jpg")
-messages = [{"role": "user", "content": [{"type": "image"}, {"type": "text", "text": "Describe this image."}]}]
 text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
 inputs = processor(text=text, images=image, return_tensors="pt")
 outputs = model.generate(**inputs, max_new_tokens=512)
 print(processor.decode(outputs[0], skip_special_tokens=True))
-```
-### With vLLM
-```bash
-vllm serve raxcore/Rax-3.5-Chat --port 8000 --max-model-len 8192
-```
-```python
 from openai import OpenAI
 client = OpenAI(base_url="http://localhost:8000/v1", api_key="token")
 response = client.chat.completions.create(
-    model="raxcore/Rax-3.5-Chat",
-    messages=[{"role": "user", "content": "Hello!"}],
     temperature=0.7,
-    max_tokens=512
 )
 print(response.choices[0].message.content)
-```
-## Architecture Highlights
-- **Hybrid Attention**: Alternates between linear attention and full attention layers for efficiency
-- **Vision Encoder**: 24-layer transformer with patch size 16 and spatial merge 2x2
-- **Efficient KV Cache**: 2 key-value heads for reduced memory footprint
-- **Multi-resolution Position Embeddings**: Optimized for long-context understanding
-## Best Practices
-- Use temperature 0.6–0.8 for factual tasks, 0.8–1.0 for creative tasks
-- For long context (>32K tokens), ensure sufficient GPU memory
-- Enable trust_remote_code when loading the model
-## Limitations
-- 2B parameters may limit complex reasoning compared to larger models
-- Vision understanding optimized for natural images
-- Long context requires significant memory resources
-## License
-Apache 2.0
-## Citation
-```bibtex
-@misc{rax3.5chat,
-  title={Rax 3.5 Chat: Efficient Multimodal Assistant Model},
   author={Raxcore},
-  year={2026}
 }
-```

 tags:
 - multimodal
 - vision-language
+- vision
+- image-to-text
+- llm
+- vision-language-model
+- computer-vision
+- deep-learning
+- pytorch
+- transformers
+- vlm
+- 2b
+- efficient
+- production
+inference: true
 ---
+# Rax 4.5 - Efficient 2B Vision Language Model | Multimodal AI
+**Rax 4.5** is a state-of-the-art 2 billion parameter multimodal vision-language model optimized for production use. Process images and text together with up to 262K token context length.
+## 🚀 Why Rax 4.5?
+- **⚡ Fast & Efficient**: Only 2B parameters for quick inference
+- **🖼️ Vision + Text**: True multimodal understanding of images and language
+- **📏 Long Context**: 262,144 token context window for complex tasks
+- **🔧 Production Ready**: Works with vLLM, SGLang, Transformers out of the box
+- **💾 Memory Efficient**: Hybrid attention architecture reduces VRAM usage
+## Model Specifications
+| Feature | Details |
+|---------|---------|
+| **Parameters** | ~2 Billion |
+| **Context Length** | 262,144 tokens |
+| **Input Types** | Text + Images |
+| **Architecture** | Hybrid Linear + Full Attention (24 layers) |
+| **Vision Encoder** | 24-layer ViT, 1024 hidden size |
+| **Text Hidden Size** | 2048 |
+| **Precision** | BFloat16 |
+| **License** | Apache 2.0 |
+## 🔥 Key Capabilities
+✅ **Image Understanding** - Analyze, describe, and answer questions about images
+✅ **Visual Question Answering** - Extract information from screenshots, documents, charts
+✅ **Multimodal Reasoning** - Combine visual and textual information for complex tasks
+✅ **Long Context Processing** - Handle extensive documents with visual elements
+✅ **Production Deployment** - Optimized for real-world applications
+## Quick Start
+### Installation
+\`\`\`bash
+pip install transformers pillow torch accelerate
+\`\`\`
+### Basic Usage with Transformers
+\`\`\`python
 from transformers import AutoModelForVision2Seq, AutoProcessor
 from PIL import Image
+# Load model
+model = AutoModelForVision2Seq.from_pretrained(
+    "raxcore-dev/rax-3.5-chat",
+    trust_remote_code=True
+)
+processor = AutoProcessor.from_pretrained(
+    "raxcore-dev/rax-3.5-chat",
+    trust_remote_code=True
+)
+# Text generation
+messages = [{"role": "user", "content": "Explain quantum computing"}]
 text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
 inputs = processor(text=text, return_tensors="pt")
 outputs = model.generate(**inputs, max_new_tokens=512)
 print(processor.decode(outputs[0], skip_special_tokens=True))
+# Image analysis
+image = Image.open("photo.jpg")
+messages = [{
+    "role": "user",
+    "content": [
+        {"type": "image"},
+        {"type": "text", "text": "What's in this image? Be detailed."}
+    ]
+}]
 text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
 inputs = processor(text=text, images=image, return_tensors="pt")
 outputs = model.generate(**inputs, max_new_tokens=512)
 print(processor.decode(outputs[0], skip_special_tokens=True))
+\`\`\`
+### Deploy with vLLM (High Performance)
+\`\`\`bash
+# Start vLLM server
+vllm serve raxcore-dev/rax-3.5-chat --port 8000 --max-model-len 8192
+\`\`\`
+\`\`\`python
 from openai import OpenAI
 client = OpenAI(base_url="http://localhost:8000/v1", api_key="token")
 response = client.chat.completions.create(
+    model="raxcore-dev/rax-3.5-chat",
+    messages=[
+        {"role": "system", "content": "You are a helpful AI assistant."},
+        {"role": "user", "content": "Write a Python function to sort a list."}
+    ],
     temperature=0.7,
+    max_tokens=1024
 )
 print(response.choices[0].message.content)
+\`\`\`
+## 🏗️ Architecture Details
+- **Hybrid Attention Mechanism**: Alternates between linear and full attention for efficiency
+- **Vision Transformer**: 24-layer encoder with 16x16 patch size, 2x2 spatial merging
+- **Optimized KV Cache**: 2 key-value heads for 75% memory reduction
+- **Multi-Resolution Position Embeddings**: Handles various image sizes and long sequences
+- **Cross-Modal Fusion**: Advanced alignment between vision and language representations
+## 📊 Use Cases
+- **Document Analysis**: Extract data from invoices, receipts, forms
+- **Visual QA Systems**: Build AI that answers questions about images
+- **Content Moderation**: Analyze images with contextual understanding
+- **Educational Tools**: Explain diagrams, charts, and scientific images
+- **Accessibility**: Generate detailed image descriptions for visually impaired users
+- **E-commerce**: Product analysis and description generation
+- **Medical Imaging**: Assist with image interpretation (not diagnostic)
+## ⚙️ Performance Tips
+- **Temperature**: Use 0.6-0.8 for factual tasks, 0.8-1.0 for creative content
+- **Context Window**: For >32K tokens, ensure 24GB+ VRAM
+- **Batch Processing**: Process multiple images/texts together for efficiency
+- **Quantization**: Use 4-bit/8-bit quantization for lower memory footprint
+- **GPU Requirements**: Minimum 12GB VRAM (16GB recommended)
+## 🚨 Limitations
+- 2B parameters may struggle with highly complex reasoning vs larger models
+- Vision encoder optimized for natural images (not specialized medical/satellite imagery)
+- Long context (>100K tokens) requires significant GPU memory
+- Not fine-tuned for specific domains without additional training
+## 🤝 Model Comparison
+| Model | Params | Context | Multimodal | Speed |
+|-------|--------|---------|------------|-------|
+| **Rax 4.5** | 2B | 262K | ✅ | ⚡⚡⚡ |
+| LLaVA 1.5 | 7B | 4K | ✅ | ⚡⚡ |
+| GPT-4V | - | 128K | ✅ | ⚡ |
+| Qwen-VL | 7B | 32K | ✅ | ⚡⚡ |
+## 📖 Citation
+\`\`\`bibtex
+@misc{rax4.5,
+  title={Rax 4.5: Efficient Multimodal Vision-Language Model},
   author={Raxcore},
+  year={2026},
+  url={https://huggingface.co/raxcore-dev/rax-3.5-chat}
 }
+\`\`\`
+## 📄 License
+Apache 2.0 - Free for commercial and research use
+## 🔗 Links
+- [Model Card](https://huggingface.co/raxcore-dev/rax-3.5-chat)
+- [Raxcore GitHub](https://github.com/raxcore-dev)
+---
+**Keywords**: vision language model, multimodal AI, image to text, VLM, computer vision, transformers, efficient LLM, 2B parameters, long context, production AI, visual question answering, image understanding, open source AI model