Instructions to use Quatfit/Quatfit-Mini with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Quatfit/Quatfit-Mini with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="Quatfit/Quatfit-Mini", trust_remote_code=True)
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Quatfit/Quatfit-Mini", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Quatfit/Quatfit-Mini with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Quatfit/Quatfit-Mini"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Quatfit/Quatfit-Mini",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/Quatfit/Quatfit-Mini

SGLang

How to use Quatfit/Quatfit-Mini with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Quatfit/Quatfit-Mini" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Quatfit/Quatfit-Mini",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Quatfit/Quatfit-Mini" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Quatfit/Quatfit-Mini",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use Quatfit/Quatfit-Mini with Docker Model Runner:
```
docker model run hf.co/Quatfit/Quatfit-Mini
```

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

🚀 Quatfit Mini

Fast • Compact • Multimodal • Long Context • Agentic

📄 Technical Report

Quatfit Mini is an 8-billion-parameter multimodal foundation model developed by Quatfit AI Research.

Built for practical intelligence, Quatfit Mini combines advanced reasoning, multimodal understanding, coding capabilities, long-context processing, and agentic tool use in an efficient architecture optimized for real-world deployment.

Supporting 131K context, native vision and audio understanding, and up to 4× faster inference than conventional 8B models, Quatfit Mini delivers frontier-level capabilities while remaining accessible on consumer hardware.

✨ Key Features

🧠 Native Multimodal Architecture
⚡ Up to 4× Faster Inference
📚 131K Token Context Window
💻 Strong Coding Performance
🖼️ Vision Understanding
🎙️ Audio Understanding
🤖 Agentic Tool Calling
🪶 Consumer GPU Optimized
🔥 GGUF Support
🌍 Multilingual

📊 Performance Highlights

Benchmark	Score
Overall Accuracy	89.08%
Coding	92.5%
Science	91.7%
Agentic Tasks	92.5%
CLI	95.0%
Exams	93.3%
Finance	90.0%
Social Intelligence	90.0%

🏗 Architecture

Quatfit Mini is built on the Quatfit 1 Architecture, engineered for efficient multimodal intelligence.

Language Model

Component	Value
Parameters	8B
Layers	42
Hidden Size	2560
Attention Heads	8
KV Heads	2
Shared KV Layers	18
Feed Forward	GeGLU
Precision	BF16
Vocabulary	262K
Context Length	131,072

Vision Encoder

Vision Transformer
16 Transformer Layers
280 Visual Tokens
Patch Size: 16×16
Pan & Scan High-Resolution Support

Audio Encoder

Conformer Architecture
12 Layers
Streaming Compatible
Causal Chunk Attention

⚡ Performance Optimizations

Quatfit Mini integrates multiple inference optimizations, including:

Flash Attention 3
Sliding Window Attention
Grouped Query Attention (GQA)
KV Cache Sharing
Speculative Decoding
GGUF Quantization

Inference Speed

Configuration	Relative Speed
Standard 8B Model	1×
Quatfit Mini BF16	2.5×
BF16 + Speculative Decoding	3.9×
GGUF Q4_K_M	4.1×

📈 Benchmark Breakdown

Domain	Accuracy
Coding	92.5%
Science	91.7%
Agentic Tasks	92.5%
CLI	95.0%
Finance	90.0%
Security	90.0%
Reasoning	88.9%
Expert Knowledge	83.8%
Mathematics	81.3%

🚀 Quick Start

from transformers import AutoProcessor, AutoModelForImageTextToText

model = AutoModelForImageTextToText.from_pretrained(
    "Quatfit/Quatfit-Mini",
    torch_dtype="auto",
    device_map="auto"
)

processor = AutoProcessor.from_pretrained(
    "Quatfit/Quatfit-Mini"
)

💬 Example

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "Explain this image."
            },
            {
                "type": "image",
                "image": "example.jpg"
            }
        ]
    }
]

inputs = processor.apply_chat_template(
    messages,
    tokenize=True,
    return_tensors="pt"
)

outputs = model.generate(
    **inputs,
    max_new_tokens=512
)

print(processor.decode(outputs[0]))

💻 GGUF Support

Optimized GGUF builds are available for:

llama.cpp
Ollama
LM Studio
Jan
Open WebUI

Recommended Quantizations

Quantization	Approx. VRAM
Q4_K_M	~5 GB
Q5_K_M	~6 GB
Q6_K	~7 GB
Q8_0	~9 GB

🎯 Recommended Applications

Quatfit Mini is designed for practical AI systems, including:

AI Assistants
Agentic AI
Workflow Automation
Tool Calling
Research Copilots
Long-Document Analysis
OCR
Vision-Language Tasks
Audio Understanding
Information Retrieval
General Chat
MVP Software Development

📚 Training

Quatfit Mini was trained on approximately 10 trillion tokens, including:

Web Data
Programming Code
Mathematics
Scientific Literature
Wikipedia
Books
Multilingual Data
Image-Text Pairs
Audio Transcriptions

Post-training

Supervised Fine-Tuning (SFT)
Reinforcement Learning from Human Feedback (RLHF)
Constitutional AI Alignment

🌟 Core Strengths

✅ Agentic AI
✅ Long-Context Reasoning
✅ Tool Use
✅ Coding Assistance
✅ Vision Understanding
✅ Audio Understanding
✅ Scientific Knowledge
✅ Multilingual Intelligence

🎯 Intended Use

Quatfit Mini is an 8B multimodal foundation model primarily optimized for agentic AI applications.

It excels at:

Multi-step reasoning
Autonomous workflows
Tool orchestration
Long-context understanding
Research assistance
Document analysis
Vision-language tasks
Audio understanding
Productivity automation

While Quatfit Mini delivers strong programming performance, it is designed as a general-purpose reasoning model rather than a specialized coding model.

It performs well for:

Code generation
Debugging
API development
Script writing
Code explanation
MVP application development

⚠️ Limitations

Quatfit Mini prioritizes reasoning, multimodal intelligence, and agentic capabilities over benchmark-focused coding performance.

Although highly capable for everyday software development, it is not specifically optimized for:

Repository-scale software engineering
Competitive programming
Enterprise-scale refactoring
Performance-critical code synthesis

As with all foundation models, outputs should be reviewed before deployment in production or safety-critical environments.

📖 Citation

@article{quatfitmini2026,
  title={Quatfit Mini: A Compact Multimodal Foundation Model with Up to 4× Faster Inference},
  author={Quatfit AI Research},
  year={2026}
}

📜 License

Quatfit Mini is released under the Quatfit Non-Commercial License v1.

Commercial licensing is available through Quatfit AI Research.

🌍 Quatfit AI Research

Building practical AI systems that think, reason, create, and collaborate.

Performance First • Practical Intelligence • Open Innovation

⭐ If Quatfit Mini helps your work, consider starring the repository and sharing your projects with the community.

Downloads last month: -

Safetensors

Model size

8B params

Tensor type

F32

Evaluation results

Overall Accuracy on Internal Evaluation Suite (815 Questions / 32 Categories)
self-reported

89.080