You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

🚀 Quatfit Mini

Fast • Compact • Multimodal • Long Context • Agentic

📄 Technical Report


Quatfit Mini is an 8-billion-parameter multimodal foundation model developed by Quatfit AI Research.

Built for practical intelligence, Quatfit Mini combines advanced reasoning, multimodal understanding, coding capabilities, long-context processing, and agentic tool use in an efficient architecture optimized for real-world deployment.

Supporting 131K context, native vision and audio understanding, and up to 4× faster inference than conventional 8B models, Quatfit Mini delivers frontier-level capabilities while remaining accessible on consumer hardware.


✨ Key Features

  • 🧠 Native Multimodal Architecture
  • ⚡ Up to 4× Faster Inference
  • 📚 131K Token Context Window
  • 💻 Strong Coding Performance
  • 🖼️ Vision Understanding
  • 🎙️ Audio Understanding
  • 🤖 Agentic Tool Calling
  • 🪶 Consumer GPU Optimized
  • 🔥 GGUF Support
  • 🌍 Multilingual

📊 Performance Highlights

Benchmark Score
Overall Accuracy 89.08%
Coding 92.5%
Science 91.7%
Agentic Tasks 92.5%
CLI 95.0%
Exams 93.3%
Finance 90.0%
Social Intelligence 90.0%

🏗 Architecture

Quatfit Mini is built on the Quatfit 1 Architecture, engineered for efficient multimodal intelligence.

Language Model

Component Value
Parameters 8B
Layers 42
Hidden Size 2560
Attention Heads 8
KV Heads 2
Shared KV Layers 18
Feed Forward GeGLU
Precision BF16
Vocabulary 262K
Context Length 131,072

Vision Encoder

  • Vision Transformer
  • 16 Transformer Layers
  • 280 Visual Tokens
  • Patch Size: 16×16
  • Pan & Scan High-Resolution Support

Audio Encoder

  • Conformer Architecture
  • 12 Layers
  • Streaming Compatible
  • Causal Chunk Attention

⚡ Performance Optimizations

Quatfit Mini integrates multiple inference optimizations, including:

  • Flash Attention 3
  • Sliding Window Attention
  • Grouped Query Attention (GQA)
  • KV Cache Sharing
  • Speculative Decoding
  • GGUF Quantization

Inference Speed

Configuration Relative Speed
Standard 8B Model
Quatfit Mini BF16 2.5×
BF16 + Speculative Decoding 3.9×
GGUF Q4_K_M 4.1×

📈 Benchmark Breakdown

Domain Accuracy
Coding 92.5%
Science 91.7%
Agentic Tasks 92.5%
CLI 95.0%
Finance 90.0%
Security 90.0%
Reasoning 88.9%
Expert Knowledge 83.8%
Mathematics 81.3%

🚀 Quick Start

from transformers import AutoProcessor, AutoModelForImageTextToText

model = AutoModelForImageTextToText.from_pretrained(
    "Quatfit/Quatfit-Mini",
    torch_dtype="auto",
    device_map="auto"
)

processor = AutoProcessor.from_pretrained(
    "Quatfit/Quatfit-Mini"
)

💬 Example

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "Explain this image."
            },
            {
                "type": "image",
                "image": "example.jpg"
            }
        ]
    }
]

inputs = processor.apply_chat_template(
    messages,
    tokenize=True,
    return_tensors="pt"
)

outputs = model.generate(
    **inputs,
    max_new_tokens=512
)

print(processor.decode(outputs[0]))

💻 GGUF Support

Optimized GGUF builds are available for:

  • llama.cpp
  • Ollama
  • LM Studio
  • Jan
  • Open WebUI

Recommended Quantizations

Quantization Approx. VRAM
Q4_K_M ~5 GB
Q5_K_M ~6 GB
Q6_K ~7 GB
Q8_0 ~9 GB

🎯 Recommended Applications

Quatfit Mini is designed for practical AI systems, including:

  • AI Assistants
  • Agentic AI
  • Workflow Automation
  • Tool Calling
  • Research Copilots
  • Long-Document Analysis
  • OCR
  • Vision-Language Tasks
  • Audio Understanding
  • Information Retrieval
  • General Chat
  • MVP Software Development

📚 Training

Quatfit Mini was trained on approximately 10 trillion tokens, including:

  • Web Data
  • Programming Code
  • Mathematics
  • Scientific Literature
  • Wikipedia
  • Books
  • Multilingual Data
  • Image-Text Pairs
  • Audio Transcriptions

Post-training

  • Supervised Fine-Tuning (SFT)
  • Reinforcement Learning from Human Feedback (RLHF)
  • Constitutional AI Alignment

🌟 Core Strengths

  • ✅ Agentic AI
  • ✅ Long-Context Reasoning
  • ✅ Tool Use
  • ✅ Coding Assistance
  • ✅ Vision Understanding
  • ✅ Audio Understanding
  • ✅ Scientific Knowledge
  • ✅ Multilingual Intelligence

🎯 Intended Use

Quatfit Mini is an 8B multimodal foundation model primarily optimized for agentic AI applications.

It excels at:

  • Multi-step reasoning
  • Autonomous workflows
  • Tool orchestration
  • Long-context understanding
  • Research assistance
  • Document analysis
  • Vision-language tasks
  • Audio understanding
  • Productivity automation

While Quatfit Mini delivers strong programming performance, it is designed as a general-purpose reasoning model rather than a specialized coding model.

It performs well for:

  • Code generation
  • Debugging
  • API development
  • Script writing
  • Code explanation
  • MVP application development

⚠️ Limitations

Quatfit Mini prioritizes reasoning, multimodal intelligence, and agentic capabilities over benchmark-focused coding performance.

Although highly capable for everyday software development, it is not specifically optimized for:

  • Repository-scale software engineering
  • Competitive programming
  • Enterprise-scale refactoring
  • Performance-critical code synthesis

As with all foundation models, outputs should be reviewed before deployment in production or safety-critical environments.


📖 Citation

@article{quatfitmini2026,
  title={Quatfit Mini: A Compact Multimodal Foundation Model with Up to 4× Faster Inference},
  author={Quatfit AI Research},
  year={2026}
}

📜 License

Quatfit Mini is released under the Quatfit Non-Commercial License v1.

Commercial licensing is available through Quatfit AI Research.


🌍 Quatfit AI Research

Building practical AI systems that think, reason, create, and collaborate.

Performance First • Practical Intelligence • Open Innovation

⭐ If Quatfit Mini helps your work, consider starring the repository and sharing your projects with the community.

Downloads last month
-
Safetensors
Model size
8B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results

  • Overall Accuracy on Internal Evaluation Suite (815 Questions / 32 Categories)
    self-reported
    89.080