Astria Logo

Astria

Astria is a next-generation, fully local multimodal foundation model built on top of a Ministral-based language backbone and a custom vision encoder. This architecture significantly improves visual grounding, multilingual reasoning, and agentic reliability while remaining efficient enough for edge deployment.


πŸš€ Astria Update Highlights

Me7war’s latest Astria update pushes the limits of small-scale multimodal AI, combining efficiency, reasoning, and vision capabilities:

Key Features

  • Vision Mastery: Custom encoder enables deep image understanding and precise visual–text alignment.
  • Multilingual Support: Handles dozens of languagesβ€”English, French, Spanish, German, Italian, Portuguese, Dutch, Arabic, Chinese, Japanese, Koreanβ€”while maintaining strong reasoning and generation.
  • Agent-Ready: Native function calls, reliable JSON outputs, and strict prompt adherence make Astria fully agentic-capable.
  • Edge Efficiency: Optimized for minimal hardware without sacrificing performance.
  • Large Context Window: Up to 256k tokens for long-form reasoning, document-level comprehension, and complex multi-step tasks.
  • Enhanced Reasoning: Ministral backbone ensures stronger factual grounding, smoother multimodal alignment, and improved long-horizon reasoning.

Astria Benchmark

A fully local, compact model redefining what edge-deployable multimodal AI can achieve.


πŸ“Š Visual Reasoning Performance

Astria Performance

Astria applies a custom evaluation using GPT-5 PRO as the judge.

92.53% β€” New SOTA

LLaVA baseline: 90.92%

A custom evaluation on 30 unseen images with 3 instruction types per image (conversation, description, complex reasoning) shows Astria outperforms GPT-5 in all categories.

Evaluation: Astria vs GPT-5

Astria Evaluation

A custom evaluation set of 30 unseen images was constructed. Each image includes three instruction types:

  1. Conversational understanding
  2. Detailed visual description
  3. Complex multimodal reasoning

This yields 90 unique image–language tasks, evaluated on:

  • Astria
  • GPT-5

Scoring was performed by GPT-5 PRO, using a 1–10 scale per task.

Results

Astria outperforms GPT-5 across all instruction categories, validating the effectiveness of the custom vision encoder combined with the Ministral knowledge-enhanced language model.


Model Summary

  • Vision Encoder: Custom-built, with precise visual-text alignment
  • Language Backbone: Ministral-based, optimized for reasoning and factual accuracy
  • Training: End-to-end multimodal alignment with knowledge supervision
  • Output: Grounded, structured, and context-aware responses
  • Deployment: Fully local and edge-optimized, supporting up to 256k token context

License

Astria is released under the Astria License for personal and non-commercial use. Commercial use requires explicit permission from the creator.

Downloads last month
64
Safetensors
Model size
9B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support