README / README.md
muhammad0-0hreden's picture
Update README.md
758ab3a verified
metadata
title: Misraj AI
colorFrom: green
colorTo: purple
sdk: static
pinned: true

مِسراج β€” Misraj AI

Built on Trust. Measured by Impact.
The next-generation Arabic AI lab β€” building the foundational infrastructure for Arabic language understanding, generation, and document intelligence.


🧭 About Us

Misraj AI is the AI research division of Misraj Technology, a Saudi-based technology group with over 10 years of experience delivering enterprise digital solutions across 15 sectors. Our AI lab is dedicated to a singular mission: making Arabic a first-class language in the modern AI era.

We develop open models, large-scale datasets, rigorous benchmarks, and production-ready AI systems β€” all purpose-built for Arabic, a morphologically rich language that has long been underserved by mainstream AI research.

From our research lab to operational products, we build a comprehensive system that enables governments and enterprises to adopt AI with confidence, depth, and speed.

πŸ“Š 15+ research papers Β· 35 billion open Arabic data tokens Β· Honored by AI Pioneers


🏒 Areas of Expertise

Our AI solutions span critical industry verticals, combining deep domain knowledge with state-of-the-art Arabic NLP:

  • πŸ₯ Healthcare Technology β€” Clinical documentation and Arabic medical NLP
  • 🏦 Financial Technology β€” Document intelligence for banking and finance
  • βš–οΈ Legal Technology β€” Contract analysis and legal document processing
  • πŸŽ“ Educational Technology β€” Arabic learning and knowledge systems
  • πŸ›οΈ Administrative Technology β€” Government and enterprise document automation

πŸ“ˆ Open Benchmarks & SOTA Results

We develop rigorous, expert-verified benchmarks to establish clear performance standards for Arabic AI. Our models consistently lead these benchmarks against both open-source and commercial competitors.

Benchmark Focus Key Performance (SOTA)
Misraj-DocOCR Arabic Document OCR Baseer achieves 0.25 WER, outperforming Azure AI and Gemini 2.5 Pro.
KITAB-Reviewed PDF-to-Markdown Baseer leads in structure with a 56 TEDS and 68.13 MARS score.
Tarjama-25 Bi-directional Translation Mutarjim (1.5B) outperforms models 20x its size (including GPT-4o mini) in EN→AR.
SadeedDiac-25 Arabic Diacritization Sadeed achieves a competitive 1.2% Diacritic Error Rate (DER).

πŸ“¦ Open Datasets

Our large-scale datasets provide the foundational fuel for high-performance Arabic model training.

Dataset Description Scale
msdd Misraj Structured Document Dataset 26.4M rows
mudd Misraj Unstructured Document Dataset 4.76M rows
Arabic-Image-Captioning Multimodal Arabic captioning pairs 100M pairs
Sadeed Tashkeela Cleaned & expert-filtered diacritization corpus 1.05M samples

πŸ“Š 35+ billion open Arabic data tokens released and growing.


πŸ“¬ Connect With Us

Platform Link
🌐 Misraj AI misraj.ai/en
🌐 Misraj Technology misraj.sa/en
πŸ”΅ Baseer OCR baseerocr.com
πŸ€— Hugging Face huggingface.co/Misraj
πŸ’Ό LinkedIn linkedin.com/company/aimisraj
🐦 X / Twitter @aimisraj
πŸ’» GitHub github.com/misraj-ai
πŸ“Έ Instagram @misraj__ai