title: Misraj AI
colorFrom: green
colorTo: purple
sdk: static
pinned: true
Ω ΩΨ³Ψ±Ψ§Ψ¬ β Misraj AI
Built on Trust. Measured by Impact.
The next-generation Arabic AI lab β building the foundational infrastructure for Arabic language understanding, generation, and document intelligence.
π§ About Us
Misraj AI is the AI research division of Misraj Technology, a Saudi-based technology group with over 10 years of experience delivering enterprise digital solutions across 15 sectors. Our AI lab is dedicated to a singular mission: making Arabic a first-class language in the modern AI era.
We develop open models, large-scale datasets, rigorous benchmarks, and production-ready AI systems β all purpose-built for Arabic, a morphologically rich language that has long been underserved by mainstream AI research.
From our research lab to operational products, we build a comprehensive system that enables governments and enterprises to adopt AI with confidence, depth, and speed.
π 15+ research papers Β· 35 billion open Arabic data tokens Β· Honored by AI Pioneers
π’ Areas of Expertise
Our AI solutions span critical industry verticals, combining deep domain knowledge with state-of-the-art Arabic NLP:
- π₯ Healthcare Technology β Clinical documentation and Arabic medical NLP
- π¦ Financial Technology β Document intelligence for banking and finance
- βοΈ Legal Technology β Contract analysis and legal document processing
- π Educational Technology β Arabic learning and knowledge systems
- ποΈ Administrative Technology β Government and enterprise document automation
π Open Benchmarks & SOTA Results
We develop rigorous, expert-verified benchmarks to establish clear performance standards for Arabic AI. Our models consistently lead these benchmarks against both open-source and commercial competitors.
| Benchmark | Focus | Key Performance (SOTA) |
|---|---|---|
| Misraj-DocOCR | Arabic Document OCR | Baseer achieves 0.25 WER, outperforming Azure AI and Gemini 2.5 Pro. |
| KITAB-Reviewed | PDF-to-Markdown | Baseer leads in structure with a 56 TEDS and 68.13 MARS score. |
| Tarjama-25 | Bi-directional Translation | Mutarjim (1.5B) outperforms models 20x its size (including GPT-4o mini) in ENβAR. |
| SadeedDiac-25 | Arabic Diacritization | Sadeed achieves a competitive 1.2% Diacritic Error Rate (DER). |
π¦ Open Datasets
Our large-scale datasets provide the foundational fuel for high-performance Arabic model training.
| Dataset | Description | Scale |
|---|---|---|
| msdd | Misraj Structured Document Dataset | 26.4M rows |
| mudd | Misraj Unstructured Document Dataset | 4.76M rows |
| Arabic-Image-Captioning | Multimodal Arabic captioning pairs | 100M pairs |
| Sadeed Tashkeela | Cleaned & expert-filtered diacritization corpus | 1.05M samples |
π 35+ billion open Arabic data tokens released and growing.
π¬ Connect With Us
| Platform | Link |
|---|---|
| π Misraj AI | misraj.ai/en |
| π Misraj Technology | misraj.sa/en |
| π΅ Baseer OCR | baseerocr.com |
| π€ Hugging Face | huggingface.co/Misraj |
| πΌ LinkedIn | linkedin.com/company/aimisraj |
| π¦ X / Twitter | @aimisraj |
| π» GitHub | github.com/misraj-ai |
| πΈ Instagram | @misraj__ai |