| --- |
| title: Misraj AI |
| colorFrom: green |
| colorTo: purple |
| sdk: static |
| pinned: true |
| --- |
| |
| # Ω
ΩΨ³Ψ±Ψ§Ψ¬ β Misraj AI |
|
|
| > **Built on Trust. Measured by Impact.** |
| > The next-generation Arabic AI lab β building the foundational infrastructure for Arabic language understanding, generation, and document intelligence. |
|
|
| --- |
|
|
| ## π§ About Us |
|
|
| **Misraj AI** is the AI research division of **Misraj Technology**, a Saudi-based technology group with over **10 years** of experience delivering enterprise digital solutions across 15 sectors. Our AI lab is dedicated to a singular mission: making Arabic a first-class language in the modern AI era. |
|
|
| We develop open models, large-scale datasets, rigorous benchmarks, and production-ready AI systems β all purpose-built for Arabic, a morphologically rich language that has long been underserved by mainstream AI research. |
|
|
| From our research lab to operational products, we build a comprehensive system that enables governments and enterprises to adopt AI with **confidence, depth, and speed**. |
|
|
| > π **15+ research papers** Β· **35 billion open Arabic data tokens** Β· Honored by **AI Pioneers** |
|
|
| --- |
|
|
| ## π’ Areas of Expertise |
|
|
| Our AI solutions span critical industry verticals, combining deep domain knowledge with state-of-the-art Arabic NLP: |
|
|
| - π₯ **Healthcare Technology** β Clinical documentation and Arabic medical NLP |
| - π¦ **Financial Technology** β Document intelligence for banking and finance |
| - βοΈ **Legal Technology** β Contract analysis and legal document processing |
| - π **Educational Technology** β Arabic learning and knowledge systems |
| - ποΈ **Administrative Technology** β Government and enterprise document automation |
|
|
| --- |
|
|
| ## π Open Benchmarks & SOTA Results |
|
|
| We develop rigorous, expert-verified benchmarks to establish clear performance standards for Arabic AI. Our models consistently lead these benchmarks against both open-source and commercial competitors. |
|
|
| | Benchmark | Focus | Key Performance (SOTA) | |
| | :--- | :--- | :--- | |
| | [**Misraj-DocOCR**](https://huggingface.co/datasets/Misraj/Misraj-DocOCR) | Arabic Document OCR | **Baseer** achieves **0.25 WER**, outperforming Azure AI and Gemini 2.5 Pro. | |
| | [**KITAB-Reviewed**](https://huggingface.co/datasets/Misraj/KITAB_pdf_to_markdown_reviewed) | PDF-to-Markdown | **Baseer** leads in structure with a **56 TEDS** and **68.13 MARS** score. | |
| | [**Tarjama-25**](https://huggingface.co/datasets/Misraj/Tarjama-25) | Bi-directional Translation | **Mutarjim (1.5B)** outperforms models 20x its size (including GPT-4o mini) in ENβAR. | |
| | [**SadeedDiac-25**](https://huggingface.co/datasets/Misraj/SadeedDiac-25) | Arabic Diacritization | **Sadeed** achieves a competitive **1.2% Diacritic Error Rate (DER)**. | |
|
|
| --- |
|
|
| ## π¦ Open Datasets |
|
|
| Our large-scale datasets provide the foundational fuel for high-performance Arabic model training. |
|
|
| | Dataset | Description | Scale | |
| | :--- | :--- | :--- | |
| | [msdd](https://huggingface.co/datasets/Misraj/msdd) | Misraj Structured Document Dataset | 26.4M rows | |
| | [mudd](https://huggingface.co/datasets/Misraj/mudd) | Misraj Unstructured Document Dataset | 4.76M rows | |
| | [Arabic-Image-Captioning](https://huggingface.co/datasets/Misraj/Arabic-Image-Captioning_100M) | Multimodal Arabic captioning pairs | 100M pairs | |
| | [Sadeed Tashkeela](https://huggingface.co/datasets/Misraj/Sadeed_Tashkeela) | Cleaned & expert-filtered diacritization corpus | 1.05M samples | |
|
|
| > π **35+ billion** open Arabic data tokens released and growing. |
|
|
| --- |
|
|
| ## π¬ Connect With Us |
|
|
| | Platform | Link | |
| |---|---| |
| | π Misraj AI | [misraj.ai/en](https://misraj.ai/en) | |
| | π Misraj Technology | [misraj.sa/en](https://misraj.sa/en) | |
| | π΅ Baseer OCR | [baseerocr.com](https://baseerocr.com/) | |
| | π€ Hugging Face | [huggingface.co/Misraj](https://huggingface.co/Misraj) | |
| | πΌ LinkedIn | [linkedin.com/company/aimisraj](https://www.linkedin.com/company/aimisraj) | |
| | π¦ X / Twitter | [@aimisraj](https://x.com/aimisraj) | |
| | π» GitHub | [github.com/misraj-ai](https://github.com/misraj-ai) | |
| | πΈ Instagram | [@misraj__ai](https://www.instagram.com/misraj__ai/) | |
|
|