File size: 1,639 Bytes
81e203f 9ae52a7 81e203f 4656745 7068254 5db55c7 7068254 5db55c7 7068254 5db55c7 7068254 5db55c7 4656745 354d8d8 4656745 7068254 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
---
language:
- ar
base_model:
- Omartificial-Intelligence-Space/Arabic-Triplet-Matryoshka-V2
tags:
- sentence-transformers
- sentence-similarity
---

### Model Summary:
This model is a **Sentence Transformer** based on **Omartificial-Intelligence-Space/Arabic-Triplet-Matryoshka-V2**, fine-tuned for **semantic textual similarity** and **information retrieval** tasks. It maps sentences to dense vector representations for tasks like search, clustering, and text classification.
### Dataset:
- The dataset used for training is derived from **Egyptian law books**.
- It consists of **synthetic data** generated using a **Large Language Model (LLM)**.
- The dataset contains **20,252 samples**, formatted as **question-answer pairs**.
### Key Features:
- **Vector Representation:** 768-dimensional embeddings.
- **Training Loss:** MatryoshkaLoss & MultipleNegativesRankingLoss.
- **Evaluation Metrics:** Cosine similarity-based metrics (Accuracy, Precision, Recall, NDCG).
---
## 🏆 Leaderboard Performance
The **Muffakir\_Embedding** model has achieved notable rankings on the [Arabic RAG Leaderboard](https://huggingface.co/spaces/Navid-AI/The-Arabic-Rag-Leaderboard), securing:
**🥇 1th place** in the **Islamic Dataset**
These results underscore the model's effectiveness in both retrieving relevant information and accurately ranking it within Arabic Retrieval-Augmented Generation (RAG) systems.
---
This model is optimized for **legal document retrieval** and other NLP applications in Arabic. |