|
|
--- |
|
|
language: |
|
|
- ar |
|
|
base_model: |
|
|
- Omartificial-Intelligence-Space/Arabic-Triplet-Matryoshka-V2 |
|
|
tags: |
|
|
- sentence-transformers |
|
|
- sentence-similarity |
|
|
--- |
|
|
|
|
|
|
|
|
|
|
|
 |
|
|
|
|
|
### Model Summary: |
|
|
|
|
|
This model is a **Sentence Transformer** based on **Omartificial-Intelligence-Space/Arabic-Triplet-Matryoshka-V2**, fine-tuned for **semantic textual similarity** and **information retrieval** tasks. It maps sentences to dense vector representations for tasks like search, clustering, and text classification. |
|
|
|
|
|
### Dataset: |
|
|
- The dataset used for training is derived from **Egyptian law books**. |
|
|
- It consists of **synthetic data** generated using a **Large Language Model (LLM)**. |
|
|
- The dataset contains **20,252 samples**, formatted as **question-answer pairs**. |
|
|
|
|
|
### Key Features: |
|
|
- **Vector Representation:** 768-dimensional embeddings. |
|
|
- **Training Loss:** MatryoshkaLoss & MultipleNegativesRankingLoss. |
|
|
- **Evaluation Metrics:** Cosine similarity-based metrics (Accuracy, Precision, Recall, NDCG). |
|
|
|
|
|
--- |
|
|
|
|
|
## 🏆 Leaderboard Performance |
|
|
|
|
|
The **Muffakir\_Embedding** model has achieved notable rankings on the [Arabic RAG Leaderboard](https://huggingface.co/spaces/Navid-AI/The-Arabic-Rag-Leaderboard), securing: |
|
|
|
|
|
**🥇 1th place** in the **Islamic Dataset** |
|
|
|
|
|
|
|
|
|
|
|
These results underscore the model's effectiveness in both retrieving relevant information and accurately ranking it within Arabic Retrieval-Augmented Generation (RAG) systems. |
|
|
|
|
|
--- |
|
|
|
|
|
|
|
|
This model is optimized for **legal document retrieval** and other NLP applications in Arabic. |