File size: 1,639 Bytes
81e203f
 
 
 
 
 
 
9ae52a7
81e203f
 
 
 
4656745
 
7068254
5db55c7
7068254
5db55c7
7068254
 
 
 
5db55c7
7068254
 
 
 
5db55c7
4656745
 
 
 
 
 
354d8d8
4656745
 
 
 
 
 
 
 
7068254
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
---
language:
- ar
base_model:
- Omartificial-Intelligence-Space/Arabic-Triplet-Matryoshka-V2
tags:
- sentence-transformers
- sentence-similarity
---



![image/png](https://cdn-uploads.huggingface.co/production/uploads/662294730e805d4fcb06a892/ICUwF5-avEYDDl1rAgSPZ.png)

### Model Summary:

This model is a **Sentence Transformer** based on **Omartificial-Intelligence-Space/Arabic-Triplet-Matryoshka-V2**, fine-tuned for **semantic textual similarity** and **information retrieval** tasks. It maps sentences to dense vector representations for tasks like search, clustering, and text classification.

### Dataset:
- The dataset used for training is derived from **Egyptian law books**.
- It consists of **synthetic data** generated using a **Large Language Model (LLM)**.
- The dataset contains **20,252 samples**, formatted as **question-answer pairs**.

### Key Features:
- **Vector Representation:** 768-dimensional embeddings.
- **Training Loss:** MatryoshkaLoss & MultipleNegativesRankingLoss.
- **Evaluation Metrics:** Cosine similarity-based metrics (Accuracy, Precision, Recall, NDCG).

---

## 🏆 Leaderboard Performance

The **Muffakir\_Embedding** model has achieved notable rankings on the [Arabic RAG Leaderboard](https://huggingface.co/spaces/Navid-AI/The-Arabic-Rag-Leaderboard), securing:

**🥇 1th place** in the **Islamic Dataset** 



These results underscore the model's effectiveness in both retrieving relevant information and accurately ranking it within Arabic Retrieval-Augmented Generation (RAG) systems.

---


This model is optimized for **legal document retrieval** and other NLP applications in Arabic.