mohamed2811
/

Muffakir_Embedding

Sentence Similarity

sentence-transformers

text-embeddings-inference

Model card Files Files and versions

Muffakir_Embedding / README.md

mohamed2811's picture

Update README.md

354d8d8 verified 9 months ago

|

history blame contribute delete

1.64 kB

	---
	language:
	- ar
	base_model:
	- Omartificial-Intelligence-Space/Arabic-Triplet-Matryoshka-V2
	tags:
	- sentence-transformers
	- sentence-similarity
	---



	![image/png](https://cdn-uploads.huggingface.co/production/uploads/662294730e805d4fcb06a892/ICUwF5-avEYDDl1rAgSPZ.png)

	### Model Summary:

	This model is a Sentence Transformer based on Omartificial-Intelligence-Space/Arabic-Triplet-Matryoshka-V2, fine-tuned for semantic textual similarity and information retrieval tasks. It maps sentences to dense vector representations for tasks like search, clustering, and text classification.

	### Dataset:
	- The dataset used for training is derived from Egyptian law books.
	- It consists of synthetic data generated using a Large Language Model (LLM).
	- The dataset contains 20,252 samples, formatted as question-answer pairs.

	### Key Features:
	- Vector Representation: 768-dimensional embeddings.
	- Training Loss: MatryoshkaLoss & MultipleNegativesRankingLoss.
	- Evaluation Metrics: Cosine similarity-based metrics (Accuracy, Precision, Recall, NDCG).

	---

	## 🏆 Leaderboard Performance

	The Muffakir\_Embedding model has achieved notable rankings on the [Arabic RAG Leaderboard](https://huggingface.co/spaces/Navid-AI/The-Arabic-Rag-Leaderboard), securing:

	🥇 1th place in the Islamic Dataset



	These results underscore the model's effectiveness in both retrieving relevant information and accurately ranking it within Arabic Retrieval-Augmented Generation (RAG) systems.

	---


	This model is optimized for legal document retrieval and other NLP applications in Arabic.