e5-finetuned-georgian
This repository contains a fine-tuned version of the intfloat/multilingual-e5-small model, specifically adapted for generating text embeddings for the Georgian language.
Model Description
This model was developed by fine-tuning the intfloat/multilingual-e5-small base model on a large-scale Georgian text pair dataset. The goal was to enhance its ability to understand the nuances of the Georgian language and produce more accurate and semantically rich vector representations of Georgian text.
The model is ideal for tasks such as:
- Semantic search
- Text similarity and clustering
- Retrieval-Augmented Generation (RAG)
- Zero-shot classification
Training Data
The model was fine-tuned using the sithet/georgian-text-pairs dataset from the Hugging Face Hub.
- Dataset: sithet/georgian-text-pairs
Benchmark Results
BelebeleRetrieval (zero-shot)
| Task | NDCG@1 | NDCG@10 | NDCG@1000 |
|---|---|---|---|
| Georgian β Georgian | 0.613 | 0.7178 | 0.7492 |
| Georgian β English | 0.513 | 0.6561 | 0.6938 |
| English β Georgian | 0.530 | 0.6608 | 0.7004 |
GeorgianFAQRetrieval (fine-tuned domain)
| Metric | Value |
|---|---|
| NDCG@10 | 0.4702 |
| MAP@10 | 0.4209 |
| MRR@10 | 0.4210 |
| Recall@10 | 0.6259 |
Tatoeba (Georgian β English)
| Metric | Score |
|---|---|
| Accuracy | 0.8378 |
| Precision | 0.7741 |
| Recall | 0.8378 |
| F1 | 0.7943 |
How to Use
You can use this model directly with the sentence-transformers library.
First, install the library:
pip install -U sentence-transformers
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("sithet/e5-finetuned-georgian")
query = "ααααα α"
embedding = model.encode(query)
- Downloads last month
- -