e5-finetuned-georgian

This repository contains a fine-tuned version of the intfloat/multilingual-e5-small model, specifically adapted for generating text embeddings for the Georgian language.

Model Description

This model was developed by fine-tuning the intfloat/multilingual-e5-small base model on a large-scale Georgian text pair dataset. The goal was to enhance its ability to understand the nuances of the Georgian language and produce more accurate and semantically rich vector representations of Georgian text.

The model is ideal for tasks such as:

  • Semantic search
  • Text similarity and clustering
  • Retrieval-Augmented Generation (RAG)
  • Zero-shot classification

Training Data

The model was fine-tuned using the sithet/georgian-text-pairs dataset from the Hugging Face Hub.

Benchmark Results

BelebeleRetrieval (zero-shot)

Task NDCG@1 NDCG@10 NDCG@1000
Georgian β†’ Georgian 0.613 0.7178 0.7492
Georgian β†’ English 0.513 0.6561 0.6938
English β†’ Georgian 0.530 0.6608 0.7004

GeorgianFAQRetrieval (fine-tuned domain)

Metric Value
NDCG@10 0.4702
MAP@10 0.4209
MRR@10 0.4210
Recall@10 0.6259

Tatoeba (Georgian ↔ English)

Metric Score
Accuracy 0.8378
Precision 0.7741
Recall 0.8378
F1 0.7943

How to Use

You can use this model directly with the sentence-transformers library.

First, install the library:

pip install -U sentence-transformers
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("sithet/e5-finetuned-georgian")

query = "მთვარე"
embedding = model.encode(query)
Downloads last month
-
Safetensors
Model size
0.1B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support