|
|
--- |
|
|
language: |
|
|
- ka |
|
|
base_model: |
|
|
- intfloat/multilingual-e5-small |
|
|
tags: |
|
|
- text-embeddings |
|
|
- georgian |
|
|
- multilingual-e5 |
|
|
--- |
|
|
|
|
|
# Georgian E5 Fine-tuned Text Embeddings |
|
|
|
|
|
Fine-tuned version of `intfloat/multilingual-e5-small` for Georgian text embeddings using contrastive learning. |
|
|
|
|
|
## Model Performance |
|
|
- Validation Accuracy: 82.53% |
|
|
- Training completed over 3 epochs |
|
|
- Contrastive loss with margin=0.5 |
|
|
|
|
|
## Dataset |
|
|
- 13,000+ Georgian text pairs across 9 semantic relationship types |
|
|
|
|
|
## Usage |
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModel |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained("matsut21/georgian-e5-finetuned") |
|
|
model = AutoModel.from_pretrained("matsut21/georgian-e5-finetuned") |