Ancient Greek Variant SBERT
A sentence-transformers model fine-tuned from pranaydeeps/Ancient-Greek-BERT for semantic similarity of Ancient Greek biblical texts.
Model Description
This model maps Ancient Greek sentences and paragraphs to a 768-dimensional dense vector space, optimized for:
- Semantic textual similarity between biblical verses
- Clustering of related passages
- Semantic search across biblical corpora
The model was fine-tuned using Multiple Negatives Ranking Loss to learn meaningful representations that capture semantic relationships between biblical Greek texts.
Strengths
This model excels at:
- Variant Detection: High similarity scores (>0.9) for verses that are textually identical or near-identical, even with minor orthographic differences
- Semantic Clustering: Effectively groups related passages and parallel texts
- Robust to Spelling Variations: Handles common manuscript variations in biblical Greek
Usage
Installation & Training
For installation instructions and training scripts, see the GitHub repository.
Inference Example
import unicodedata
from sentence_transformers import SentenceTransformer, util
def strip_accents_and_lowercase(s):
return "".join(
c for c in unicodedata.normalize("NFD", s) if unicodedata.category(c) != "Mn"
).lower()
xsent = [
"οι δε φαρισεοι ακουσαντες οτι εφιμωσε τους σαδδουκεους συνηχθησαν επι το αυτο",
"ειπεν δε αυτοις οταν προσευχησθε λεγετε πατερ αγιασθητω το ονομα σου ελθατω η βασιλια σου γενηθητω το θελημα σου ως εν ουρανω και επι γης",
"νυν κρισις εστιν του κοσμου νυν ο αρχων τουτου τουτου νυν ο αρχων του κοσμου τουτου εκβληθησεται εξω",
"διο προσλαμβανεσθαι αλληλους καθως και ο χς προσελαβετο υμας εις δοξαν του θυ",
]
ysent = [
"οι δε φαρισαιοι ακουσαντες οτι εφιμωσε τους σαδδουκεους συνηχθησαν επι το αυτο",
"ειπεν δε αυτοις οταν προσευχησθε λεγετε πατερ αγιασθητω το ονομα σου ελθατω η βασιλια σου γενηθητω το θελημα σου ως εν ουρανω και επι γης και ρυσαι ημας απο του πονηρου",
"νυν δε προς σε ερχομαι και ταυτα λαλω εν τω κοσμω ινα εχωσιν την χαραν την εμην πεπληρωκενην εν αυτοις",
"μετανοησαται ουν και επιστρεψαται προς το εξαλιφθηναι υμων τας αμαρτιας",
]
xsent_norm = [strip_accents_and_lowercase(s) for s in xsent]
ysent_norm = [strip_accents_and_lowercase(s) for s in ysent]
model = SentenceTransformer("Paulanerus/AncientGreekVariantSBERT")
x_embeddings = model.encode(xsent_norm, convert_to_tensor=True)
y_embeddings = model.encode(ysent_norm, convert_to_tensor=True)
print("Similarities:")
for i in range(len(xsent_norm)):
similarity = util.cos_sim(x_embeddings[i], y_embeddings[i]).item()
print(f"Pair {i + 1}: {similarity:.4f}")
Expected output:
Similarities:
Pair 1: 0.9882 # Near-identical verses (minor spelling difference)
Pair 2: 0.9000 # Same verse, one with additional text
Pair 3: 0.1772
Pair 4: 0.1724
Training Details
Base Model
- Base: pranaydeeps/Ancient-Greek-BERT
- Architecture: BERT-base (12 layers, 768 hidden dimensions)
Training Configuration
| Parameter | Value |
|---|---|
| Batch Size | 256 |
| Epochs | 8 |
| Learning Rate | 2e-5 |
| Loss Function | MultipleNegativesRankingLoss |
| Warmup | 10% of training steps |
| Hardware | NVIDIA A100 80GB PCIe |
Preprocessing
All input text should be normalized by:
- Removing diacritics/accents (NFD normalization)
- Converting to lowercase
Evaluation Results
Evaluated on information retrieval task:
| Metric | Score |
|---|---|
| Accuracy@1 | 0.43 |
| Accuracy@3 | 1.00 |
| Accuracy@5 | 1.00 |
| Precision@3 | 0.74 |
| Precision@10 | 0.88 |
| MRR@10 | 0.715 |
| NDCG@10 | 0.843 |
| MAP@100 | 0.911 |
Limitations
- Optimized specifically for Ancient Greek; may have reduced performance on other genres (Classical, Homeric, etc.)
- Requires text preprocessing (accent stripping, lowercasing) for best results
Citation
@misc{ancient-greek-variant-sbert,
author = {Fröhlich, Paul},
title = {Ancient Greek Variant SBERT: Fine-tuned Embeddings for Biblical text verses in Ancient Greek},
year = {2026},
howpublished = {\url{https://huggingface.co/Paulanerus/AncientGreekVariantSBERT}},
note = {Model release}
}
Acknowledgments
This model builds upon the Ancient Greek BERT by Singh, Rutten, and Lefever (2021).
- Downloads last month
- 56
Model tree for Paulanerus/AncientGreekVariantSBERT
Base model
pranaydeeps/Ancient-Greek-BERT