Ancient Greek Variant SBERT

A sentence-transformers model fine-tuned from pranaydeeps/Ancient-Greek-BERT for semantic similarity of Ancient Greek biblical texts.

Model Description

This model maps Ancient Greek sentences and paragraphs to a 768-dimensional dense vector space, optimized for:

  • Semantic textual similarity between biblical verses
  • Clustering of related passages
  • Semantic search across biblical corpora

The model was fine-tuned using Multiple Negatives Ranking Loss to learn meaningful representations that capture semantic relationships between biblical Greek texts.

Strengths

This model excels at:

  • Variant Detection: High similarity scores (>0.9) for verses that are textually identical or near-identical, even with minor orthographic differences
  • Semantic Clustering: Effectively groups related passages and parallel texts
  • Robust to Spelling Variations: Handles common manuscript variations in biblical Greek

Usage

Installation & Training

For installation instructions and training scripts, see the GitHub repository.

Inference Example

import unicodedata
from sentence_transformers import SentenceTransformer, util


def strip_accents_and_lowercase(s):
    return "".join(
        c for c in unicodedata.normalize("NFD", s) if unicodedata.category(c) != "Mn"
    ).lower()


xsent = [
    "οι δε φαρισεοι ακουσαντες οτι εφιμωσε τους σαδδουκεους συνηχθησαν επι το αυτο",
    "ειπεν δε αυτοις οταν προσευχησθε λεγετε πατερ αγιασθητω το ονομα σου ελθατω η βασιλια σου γενηθητω το θελημα σου ως εν ουρανω και επι γης",
    "νυν κρισις εστιν του κοσμου νυν ο αρχων τουτου τουτου νυν ο αρχων του κοσμου τουτου εκβληθησεται εξω",
    "διο προσλαμβανεσθαι αλληλους καθως και ο χς προσελαβετο υμας εις δοξαν του θυ",
]

ysent = [
    "οι δε φαρισαιοι ακουσαντες οτι εφιμωσε τους σαδδουκεους συνηχθησαν επι το αυτο",
    "ειπεν δε αυτοις οταν προσευχησθε λεγετε πατερ αγιασθητω το ονομα σου ελθατω η βασιλια σου γενηθητω το θελημα σου ως εν ουρανω και επι γης και ρυσαι ημας απο του πονηρου",
    "νυν δε προς σε ερχομαι και ταυτα λαλω εν τω κοσμω ινα εχωσιν την χαραν την εμην πεπληρωκενην εν αυτοις",
    "μετανοησαται ουν και επιστρεψαται προς το εξαλιφθηναι υμων τας αμαρτιας",
]

xsent_norm = [strip_accents_and_lowercase(s) for s in xsent]
ysent_norm = [strip_accents_and_lowercase(s) for s in ysent]

model = SentenceTransformer("Paulanerus/AncientGreekVariantSBERT")

x_embeddings = model.encode(xsent_norm, convert_to_tensor=True)
y_embeddings = model.encode(ysent_norm, convert_to_tensor=True)

print("Similarities:")
for i in range(len(xsent_norm)):
    similarity = util.cos_sim(x_embeddings[i], y_embeddings[i]).item()
    print(f"Pair {i + 1}: {similarity:.4f}")

Expected output:

Similarities:
Pair 1: 0.9882  # Near-identical verses (minor spelling difference)
Pair 2: 0.9000  # Same verse, one with additional text
Pair 3: 0.1772
Pair 4: 0.1724

Training Details

Base Model

Training Configuration

Parameter Value
Batch Size 256
Epochs 8
Learning Rate 2e-5
Loss Function MultipleNegativesRankingLoss
Warmup 10% of training steps
Hardware NVIDIA A100 80GB PCIe

Preprocessing

All input text should be normalized by:

  1. Removing diacritics/accents (NFD normalization)
  2. Converting to lowercase

Evaluation Results

Evaluated on information retrieval task:

Metric Score
Accuracy@1 0.43
Accuracy@3 1.00
Accuracy@5 1.00
Precision@3 0.74
Precision@10 0.88
MRR@10 0.715
NDCG@10 0.843
MAP@100 0.911

Limitations

  • Optimized specifically for Ancient Greek; may have reduced performance on other genres (Classical, Homeric, etc.)
  • Requires text preprocessing (accent stripping, lowercasing) for best results

Citation

@misc{ancient-greek-variant-sbert,
  author = {Fröhlich, Paul},
  title = {Ancient Greek Variant SBERT: Fine-tuned Embeddings for Biblical text verses in Ancient Greek},
  year = {2026},
  howpublished = {\url{https://huggingface.co/Paulanerus/AncientGreekVariantSBERT}},
  note = {Model release}
}

Acknowledgments

This model builds upon the Ancient Greek BERT by Singh, Rutten, and Lefever (2021).

Downloads last month
56
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Paulanerus/AncientGreekVariantSBERT

Finetuned
(6)
this model