DistilSPhilBERTa

Latin and Ancient Greek cross-lingual sentence mining and similarity detection is a relatively new field.

In our paper Evaluating Latin and Ancient Greek Sentence Alignment through Parallel Sentence Mining we introduce DistilSPhilBERTa, a sentence-transformers model finetuned from bowphs/SPhilBerta for exactly these tasks. It achieves significantly better results in our custom sentence mining benchmark than SPhilBERTa and other classical cross-lingual language models. It maps sentences & paragraphs to a vector space and can be used for semantic textual similarity, semantic search, parallel sentence mining and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: bowphs/SPhilBerta
Maximum Sequence Length: 128 tokens
Output Dimensionality: 768 dimensions
Similarity Function: Cosine Similarity
Training Dataset: grosenthal/latin_english_parallel, English-Greek Dataset

Usage

Direct Usage (Sentence Transformers)

With the Sentence Transformers library installed, you can use the model in the following way:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("sebastian-reichbauer/DistilSPhilBERTa")
# Run inference
sentences = [
    'Νομίζω οὖν τοῦτο καλὸν ὑπάρχειν διὰ τὴν ἐνεστῶσαν ἀνάγκην, ὅτι καλὸν ἀνθρώπῳ τὸ οὕτως εἶναι.'
]
embeddings = model.encode(sentences)
print(embeddings)

Downloads last month: 236

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for sebastian-reichbauer/DistilSPhilBERTa

Base model

bowphs/SPhilBerta

Finetuned

(6)

this model

Evaluation results

Negative Mse on Latin to Ancient Greek
self-reported

-0.134