PatriSBERT-STS

A Sentence-BERT model fine-tuned for semantic textual similarity on patristic and biblical Latin texts. It is designed to detect and measure text reuse between early Christian writings and the Vulgate Bible.

/!\ Work in progress: this is a draft version of PatriSBERT previously released for experiments. Its current performances are provisional.

Usage

from sentence_transformers import SentenceTransformer, util

model = SentenceTransformer("tdelaselle/PatriSBERT-STS")

sentences = [
    "In principio erat Verbum",
    "Et Verbum caro factum est",
]
embeddings = model.encode(sentences)
similarity = util.cos_sim(embeddings[0], embeddings[1])
print(f"Cosine similarity: {similarity.item():.4f}")

Training

Base model: PatriSBERT-NLI (SBERT model trained on NLI-type latin biblical reuses dataset) from the PatriBERT model (BERT pre-trained on latin patristic texts)
Task: Semantic textual similarity (STS) via triplet fine-tuning
Dataset: Latin biblical reuse triplets

Evaluation

See the eval/ folder for evaluation metrics on the held-out test set.

Citation

If you use this model, please cite:

@misc{patriSBERT2026,
  author = {TdelaSelle},
  title  = {PatriSBERT-STS},
  year   = {2026},
  url    = {https://huggingface.co/TdelaSelle/PatriSBERT-STS}
}

Downloads last month: 51

Safetensors

Model size

0.1B params

Tensor type

F32

TdelaSelle
/

PatriSBERT-STS

PatriSBERT-STS

Usage

Training

Evaluation

Citation

Space using TdelaSelle/PatriSBERT-STS 1