PatriSBERT-STS

A Sentence-BERT model fine-tuned for semantic textual similarity on patristic and biblical Latin texts. It is designed to detect and measure text reuse between early Christian writings and the Vulgate Bible.

Usage

from sentence_transformers import SentenceTransformer, util

model = SentenceTransformer("tdelaselle/PatriSBERT-STS")

sentences = [
    "In principio erat Verbum",
    "Et Verbum caro factum est",
]
embeddings = model.encode(sentences)
similarity = util.cos_sim(embeddings[0], embeddings[1])
print(f"Cosine similarity: {similarity.item():.4f}")

Training

  • Base model: PatriSBERT-NLI (SBERT model trained on NLI-type latin biblical reuses dataset) from the PatriBERT model (BERT pre-trained on latin patristic texts)
  • Task: Semantic textual similarity (STS) via triplet fine-tuning
  • Dataset: Latin biblical reuse triplets

Evaluation

See the eval/ folder for evaluation metrics on the held-out test set.

Citation

If you use this model, please cite:

@misc{patriSBERT2026,
  author = {TdelaSelle},
  title  = {PatriSBERT-STS},
  year   = {2026},
  url    = {https://huggingface.co/TdelaSelle/PatriSBERT-STS}
}
Downloads last month
108
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support