Cross-Translation Bible Embeddings

A sentence transformer fine-tuned to create a shared embedding space where semantically equivalent Bible verses across different translations map to nearby vectors.

Usage

from sentence_transformers import SentenceTransformer
from sentence_transformers.util import cos_sim

model = SentenceTransformer("LoveJesus/biblical-cross-translation-chirho")

verses = [
    "[KJV] In the beginning God created the heaven and the earth.",
    "[BBE] At the first God made the heaven and the earth.",
    "[KJV] And the earth was without form, and void;",
]

embeddings = model.encode(verses)
similarities = cos_sim(embeddings, embeddings)
print(similarities)
# Gen 1:1 KJV vs Gen 1:1 BBE: ~0.95 (same verse, different translation)
# Gen 1:1 KJV vs Gen 1:2 KJV: ~0.30 (different verses)

Training

  • Base model: paraphrase-multilingual-MiniLM-L12-v2 (118M params, 384-dim)
  • Training: Contrastive learning (CosineSimilarityLoss) on ~300K verse pairs
  • Translations: KJV, ASV, YLT, BBE, WEB (all public domain)
  • Positive pairs: Same verse in different translations
  • Negative pairs: Different verses from the same translation

Part of bible.systems

This is model 5 of 5 in the bible.systems ML pipeline.

Evaluation Results

Evaluated on a held-out test set of cross-translation verse pairs.

Metric Score
Accuracy@0.5 (cosine sim threshold) 0.9988
ROC AUC 1.0000
Spearman Correlation 0.4915
Avg Positive Similarity 0.9841
Avg Negative Similarity 0.0359
Similarity Gap (pos - neg) 0.9482

The model achieves near-perfect discrimination between same-verse pairs across translations (high positive similarity) and different-verse pairs (low negative similarity), with a gap of 0.95. The Spearman correlation is moderate because within-class similarity variance is low (most positive pairs cluster near 0.98).


For God so loved the world... — John 3:16

Downloads last month
3
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train LoveJesus/biblical-cross-translation-chirho

Space using LoveJesus/biblical-cross-translation-chirho 1

Evaluation results

  • Accuracy@0.5 on Biblical Embedding Dataset (Chirho)
    self-reported
    0.999
  • ROC AUC on Biblical Embedding Dataset (Chirho)
    self-reported
    1.000
  • Spearman Correlation on Biblical Embedding Dataset (Chirho)
    self-reported
    0.491