unicamp-dl/mmarco
Updated • 2.09k • 92
How to use andreaschari/bge-m3-RU_MMARCO_50_MIXED with sentence-transformers:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("andreaschari/bge-m3-RU_MMARCO_50_MIXED")
sentences = [
"The weather is lovely today.",
"It's so sunny outside!",
"He drove to the stadium."
]
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]This is a BGE-M3 model post-trained on the Russian dataset from MMARCO/v2. The queries are a 50/50 split between native Russian and transliterated Russian to English text using uroman.
The model was used for the SIGIR 2025 Short paper: Lost in Transliteration: Bridging the Script Gap in Neural IR.
Base model
BAAI/bge-m3