mpnet-use-ubertext-no-pt

This model is a fine-tuned version of paraphrase-multilingual-mpnet-base-v2, trained on the Ukrainian text corpus UberText 2.0 without any data augmentation or pool targets. It is part of the Ukrainian Sentence Embeddings collection, which explores the effect of different training strategies on sentence embedding quality for Ukrainian.

Model Description

The model was fine-tuned using a contrastive objective on UberText 2.0, a large general-purpose Ukrainian text corpus, without any additional augmentation techniques. This makes it the most general variant in the collection and serves as a baseline for comparing the effect of augmentation strategies in other variants.

Collection Overview

Model Description
mpnet-use-ubertext-no-pt (this model) Raw UberText 2.0, no augmentation, no pool targets
mpnet-use-combined-no-pt Combined augmentation strategies, no pool targets
mpnet-use-markov-pt Markov-based augmentation with pool targets

Usage

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("victormuryn/mpnet-use-ubertext-no-pt")

sentences = [
    "Проводжає сина мати захищати рідний край",
    "Хоч би малесеньку хатину він мріяв мати над Дніпром",
]

embeddings = model.encode(sentences)
print(embeddings.shape)

Training Details

  • Base model: paraphrase-multilingual-mpnet-base-v2
  • Training corpus: UberText 2.0
  • Augmentation: None
  • Pool targets: No

Citation

To be added

License

Apache 2.0

Downloads last month
29
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for victormuryn/mpnet-use-ubertext-no-pt

Collection including victormuryn/mpnet-use-ubertext-no-pt