Instructions to use Lajavaness/bilingual-document-embedding with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use Lajavaness/bilingual-document-embedding with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("Lajavaness/bilingual-document-embedding", trust_remote_code=True) sentences = [ "C'est une personne heureuse", "C'est un chien heureux", "C'est une personne très heureuse", "Aujourd'hui est une journée ensoleillée" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Transformers
How to use Lajavaness/bilingual-document-embedding with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Lajavaness/bilingual-document-embedding", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
Questions on Training and Architecture
I’m exploring this model, particularly its training methods and architectural specifics,
and I have a few questions:
Is the 2nd stage missing in the descriptions of "Training and Fine-tuning process" or is it a typo ?
How architecturally distinct is this model from BGE3, and are there practical differences in its embedding approach?
What evaluation metrics did you use during training, and are any benchmarks available for comparison?
Could you share more about the fine-tuning capabilities—especially regarding generating custom embeddings or using the model in domain-specific applications?
Also can you share the training code or give us idea on how you exactly did that ?
Thank you in advance for any insights!