Sentence Similarity
sentence-transformers
Safetensors
Transformers
French
bilingual
feature-extraction
sentence-embedding
mteb
custom_code
Eval Results (legacy)
Instructions to use Lajavaness/bilingual-document-embedding with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use Lajavaness/bilingual-document-embedding with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("Lajavaness/bilingual-document-embedding", trust_remote_code=True) sentences = [ "C'est une personne heureuse", "C'est un chien heureux", "C'est une personne très heureuse", "Aujourd'hui est une journée ensoleillée" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Transformers
How to use Lajavaness/bilingual-document-embedding with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Lajavaness/bilingual-document-embedding", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
Difference between bilingual-embedding-large and bilingual-embedding-large-8k
#1
by thomlevy - opened
Hello,
Thanks for this model.
This doesn't seem obvious from the model card.
What is the difference between bilingual-embedding-large and bilingual-embedding-large-8k?
What does the "8k" suffix means?
Thanks
Hi @thomlevy
Token input of bilingual-embedding-large is 512 tokens, and bilingual-embedding-large-8k is 8096 tokens.
Tuan
Hi @dangvantuan
Thanks. Very clear.
May be it could help if you could mention this updated max_seq_length in the model card:
https://huggingface.co/Lajavaness/bilingual-embedding-large-8k#full-model-architecture
Thomas
thomlevy changed discussion status to closed