Sentence Similarity
sentence-transformers
Safetensors
Transformers
Vietnamese
Vietnamese
feature-extraction
phobert
vietnamese
sentence-embedding
custom_code
Instructions to use dangvantuan/vietnamese-document-embedding with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use dangvantuan/vietnamese-document-embedding with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("dangvantuan/vietnamese-document-embedding", trust_remote_code=True) sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Transformers
How to use dangvantuan/vietnamese-document-embedding with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("dangvantuan/vietnamese-document-embedding", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
Is this apply late chunking?
#1
by thatpham2k - opened
When working with LLM, we sometimes encounter context-sensitive chunking (aka late chunking) to ensure our model understands the context.
More detail: https://github.com/jina-ai/late-chunking/blob/main/README.md
I have tried late chunking with this embedding model, but it's seem not work out. I put this colab notebook here: https://colab.research.google.com/drive/1JYas5VJWModbJfGZak0XDN8KTDQyR3g6?usp=sharing. If you're free, please check that. The result is bad when compare with traditional chunking (not apply late chunking)