Instructions to use Snowflake/snowflake-arctic-embed-m with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use Snowflake/snowflake-arctic-embed-m with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("Snowflake/snowflake-arctic-embed-m") sentences = [ "That is a happy person", "That is a happy dog", "That is a very happy person", "Today is a sunny day" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Transformers.js
How to use Snowflake/snowflake-arctic-embed-m with Transformers.js:
// npm i @huggingface/transformers import { pipeline } from '@huggingface/transformers'; // Allocate pipeline const pipe = await pipeline('sentence-similarity', 'Snowflake/snowflake-arctic-embed-m'); - Inference
- Notebooks
- Google Colab
- Kaggle
don't reproduce QuoraRetrieval NDCG@10 score.
thanks.
I want to reproduce to mteb/retrieval for QuoraRetrieval. but I get an NDCG@10 score of 80.73.
I confirm that query embedding have prompt,and doc don't have prompt。
Other dataset's NDCG@10 score can reproduce. For example SCIDOCS,ArguAna,etc.
QuoraRetrieval is a duplicate question retrieval task, i.e. matching queries to other queries instead of queries to documents. As such, we follow the common practice of using the query prefix for both queries and documents when embedding this dataset (this was not our brilliant idea by any means, it goes back to the E5 paper at least -- see their Appendix B).
I do not believe this was properly documented anywhere, though, even in our tech report. My apologies for the oversight!
You should see if this symmetrical embedding improves your organization's Stella models' scores on QuoraRetrieval, too, if you haven't yet!
(And good luck with the write-up for that one -- we're looking forward to reading when it's ready!)
Thanks ,got it.