Instructions to use Snowflake/snowflake-arctic-embed-m-v2.0 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use Snowflake/snowflake-arctic-embed-m-v2.0 with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("Snowflake/snowflake-arctic-embed-m-v2.0", trust_remote_code=True) sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Transformers.js
How to use Snowflake/snowflake-arctic-embed-m-v2.0 with Transformers.js:
// npm i @huggingface/transformers import { pipeline } from '@huggingface/transformers'; // Allocate pipeline const pipe = await pipeline('sentence-similarity', 'Snowflake/snowflake-arctic-embed-m-v2.0'); - Notebooks
- Google Colab
- Kaggle
What languages does snowflake-arctic-embed-m-v2.0 support?
Thank you for your excellent work, which has greatly assisted our project. Now, we would like to train a multilingual quality classifier based on snowflake-arctic-embed-m-v2.0, but we are unsure which languages snowflake-arctic-embed-m-v2.0 specifically supports. We hope you can inform us.
The best way to find out if this model is a good choice for your fine-tuning task is to try it out. You can look at the training details in our technical report (linked in the news section of our model card) for information about which languages we focused on with our contrastive training, but this may or may not translate strongly into a sense of how classification performance will turn on on a quality classification task.
Other potentially helpful details: The tokenizer is from XLMR and the MLM pretraining details are here: https://arxiv.org/abs/2407.19669