algolia
/

algolia-large-en-generic-v2410

Sentence Similarity

Model card Files Files and versions

rabay35 commited on Dec 11, 2024

Commit

934e37d

·

verified ·

1 Parent(s): 74b9d55

Update README.md

Files changed (1) hide show

README.md +63 -3

README.md CHANGED Viewed

@@ -1,3 +1,63 @@
----
-license: mit
----

+---
+license: mit
+language:
+- en
+base_model:
+- thenlper/gte-large
+---
+## News
+11/12/2024: Release of Algolia/Algolia-large-en-generic-v2410, Algolia's English embedding model.
+## Models
+Algolia-large-en-generic-v2410 is the first addition to Algolia's suite of embedding models built for retrieval performance and efficiency in e-commerce search.
+Algolia v2410 models  are the state-of-the-art  for their size and use cases and now available under an MIT licence.
+### Quality Benchmarks
+|Model|MTEB EN rank|Public e-comm rank| Algolia private e-comm rank|
+|Algolia-large-en-generic-v2410|11|2|10|
+Note that our benchmarks are for retrieval task only, and includes open-source models that are approximately 500M parameters and smaller, and commercially available embedding models.
+## Usage
+### Using Sentence Transformers
+```python
+# Load model and tokenizer
+from scipy.spatial.distance import cosine
+from sentence_transformers import SentenceTransformer
+modelname = "algolia/algolia-large-en-generic-v2410"
+model = SentenceTransformer(modelname)
+# Define embedding and compute_similarity
+def get_embedding(text):
+    embedding = model.encode([text])
+    return embedding[0]
+def compute_similarity(query, documents):
+    query_emb = get_embedding(query)
+    doc_embeddings = [get_embedding(doc) for doc in documents]
+    # Calculate cosine similarity
+    similarities = [1 - cosine(query_emb, doc_emb) for doc_emb in doc_embeddings]
+    ranked_docs = sorted(zip(documents, similarities), key=lambda x: x[1], reverse=True)
+    # Format output
+    return [{"document": doc, "similarity_score": round(sim, 4)} for doc, sim in ranked_docs]
+# Define inputs
+query = "query: "+"running shoes"
+documents = ["adidas sneakers, great for outdoor running",
+             "nike soccer boots indoor, it can be used on turf",
+             "new balance light weight, good for jogging",
+             "hiking boots, good for bushwalking"
+            ]
+# Output the results
+result_df = pd.DataFrame(compute_similarity(query,documents))
+print(query)
+result_df.head()
+```
+## Contact
+Feel free to open an issue or pull request if you have any questions or suggestions about this project.
+You also can email Rasit Abay(rasit.abay@algolia.com).
+## License
+Algolia EN v2410 is licensed under the [MIT](https://mit-license.org/). The released models can be used for commercial purposes free of charge.