Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,63 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: mit
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
+
base_model:
|
| 6 |
+
- thenlper/gte-large
|
| 7 |
+
---
|
| 8 |
+
## News
|
| 9 |
+
11/12/2024: Release of Algolia/Algolia-large-en-generic-v2410, Algolia's English embedding model.
|
| 10 |
+
|
| 11 |
+
## Models
|
| 12 |
+
Algolia-large-en-generic-v2410 is the first addition to Algolia's suite of embedding models built for retrieval performance and efficiency in e-commerce search.
|
| 13 |
+
Algolia v2410 models are the state-of-the-art for their size and use cases and now available under an MIT licence.
|
| 14 |
+
|
| 15 |
+
### Quality Benchmarks
|
| 16 |
+
|Model|MTEB EN rank|Public e-comm rank| Algolia private e-comm rank|
|
| 17 |
+
|Algolia-large-en-generic-v2410|11|2|10|
|
| 18 |
+
|
| 19 |
+
Note that our benchmarks are for retrieval task only, and includes open-source models that are approximately 500M parameters and smaller, and commercially available embedding models.
|
| 20 |
+
|
| 21 |
+
## Usage
|
| 22 |
+
|
| 23 |
+
### Using Sentence Transformers
|
| 24 |
+
```python
|
| 25 |
+
# Load model and tokenizer
|
| 26 |
+
from scipy.spatial.distance import cosine
|
| 27 |
+
from sentence_transformers import SentenceTransformer
|
| 28 |
+
modelname = "algolia/algolia-large-en-generic-v2410"
|
| 29 |
+
model = SentenceTransformer(modelname)
|
| 30 |
+
|
| 31 |
+
# Define embedding and compute_similarity
|
| 32 |
+
def get_embedding(text):
|
| 33 |
+
embedding = model.encode([text])
|
| 34 |
+
return embedding[0]
|
| 35 |
+
def compute_similarity(query, documents):
|
| 36 |
+
query_emb = get_embedding(query)
|
| 37 |
+
doc_embeddings = [get_embedding(doc) for doc in documents]
|
| 38 |
+
# Calculate cosine similarity
|
| 39 |
+
similarities = [1 - cosine(query_emb, doc_emb) for doc_emb in doc_embeddings]
|
| 40 |
+
ranked_docs = sorted(zip(documents, similarities), key=lambda x: x[1], reverse=True)
|
| 41 |
+
# Format output
|
| 42 |
+
return [{"document": doc, "similarity_score": round(sim, 4)} for doc, sim in ranked_docs]
|
| 43 |
+
|
| 44 |
+
# Define inputs
|
| 45 |
+
query = "query: "+"running shoes"
|
| 46 |
+
documents = ["adidas sneakers, great for outdoor running",
|
| 47 |
+
"nike soccer boots indoor, it can be used on turf",
|
| 48 |
+
"new balance light weight, good for jogging",
|
| 49 |
+
"hiking boots, good for bushwalking"
|
| 50 |
+
]
|
| 51 |
+
|
| 52 |
+
# Output the results
|
| 53 |
+
result_df = pd.DataFrame(compute_similarity(query,documents))
|
| 54 |
+
print(query)
|
| 55 |
+
result_df.head()
|
| 56 |
+
```
|
| 57 |
+
|
| 58 |
+
## Contact
|
| 59 |
+
Feel free to open an issue or pull request if you have any questions or suggestions about this project.
|
| 60 |
+
You also can email Rasit Abay(rasit.abay@algolia.com).
|
| 61 |
+
|
| 62 |
+
## License
|
| 63 |
+
Algolia EN v2410 is licensed under the [MIT](https://mit-license.org/). The released models can be used for commercial purposes free of charge.
|