Update README.md
Browse files
README.md
CHANGED
|
@@ -2604,34 +2604,15 @@ model-index:
|
|
| 2604 |
value: 78.25741142443962
|
| 2605 |
---
|
| 2606 |
|
| 2607 |
-
|
| 2608 |
-
|
| 2609 |
-
<p align="center">
|
| 2610 |
-
<img src="https://console.llmrails.com/assets/img/logo-black.svg" width="150px">
|
| 2611 |
-
</p>
|
| 2612 |
|
| 2613 |
This model has been trained on an extensive corpus of text pairs that encompass a broad spectrum of domains, including finance, science, medicine, law, and various others. During the training process, we incorporated techniques derived from the [RetroMAE](https://arxiv.org/abs/2205.12035) and [SetFit](https://arxiv.org/abs/2209.11055) research papers.
|
| 2614 |
|
| 2615 |
-
We are pleased to offer this model as an API service through our platform, [LLMRails](https://llmrails.com/?ref=ember-v1). If you are interested, please don't hesitate to sign up.
|
| 2616 |
-
|
| 2617 |
### Plans
|
| 2618 |
- The research paper will be published soon.
|
| 2619 |
- The v2 of the model is currently in development and will feature an extended maximum sequence length of 4,000 tokens.
|
| 2620 |
|
| 2621 |
## Usage
|
| 2622 |
-
Use with API request:
|
| 2623 |
-
```bash
|
| 2624 |
-
curl --location 'https://api.llmrails.com/v1/embeddings' \
|
| 2625 |
-
--header 'X-API-KEY: {token}' \
|
| 2626 |
-
--header 'Content-Type: application/json' \
|
| 2627 |
-
--data '{
|
| 2628 |
-
"input": ["This is an example sentence"],
|
| 2629 |
-
"model":"embedding-english-v1" # equals to ember-v1
|
| 2630 |
-
}'
|
| 2631 |
-
```
|
| 2632 |
-
API docs: https://docs.llmrails.com/embedding/embed-text<br>
|
| 2633 |
-
Langchain plugin: https://python.langchain.com/docs/integrations/text_embedding/llm_rails
|
| 2634 |
-
|
| 2635 |
Use with transformers:
|
| 2636 |
```python
|
| 2637 |
import torch.nn.functional as F
|
|
@@ -2692,4 +2673,15 @@ Our model achieve state-of-the-art performance on [MTEB leaderboard](https://hug
|
|
| 2692 |
|
| 2693 |
This model exclusively caters to English texts, and any lengthy texts will be truncated to a maximum of 512 tokens.
|
| 2694 |
|
| 2695 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2604 |
value: 78.25741142443962
|
| 2605 |
---
|
| 2606 |
|
| 2607 |
+
<h1 align="center">ember-v1</h1>
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2608 |
|
| 2609 |
This model has been trained on an extensive corpus of text pairs that encompass a broad spectrum of domains, including finance, science, medicine, law, and various others. During the training process, we incorporated techniques derived from the [RetroMAE](https://arxiv.org/abs/2205.12035) and [SetFit](https://arxiv.org/abs/2209.11055) research papers.
|
| 2610 |
|
|
|
|
|
|
|
| 2611 |
### Plans
|
| 2612 |
- The research paper will be published soon.
|
| 2613 |
- The v2 of the model is currently in development and will feature an extended maximum sequence length of 4,000 tokens.
|
| 2614 |
|
| 2615 |
## Usage
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2616 |
Use with transformers:
|
| 2617 |
```python
|
| 2618 |
import torch.nn.functional as F
|
|
|
|
| 2673 |
|
| 2674 |
This model exclusively caters to English texts, and any lengthy texts will be truncated to a maximum of 512 tokens.
|
| 2675 |
|
| 2676 |
+
## License
|
| 2677 |
+
MIT
|
| 2678 |
+
|
| 2679 |
+
## Citation
|
| 2680 |
+
|
| 2681 |
+
```bibtex
|
| 2682 |
+
@misc{nur2024emberv1,
|
| 2683 |
+
title={ember-v1: SOTA embedding model},
|
| 2684 |
+
author={Enrike Nur and Anar Aliyev},
|
| 2685 |
+
year={2023},
|
| 2686 |
+
}
|
| 2687 |
+
```
|