Sentence Similarity
sentence-transformers
Safetensors
English
Chinese
multilingual
qwen3
feature-extraction
embedding
text-embedding
retrieval
text-embeddings-inference
Instructions to use Octen/Octen-Embedding-4B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use Octen/Octen-Embedding-4B with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("Octen/Octen-Embedding-4B") sentences = [ "That is a happy person", "That is a happy dog", "That is a very happy person", "Today is a sunny day" ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [4, 4] - Notebooks
- Google Colab
- Kaggle
Update README: migrate from bflhc to Octen organization and update citation
Browse files
README.md
CHANGED
|
@@ -36,7 +36,7 @@ Octen-Embedding-4B is a text embedding model designed for semantic search and re
|
|
| 36 |
```python
|
| 37 |
from sentence_transformers import SentenceTransformer
|
| 38 |
|
| 39 |
-
model = SentenceTransformer("
|
| 40 |
|
| 41 |
# Encode sentences
|
| 42 |
sentences = [
|
|
@@ -61,8 +61,8 @@ from transformers import AutoModel, AutoTokenizer
|
|
| 61 |
import torch
|
| 62 |
import torch.nn.functional as F
|
| 63 |
|
| 64 |
-
tokenizer = AutoTokenizer.from_pretrained("
|
| 65 |
-
model = AutoModel.from_pretrained("
|
| 66 |
model.eval()
|
| 67 |
|
| 68 |
def encode(texts):
|
|
@@ -105,26 +105,19 @@ This model is licensed under the [Apache License 2.0](https://www.apache.org/lic
|
|
| 105 |
|
| 106 |
This model is derived from [Qwen/Qwen3-Embedding-4B](https://huggingface.co/Qwen/Qwen3-Embedding-4B), which is also licensed under Apache License 2.0.
|
| 107 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 108 |
## Citation
|
| 109 |
|
| 110 |
-
If you
|
| 111 |
|
| 112 |
```bibtex
|
| 113 |
-
@misc{
|
| 114 |
-
title={Octen-Embedding
|
| 115 |
author={Octen Team},
|
| 116 |
year={2025},
|
| 117 |
-
url={https://
|
| 118 |
-
}
|
| 119 |
-
```
|
| 120 |
-
|
| 121 |
-
Please also cite the base model:
|
| 122 |
-
|
| 123 |
-
```bibtex
|
| 124 |
-
@article{qwen3embedding,
|
| 125 |
-
title={Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models},
|
| 126 |
-
author={Zhang, Yanzhao and Li, Mingxin and Long, Dingkun and Zhang, Xin and Lin, Huan and Yang, Baosong and Xie, Pengjun and Yang, An and Liu, Dayiheng and Lin, Junyang and Huang, Fei and Zhou, Jingren},
|
| 127 |
-
journal={arXiv preprint arXiv:2506.05176},
|
| 128 |
-
year={2025}
|
| 129 |
}
|
| 130 |
```
|
|
|
|
| 36 |
```python
|
| 37 |
from sentence_transformers import SentenceTransformer
|
| 38 |
|
| 39 |
+
model = SentenceTransformer("Octen/Octen-Embedding-4B")
|
| 40 |
|
| 41 |
# Encode sentences
|
| 42 |
sentences = [
|
|
|
|
| 61 |
import torch
|
| 62 |
import torch.nn.functional as F
|
| 63 |
|
| 64 |
+
tokenizer = AutoTokenizer.from_pretrained("Octen/Octen-Embedding-4B", padding_side="left")
|
| 65 |
+
model = AutoModel.from_pretrained("Octen/Octen-Embedding-4B")
|
| 66 |
model.eval()
|
| 67 |
|
| 68 |
def encode(texts):
|
|
|
|
| 105 |
|
| 106 |
This model is derived from [Qwen/Qwen3-Embedding-4B](https://huggingface.co/Qwen/Qwen3-Embedding-4B), which is also licensed under Apache License 2.0.
|
| 107 |
|
| 108 |
+
## Paper
|
| 109 |
+
|
| 110 |
+
For more details, please refer to our blog post: [Octen-Embedding: Reproducible 1st Place on RTEB](https://octen-team.github.io/octen_blog/posts/octen-rteb-first-place/)
|
| 111 |
+
|
| 112 |
## Citation
|
| 113 |
|
| 114 |
+
If you find our work helpful, please consider citing:
|
| 115 |
|
| 116 |
```bibtex
|
| 117 |
+
@misc{octen2025rteb,
|
| 118 |
+
title={Octen-Embedding: Reproducible 1st Place on RTEB},
|
| 119 |
author={Octen Team},
|
| 120 |
year={2025},
|
| 121 |
+
url={https://octen-team.github.io/octen_blog/posts/octen-rteb-first-place/}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 122 |
}
|
| 123 |
```
|