File size: 4,217 Bytes
358b1e5 e8d6034 358b1e5 e8d6034 358b1e5 e8d6034 358b1e5 150628c 358b1e5 e8d6034 358b1e5 e8d6034 358b1e5 e8d6034 150628c e8d6034 358b1e5 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 |
---
language:
- en
- zh
- multilingual
license: apache-2.0
library_name: sentence-transformers
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- embedding
- text-embedding
- retrieval
- quantization
- int8
pipeline_tag: sentence-similarity
base_model: Qwen/Qwen3-Embedding-8B
---
# Octen-Embedding-8B-INT8
Octen-Embedding-8B-INT8 is a text embedding model designed for semantic search and retrieval tasks. This model is fine-tuned from [Qwen/Qwen3-Embedding-8B](https://huggingface.co/Qwen/Qwen3-Embedding-8B) and supports multiple languages, providing high-quality embeddings for various applications.
**Quantization**: This is an INT8 quantized version using bitsandbytes. INT8 quantization significantly reduces memory footprint (~50% smaller), making it suitable for deployment on resource-constrained environments. Note that while memory usage is reduced, inference speed may not necessarily improve and could be slightly slower than the BF16 version on some hardware.
## Model Details
- **Base Model**: [Qwen/Qwen3-Embedding-8B](https://huggingface.co/Qwen/Qwen3-Embedding-8B)
- **Model Size**: 8B parameters (INT8 quantized)
- **Max Sequence Length**: 40,960 tokens
- **Embedding Dimension**: 4096
- **Languages**: English, Chinese, and multilingual support
- **Training Method**: LoRA fine-tuning
- **Quantization**: INT8 (bitsandbytes)
- **Memory Footprint**: ~8GB (vs ~16GB for BF16 version)
## Usage
### Using Sentence Transformers
```python
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("Octen/Octen-Embedding-8B-INT8")
# Encode sentences
sentences = [
"This is an example sentence",
"Each sentence is converted to a vector"
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# Output: (2, 4096)
# Compute similarity
from sentence_transformers.util import cos_sim
similarity = cos_sim(embeddings[0], embeddings[1])
print(f"Similarity: {similarity.item():.4f}")
```
### Using Transformers
```python
from transformers import AutoModel, AutoTokenizer
import torch
import torch.nn.functional as F
tokenizer = AutoTokenizer.from_pretrained("Octen/Octen-Embedding-8B-INT8", padding_side="left")
model = AutoModel.from_pretrained("Octen/Octen-Embedding-8B-INT8")
model.eval()
def encode(texts):
inputs = tokenizer(texts, padding=True, truncation=True,
max_length=8192, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
# Use last token embedding
embeddings = outputs.last_hidden_state[:, -1, :]
# Normalize embeddings
embeddings = F.normalize(embeddings, p=2, dim=1)
return embeddings
# Example usage
texts = ["Hello world", "你好世界"]
embeddings = encode(texts)
similarity = torch.matmul(embeddings[0], embeddings[1])
print(f"Similarity: {similarity.item():.4f}")
```
## Recommended Use Cases
- Semantic search and information retrieval
- Document similarity and clustering
- Question answering
- Cross-lingual retrieval
- Text classification with embeddings
- Deployment on GPU-constrained environments
## Limitations
- Performance may vary across different domains and languages
- Very long documents (>40K tokens) require truncation
- Optimized for retrieval tasks, not for text generation
- INT8 quantization may introduce minor accuracy degradation compared to BF16 version
- Inference speed may not improve despite reduced memory usage
## License
This model is licensed under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0).
This model is derived from [Qwen/Qwen3-Embedding-8B](https://huggingface.co/Qwen/Qwen3-Embedding-8B), which is also licensed under Apache License 2.0.
## Paper
For more details, please refer to our blog post: [Octen Series: Optimizing Embedding Models to #1 on RTEB Leaderboard](https://octen-team.github.io/octen_blog/posts/octen-rteb-first-place/)
## Citation
If you find our work helpful, please consider citing:
```bibtex
@misc{octen2025rteb,
title={Octen Series: Optimizing Embedding Models to #1 on RTEB Leaderboard},
author={Octen Team},
year={2025},
url={https://octen-team.github.io/octen_blog/posts/octen-rteb-first-place/}
}
```
|