README.md · Octen/Octen-Embedding-8B-INT8 at main

Octen-Embedding-8B-INT8 / README.md

bflhc

Update README: migrate from bflhc to Octen organization and update citation

150628c about 20 hours ago

preview code

raw

history blame contribute delete

4.22 kB

	---
	language:
	- en
	- zh
	- multilingual
	license: apache-2.0
	library_name: sentence-transformers
	tags:
	- sentence-transformers
	- sentence-similarity
	- feature-extraction
	- embedding
	- text-embedding
	- retrieval
	- quantization
	- int8
	pipeline_tag: sentence-similarity
	base_model: Qwen/Qwen3-Embedding-8B
	---

	# Octen-Embedding-8B-INT8

	Octen-Embedding-8B-INT8 is a text embedding model designed for semantic search and retrieval tasks. This model is fine-tuned from [Qwen/Qwen3-Embedding-8B](https://huggingface.co/Qwen/Qwen3-Embedding-8B) and supports multiple languages, providing high-quality embeddings for various applications.

	Quantization: This is an INT8 quantized version using bitsandbytes. INT8 quantization significantly reduces memory footprint (~50% smaller), making it suitable for deployment on resource-constrained environments. Note that while memory usage is reduced, inference speed may not necessarily improve and could be slightly slower than the BF16 version on some hardware.

	## Model Details

	- Base Model: [Qwen/Qwen3-Embedding-8B](https://huggingface.co/Qwen/Qwen3-Embedding-8B)
	- Model Size: 8B parameters (INT8 quantized)
	- Max Sequence Length: 40,960 tokens
	- Embedding Dimension: 4096
	- Languages: English, Chinese, and multilingual support
	- Training Method: LoRA fine-tuning
	- Quantization: INT8 (bitsandbytes)
	- Memory Footprint: ~8GB (vs ~16GB for BF16 version)

	## Usage

	### Using Sentence Transformers

	```python
	from sentence_transformers import SentenceTransformer

	model = SentenceTransformer("Octen/Octen-Embedding-8B-INT8")

	# Encode sentences
	sentences = [
	"This is an example sentence",
	"Each sentence is converted to a vector"
	]

	embeddings = model.encode(sentences)
	print(embeddings.shape)
	# Output: (2, 4096)

	# Compute similarity
	from sentence_transformers.util import cos_sim
	similarity = cos_sim(embeddings[0], embeddings[1])
	print(f"Similarity: {similarity.item():.4f}")
	```

	### Using Transformers

	```python
	from transformers import AutoModel, AutoTokenizer
	import torch
	import torch.nn.functional as F

	tokenizer = AutoTokenizer.from_pretrained("Octen/Octen-Embedding-8B-INT8", padding_side="left")
	model = AutoModel.from_pretrained("Octen/Octen-Embedding-8B-INT8")
	model.eval()

	def encode(texts):
	inputs = tokenizer(texts, padding=True, truncation=True,
	max_length=8192, return_tensors="pt")

	with torch.no_grad():
	outputs = model(**inputs)
	# Use last token embedding
	embeddings = outputs.last_hidden_state[:, -1, :]
	# Normalize embeddings
	embeddings = F.normalize(embeddings, p=2, dim=1)

	return embeddings

	# Example usage
	texts = ["Hello world", "你好世界"]
	embeddings = encode(texts)
	similarity = torch.matmul(embeddings[0], embeddings[1])
	print(f"Similarity: {similarity.item():.4f}")
	```

	## Recommended Use Cases

	- Semantic search and information retrieval
	- Document similarity and clustering
	- Question answering
	- Cross-lingual retrieval
	- Text classification with embeddings
	- Deployment on GPU-constrained environments

	## Limitations

	- Performance may vary across different domains and languages
	- Very long documents (>40K tokens) require truncation
	- Optimized for retrieval tasks, not for text generation
	- INT8 quantization may introduce minor accuracy degradation compared to BF16 version
	- Inference speed may not improve despite reduced memory usage

	## License

	This model is licensed under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0).

	This model is derived from [Qwen/Qwen3-Embedding-8B](https://huggingface.co/Qwen/Qwen3-Embedding-8B), which is also licensed under Apache License 2.0.

	## Paper

	For more details, please refer to our blog post: [Octen Series: Optimizing Embedding Models to #1 on RTEB Leaderboard](https://octen-team.github.io/octen_blog/posts/octen-rteb-first-place/)

	## Citation

	If you find our work helpful, please consider citing:

	```bibtex
	@misc{octen2025rteb,
	title={Octen Series: Optimizing Embedding Models to #1 on RTEB Leaderboard},
	author={Octen Team},
	year={2025},
	url={https://octen-team.github.io/octen_blog/posts/octen-rteb-first-place/}
	}
	```