---
license: apache-2.0
tags:
- sentence-transformers
- sentence-similarity
- mteb
- retriever
- text-embeddings-inference
---
# QZhou-Embedding
## Introduction
We present QZhou-Embedding (called "Qingzhou Embedding"), a general-purpose contextual text embedding model with exceptional text representation capabilities. Built upon the Qwen2.5-7B-Instruct foundation model, we designed a unified multi-task framework and developed a data synthesis pipeline leveraging LLM API, effectively improving the diversity and quality of training data, further enhancing the model's generalization and text representation capabilities. Additionally, we employ a two-stage training strategy, comprising initial retrieval-focused training followed by full-task fine-tuning, enabling the embedding model to extend its capabilities based on robust retrieval performance. Our model achieves state-of-the-art results on the MTEB and CMTEB benchmarks, ranking first on both leaderboards(August 27, 2025).
**Latest Updates:**
**1. Our technical report has now been released. Welcome your feedback!** Link: [QZhou-Embedding](https://arxiv.org/abs/2508.21632)
**2. We have added support for vLLM.**
## Basic Features
- Powerful text embedding capabilities;
- Long context: up to 8k context length;
- 7B parameter size
## Model Refactoring
For the Qwen base model, we implemented the following modifications:
1. Replaced causal attention with bidirectional attention and constructed a new QZhouModel module based on Qwen2Model;
2. Modified the tokenizer's padding_side to "left".
## MTEB/CMTEB Results
## Usage
### Completely replicate the benchmark results
We provide detailed parameters and environment configurations so that you can run results that are completely consistent with the mteb leaderboard on your own machine, including configurations such as environment dependencies and model arguments.
#### Requirements
- Python: 3.10.12
- Sentence Transformers: 3.4.1
- Transformers: 4.51.1
- PyTorch: 2.7.1
- Accelerate: 1.3.0
- Datasets: 3.2.0
- Tokenizers: 0.21.2
- mteb: 1.38.30
- vllm: 0.10.1.1
#### Transformers model load arguments
torch_dtype=torch.bfloat16
attn_implementation='sdpa'
**NOTE:** The leaderboard evaluation results were obtained using "sdpa" mode. Other modes ('eager', 'flash_attention_2') may vary in results, but still keep the overall performance consistent.
#### Instruction Adding Rules
Details can be found on our GitHub.
#### Evaluation code usage
Find our benchmark evaluation code on GitHub. The mteb benchmark script is **run_mteb_all_v2.py**, and the cmteb benchmark script is **run_cmteb_all.py**. Run the following command:
```bash
POOLING_MODE=mean
normalize=true
use_instruction=true
export TOKENIZERS_PARALLELISM=true
model_name_or_path=
python3 ./run_cmteb_all.py \
--model_name_or_path ${model_name_or_path} \
--pooling_mode ${POOLING_MODE} \
--normalize ${normalize} \
--use_instruction ${use_instruction} \
--output_dir