YuPeng0214 commited on
Commit
bc5c8a0
·
verified ·
1 Parent(s): 882df3c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -3
README.md CHANGED
@@ -14,9 +14,10 @@ tags:
14
 
15
 
16
  ## Introduction
17
- We present <a href="https://huggingface.co/Kingsoft-LLM/QZhou-Embedding">QZhou-Embedding</a> (called "Qingzhou Embedding"), a general-purpose contextual text embedding model with exceptional text representation capabilities. Built upon the <a href="https://huggingface.co/Qwen/Qwen2.5-7B-Instruct">Qwen2.5-7B-Instruct</a> foundation model, we designed a unified multi-task framework and developed a data synthesis pipeline leveraging LLM APIs, effectively improving the diversity and quality of training data, further enhancing the model's generalization and text representation capabilities. Additionally, we employ a two-stage training strategy, comprising initial retrieval-focused training followed by full-task fine-tuning, enabling the embedding model to extend its capabilities based on robust retrieval performance. Our model achieves state-of-the-art results on the MTEB and CMTEB benchmarks, ranking first on both leaderboards(August 27, 2025).
18
 
19
- **<span style="color:red">We will promptly release our technical report—stay tuned!</span>**
 
20
 
21
  ## Basic Features
22
 
@@ -171,7 +172,16 @@ Our initial research experiments commenced prior to the release of Qwen3. To mai
171
  ### Citation
172
  If you find our work worth citing, please use the following citation:<br>
173
  **Technical Report:**<br>
174
- Coming soon...<br>
 
 
 
 
 
 
 
 
 
175
  **Qwen2.5-7B-Instruct:**
176
  ```
177
  @misc{qwen2.5,
 
14
 
15
 
16
  ## Introduction
17
+ We present <a href="https://huggingface.co/Kingsoft-LLM/QZhou-Embedding">QZhou-Embedding</a> (called "Qingzhou Embedding"), a general-purpose contextual text embedding model with exceptional text representation capabilities. Built upon the <a href="https://huggingface.co/Qwen/Qwen2.5-7B-Instruct">Qwen2.5-7B-Instruct</a> foundation model, we designed a unified multi-task framework and developed a data synthesis pipeline leveraging LLM API, effectively improving the diversity and quality of training data, further enhancing the model's generalization and text representation capabilities. Additionally, we employ a two-stage training strategy, comprising initial retrieval-focused training followed by full-task fine-tuning, enabling the embedding model to extend its capabilities based on robust retrieval performance. Our model achieves state-of-the-art results on the MTEB and CMTEB benchmarks, ranking first on both leaderboards(August 27, 2025).
18
 
19
+ **<span style="color:green">Our technical report has now been released. Welcome your feedback!</span>**
20
+ ​​Link:​​ <a href="https://arxiv.org/abs/2508.21632">[QZhou-Embedding](https://arxiv.org/abs/2508.21632)</a>
21
 
22
  ## Basic Features
23
 
 
172
  ### Citation
173
  If you find our work worth citing, please use the following citation:<br>
174
  **Technical Report:**<br>
175
+ @misc{yu2025qzhouembeddingtechnicalreport,
176
+ title={QZhou-Embedding Technical Report},
177
+ author={Peng Yu and En Xu and Bin Chen and Haibiao Chen and Yinfei Xu},
178
+ year={2025},
179
+ eprint={2508.21632},
180
+ archivePrefix={arXiv},
181
+ primaryClass={cs.CL},
182
+ url={https://arxiv.org/abs/2508.21632},
183
+ }<br>
184
+
185
  **Qwen2.5-7B-Instruct:**
186
  ```
187
  @misc{qwen2.5,