Kingsoft-LLM
/

QZhou-Embedding

@@ -12,6 +12,9 @@ tags:
 <img src="assets/image-1.png" width="800" height="300"></img>
 </div>
 ## Introduction
 We have released <a href="https://huggingface.co/Kingsoft-LLM/QZhou-Embedding">QZhou-Embedding</a> (called "Qingzhou Embedding"), a large-scale text embedding model designed for general use，excelling at various text embedding tasks (retrieval, re-ranking, sentence similarity, and classification). We modified the model structure based on Qwen2Model by changing the causal attention mechanism to bidirectional attention, so that each token can capture the global context semantics. The new module is named QZhouModel.Leveraging the general language capabilities of its underlying model, and pre-trained on massive amounts of text, QZhou-Embedding achieves even more powerful text embedding representations. QZhou-Embedding is continuously trained using millions of high-quality open-source embedding datasets and over 5 million high-quality synthetic data (using two synthetic techniques: rewriting and expansion). Initial retrieval training provides the model with a foundation for query-doc semantic matching capabilities. Later, multi-dimensional training such as STS and clustering, helps the model achieve continuous breakthroughs in various tasks. QZhou-Embedding is a 7B model and can embed long text vectors up to 8k in size. It achieved the highest average score on the mteb/cmteb evaluation benchmarks. In terms of various task scores, its clustering, sentence pair classification, rearrangement, and STS task achieved the highest average scores.
 ## Basic Features

 <img src="assets/image-1.png" width="800" height="300"></img>
 </div>
+**<span style="color:red">Important Notice:</span>**
+Our model parameters were **<span style="color:red">updated on August 24</span>**. If you downloaded the files prior to this date, please ensure you update to the latest version at your earliest convenience.
 ## Introduction
 We have released <a href="https://huggingface.co/Kingsoft-LLM/QZhou-Embedding">QZhou-Embedding</a> (called "Qingzhou Embedding"), a large-scale text embedding model designed for general use，excelling at various text embedding tasks (retrieval, re-ranking, sentence similarity, and classification). We modified the model structure based on Qwen2Model by changing the causal attention mechanism to bidirectional attention, so that each token can capture the global context semantics. The new module is named QZhouModel.Leveraging the general language capabilities of its underlying model, and pre-trained on massive amounts of text, QZhou-Embedding achieves even more powerful text embedding representations. QZhou-Embedding is continuously trained using millions of high-quality open-source embedding datasets and over 5 million high-quality synthetic data (using two synthetic techniques: rewriting and expansion). Initial retrieval training provides the model with a foundation for query-doc semantic matching capabilities. Later, multi-dimensional training such as STS and clustering, helps the model achieve continuous breakthroughs in various tasks. QZhou-Embedding is a 7B model and can embed long text vectors up to 8k in size. It achieved the highest average score on the mteb/cmteb evaluation benchmarks. In terms of various task scores, its clustering, sentence pair classification, rearrangement, and STS task achieved the highest average scores.
 ## Basic Features