sentence-transformers
Safetensors
English
CSRv2-sts / README.md
Veritas2025's picture
Create README.md
3179e73 verified
---
license: apache-2.0
language:
- en
base_model:
- Qwen/Qwen3-Embedding-4B
library_name: sentence-transformers
---
## Description
This is one [CSRv2](https://arxiv.org/abs/2602.05735) model finetuned on [MTEB](https://huggingface.co/mteb)
sts datasets with [Qwen3-Embedding-4B](https://huggingface.co/Qwen/Qwen3-Embedding-4B) as backbone.
For more details, including benchmark evaluation, hardware requirements, and inference performance, please
refer to our [Github](https://github.com/Y-Research-SBU/CSRv2).
## Sentence Transformer Usage
You can evaluate this model loaded by Sentence Transformers with the following code snippet (take STS12 as one example):
```python
import mteb
from sentence_transformers import SparseEncoder
model = SparseEncoder(
"Y-Research-Group/CSRv2-sts", 
trust_remote_code=True
)
model.prompts = {
"STS12": "Instruct: Retrieve semantically similar text\n Query:" 
}
task = mteb.get_tasks(tasks=["STS12"])
evaluation = mteb.MTEB(tasks=task)
evaluation.run(
model,
eval_splits=["test"],
output_folder="./results/STS12",
show_progress_bar=True
encode_kwargs={"convert_to_sparse_tensor": False, "batch_size": 8}
) # MTEB don't support sparse tensors yet, so we need to convert to dense tensors
```
It is suggested that you use our [default prompts](https://github.com/Y-Research-SBU/CSRv2/blob/main/text/dataset_to_prompt.json)
in evaluation.
## Multi-TopK Support
Our model supports different sparsity levels due to the utilization of **Multi-TopK** loss in training.
You can change sparsity model by adjusting the `k` parameter` in the file `3_SparseAutoEncoder/config.json`.
We set sparsity level to 2 by default.
For instance, if you want to evaluate with sparsity level $K=8$ (which means there are 8 activated neurons in
each embedding vector), the `3_SparseAutoEncoder/config.json` should look like this:
```json
{
"input_dim": 2560,
"hidden_dim": 10240,
"k": 8,
"k_aux": 1024,
"normalize": false,
"dead_threshold": 30
}
```
## CSRv2 Qwen Series
We will release a series of [CSRv2](https://arxiv.org/abs/2602.05735) models finetuned on common tasks in
[MTEB](https://huggingface.co/mteb) with [Qwen3-Embedding-4B](https://huggingface.co/Qwen/Qwen3-Embedding-4B)
as backbone. These tasks are: 
- **[Classification](https://huggingface.co/Y-Research-Group/CSRv2-classification)**
- **[Clustering](https://huggingface.co/Y-Research-Group/CSRv2-clustering)**
- **[Retrieval](https://huggingface.co/Y-Research-Group/CSRv2-retrieval)**
- **[STS](https://huggingface.co/Y-Research-Group/CSRv2-sts)**
- **[Pair_classification](https://huggingface.co/Y-Research-Group/CSRv2-pair_classification)**
- **[Reranking](https://huggingface.co/Y-Research-Group/CSRv2-reranking)**
## Citation
```bibtex
@inproceedings{guo2026csrv2,
title={{CSR}v2: Unlocking Ultra-sparse Embeddings},
author={Guo, Lixuan and Wang, Yifei and Wen, Tiansheng and Wang, Yifan and Feng, Aosong and Chen, Bo and Jegelka, Stefanie and You, Chenyu},
booktitle={International Conference on Learning Representations (ICLR)},
year={2026}
}
```