File size: 3,230 Bytes
b825489
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
---
license: apache-2.0
language:
- en
base_model:
- Qwen/Qwen3-Embedding-4B
library_name: sentence-transformers
---
## Description
This is one [CSRv2](https://arxiv.org/abs/2602.05735) model finetuned on [MTEB](https://huggingface.co/mteb) 
classification datasets with [Qwen3-Embedding-4B](https://huggingface.co/Qwen/Qwen3-Embedding-4B) as backbone.

For more details, including benchmark evaluation, hardware requirements, and inference performance, please 
refer to our [Github](https://github.com/Y-Research-SBU/CSRv2).

## Sentence Transformer Usage
You can evaluate this model loaded by Sentence Transformers with the following code snippet (take Banking77 as one example):
```python
import mteb
from sentence_transformers import SparseEncoder
model = SparseEncoder(
    "Y-Research-Group/CSRv2-classification",  
    trust_remote_code=True
)
model.prompts = {
    "Banking77Classification": "Instruct: Given a online banking query, find the corresponding intents\nQuery:"
}
task = mteb.get_tasks(tasks=["Banking77Classification"])
evaluation = mteb.MTEB(tasks=task)
evaluation.run(
    model,
    eval_splits=["test"],
    output_folder="./results/Banking77Classification",
    show_progress_bar=True
    encode_kwargs={"convert_to_sparse_tensor": False, "batch_size": 8}
)  # MTEB don't support sparse tensors yet, so we need to convert to dense tensors
```

It is suggested that you use our [default prompts](https://github.com/Y-Research-SBU/CSRv2/blob/main/text/dataset_to_prompt.json)
in evaluation.

## Multi-TopK Support

Our model supports different sparsity levels due to the utilization of **Multi-TopK** loss in training. 
You can change sparsity model by adjusting the `k` parameter` in the file `3_SparseAutoEncoder/config.json`. 
We set sparsity level to 2 by default.

For instance, if you want to evaluate with sparsity level $K=8$ (which means there are 8 activated neurons in 
each embedding vector), the `3_SparseAutoEncoder/config.json` should look like this:

```json
{
    "input_dim": 2560,
    "hidden_dim": 10240,
    "k": 8,
    "k_aux": 1024,
    "normalize": false,
    "dead_threshold": 30
}
```


## CSRv2 Qwen Series
We will release a series of [CSRv2](https://arxiv.org/abs/2602.05735) models finetuned on common tasks in
[MTEB](https://huggingface.co/mteb) with [Qwen3-Embedding-4B](https://huggingface.co/Qwen/Qwen3-Embedding-4B) 
as backbone. These tasks are:  

- **[Classification](https://huggingface.co/Y-Research-Group/CSRv2-classification)**
- **[Clustering](https://huggingface.co/Y-Research-Group/CSRv2-clustering)**
- **[Retrieval](https://huggingface.co/Y-Research-Group/CSRv2-retrieval)**
- **[STS](https://huggingface.co/Y-Research-Group/CSRv2-sts)**
- **[Pair_classification](https://huggingface.co/Y-Research-Group/CSRv2-pair_classification)**
- **[Reranking](https://huggingface.co/Y-Research-Group/CSRv2-reranking)**

## Citation
```bibtex
@inproceedings{guo2026csrv2,
    title={{CSR}v2: Unlocking Ultra-sparse Embeddings},
    author={Guo, Lixuan and Wang, Yifei and Wen, Tiansheng and Wang, Yifan and Feng, Aosong and Chen, Bo and Jegelka, Stefanie and You, Chenyu},
    booktitle={International Conference on Learning Representations (ICLR)},
    year={2026}
}
```