File size: 3,114 Bytes
0cca3b0 0d9fd7f 7c7cd3e eaa8266 7c7cd3e eaa8266 0d9fd7f eaa8266 0d9fd7f eaa8266 0d9fd7f eaa8266 0d9fd7f eaa8266 0d9fd7f eaa8266 0d9fd7f eaa8266 0d9fd7f eaa8266 0d9fd7f eaa8266 0d9fd7f eaa8266 0d9fd7f eaa8266 0d9fd7f eaa8266 0d9fd7f eaa8266 0d9fd7f eaa8266 0d9fd7f eaa8266 0d9fd7f eaa8266 0d9fd7f 7c7cd3e 0d9fd7f 7c7cd3e eaa8266 0d9fd7f 0cca3b0 2cb0a6e 0cca3b0 7c7cd3e 0d9fd7f 7c7cd3e 0d9fd7f 7c7cd3e 0d9fd7f eaa8266 7c7cd3e 0d9fd7f 7c7cd3e 0cca3b0 0d9fd7f 0cca3b0 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 | ---
license: mit
datasets:
- mteb/nfcorpus
language:
- en
pipeline_tag: text-retrieval
library_name: sentence-transformers
tags:
- mteb
- text
- transformers
- text-embeddings-inference
- sparse-encoder
- sparse
- csr
model-index:
- name: NV-Embed-v2
results:
- dataset:
name: MTEB NFCorpus
type: mteb/nfcorpus
revision: ec0fa4fe99da2ff19ca1214b7966684033a58814
config: default
split: test
languages:
- eng-Latn
metrics:
- type: ndcg@1
value: 0.43189
- type: ndcg@3
value: 0.41132
- type: ndcg@5
value: 0.40406
- type: ndcg@10
value: 0.39624
- type: ndcg@20
value: 0.38517
- type: ndcg@100
value: 0.40068
- type: ndcg@1000
value: 0.49126
- type: map@10
value: 0.14342
- type: map@100
value: 0.21866
- type: map@1000
value: 0.2427
- type: recall@10
value: 0.1968
- type: recall@100
value: 0.45592
- type: recall@1000
value: 0.78216
- type: precision@1
value: 0.45511
- type: precision@10
value: 0.32353
- type: mrr@10
value: 0.537792
- type: main_score
value: 0.39624
task:
type: Retrieval
base_model:
- nvidia/NV-Embed-v2
---
For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our [Github](https://github.com/neilwen987/CSR_Adaptive_Rep).
## Usage
📌 **Tip**: For NV-Embed-V2, using Transformers versions **later** than 4.47.0 may lead to performance degradation, as ``model_type=bidir_mistral`` in ``config.json`` is no longer supported.
We recommend using ``Transformers 4.47.0.``
### Sentence Transformers Usage
You can evaluate this model loaded by Sentence Transformers with the following code snippet:
```python
import mteb
from sentence_transformers import SparseEncoder
model = SparseEncoder("Y-Research-Group/CSR-NV_Embed_v2-Retrieval-NFcorpus", trust_remote_code=True)
model.prompts = {
"NFCorpus-query": "Instruct: Given a question, retrieve relevant documents that answer the question\nQuery:"
}
task = mteb.get_tasks(tasks=["NFCorpus"])
evaluation = mteb.MTEB(tasks=task)
evaluation.run(
model,
eval_splits=["test"],
output_folder="./results/NFCorpus",
show_progress_bar=True,
encode_kwargs={"convert_to_sparse_tensor": False, "batch_size": 8},
) # MTEB don't support sparse tensors yet, so we need to convert to dense tensors
```
## Citation
```bibtex
@misc{wen2025matryoshkarevisitingsparsecoding,
title={Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation},
author={Tiansheng Wen and Yifei Wang and Zequn Zeng and Zhong Peng and Yudi Su and Xinyang Liu and Bo Chen and Hongwei Liu and Stefanie Jegelka and Chenyu You},
year={2025},
eprint={2503.01776},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2503.01776},
}
``` |