| | --- |
| | license: mit |
| | datasets: |
| | - mteb/nfcorpus |
| | language: |
| | - en |
| | pipeline_tag: text-retrieval |
| | library_name: sentence-transformers |
| | tags: |
| | - mteb |
| | - text |
| | - transformers |
| | - text-embeddings-inference |
| | - sparse-encoder |
| | - sparse |
| | - csr |
| | model-index: |
| | - name: NV-Embed-v2 |
| | results: |
| | - dataset: |
| | name: MTEB NFCorpus |
| | type: mteb/nfcorpus |
| | revision: ec0fa4fe99da2ff19ca1214b7966684033a58814 |
| | config: default |
| | split: test |
| | languages: |
| | - eng-Latn |
| | metrics: |
| | - type: ndcg@1 |
| | value: 0.43189 |
| | - type: ndcg@3 |
| | value: 0.41132 |
| | - type: ndcg@5 |
| | value: 0.40406 |
| | - type: ndcg@10 |
| | value: 0.39624 |
| | - type: ndcg@20 |
| | value: 0.38517 |
| | - type: ndcg@100 |
| | value: 0.40068 |
| | - type: ndcg@1000 |
| | value: 0.49126 |
| | - type: map@10 |
| | value: 0.14342 |
| | - type: map@100 |
| | value: 0.21866 |
| | - type: map@1000 |
| | value: 0.2427 |
| | - type: recall@10 |
| | value: 0.1968 |
| | - type: recall@100 |
| | value: 0.45592 |
| | - type: recall@1000 |
| | value: 0.78216 |
| | - type: precision@1 |
| | value: 0.45511 |
| | - type: precision@10 |
| | value: 0.32353 |
| | - type: mrr@10 |
| | value: 0.537792 |
| | - type: main_score |
| | value: 0.39624 |
| | task: |
| | type: Retrieval |
| | base_model: |
| | - nvidia/NV-Embed-v2 |
| | --- |
| | |
| |
|
| | For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our [Github](https://github.com/neilwen987/CSR_Adaptive_Rep). |
| |
|
| |
|
| | ## Usage |
| | 📌 **Tip**: For NV-Embed-V2, using Transformers versions **later** than 4.47.0 may lead to performance degradation, as ``model_type=bidir_mistral`` in ``config.json`` is no longer supported. |
| |
|
| | We recommend using ``Transformers 4.47.0.`` |
| |
|
| | ### Sentence Transformers Usage |
| | You can evaluate this model loaded by Sentence Transformers with the following code snippet: |
| | ```python |
| | import mteb |
| | from sentence_transformers import SparseEncoder |
| | |
| | model = SparseEncoder("Y-Research-Group/CSR-NV_Embed_v2-Retrieval-NFcorpus", trust_remote_code=True) |
| | model.prompts = { |
| | "NFCorpus-query": "Instruct: Given a question, retrieve relevant documents that answer the question\nQuery:" |
| | } |
| | |
| | task = mteb.get_tasks(tasks=["NFCorpus"]) |
| | evaluation = mteb.MTEB(tasks=task) |
| | evaluation.run( |
| | model, |
| | eval_splits=["test"], |
| | output_folder="./results/NFCorpus", |
| | show_progress_bar=True, |
| | encode_kwargs={"convert_to_sparse_tensor": False, "batch_size": 8}, |
| | ) # MTEB don't support sparse tensors yet, so we need to convert to dense tensors |
| | ``` |
| |
|
| | ## Citation |
| | ```bibtex |
| | @misc{wen2025matryoshkarevisitingsparsecoding, |
| | title={Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation}, |
| | author={Tiansheng Wen and Yifei Wang and Zequn Zeng and Zhong Peng and Yudi Su and Xinyang Liu and Bo Chen and Hongwei Liu and Stefanie Jegelka and Chenyu You}, |
| | year={2025}, |
| | eprint={2503.01776}, |
| | archivePrefix={arXiv}, |
| | primaryClass={cs.LG}, |
| | url={https://arxiv.org/abs/2503.01776}, |
| | } |
| | ``` |