metadata
license: mit
datasets:
- mteb/scifact
language:
- en
pipeline_tag: text-retrieval
library_name: sentence-transformers
tags:
- mteb
- text
- transformers
- text-embeddings-inference
- CSR
model-index:
- name: NV-Embed-v2
results:
- dataset:
name: MTEB SciFact
type: mteb/scifact
revision: 0228b52cf27578f30900b9e5271d331663a030d7
config: default
split: test
languages:
- eng-Latn
metrics:
- type: ndcg@1
value: 0.67
- type: ndcg@3
value: 0.7635
- type: ndcg@5
value: 0.78982
- type: ndcg@10
value: 0.80426
- type: ndcg@20
value: 0.80967
- type: ndcg@100
value: 0.81514
- type: ndcg@1000
value: 0.81692
- type: map@10
value: 0.75662
- type: map@100
value: 0.7593
- type: map@1000
value: 0.75937
- type: recall@10
value: 0.93889
- type: recall@100
value: 0.98667
- type: recall@1000
value: 1
- type: precision@1
value: 0.67
- type: precision@10
value: 0.106
- type: mrr@10
value: 0.76503
- type: main_score
value: 0.80426
task:
type: Retrieval
For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our Github.
Usage
📌 Tip: For NV-Embed-V2, using Transformers versions later than 4.47.0 may lead to performance degradation, as model_type=bidir_mistral in config.json is unsupported is no longer supported.
We recommend using Transformers 4.47.0.
Sentence Transformers Usage
You can evaluate this model loaded by Sentence Transformers with the following code snippet:
import mteb
from sentence_transformers import SentenceTransformer
model = SentenceTransformer(
"Y-Research-Group/CSR-NV_Embed_v2-Retrieval-SciFACT ",
trust_remote_code=True
)
model.prompts = {
"SciFact-query": "Instrcut: Given a scientific claim, retrieve documents that support or refute the claim\nQuery:"
}
task = mteb.get_tasks(tasks=["SciFact"])
evaluation = mteb.MTEB(tasks=task)
evaluation.run(model, eval_splits=["test"], output_folder="./results/SciFact",
batch_size=32, show_progress_bar=True)
Citation
@inproceedings{wenbeyond,
title={Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation},
author={Wen, Tiansheng and Wang, Yifei and Zeng, Zequn and Peng, Zhong and Su, Yudi and Liu, Xinyang and Chen, Bo and Liu, Hongwei and Jegelka, Stefanie and You, Chenyu},
booktitle={Forty-second International Conference on Machine Learning}
}