Update README.md

954bd0d verified 10 months ago

2.84 kB

license: mit
datasets:
  - mteb/scifact
language:
  - en
pipeline_tag: text-retrieval
library_name: sentence-transformers
tags:
  - mteb
  - text
  - transformers
  - text-embeddings-inference
  - CSR
model-index:
  - name: NV-Embed-v2
    results:
      - dataset:
          name: MTEB SciFact
          type: mteb/scifact
          revision: 0228b52cf27578f30900b9e5271d331663a030d7
          config: default
          split: test
          languages:
            - eng-Latn
        metrics:
          - type: ndcg@1
            value: 0.67
          - type: ndcg@3
            value: 0.7635
          - type: ndcg@5
            value: 0.78982
          - type: ndcg@10
            value: 0.80426
          - type: ndcg@20
            value: 0.80967
          - type: ndcg@100
            value: 0.81514
          - type: ndcg@1000
            value: 0.81692
          - type: map@10
            value: 0.75662
          - type: map@100
            value: 0.7593
          - type: map@1000
            value: 0.75937
          - type: recall@10
            value: 0.93889
          - type: recall@100
            value: 0.98667
          - type: recall@1000
            value: 1
          - type: precision@1
            value: 0.67
          - type: precision@10
            value: 0.106
          - type: mrr@10
            value: 0.76503
          - type: main_score
            value: 0.80426
        task:
          type: Retrieval

For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our Github.

Usage

📌 Tip: For NV-Embed-V2, using Transformers versions later than 4.47.0 may lead to performance degradation, as model_type=bidir_mistral in config.json is unsupported is no longer supported.

We recommend using Transformers 4.47.0.

Sentence Transformers Usage

You can evaluate this model loaded by Sentence Transformers with the following code snippet:

import mteb
from sentence_transformers import SentenceTransformer
model = SentenceTransformer(
    "Y-Research-Group/CSR-NV_Embed_v2-Retrieval-SciFACT ",
    trust_remote_code=True
)
model.prompts = {
    "SciFact-query": "Instrcut: Given a scientific claim, retrieve documents that support or refute the claim\nQuery:"
}
task = mteb.get_tasks(tasks=["SciFact"])
evaluation = mteb.MTEB(tasks=task)
evaluation.run(model, eval_splits=["test"], output_folder="./results/SciFact", 
               batch_size=32, show_progress_bar=True)

Citation

@inproceedings{wenbeyond,
  title={Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation},
  author={Wen, Tiansheng and Wang, Yifei and Zeng, Zequn and Peng, Zhong and Su, Yudi and Liu, Xinyang and Chen, Bo and Liu, Hongwei and Jegelka, Stefanie and You, Chenyu},
  booktitle={Forty-second International Conference on Machine Learning}
}