README.md · Y-Research-Group/CSR-NV_Embed_v2-Retrieval-NFcorpus at main

CSR-NV_Embed_v2-Retrieval-NFcorpus / README.md

W1nd-navigator

Update README.md

2cb0a6e verified 8 months ago

preview code

raw

history blame contribute delete

3.11 kB

	---
	license: mit
	datasets:
	- mteb/nfcorpus
	language:
	- en
	pipeline_tag: text-retrieval
	library_name: sentence-transformers
	tags:
	- mteb
	- text
	- transformers
	- text-embeddings-inference
	- sparse-encoder
	- sparse
	- csr
	model-index:
	- name: NV-Embed-v2
	results:
	- dataset:
	name: MTEB NFCorpus
	type: mteb/nfcorpus
	revision: ec0fa4fe99da2ff19ca1214b7966684033a58814
	config: default
	split: test
	languages:
	- eng-Latn
	metrics:
	- type: ndcg@1
	value: 0.43189
	- type: ndcg@3
	value: 0.41132
	- type: ndcg@5
	value: 0.40406
	- type: ndcg@10
	value: 0.39624
	- type: ndcg@20
	value: 0.38517
	- type: ndcg@100
	value: 0.40068
	- type: ndcg@1000
	value: 0.49126
	- type: map@10
	value: 0.14342
	- type: map@100
	value: 0.21866
	- type: map@1000
	value: 0.2427
	- type: recall@10
	value: 0.1968
	- type: recall@100
	value: 0.45592
	- type: recall@1000
	value: 0.78216
	- type: precision@1
	value: 0.45511
	- type: precision@10
	value: 0.32353
	- type: mrr@10
	value: 0.537792
	- type: main_score
	value: 0.39624
	task:
	type: Retrieval
	base_model:
	- nvidia/NV-Embed-v2
	---


	For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our [Github](https://github.com/neilwen987/CSR_Adaptive_Rep).


	## Usage
	📌 Tip: For NV-Embed-V2, using Transformers versions later than 4.47.0 may lead to performance degradation, as ``model_type=bidir_mistral`` in ``config.json`` is no longer supported.

	We recommend using ``Transformers 4.47.0.``

	### Sentence Transformers Usage
	You can evaluate this model loaded by Sentence Transformers with the following code snippet:
	```python
	import mteb
	from sentence_transformers import SparseEncoder

	model = SparseEncoder("Y-Research-Group/CSR-NV_Embed_v2-Retrieval-NFcorpus", trust_remote_code=True)
	model.prompts = {
	"NFCorpus-query": "Instruct: Given a question, retrieve relevant documents that answer the question\nQuery:"
	}

	task = mteb.get_tasks(tasks=["NFCorpus"])
	evaluation = mteb.MTEB(tasks=task)
	evaluation.run(
	model,
	eval_splits=["test"],
	output_folder="./results/NFCorpus",
	show_progress_bar=True,
	encode_kwargs={"convert_to_sparse_tensor": False, "batch_size": 8},
	) # MTEB don't support sparse tensors yet, so we need to convert to dense tensors
	```

	## Citation
	```bibtex
	@misc{wen2025matryoshkarevisitingsparsecoding,
	title={Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation},
	author={Tiansheng Wen and Yifei Wang and Zequn Zeng and Zhong Peng and Yudi Su and Xinyang Liu and Bo Chen and Hongwei Liu and Stefanie Jegelka and Chenyu You},
	year={2025},
	eprint={2503.01776},
	archivePrefix={arXiv},
	primaryClass={cs.LG},
	url={https://arxiv.org/abs/2503.01776},
	}
	```