Spaces:

rlhn
/

README

Running

App Files Files Community

README / README.md

nthakur

updated README

27ca3ad verified 2 months ago

preview code

raw

history blame contribute delete

2.59 kB

	---
	title: README
	emoji: 🐠
	colorFrom: pink
	colorTo: blue
	sdk: static
	pinned: false
	license: cc-by-sa-4.0
	---

	# Welcome to RLHN (EMNLP 2025 Findings)
	RLHN (ReLabeing Hard Negatives) uses a cascading LLM framework to identify and relabel false negatives in IR training datasets.

	This repository contains training datasets curated by RLHN \& models fine-tuned on these curated datasets.

	List of Contributors:
	- Nandan Thakur*
	- Crystina Zhang*
	- Xueguang Ma
	- Jimmy Lin

	Paper URL: https://aclanthology.org/2025.findings-emnlp.481/

	# Citation

	```
	@inproceedings{thakur-etal-2025-hard,
	title = "Hard Negatives, Hard Lessons: Revisiting Training Data Quality for Robust Information Retrieval with {LLM}s",
	author = "Thakur, Nandan and
	Zhang, Crystina and
	Ma, Xueguang and
	Lin, Jimmy",
	editor = "Christodoulopoulos, Christos and
	Chakraborty, Tanmoy and
	Rose, Carolyn and
	Peng, Violet",
	booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2025",
	month = nov,
	year = "2025",
	address = "Suzhou, China",
	publisher = "Association for Computational Linguistics",
	url = "https://aclanthology.org/2025.findings-emnlp.481/",
	doi = "10.18653/v1/2025.findings-emnlp.481",
	pages = "9064--9083",
	ISBN = "979-8-89176-335-7",
	abstract = "Training robust retrieval and reranker models typically relies on large-scale retrieval datasets; for example, the BGE collection contains 1.6 million query-passage pairs sourced from various data sources. However, we find that certain datasets can negatively impact model effectiveness {---} pruning 8 out of 15 datasets from the BGE collection, reduces the training set size by 2.35{\texttimes}, surprisingly increases nDCG@10 on BEIR by 1.0 point. This motivates a deeper examination of training data quality, with a particular focus on ``false negatives'', where relevant passages are incorrectly labeled as irrelevant. We utilize LLMs as a simple, cost-effective approach to \textit{identify} and \textit{relabel} false negatives in training datasets. Experimental results show that relabeling false negatives as true positives improves both E5 (base) and Qwen2.5-7B retrieval models by 0.7-1.4 points on BEIR and by 1.7-1.8 points at nDCG@10 on zero-shot AIR-Bench evaluation. Similar gains are observed for rerankers fine-tuned on the relabeled data, such as Qwen2.5-3B on BEIR. The reliability of LLMs to identify false negatives is supported by human annotation results. Our training dataset and code are publicly available."
	}
	```