Fix example outputs, and add warning

a8f3049 verified 4 months ago

5.9 kB

	---
	library_name: transformers
	license: cc-by-nc-sa-4.0
	pipeline_tag: text-ranking
	---

	<div align="center">

	# Contextual AI Reranker v2 2B-NVFP4

	<img src="Contextual_AI_Brand_Mark_Dark.png" width="10%" alt="Contextual_AI"/>

	[![Blog Post](https://img.shields.io/badge/📝%20Blog-ContextualReranker-green)](https://contextual.ai/blog/rerank-v2)
	[![Hugging Face Collection](https://img.shields.io/badge/🤗%20Hugging%20Face-Model%20Collection-yellow)](https://huggingface.co/collections/ContextualAI/contextual-ai-reranker-v2)

	</div>

	<hr>

	## Highlights

	Contextual AI's reranker is the first instruction-following reranker capable of handling retrieval conflicts and ranking with custom instructions (e.g., prioritizing recent information). It achieves state-of-the-art performance on BEIR and sits on the cost/performance Pareto frontier across:

	- Instruction following
	- Question answering
	- Multilinguality (100+ languages)
	- Product search & recommendation
	- Real-world use cases

	<p align="center">
	<img src="main_benchmark.png" width="1200"/>
	<p>

	For detailed benchmarks, see our [blog post](https://contextual.ai/blog/rerank-v2).

	## Overview

	- Model Type: Text Reranking
	- Supported Languages: 100+
	- Parameters: 2B
	- Precision: NVFP4 (4-bit floating point)
	- Context Length: up to 32K

	## When to Use This Model

	Use this reranker when you need to:
	- Re-rank retrieved documents with custom instructions
	- Handle conflicting information in retrieval results
	- Prioritize documents by recency or other criteria
	- Support multilingual search (100+ languages)
	- Process long contexts (up to 32K tokens)
	- Maximize efficiency with 4-bit precision (NVFP4)

	## Quickstart

	### Basic Usage

	```python
	# Requires vLLM==0.10.0 for NVFP4 support
	# See full implementation below

	model_path = "ContextualAI/ctxl-rerank-v2-instruct-multilingual-2b-nvfp4"

	query = "What are the health benefits of exercise?"
	instruction = "Prioritize recent medical research"
	documents = [
	"Regular exercise reduces risk of heart disease and improves mental health.",
	"A 2024 study shows exercise enhances cognitive function in older adults.",
	"Ancient Greeks valued physical fitness for military training."
	]

	infer_w_vllm(model_path, query, instruction, documents)
	```

	Expected Output:
	> ⚠️ Warning: These scores are produced using the BF16 model. If you run the same query with NVFP4, the scores may be slightly different.
	```
	Query: What are the health benefits of exercise?
	Instruction: Prioritize recent medical research
	Score: 0.8398 \| Doc: A 2024 study shows exercise enhances cognitive function in older adults.
	Score: -2.5469 \| Doc: Regular exercise reduces risk of heart disease and improves mental health.
	Score: -9.3750 \| Doc: Ancient Greeks valued physical fitness for military training.
	```

	### vLLM Usage

	Requires `vllm==0.10.0` for NVFP4 support.

	```python
	import os
	os.environ['VLLM_USE_V1'] = '0' # v1 engine doesn't support logits processor yet

	import torch
	from vllm import LLM, SamplingParams


	def logits_processor(_, scores):
	"""Custom logits processor for vLLM reranking."""
	index = scores[0].view(torch.uint16)
	scores = torch.full_like(scores, float("-inf"))
	scores[index] = 1
	return scores


	def format_prompts(query: str, instruction: str, documents: list[str]) -> list[str]:
	"""Format query and documents into prompts for reranking."""
	if instruction:
	instruction = f" {instruction}"
	prompts = []
	for doc in documents:
	prompt = f"Check whether a given document contains information helpful to answer the query.\n<Document> {doc}\n<Query> {query}{instruction} ??"
	prompts.append(prompt)
	return prompts


	def infer_w_vllm(model_path: str, query: str, instruction: str, documents: list[str]):
	model = LLM(
	model=model_path,
	gpu_memory_utilization=0.85,
	max_model_len=8192,
	dtype="bfloat16",
	max_logprobs=2,
	max_num_batched_tokens=262144,
	)
	sampling_params = SamplingParams(
	temperature=0,
	max_tokens=1,
	logits_processors=[logits_processor]
	)
	prompts = format_prompts(query, instruction, documents)

	outputs = model.generate(prompts, sampling_params, use_tqdm=False)

	# Extract scores and create results
	results = []
	for i, output in enumerate(outputs):
	score = (
	torch.tensor([output.outputs[0].token_ids[0]], dtype=torch.uint16)
	.view(torch.bfloat16)
	.item()
	)
	results.append((score, i, documents[i]))

	# Sort by score (descending)
	results = sorted(results, key=lambda x: x[0], reverse=True)

	print(f"Query: {query}")
	print(f"Instruction: {instruction}")
	for score, doc_id, doc in results:
	print(f"Score: {score:.4f} \| Doc: {doc}")


	# Example usage
	if __name__ == "__main__":
	model_path = "ContextualAI/ctxl-rerank-v2-instruct-multilingual-2b-nvfp4"
	query = "What are the health benefits of exercise?"
	instruction = "Prioritize recent medical research"
	documents = [
	"Regular exercise reduces risk of heart disease and improves mental health.",
	"A 2024 study shows exercise enhances cognitive function in older adults.",
	"Ancient Greeks valued physical fitness for military training."
	]

	infer_w_vllm(model_path, query, instruction, documents)
	```

	## Citation

	If you use this model, please cite:

	```bibtex
	@misc{ctxl_rerank_v2_instruct_multilingual,
	title={Contextual AI Reranker v2},
	author={Halal, George and Agrawal, Sheshansh},
	year={2025},
	url={https://contextual.ai/blog/rerank-v2},
	}
	```

	## License

	Creative Commons Attribution Non Commercial Share Alike 4.0 (cc-by-nc-sa-4.0)

	## Contact

	For questions or issues, please open an issue on the model repository.