| --- |
| library_name: transformers |
| license: cc-by-nc-sa-4.0 |
| pipeline_tag: text-ranking |
| --- |
| |
| <div align="center"> |
|
|
| # Contextual AI Reranker v2 2B-NVFP4 |
|
|
| <img src="Contextual_AI_Brand_Mark_Dark.png" width="10%" alt="Contextual_AI"/> |
|
|
| [](https://contextual.ai/blog/rerank-v2) |
| [](https://huggingface.co/collections/ContextualAI/contextual-ai-reranker-v2) |
|
|
| </div> |
|
|
| <hr> |
|
|
| ## Highlights |
|
|
| Contextual AI's reranker is the **first instruction-following reranker** capable of handling retrieval conflicts and ranking with custom instructions (e.g., prioritizing recent information). It achieves state-of-the-art performance on BEIR and sits on the cost/performance Pareto frontier across: |
|
|
| - Instruction following |
| - Question answering |
| - Multilinguality (100+ languages) |
| - Product search & recommendation |
| - Real-world use cases |
|
|
| <p align="center"> |
| <img src="main_benchmark.png" width="1200"/> |
| <p> |
| |
| For detailed benchmarks, see our [blog post](https://contextual.ai/blog/rerank-v2). |
|
|
| ## Overview |
|
|
| - **Model Type**: Text Reranking |
| - **Supported Languages**: 100+ |
| - **Parameters**: 2B |
| - **Precision**: NVFP4 (4-bit floating point) |
| - **Context Length**: up to 32K |
|
|
| ## When to Use This Model |
|
|
| Use this reranker when you need to: |
| - Re-rank retrieved documents with custom instructions |
| - Handle conflicting information in retrieval results |
| - Prioritize documents by recency or other criteria |
| - Support multilingual search (100+ languages) |
| - Process long contexts (up to 32K tokens) |
| - **Maximize efficiency with 4-bit precision (NVFP4)** |
|
|
| ## Quickstart |
|
|
| ### Basic Usage |
|
|
| ```python |
| # Requires vLLM==0.10.0 for NVFP4 support |
| # See full implementation below |
| |
| model_path = "ContextualAI/ctxl-rerank-v2-instruct-multilingual-2b-nvfp4" |
| |
| query = "What are the health benefits of exercise?" |
| instruction = "Prioritize recent medical research" |
| documents = [ |
| "Regular exercise reduces risk of heart disease and improves mental health.", |
| "A 2024 study shows exercise enhances cognitive function in older adults.", |
| "Ancient Greeks valued physical fitness for military training." |
| ] |
| |
| infer_w_vllm(model_path, query, instruction, documents) |
| ``` |
|
|
| **Expected Output:** |
| > ⚠️ **Warning:** These scores are produced using the **BF16** model. If you run the same query with **NVFP4**, the scores may be slightly different. |
| ``` |
| Query: What are the health benefits of exercise? |
| Instruction: Prioritize recent medical research |
| Score: 0.8398 | Doc: A 2024 study shows exercise enhances cognitive function in older adults. |
| Score: -2.5469 | Doc: Regular exercise reduces risk of heart disease and improves mental health. |
| Score: -9.3750 | Doc: Ancient Greeks valued physical fitness for military training. |
| ``` |
|
|
| ### vLLM Usage |
|
|
| Requires `vllm==0.10.0` for NVFP4 support. |
|
|
| ```python |
| import os |
| os.environ['VLLM_USE_V1'] = '0' # v1 engine doesn't support logits processor yet |
| |
| import torch |
| from vllm import LLM, SamplingParams |
| |
| |
| def logits_processor(_, scores): |
| """Custom logits processor for vLLM reranking.""" |
| index = scores[0].view(torch.uint16) |
| scores = torch.full_like(scores, float("-inf")) |
| scores[index] = 1 |
| return scores |
| |
| |
| def format_prompts(query: str, instruction: str, documents: list[str]) -> list[str]: |
| """Format query and documents into prompts for reranking.""" |
| if instruction: |
| instruction = f" {instruction}" |
| prompts = [] |
| for doc in documents: |
| prompt = f"Check whether a given document contains information helpful to answer the query.\n<Document> {doc}\n<Query> {query}{instruction} ??" |
| prompts.append(prompt) |
| return prompts |
| |
| |
| def infer_w_vllm(model_path: str, query: str, instruction: str, documents: list[str]): |
| model = LLM( |
| model=model_path, |
| gpu_memory_utilization=0.85, |
| max_model_len=8192, |
| dtype="bfloat16", |
| max_logprobs=2, |
| max_num_batched_tokens=262144, |
| ) |
| sampling_params = SamplingParams( |
| temperature=0, |
| max_tokens=1, |
| logits_processors=[logits_processor] |
| ) |
| prompts = format_prompts(query, instruction, documents) |
| |
| outputs = model.generate(prompts, sampling_params, use_tqdm=False) |
| |
| # Extract scores and create results |
| results = [] |
| for i, output in enumerate(outputs): |
| score = ( |
| torch.tensor([output.outputs[0].token_ids[0]], dtype=torch.uint16) |
| .view(torch.bfloat16) |
| .item() |
| ) |
| results.append((score, i, documents[i])) |
| |
| # Sort by score (descending) |
| results = sorted(results, key=lambda x: x[0], reverse=True) |
| |
| print(f"Query: {query}") |
| print(f"Instruction: {instruction}") |
| for score, doc_id, doc in results: |
| print(f"Score: {score:.4f} | Doc: {doc}") |
| |
| |
| # Example usage |
| if __name__ == "__main__": |
| model_path = "ContextualAI/ctxl-rerank-v2-instruct-multilingual-2b-nvfp4" |
| query = "What are the health benefits of exercise?" |
| instruction = "Prioritize recent medical research" |
| documents = [ |
| "Regular exercise reduces risk of heart disease and improves mental health.", |
| "A 2024 study shows exercise enhances cognitive function in older adults.", |
| "Ancient Greeks valued physical fitness for military training." |
| ] |
| |
| infer_w_vllm(model_path, query, instruction, documents) |
| ``` |
|
|
| ## Citation |
|
|
| If you use this model, please cite: |
|
|
| ```bibtex |
| @misc{ctxl_rerank_v2_instruct_multilingual, |
| title={Contextual AI Reranker v2}, |
| author={Halal, George and Agrawal, Sheshansh}, |
| year={2025}, |
| url={https://contextual.ai/blog/rerank-v2}, |
| } |
| ``` |
|
|
| ## License |
|
|
| Creative Commons Attribution Non Commercial Share Alike 4.0 (cc-by-nc-sa-4.0) |
|
|
| ## Contact |
|
|
| For questions or issues, please open an issue on the model repository. |