Project Page arXiv Dataset Code

CoREB-Reranker

CoREB-Reranker is a code reranker fine-tuned from Qwen3-Reranker-4B via LoRA on a mixed reranker corpus. It is the only reranker we evaluate that achieves consistent gains across all three code search tasks (text-to-code, code-to-text, and code-to-code).

Highlights

  • Fine-tuned from Qwen3-Reranker-4B using LoRA (rank=16, alpha=16) on 3.1M training samples from a mixed corpus
  • Evaluated on CoREB v202603 (problem-disjoint from training set, no data leakage)
  • Achieves positive reranking delta on all three tasks, unlike all off-the-shelf rerankers tested

Reranking Results (nDCG@10 Delta %)

Reranking delta on CoREB v202603, using C2LLM-7B as the first-stage retriever:

Reranker Text-to-Code Code-to-Text Code-to-Code
Jina Reranker v2 -8.3 -22.4 -8.8
Jina Reranker v3 -2.2 -5.0 -0.1
Qwen3-Reranker-0.6B -0.6 -8.2 -2.3
Qwen3-Reranker-4B -0.1 -3.2 +3.3
CoREB-Reranker (ours) +1.1 +0.8 +5.1

Training Details

  • Base model: Qwen/Qwen3-Reranker-4B
  • Method: LoRA (rank=16, alpha=16, dropout=0.05)
  • Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
  • Training data: A mixed reranker corpus consisting of CoREB v202602, CodeSearchNet (code-to-code, code-to-text, text-to-code), APPS, CosQA, and CodeFeedback (single-turn and multi-turn). Each record is normalized into binary reranking examples (instruction, query, document, yes/no). Positives are duplicated twice; one easy negative and one hard negative are sampled per record.
  • Evaluation data: CoREB v202603 (problem-disjoint from CoREB v202602 training split; covers a different contest time window)
  • Training samples: ~3.1M binary reranking examples across text-to-code, code-to-text, and code-to-code tasks
  • Top-k retrieval for reranking: 128

Usage

CoREB-Reranker follows the same usage pattern as Qwen3-Reranker. The instruction is task-specific — use the appropriate one for your retrieval task:

from enum import Enum
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

class Task(Enum):
    TEXT_TO_CODE = "Given a natural language programming task, retrieve code that correctly solves or implements the task."
    CODE_TO_CODE = "Given a code snippet, retrieve code that is semantically equivalent or solves the same task."
    CODE_TO_TEXT = "Given a code snippet, retrieve the natural language description or problem statement that best matches the code."

model_id = "hq-bench/coreb-code-reranker"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, trust_remote_code=True)
model.eval()

PREFIX = '<|im_start|>system\nJudge whether the Document meets the requirements based on the Query and the Instruct provided. Note that the answer can only be "yes" or "no".<|im_end|>\n<|im_start|>user\n'
SUFFIX = "<|im_end|>\n<|im_start|>assistant\n"
yes_id = tokenizer.convert_tokens_to_ids("yes")
no_id = tokenizer.convert_tokens_to_ids("no")

def score(query: str, document: str, task: Task) -> float:
    prompt = f"{PREFIX}<Instruct>: {task.value}\n<Query>: {query}\n<Document>: {document}{SUFFIX}"
    inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=4096)
    with torch.no_grad():
        logits = model(**inputs).logits[0, -1, :]
    return (logits[yes_id] - logits[no_id]).item()

# Text-to-Code: natural language query -> code
print(score(
    query="binary search implementation",
    document="def binary_search(arr, target):\n    lo, hi = 0, len(arr) - 1\n    ...",
    task=Task.TEXT_TO_CODE,
))

# Code-to-Code: code -> semantically equivalent code
print(score(
    query="def binary_search(arr, target): ...",
    document="int binarySearch(int[] arr, int target) { ... }",
    task=Task.CODE_TO_CODE,
))

# Code-to-Text: code -> problem description
print(score(
    query="def binary_search(arr, target): ...",
    document="Find the index of a target value in a sorted array using binary search.",
    task=Task.CODE_TO_TEXT,
))

For batch reranking with the CoREB evaluation pipeline, see the CoREB repository.

Citation

@article{xue2026coreb,
  title={Beyond Retrieval: A Multitask Benchmark and Reranker for Code Search},
  author={Xue, Siqiao and Liao, Zihan and Qin, Jin and Zhang, Ziyin and Mu, Yixiang and Zhou, Fan and Yu, Hang},
  journal={arXiv preprint arXiv:2605.04615},
  year={2026},
  url={https://arxiv.org/abs/2605.04615}
}
Downloads last month
-
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for hq-bench/coreb-code-reranker

Adapter
(1)
this model

Dataset used to train hq-bench/coreb-code-reranker

Collection including hq-bench/coreb-code-reranker

Paper for hq-bench/coreb-code-reranker