How to use from the
Use from the
sentence-transformers library
from sentence_transformers import CrossEncoder

model = CrossEncoder("jamie8johnson/code-reranker-v1")

query = "Which planet is known as the Red Planet?"
passages = [
	"Venus is often called Earth's twin because of its similar size and proximity.",
	"Mars, known for its reddish appearance, is often referred to as the Red Planet.",
	"Jupiter, the largest planet in our solar system, has a prominent red spot.",
	"Saturn, famous for its rings, is sometimes mistaken for the Red Planet."
]

scores = model.predict([(query, passage) for passage in passages])
print(scores)

code-reranker-v1

A cross-encoder reranker for code search, trained on CodeSearchNet pairs. Experimental — does not improve retrieval in our benchmarks. Published for reproducibility.

Status: Negative Result

This reranker regresses retrieval quality on our hard eval (55 confusable function pairs):

Config Recall@1 Delta
No reranker 90.9% —
Web-trained cross-encoder 80.0% -10.9pp
This model (code-trained) 9.1% -81.8pp

Root cause: Trained with random same-language negatives, which are too easy for cross-encoders. The model learns surface-level language patterns instead of semantic code discrimination. A V2 with BM25 hard negatives may fix this.

Training

  • Architecture: Cross-encoder (BERT-base)
  • Data: 50,000 CodeSearchNet pairs + 7,500 docstring pairs
  • Epochs: 3
  • Negatives: Random same-language (this was the mistake)

Usage (if you want to experiment)

# In cqs — NOT default, opt-in only
CQS_RERANKER_MODEL=jamie8johnson/code-reranker-v1 cqs "query" --rerank

License

Apache 2.0.

Downloads last month
6
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train jamie8johnson/code-reranker-v1