code-reranker-v1 / README.md
jamie8johnson's picture
Add model card documenting negative result
37958cc verified
metadata
language:
  - en
  - code
library_name: sentence-transformers
pipeline_tag: text-classification
tags:
  - cross-encoder
  - reranker
  - code-search
  - onnx
  - bert
datasets:
  - code_search_net
license: apache-2.0

code-reranker-v1

A cross-encoder reranker for code search, trained on CodeSearchNet pairs. Experimental — does not improve retrieval in our benchmarks. Published for reproducibility.

Status: Negative Result

This reranker regresses retrieval quality on our hard eval (55 confusable function pairs):

Config Recall@1 Delta
No reranker 90.9%
Web-trained cross-encoder 80.0% -10.9pp
This model (code-trained) 9.1% -81.8pp

Root cause: Trained with random same-language negatives, which are too easy for cross-encoders. The model learns surface-level language patterns instead of semantic code discrimination. A V2 with BM25 hard negatives may fix this.

Training

  • Architecture: Cross-encoder (BERT-base)
  • Data: 50,000 CodeSearchNet pairs + 7,500 docstring pairs
  • Epochs: 3
  • Negatives: Random same-language (this was the mistake)

Usage (if you want to experiment)

# In cqs — NOT default, opt-in only
CQS_RERANKER_MODEL=jamie8johnson/code-reranker-v1 cqs "query" --rerank

License

Apache 2.0.