--- language: - en - code library_name: sentence-transformers pipeline_tag: text-classification tags: - cross-encoder - reranker - code-search - onnx - bert datasets: - code_search_net license: apache-2.0 --- # code-reranker-v1 A cross-encoder reranker for code search, trained on CodeSearchNet pairs. **Experimental — does not improve retrieval in our benchmarks.** Published for reproducibility. ## Status: Negative Result This reranker **regresses** retrieval quality on our hard eval (55 confusable function pairs): | Config | Recall@1 | Delta | |--------|----------|-------| | No reranker | 90.9% | — | | Web-trained cross-encoder | 80.0% | **-10.9pp** | | **This model (code-trained)** | **9.1%** | **-81.8pp** | **Root cause:** Trained with random same-language negatives, which are too easy for cross-encoders. The model learns surface-level language patterns instead of semantic code discrimination. A V2 with BM25 hard negatives may fix this. ## Training - **Architecture:** Cross-encoder (BERT-base) - **Data:** 50,000 CodeSearchNet pairs + 7,500 docstring pairs - **Epochs:** 3 - **Negatives:** Random same-language (this was the mistake) ## Usage (if you want to experiment) ```bash # In cqs — NOT default, opt-in only CQS_RERANKER_MODEL=jamie8johnson/code-reranker-v1 cqs "query" --rerank ``` ## License Apache 2.0.