| --- |
| language: |
| - en |
| - code |
| library_name: sentence-transformers |
| pipeline_tag: text-classification |
| tags: |
| - cross-encoder |
| - reranker |
| - code-search |
| - onnx |
| - bert |
| datasets: |
| - code_search_net |
| license: apache-2.0 |
| --- |
| |
| # code-reranker-v1 |
|
|
| A cross-encoder reranker for code search, trained on CodeSearchNet pairs. **Experimental — does not improve retrieval in our benchmarks.** Published for reproducibility. |
|
|
| ## Status: Negative Result |
|
|
| This reranker **regresses** retrieval quality on our hard eval (55 confusable function pairs): |
|
|
| | Config | Recall@1 | Delta | |
| |--------|----------|-------| |
| | No reranker | 90.9% | — | |
| | Web-trained cross-encoder | 80.0% | **-10.9pp** | |
| | **This model (code-trained)** | **9.1%** | **-81.8pp** | |
|
|
| **Root cause:** Trained with random same-language negatives, which are too easy for cross-encoders. The model learns surface-level language patterns instead of semantic code discrimination. A V2 with BM25 hard negatives may fix this. |
|
|
| ## Training |
|
|
| - **Architecture:** Cross-encoder (BERT-base) |
| - **Data:** 50,000 CodeSearchNet pairs + 7,500 docstring pairs |
| - **Epochs:** 3 |
| - **Negatives:** Random same-language (this was the mistake) |
|
|
| ## Usage (if you want to experiment) |
|
|
| ```bash |
| # In cqs — NOT default, opt-in only |
| CQS_RERANKER_MODEL=jamie8johnson/code-reranker-v1 cqs "query" --rerank |
| ``` |
|
|
| ## License |
|
|
| Apache 2.0. |
|
|