Text Classification
sentence-transformers
ONNX
English
code
bert
cross-encoder
reranker
code-search
text-embeddings-inference
Instructions to use jamie8johnson/code-reranker-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use jamie8johnson/code-reranker-v1 with sentence-transformers:
from sentence_transformers import CrossEncoder model = CrossEncoder("jamie8johnson/code-reranker-v1") query = "Which planet is known as the Red Planet?" passages = [ "Venus is often called Earth's twin because of its similar size and proximity.", "Mars, known for its reddish appearance, is often referred to as the Red Planet.", "Jupiter, the largest planet in our solar system, has a prominent red spot.", "Saturn, famous for its rings, is sometimes mistaken for the Red Planet." ] scores = model.predict([(query, passage) for passage in passages]) print(scores) - Notebooks
- Google Colab
- Kaggle
File size: 1,348 Bytes
37958cc | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 | ---
language:
- en
- code
library_name: sentence-transformers
pipeline_tag: text-classification
tags:
- cross-encoder
- reranker
- code-search
- onnx
- bert
datasets:
- code_search_net
license: apache-2.0
---
# code-reranker-v1
A cross-encoder reranker for code search, trained on CodeSearchNet pairs. **Experimental — does not improve retrieval in our benchmarks.** Published for reproducibility.
## Status: Negative Result
This reranker **regresses** retrieval quality on our hard eval (55 confusable function pairs):
| Config | Recall@1 | Delta |
|--------|----------|-------|
| No reranker | 90.9% | — |
| Web-trained cross-encoder | 80.0% | **-10.9pp** |
| **This model (code-trained)** | **9.1%** | **-81.8pp** |
**Root cause:** Trained with random same-language negatives, which are too easy for cross-encoders. The model learns surface-level language patterns instead of semantic code discrimination. A V2 with BM25 hard negatives may fix this.
## Training
- **Architecture:** Cross-encoder (BERT-base)
- **Data:** 50,000 CodeSearchNet pairs + 7,500 docstring pairs
- **Epochs:** 3
- **Negatives:** Random same-language (this was the mistake)
## Usage (if you want to experiment)
```bash
# In cqs — NOT default, opt-in only
CQS_RERANKER_MODEL=jamie8johnson/code-reranker-v1 cqs "query" --rerank
```
## License
Apache 2.0.
|