Text Classification
sentence-transformers
ONNX
English
code
bert
cross-encoder
reranker
code-search
text-embeddings-inference
Instructions to use jamie8johnson/code-reranker-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use jamie8johnson/code-reranker-v1 with sentence-transformers:
from sentence_transformers import CrossEncoder model = CrossEncoder("jamie8johnson/code-reranker-v1") query = "Which planet is known as the Red Planet?" passages = [ "Venus is often called Earth's twin because of its similar size and proximity.", "Mars, known for its reddish appearance, is often referred to as the Red Planet.", "Jupiter, the largest planet in our solar system, has a prominent red spot.", "Saturn, famous for its rings, is sometimes mistaken for the Red Planet." ] scores = model.predict([(query, passage) for passage in passages]) print(scores) - Notebooks
- Google Colab
- Kaggle
Add model card documenting negative result
Browse files
README.md
ADDED
|
@@ -0,0 +1,50 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language:
|
| 3 |
+
- en
|
| 4 |
+
- code
|
| 5 |
+
library_name: sentence-transformers
|
| 6 |
+
pipeline_tag: text-classification
|
| 7 |
+
tags:
|
| 8 |
+
- cross-encoder
|
| 9 |
+
- reranker
|
| 10 |
+
- code-search
|
| 11 |
+
- onnx
|
| 12 |
+
- bert
|
| 13 |
+
datasets:
|
| 14 |
+
- code_search_net
|
| 15 |
+
license: apache-2.0
|
| 16 |
+
---
|
| 17 |
+
|
| 18 |
+
# code-reranker-v1
|
| 19 |
+
|
| 20 |
+
A cross-encoder reranker for code search, trained on CodeSearchNet pairs. **Experimental — does not improve retrieval in our benchmarks.** Published for reproducibility.
|
| 21 |
+
|
| 22 |
+
## Status: Negative Result
|
| 23 |
+
|
| 24 |
+
This reranker **regresses** retrieval quality on our hard eval (55 confusable function pairs):
|
| 25 |
+
|
| 26 |
+
| Config | Recall@1 | Delta |
|
| 27 |
+
|--------|----------|-------|
|
| 28 |
+
| No reranker | 90.9% | — |
|
| 29 |
+
| Web-trained cross-encoder | 80.0% | **-10.9pp** |
|
| 30 |
+
| **This model (code-trained)** | **9.1%** | **-81.8pp** |
|
| 31 |
+
|
| 32 |
+
**Root cause:** Trained with random same-language negatives, which are too easy for cross-encoders. The model learns surface-level language patterns instead of semantic code discrimination. A V2 with BM25 hard negatives may fix this.
|
| 33 |
+
|
| 34 |
+
## Training
|
| 35 |
+
|
| 36 |
+
- **Architecture:** Cross-encoder (BERT-base)
|
| 37 |
+
- **Data:** 50,000 CodeSearchNet pairs + 7,500 docstring pairs
|
| 38 |
+
- **Epochs:** 3
|
| 39 |
+
- **Negatives:** Random same-language (this was the mistake)
|
| 40 |
+
|
| 41 |
+
## Usage (if you want to experiment)
|
| 42 |
+
|
| 43 |
+
```bash
|
| 44 |
+
# In cqs — NOT default, opt-in only
|
| 45 |
+
CQS_RERANKER_MODEL=jamie8johnson/code-reranker-v1 cqs "query" --rerank
|
| 46 |
+
```
|
| 47 |
+
|
| 48 |
+
## License
|
| 49 |
+
|
| 50 |
+
Apache 2.0.
|