Code Reranker MiniLM v1
A fine-tuned cross-encoder reranker for code relevance ranking, trained on TypeScript/Python codebases.
Model Details
- Base model: cross-encoder/ms-marco-MiniLM-L-6-v2
- Training data: 19,810 (query, code snippet) pairs from a real-world TypeScript/Python codebase
- Training: 2 epochs, batch size 8, lr 2e-5
- Hardware: GTX 1660 SUPER (6GB VRAM), CPU training
Benchmarks
| Metric | Original | Fine-tuned |
|---|---|---|
| Accuracy | 88.4% | 98.0% |
| Best Accuracy | 96.2% | 99.0% |
| AUC | 98.64% | 99.95% |
Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
tokenizer = AutoTokenizer.from_pretrained("trd92/code-reranker-minilm-v1")
model = AutoModelForSequenceClassification.from_pretrained("trd92/code-reranker-minilm-v1")
query = "How to define configuration?"
code = "def define_config(): ..."
inputs = tokenizer(query, code, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
score = outputs.logits.item()
print(f"Relevance score: {score:.4f}")
Training
Trained using contrastive learning on code triplets (query, positive_code, negative_code) extracted via AST parsing.
- Downloads last month
- 24