Code Reranker MiniLM v1

A fine-tuned cross-encoder reranker for code relevance ranking, trained on TypeScript/Python codebases.

Model Details

  • Base model: cross-encoder/ms-marco-MiniLM-L-6-v2
  • Training data: 19,810 (query, code snippet) pairs from a real-world TypeScript/Python codebase
  • Training: 2 epochs, batch size 8, lr 2e-5
  • Hardware: GTX 1660 SUPER (6GB VRAM), CPU training

Benchmarks

Metric Original Fine-tuned
Accuracy 88.4% 98.0%
Best Accuracy 96.2% 99.0%
AUC 98.64% 99.95%

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("trd92/code-reranker-minilm-v1")
model = AutoModelForSequenceClassification.from_pretrained("trd92/code-reranker-minilm-v1")

query = "How to define configuration?"
code = "def define_config(): ..."

inputs = tokenizer(query, code, return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)
    score = outputs.logits.item()
print(f"Relevance score: {score:.4f}")

Training

Trained using contrastive learning on code triplets (query, positive_code, negative_code) extracted via AST parsing.

Downloads last month
24
Safetensors
Model size
22.7M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support