trd92
/

code-reranker-minilm-v1

Text Classification

Model card Files Files and versions

code-reranker-minilm-v1 / README.md

trd92's picture

Upload README.md with huggingface_hub

f0be504 verified 8 days ago

|

history blame contribute delete

1.46 kB

	---
	license: apache-2.0
	tags:
	- cross-encoder
	- reranker
	- code
	- retrieval
	- typescript
	- python
	pipeline_tag: text-classification
	---

	# Code Reranker MiniLM v1

	A fine-tuned cross-encoder reranker for code relevance ranking, trained on TypeScript/Python codebases.

	## Model Details

	- Base model: [cross-encoder/ms-marco-MiniLM-L-6-v2](https://huggingface.co/cross-encoder/ms-marco-MiniLM-L-6-v2)
	- Training data: 19,810 (query, code snippet) pairs from a real-world TypeScript/Python codebase
	- Training: 2 epochs, batch size 8, lr 2e-5
	- Hardware: GTX 1660 SUPER (6GB VRAM), CPU training

	## Benchmarks

	\| Metric \| Original \| Fine-tuned \|
	\|--------\|----------\|------------\|
	\| Accuracy \| 88.4% \| 98.0% \|
	\| Best Accuracy \| 96.2% \| 99.0% \|
	\| AUC \| 98.64% \| 99.95% \|

	## Usage

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	import torch

	tokenizer = AutoTokenizer.from_pretrained("trd92/code-reranker-minilm-v1")
	model = AutoModelForSequenceClassification.from_pretrained("trd92/code-reranker-minilm-v1")

	query = "How to define configuration?"
	code = "def define_config(): ..."

	inputs = tokenizer(query, code, return_tensors="pt")
	with torch.no_grad():
	outputs = model(**inputs)
	score = outputs.logits.item()
	print(f"Relevance score: {score:.4f}")
	```

	## Training

	Trained using contrastive learning on code triplets (query, positive_code, negative_code) extracted via AST parsing.