code-search-net/code_search_net
Viewer • Updated • 4.14M • 23.4k • 332
How to use Matthieufromparis/bge-small-code-search-v1 with sentence-transformers:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("Matthieufromparis/bge-small-code-search-v1")
sentences = [
"That is a happy person",
"That is a happy dog",
"That is a very happy person",
"Today is a sunny day"
]
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]A BGE-small-en-v1.5 model fine-tuned on CodeSearchNet (Python) for semantic code search.
Maps natural language queries and code snippets into the same 384-dimensional vector space. Search your codebase by describing what a function does.
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("Matthieufromparis/bge-small-code-search-v1")
query = "parse JSON config file and return a dictionary"
code_snippets = [...] # your codebase
query_emb = model.encode(query)
code_embs = model.encode(code_snippets)
similarities = model.similarity(query_emb, code_embs)
| Metric | Base (BGE-small) | Fine-Tuned | Improvement |
|---|---|---|---|
| NDCG@10 | 0.9761 | 0.9849 | +0.9% |
| Accuracy@1 | 0.948 | 0.960 | +1.3% |
| MRR@10 | 0.975 | 0.978 | +0.3% |
Evaluated on 500 held-out Python code-comment pairs from CodeSearchNet.
pip install sentence-transformers
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("Matthieufromparis/bge-small-code-search-v1")
query_embedding = model.encode("function that sorts a list of dictionaries by a key")
code_embedding = model.encode("def sort_dicts_by_key(dicts, key): return sorted(dicts, key=lambda x: x.get(key, ''))")
similarity = model.similarity(query_embedding, code_embedding)
print(f"Similarity: {similarity.item():.4f}")
Designed for asymmetric search — queries are natural language, documents are code.
Author: Matthieu.AI (Matthieufromparis) — License: Apache 2.0
Base model
BAAI/bge-small-en-v1.5