Shuu12121/CodeSearch-ModernBERT-Crow-Plus๐ฆโโฌ
ใใฎใขใใซใฏใShuu12121/CodeModernBERT-Crow ใใใผในใซใใ Sentence Transformer ใขใใซใงใใใ็นใซๅค่จ่ชใณใผใๆค็ดขใฟในใฏใซใใใฆ้ซใๆง่ฝใ็บๆฎใใใใใซใใกใคใณใใฅใผใใณใฐใใใฆใใพใใ
This is a Sentence Transformer model based on Shuu12121/CodeModernBERT-Crow, fine-tuned for high performance on multilingual code search tasks.
- ้็บ่ (Developer): Shuu12121
- ใใผในใขใใซ (Base Model): Shuu12121/CodeModernBERT-Crow
- License: Apache-2.0
๐ Google Colab ไธใงไปใใ่ฉฆใ
ใใฎใขใใซใไฝฟใฃใใGitHubใชใใธใใชใฎ้ขๆฐใฌใใซใณใผใๆค็ดขใทในใใ ใ็ฐกๅใซ่ฉฆใใใจใใงใใพใ๏ผ
๐ MTEB Leaderboard ๆ็ธพ
ๆฌใขใใซ CodeSearch-ModernBERT-Crow-Plus ใฏใMassive Text Embedding Benchmark (MTEB) ใซใใใไปฅไธใฎใฟในใฏใง้ซใ้ ไฝใ่จ้ฒใใฆใใพใ๏ผ
็พๅจใฎ้ ไฝใฏLeaderboardใฏใใกใใงใ็ขบ่ชใใฆใใ ใใ
| ใฟในใฏๅ | nDCG@10 ในใณใข | ้ ไฝ๏ผ2025ๅนด4ๆๆ็น๏ผ |
|---|---|---|
| CodeSearchNetRetrieval | 0.89296 | ็ฌฌ8ไฝ / 146 ใขใใซไธญ |
| COIRCodeSearchNetRetrieval | 0.79884 | ็ฌฌ5ไฝ / 15 ใขใใซไธญ |
ใใใใฎ็ตๆใฏใๆฌใขใใซใใณใผใๆค็ดขใฟในใฏใซใใใฆ้ๅธธใซ็ซถไบๅใฎใใๆง่ฝใ็บๆฎใใฆใใใใจใ็คบใใฆใใพใใ ็นใซใๅค่จ่ชใป่ช็ถ่จ่ชโใณใผใ้ใฎๆค็ดข็ฒพๅบฆใซๅชใใ Sentence Transformer ใขใใซใจใใฆใๅฎ็จ็ใช้ธๆ่ขใฎไธใคใงใใ
๐งฉ ้ข้ฃใใญใธใงใฏใใจใฎ้ฃๆบ / Integration with Related Projects
CodeSearch-ModernBERT-Crow-Plus ใฏใCodeSearchCrow.ipynb ใฎใใใซ ๅฎ้ใฎGitHubใชใใธใใชใๅฏพ่ฑกใจใใใ้ขๆฐๅไฝใฎใณใผใๆค็ดขใทในใใ ใ็ฐกๅใซๆง็ฏใงใใพใใ
ใใฎใใผใใใใฏใงใฏไปฅไธใฎๅฆ็ใๅฎ่กใใใพใ๏ผ
- GitHubใชใใธใใชใๆๅฎใใฆใฏใญใผใณ
.pyใพใใฏ.ipynbใใกใคใซใใ้ขๆฐใปใณใผใใปใซใๆฝๅบ- ้ขๆฐใณใผใใใจใณใใใฃใณใฐ๏ผSentence Transformerใขใใซใไฝฟ็จ๏ผ
- FAISSใคใณใใใฏในใไฝๆใใฆ้ซ้ๆค็ดขใๅฏ่ฝใซ
- Qwen3-8B-FP8ใขใใซใซใใๆฅๆฌ่ชโ่ฑ่ช็ฟป่จณใ้ใใฆใๆฅๆฌ่ชใฏใจใชใงใ่ช็ถใชๆค็ดขใๅฎ็พ
๐น ็นๅพด
- ๅๅๅฎ่กๆใซใคใณใใใฏในใไฝๆใใไปฅ้ใฏ้ซ้ใซๅๅฉ็จๅฏ่ฝ
- ้ขๆฐใฌใใซใงใณใผใใๆค็ดขใงใใใใใๆๅณ็ใซๆใ้กไผผใใใณใผใใ้ซ็ฒพๅบฆใซๆค็ดขๅฏ่ฝ
- ๆฅๆฌ่ชใฏใจใชใซใๅฎๅ จๅฏพๅฟ๏ผQwen3-8B-FP8ใง่ฑ่ช็ฟป่จณๅพใซๆค็ดข๏ผ
๐ ใชใณใฏ
- ๐ ๅฎ่กๅฏ่ฝใชใใผใใใใฏ๏ผ.ipynb
ๆฆ่ฆ / Overview
CodeSearch-ModernBERT-Crow-Plus ใฏใ่ช็ถ่จ่ชใฎใฏใจใชใจ่คๆฐใฎใใญใฐใฉใใณใฐ่จ่ช๏ผPython, Java, JavaScript, PHP, Ruby, Go, Rust๏ผใฎใณใผใในใใใใ๏ผไธปใซ้ขๆฐใฌใใซ๏ผ้ใฎๆๅณ็ใช้กไผผๆงใๆใใใใใซ่จญ่จใใใ Sentence Transformer ใขใใซใงใใใใผในใขใใซใงใใ CodeModernBERT-Crow ใฎๅผทๅใชใณใผใ็่งฃ่ฝๅใ็ถๆฟใใใณใผใๆค็ดขใ้กไผผๆงๅคๅฎใฟในใฏใซๆ้ฉๅใใใฆใใพใใ
CodeSearch-ModernBERT-Crow-Plus is a Sentence Transformer model designed to capture the semantic similarity between natural language queries and code snippets (primarily at the function level) across multiple programming languages (Python, Java, JavaScript, PHP, Ruby, Go, Rust). It inherits the strong code understanding capabilities of its base model, CodeModernBERT-Crow, and is optimized for code search and similarity tasks.
ใขใใซ่ฉณ็ดฐ / Model Details
- ใใผในใขใใซ / Base Model: Shuu12121/CodeModernBERT-Crow
- ใขใผใญใใฏใใฃ / Architecture: ModernBERT (hidden_size: 768, layers: 12, heads: 12)
- ๆๅคงๅ ฅๅ้ท / Max Sequence Length: 1024 ใใผใฏใณ
- ใใกใคใณใใฅใผใใณใฐ / Fine-tuning: ใใฎใขใใซใฏใใณใผใใจใใฎๅฏพๅฟใใใใญใฅใกใณใ๏ผไพ๏ผCodeSearchNet ใใผใฟใปใใ๏ผใ็จใใ้กไผผๆงๅญฆ็ฟใฟในใฏใงใใกใคใณใใฅใผใใณใฐใใใฆใใใจ่ใใใใพใใSentence Transformer ใฉใคใใฉใชใงไฝฟ็จใใใใใซใPooling ๅฑคใ่ฟฝๅ ใใใฆใใพใใ
ไฝฟ็จๆนๆณ / How to Use
sentence-transformers ใฉใคใใฉใชใไฝฟใฃใฆ็ฐกๅใซๅฉ็จใงใใพใใ
You can easily use this model with the sentence-transformers library.
from sentence_transformers import SentenceTransformer
import torch
# ใขใใซใฎใญใผใ / Load the model
model = SentenceTransformer("Shuu12121/CodeSearch-ModernBERT-Crow-Plus")
# ใจใณใณใผใใใใใใญในใ๏ผใณใผใใพใใฏ่ช็ถ่จ่ช๏ผ / Texts to encode (code or natural language)
code_snippets = [
"def factorial(n): if n == 0: return 1 else: return n * factorial(n-1)",
"function binarySearch(arr, target) { let left = 0, right = arr.length - 1; while (left <= right) { const mid = Math.floor((left + right) / 2); if (arr[mid] === target) return mid; if (arr[mid] < target) left = mid + 1; else right = mid - 1; } return -1; }"
]
natural_language_queries = [
"calculate the factorial of a number recursively",
"find an element in a sorted array using binary search"
]
# ใจใณใใใฃใณใฐใฎๅๅพ / Get embeddings
code_embeddings = model.encode(code_snippets)
query_embeddings = model.encode(natural_language_queries)
print("Code Embeddings Shape:", code_embeddings.shape)
print("Query Embeddings Shape:", query_embeddings.shape)
# ้กไผผๅบฆใฎ่จ็ฎ๏ผไพ๏ผใณใตใคใณ้กไผผๅบฆ๏ผ / Calculate similarity (e.g., cosine similarity)
# Requires a similarity function, e.g., from sentence_transformers.util or sklearn.metrics.pairwise
# from sentence_transformers.util import cos_sim
# similarities = cos_sim(query_embeddings, code_embeddings)
# print(similarities)
่ฉไพก / Evaluation
ใใฎใขใใซใฏ MTEB (Massive Text Embedding Benchmark) ใง่ฉไพกใใใฆใใพใใ
This model has been evaluated on the MTEB (Massive Text Embedding Benchmark).
ใฟในใฏ: CodeSearchNet Retrieval
MTEB ๆจๆบ่ฉไพก (main_score: nDCG@10): 0.89296
ndcg_at_1: 0.8135ndcg_at_3: 0.8781ndcg_at_5: 0.8868ndcg_at_10: 0.8930ndcg_at_20: 0.8947ndcg_at_100: 0.8971ndcg_at_1000: 0.8995map_at_10: 0.8705recall_at_10: 0.9610mrr_at_10: 0.8705
COIR็ใงใฎ่ฉไพก (main_score: nDCG@10): 0.79884
ndcg_at_1: 0.7152ndcg_at_3: 0.7762ndcg_at_5: 0.7885ndcg_at_10: 0.7988ndcg_at_20: 0.8056ndcg_at_100: 0.8134ndcg_at_1000: 0.8172map_at_10: 0.7729recall_at_10: 0.8794mrr_at_10: 0.7729
ๆณจ: ่ฉไพก่จญๅฎใฎ้ใใซใใใๅใ CodeSearchNet Retrieval ใฟในใฏใงใในใณใขใ็ฐใชใใพใใ Note: Scores differ for the same CodeSearchNet Retrieval task due to different evaluation settings.
ๅ่ใจใใฆใใใผในใขใใซ Shuu12121/CodeModernBERT-Crow ใฎ CodeSearchNet Test Split ใซใใใ MRR@100 ในใณใขใฏไปฅไธใฎ้ใใงใ๏ผๅบๅฎ่ฉไพกในใฏใชใใไฝฟ็จ๏ผใ
For reference, the MRR@100 scores for the base model Shuu12121/CodeModernBERT-Crow on the CodeSearchNet Test Split (using a fixed evaluation script) are:
| ่จ่ช / Language | Python | Java | JavaScript | PHP | Ruby | Go |
|---|---|---|---|---|---|---|
| MRR@100 | 0.9372 | 0.8642 | 0.8118 | 0.8388 | 0.8392 | 0.8522 |
ๆณๅฎใใฆใใ็จ้ใจๅถ้ / Intended Use & Limitations
- ๆณๅฎใใฆใใ็จ้ / Intended Use:
- ๅค่จ่ชใณใผใๆค็ดข (Natural Language to Code, Code to Code)
- ใณใผใใฎ้กไผผๆงๅคๅฎ
- ใณใผใๅ้กใใฏใฉในใฟใชใณใฐใฎใใใฎ็นๅพดๆฝๅบ
- ใณใผใๆจ่ฆใทในใใ
- ๅฏพ่ฑก่จ่ช / Target Languages: Python, Java, JavaScript, PHP, Ruby, Go, Rust
- ๅถ้ / Limitations:
- ไธปใซ้ขๆฐใฌใใซใฎใณใผใในใใใใใซๆ้ฉๅใใใฆใใพใใ้ๅธธใซ้ทใใณใผใใใกใคใซๅ จไฝใใๆงๆ็ใซไธๅฎๅ จใชใณใผใใซๅฏพใใๆง่ฝใฏไฝไธใใๅฏ่ฝๆงใใใใพใใ
- ็นๅฎใฎใใกใคใณใใฉใคใใฉใชใซ็นๅใใใฟในใฏใงใฏใ่ฟฝๅ ใฎใใกใคใณใใฅใผใใณใฐใๆๅนใชๅ ดๅใใใใพใใ
- ็ๆใฟในใฏใซใฏ้ฉใใฆใใพใใ๏ผใใใฏใจใณใณใผใใขใใซใงใ๏ผใ
Note:
This model was evaluated on MTEB with commit hash 044a7a4b552f86e284817234c336bccf16f895ce.
The current README may have been updated since that version, but the model weights remain unchanged.
้ฃ็ตกๅ / Contact
่ณชๅใๆๆกใซใคใใฆใฏใ้็บ่ Shuu12121 ใพใงใ้ฃ็ตกใใ ใใใ For questions or suggestions, please contact the developer Shuu12121.
- Downloads last month
- 41
Model tree for Shuu12121/CodeSearch-ModernBERT-Crow-Plus
Base model
Shuu12121/CodeModernBERT-Crow