faisalmumtaz
/

codecompass-embed

@@ -21,6 +21,16 @@ base_model: Qwen/Qwen2.5-Coder-0.5B
 model-index:
 - name: CodeCompass-Embed
   results:
   - task:
       type: retrieval
       name: Code Retrieval
@@ -42,7 +52,8 @@ model-index:
 ## Model Highlights
-- 🏆 **SOTA on CodeSearchNet-Python**: NDCG@10 = 0.9228, MRR@10 = 0.9106
 - ⚡ **Efficient**: 494M parameters, runs on consumer GPUs
 - 🔄 **Bidirectional Attention**: Converted from causal to bidirectional for embedding tasks
 - 📏 **Flexible Context**: Trained at 512 tokens, supports up to 32K via RoPE extrapolation
@@ -64,30 +75,46 @@ model-index:
 We evaluate on the [CoIR Benchmark](https://github.com/CoIR-team/coir) (ACL 2025), the gold standard for code retrieval evaluation.
-### Per-Task Results
-| Task | NDCG@10 | MRR@10 | Recall@10 |
-|------|---------|--------|-----------|
-| **codesearchnet-python** | **0.9228** | **0.9106** | 0.9600 |
-| stackoverflow-qa | 0.6480 | 0.6156 | 0.7500 |
-| synthetic-text2sql | 0.5673 | 0.4853 | 0.8220 |
-| codefeedback-st | 0.4080 | 0.3698 | 0.5300 |
-| codetrans-dl | 0.3305 | 0.2161 | 0.7167 |
-| apps | 0.1277 | 0.1097 | 0.1860 |
-| **Average** | **0.5007** | **0.4512** | - |
-### Comparison with SOTA Models
-| Model | Params | Avg NDCG@10 | CodeSearchNet-Python |
-|-------|--------|-------------|---------------------|
-| SFR-Embedding-Code-400M | 400M | 0.6786 | - |
-| CodeRankEmbed | 137M | 0.6303 | - |
-| Jina-Code-v2 | 161M | 0.5789 | - |
-| BGE-M3 | 568M | 0.5547 | - |
-| **CodeCompass-Embed (ours)** | **494M** | **0.5007** | **0.9228** |
-| CodeT5+-110M | 110M | 0.4817 | - |
-> **Note**: CodeCompass achieves state-of-the-art on CodeSearchNet-Python (NL→Code retrieval), which is the primary use case for code search applications.
 ## Usage
@@ -111,8 +138,7 @@ model.eval()
 def encode(texts, is_query=False):
     # Add instruction prefix for queries
     if is_query:
-        texts = [f"Instruct: Find the most relevant code snippet given the following query:
-Query: {t}" for t in texts]
     inputs = tokenizer(texts, padding=True, truncation=True, max_length=512, return_tensors="pt")
@@ -132,12 +158,9 @@ Query: {t}" for t in texts]
 # Example: Code Search
 query = "How to sort a list in Python"
 code_snippets = [
-    "def sort_list(lst):
-    return sorted(lst)",
-    "def add_numbers(a, b):
-    return a + b",
-    "def reverse_string(s):
-    return s[::-1]",
 ]
 query_emb = encode([query], is_query=True)
@@ -156,14 +179,10 @@ For optimal performance, use these instruction prefixes for queries:
 | Task | Instruction Template |
 |------|---------------------|
-| NL → Code | `Instruct: Find the most relevant code snippet given the following query:
-Query: {query}` |
-| Code → Code | `Instruct: Find an equivalent code snippet given the following code snippet:
-Query: {query}` |
-| Tech Q&A | `Instruct: Find the most relevant answer given the following question:
-Query: {query}` |
-| Text → SQL | `Instruct: Given a natural language question and schema, find the corresponding SQL query:
-Query: {query}` |
 **Note**: Document/corpus texts do NOT need instruction prefixes.
@@ -181,7 +200,7 @@ Query: {query}` |
 ## Limitations
-- Optimized for **NL → Code** retrieval; weaker on code translation tasks
 - Trained primarily on Python/JavaScript/Java/Go/PHP/Ruby
 - May not generalize well to low-resource programming languages

 model-index:
 - name: CodeCompass-Embed
   results:
+  - task:
+      type: retrieval
+      name: Code Retrieval
+    dataset:
+      type: CoIR-Retrieval/codetrans-dl
+      name: CodeTrans-DL
+    metrics:
+    - type: ndcg@10
+      value: 0.3305
+      name: NDCG@10
   - task:
       type: retrieval
       name: Code Retrieval
 ## Model Highlights
+- 🏆 **SOTA on CodeTrans-DL**: #1 on code translation benchmark (+20.7% over next best)
+- 🥇 **Top-4 on CodeSearchNet-Python**: NDCG@10 = 0.9228 (competitive with 400M models)
 - ⚡ **Efficient**: 494M parameters, runs on consumer GPUs
 - 🔄 **Bidirectional Attention**: Converted from causal to bidirectional for embedding tasks
 - 📏 **Flexible Context**: Trained at 512 tokens, supports up to 32K via RoPE extrapolation
 We evaluate on the [CoIR Benchmark](https://github.com/CoIR-team/coir) (ACL 2025), the gold standard for code retrieval evaluation.
+### 🏆 CodeTrans-DL — State-of-the-Art
+CodeCompass-Embed achieves **#1** on CodeTrans-DL (code translation between deep learning frameworks), beating all existing models by **+20.7%**.
+| Rank | Model | Params | CodeTrans NDCG@10 |
+|------|-------|--------|-------------------|
+| **🥇 1** | **CodeCompass-Embed (ours)** | **494M** | **0.3305** |
+| 2 | Jina-Code-v2 | 161M | 0.2739 |
+| 3 | SFR-Embedding-Code | 400M | 0.2683 |
+| 4 | CodeRankEmbed | 137M | 0.2604 |
+| 5 | BGE-M3 | 568M | 0.2194 |
+| 6 | BGE-Base-en-v1.5 | 109M | 0.2125 |
+| 7 | Snowflake-Arctic-Embed-L | 568M | 0.1958 |
+| 8 | CodeT5+-110M | 110M | 0.1794 |
+### CodeSearchNet-Python — Top 4
+Strong performance on the primary code search benchmark (NL → Code retrieval).
+| Rank | Model | Params | CSN-Python NDCG@10 |
+|------|-------|--------|-------------------|
+| 1 | SFR-Embedding-Code | 400M | 0.9505 |
+| 2 | Jina-Code-v2 | 161M | 0.9439 |
+| 3 | CodeRankEmbed | 137M | 0.9378 |
+| **4** | **CodeCompass-Embed (ours)** | **494M** | **0.9228** |
+| 5 | Snowflake-Arctic-Embed-L | 568M | 0.9146 |
+| 6 | BGE-M3 | 568M | 0.8976 |
+| 7 | BGE-Base-en-v1.5 | 109M | 0.8944 |
+| 8 | CodeT5+-110M | 110M | 0.8702 |
+### Full Results (All Tasks)
+| Task | NDCG@10 | MRR@10 |
+|------|---------|--------|
+| **codesearchnet-python** | **0.9228** | **0.9106** |
+| stackoverflow-qa | 0.6480 | 0.6156 |
+| synthetic-text2sql | 0.5673 | 0.4853 |
+| codefeedback-st | 0.4080 | 0.3698 |
+| **codetrans-dl** | **0.3305** 🏆 | **0.2161** |
+| apps | 0.1277 | 0.1097 |
 ## Usage
 def encode(texts, is_query=False):
     # Add instruction prefix for queries
     if is_query:
+        texts = [f"Instruct: Find the most relevant code snippet given the following query:\nQuery: {t}" for t in texts]
     inputs = tokenizer(texts, padding=True, truncation=True, max_length=512, return_tensors="pt")
 # Example: Code Search
 query = "How to sort a list in Python"
 code_snippets = [
+    "def sort_list(lst):\n    return sorted(lst)",
+    "def add_numbers(a, b):\n    return a + b",
+    "def reverse_string(s):\n    return s[::-1]",
 ]
 query_emb = encode([query], is_query=True)
 | Task | Instruction Template |
 |------|---------------------|
+| NL → Code | `Instruct: Find the most relevant code snippet given the following query:\nQuery: {query}` |
+| Code → Code | `Instruct: Find an equivalent code snippet given the following code snippet:\nQuery: {query}` |
+| Tech Q&A | `Instruct: Find the most relevant answer given the following question:\nQuery: {query}` |
+| Text → SQL | `Instruct: Given a natural language question and schema, find the corresponding SQL query:\nQuery: {query}` |
 **Note**: Document/corpus texts do NOT need instruction prefixes.
 ## Limitations
+- Optimized for **NL → Code** retrieval; weaker on Q&A style tasks
 - Trained primarily on Python/JavaScript/Java/Go/PHP/Ruby
 - May not generalize well to low-resource programming languages