faisalmumtaz
/

codecompass-embed

@@ -52,12 +52,12 @@ model-index:
 ## Model Highlights
-- 🏆 **SOTA on CodeTrans-DL**: #1 on code translation benchmark (+20.7% over next best)
-- 🥇 **Top-4 on CodeSearchNet-Python**: NDCG@10 = 0.9228 (competitive with 400M models)
-- ⚡ **Efficient**: 494M parameters, runs on consumer GPUs
-- 🔄 **Bidirectional Attention**: Converted from causal to bidirectional for embedding tasks
-- 📏 **Flexible Context**: Trained at 512 tokens, supports up to 32K via RoPE extrapolation
-- 🎯 **Mean Pooling**: Robust to variable-length inputs
 ## Model Details
@@ -73,48 +73,20 @@ model-index:
 ## Benchmark Results (CoIR)
-We evaluate on the [CoIR Benchmark](https://github.com/CoIR-team/coir) (ACL 2025), the gold standard for code retrieval evaluation.
-### 🏆 CodeTrans-DL — State-of-the-Art
-CodeCompass-Embed achieves **#1** on CodeTrans-DL (code translation between deep learning frameworks), beating all existing models by **+20.7%**.
-| Rank | Model | Params | CodeTrans NDCG@10 |
-|------|-------|--------|-------------------|
-| **🥇 1** | **CodeCompass-Embed (ours)** | **494M** | **0.3305** |
-| 2 | Jina-Code-v2 | 161M | 0.2739 |
-| 3 | SFR-Embedding-Code | 400M | 0.2683 |
-| 4 | CodeRankEmbed | 137M | 0.2604 |
-| 5 | BGE-M3 | 568M | 0.2194 |
-| 6 | BGE-Base-en-v1.5 | 109M | 0.2125 |
-| 7 | Snowflake-Arctic-Embed-L | 568M | 0.1958 |
-| 8 | CodeT5+-110M | 110M | 0.1794 |
-### CodeSearchNet-Python — Top 4
-Strong performance on the primary code search benchmark (NL → Code retrieval).
-| Rank | Model | Params | CSN-Python NDCG@10 |
-|------|-------|--------|-------------------|
-| 1 | SFR-Embedding-Code | 400M | 0.9505 |
-| 2 | Jina-Code-v2 | 161M | 0.9439 |
-| 3 | CodeRankEmbed | 137M | 0.9378 |
-| **4** | **CodeCompass-Embed (ours)** | **494M** | **0.9228** |
-| 5 | Snowflake-Arctic-Embed-L | 568M | 0.9146 |
-| 6 | BGE-M3 | 568M | 0.8976 |
-| 7 | BGE-Base-en-v1.5 | 109M | 0.8944 |
-| 8 | CodeT5+-110M | 110M | 0.8702 |
-### Full Results (All Tasks)
-| Task | NDCG@10 | MRR@10 |
-|------|---------|--------|
-| **codesearchnet-python** | **0.9228** | **0.9106** |
-| stackoverflow-qa | 0.6480 | 0.6156 |
-| synthetic-text2sql | 0.5673 | 0.4853 |
-| codefeedback-st | 0.4080 | 0.3698 |
-| **codetrans-dl** | **0.3305** 🏆 | **0.2161** |
-| apps | 0.1277 | 0.1097 |
 ## Usage
@@ -191,16 +163,16 @@ For optimal performance, use these instruction prefixes for queries:
 - **Base Model**: Qwen2.5-Coder-0.5B
 - **Training Data**: 8.8M samples from CoRNStack, StackOverflow, CodeSearchNet
 - **Architecture Modification**: Converted all 24 attention layers from causal to bidirectional
-- **Pooling**: Mean pooling (robust for variable-length extrapolation)
 - **Loss**: InfoNCE with temperature τ=0.05
 - **Hard Negatives**: 7 per sample (embedding-mined)
 - **Effective Batch Size**: 1024 (via GradCache)
-- **Training Steps**: 950 (early stopping at best MRR)
-- **Hardware**: NVIDIA H100 (95GB)
 ## Limitations
-- Optimized for **NL → Code** retrieval; weaker on Q&A style tasks
 - Trained primarily on Python/JavaScript/Java/Go/PHP/Ruby
 - May not generalize well to low-resource programming languages

 ## Model Highlights
+- 🏆 #1 on CodeTrans-DL (code translation between frameworks)
+- 🥇 #4 on CodeSearchNet-Python (natural language to code search)
+- ⚡ 494M parameters, 896-dim embeddings
+- 🔄 Bidirectional attention (converted from causal LLM)
+- 🎯 Mean pooling with L2 normalization
+- 📏 Trained at 512 tokens, extrapolates to longer sequences via RoPE
 ## Model Details
 ## Benchmark Results (CoIR)
+Evaluated on the [CoIR Benchmark](https://github.com/CoIR-team/coir) (NDCG@10). Sorted by CSN-Python.
+| Model | Params | CSN-Python | CodeTrans-DL | Text2SQL | SO-QA | CF-ST | Apps |
+|-------|--------|------------|--------------|----------|-------|-------|------|
+| SFR-Embedding-Code | 400M | 0.9505 | 0.2683 | 0.9949 | 0.9107 | 0.7258 | 0.2212 |
+| Jina-Code-v2 | 161M | 0.9439 | 0.2739 | 0.5169 | 0.8874 | 0.6975 | 0.1538 |
+| CodeRankEmbed | 137M | 0.9378 | 0.2604 | 0.7686 | 0.8990 | 0.7166 | 0.1993 |
+| **CodeCompass-Embed** | **494M** | **0.9228** | **0.3305** | **0.5673** | **0.6480** | **0.4080** | **0.1277** |
+| Snowflake-Arctic-Embed-L | 568M | 0.9146 | 0.1958 | 0.5401 | 0.8718 | 0.6503 | 0.1435 |
+| BGE-M3 | 568M | 0.8976 | 0.2194 | 0.5728 | 0.8501 | 0.6437 | 0.1445 |
+| BGE-Base-en-v1.5 | 109M | 0.8944 | 0.2125 | 0.5265 | 0.8581 | 0.6423 | 0.1415 |
+| CodeT5+-110M | 110M | 0.8702 | 0.1794 | 0.3275 | 0.8147 | 0.5804 | 0.1179 |
+*CodeCompass-Embed ranks #1 on CodeTrans-DL and #4 on CSN-Python.*
 ## Usage
 - **Base Model**: Qwen2.5-Coder-0.5B
 - **Training Data**: 8.8M samples from CoRNStack, StackOverflow, CodeSearchNet
 - **Architecture Modification**: Converted all 24 attention layers from causal to bidirectional
+- **Pooling**: Mean pooling
 - **Loss**: InfoNCE with temperature τ=0.05
 - **Hard Negatives**: 7 per sample (embedding-mined)
 - **Effective Batch Size**: 1024 (via GradCache)
+- **Training Steps**: 950
+- **Hardware**: NVIDIA H100
 ## Limitations
+- Weaker on Q&A style tasks (StackOverflow-QA, CodeFeedback)
 - Trained primarily on Python/JavaScript/Java/Go/PHP/Ruby
 - May not generalize well to low-resource programming languages