faisalmumtaz commited on
Commit
6321f18
Β·
verified Β·
1 Parent(s): 5a88cb2

Simplify README: single benchmark table, factual highlights

Browse files
Files changed (1) hide show
  1. README.md +24 -52
README.md CHANGED
@@ -52,12 +52,12 @@ model-index:
52
 
53
  ## Model Highlights
54
 
55
- - πŸ† **SOTA on CodeTrans-DL**: #1 on code translation benchmark (+20.7% over next best)
56
- - πŸ₯‡ **Top-4 on CodeSearchNet-Python**: NDCG@10 = 0.9228 (competitive with 400M models)
57
- - ⚑ **Efficient**: 494M parameters, runs on consumer GPUs
58
- - πŸ”„ **Bidirectional Attention**: Converted from causal to bidirectional for embedding tasks
59
- - πŸ“ **Flexible Context**: Trained at 512 tokens, supports up to 32K via RoPE extrapolation
60
- - 🎯 **Mean Pooling**: Robust to variable-length inputs
61
 
62
  ## Model Details
63
 
@@ -73,48 +73,20 @@ model-index:
73
 
74
  ## Benchmark Results (CoIR)
75
 
76
- We evaluate on the [CoIR Benchmark](https://github.com/CoIR-team/coir) (ACL 2025), the gold standard for code retrieval evaluation.
77
-
78
- ### πŸ† CodeTrans-DL β€” State-of-the-Art
79
-
80
- CodeCompass-Embed achieves **#1** on CodeTrans-DL (code translation between deep learning frameworks), beating all existing models by **+20.7%**.
81
-
82
- | Rank | Model | Params | CodeTrans NDCG@10 |
83
- |------|-------|--------|-------------------|
84
- | **πŸ₯‡ 1** | **CodeCompass-Embed (ours)** | **494M** | **0.3305** |
85
- | 2 | Jina-Code-v2 | 161M | 0.2739 |
86
- | 3 | SFR-Embedding-Code | 400M | 0.2683 |
87
- | 4 | CodeRankEmbed | 137M | 0.2604 |
88
- | 5 | BGE-M3 | 568M | 0.2194 |
89
- | 6 | BGE-Base-en-v1.5 | 109M | 0.2125 |
90
- | 7 | Snowflake-Arctic-Embed-L | 568M | 0.1958 |
91
- | 8 | CodeT5+-110M | 110M | 0.1794 |
92
-
93
- ### CodeSearchNet-Python β€” Top 4
94
-
95
- Strong performance on the primary code search benchmark (NL β†’ Code retrieval).
96
-
97
- | Rank | Model | Params | CSN-Python NDCG@10 |
98
- |------|-------|--------|-------------------|
99
- | 1 | SFR-Embedding-Code | 400M | 0.9505 |
100
- | 2 | Jina-Code-v2 | 161M | 0.9439 |
101
- | 3 | CodeRankEmbed | 137M | 0.9378 |
102
- | **4** | **CodeCompass-Embed (ours)** | **494M** | **0.9228** |
103
- | 5 | Snowflake-Arctic-Embed-L | 568M | 0.9146 |
104
- | 6 | BGE-M3 | 568M | 0.8976 |
105
- | 7 | BGE-Base-en-v1.5 | 109M | 0.8944 |
106
- | 8 | CodeT5+-110M | 110M | 0.8702 |
107
-
108
- ### Full Results (All Tasks)
109
-
110
- | Task | NDCG@10 | MRR@10 |
111
- |------|---------|--------|
112
- | **codesearchnet-python** | **0.9228** | **0.9106** |
113
- | stackoverflow-qa | 0.6480 | 0.6156 |
114
- | synthetic-text2sql | 0.5673 | 0.4853 |
115
- | codefeedback-st | 0.4080 | 0.3698 |
116
- | **codetrans-dl** | **0.3305** πŸ† | **0.2161** |
117
- | apps | 0.1277 | 0.1097 |
118
 
119
  ## Usage
120
 
@@ -191,16 +163,16 @@ For optimal performance, use these instruction prefixes for queries:
191
  - **Base Model**: Qwen2.5-Coder-0.5B
192
  - **Training Data**: 8.8M samples from CoRNStack, StackOverflow, CodeSearchNet
193
  - **Architecture Modification**: Converted all 24 attention layers from causal to bidirectional
194
- - **Pooling**: Mean pooling (robust for variable-length extrapolation)
195
  - **Loss**: InfoNCE with temperature Ο„=0.05
196
  - **Hard Negatives**: 7 per sample (embedding-mined)
197
  - **Effective Batch Size**: 1024 (via GradCache)
198
- - **Training Steps**: 950 (early stopping at best MRR)
199
- - **Hardware**: NVIDIA H100 (95GB)
200
 
201
  ## Limitations
202
 
203
- - Optimized for **NL β†’ Code** retrieval; weaker on Q&A style tasks
204
  - Trained primarily on Python/JavaScript/Java/Go/PHP/Ruby
205
  - May not generalize well to low-resource programming languages
206
 
 
52
 
53
  ## Model Highlights
54
 
55
+ - πŸ† #1 on CodeTrans-DL (code translation between frameworks)
56
+ - πŸ₯‡ #4 on CodeSearchNet-Python (natural language to code search)
57
+ - ⚑ 494M parameters, 896-dim embeddings
58
+ - πŸ”„ Bidirectional attention (converted from causal LLM)
59
+ - 🎯 Mean pooling with L2 normalization
60
+ - πŸ“ Trained at 512 tokens, extrapolates to longer sequences via RoPE
61
 
62
  ## Model Details
63
 
 
73
 
74
  ## Benchmark Results (CoIR)
75
 
76
+ Evaluated on the [CoIR Benchmark](https://github.com/CoIR-team/coir) (NDCG@10). Sorted by CSN-Python.
77
+
78
+ | Model | Params | CSN-Python | CodeTrans-DL | Text2SQL | SO-QA | CF-ST | Apps |
79
+ |-------|--------|------------|--------------|----------|-------|-------|------|
80
+ | SFR-Embedding-Code | 400M | 0.9505 | 0.2683 | 0.9949 | 0.9107 | 0.7258 | 0.2212 |
81
+ | Jina-Code-v2 | 161M | 0.9439 | 0.2739 | 0.5169 | 0.8874 | 0.6975 | 0.1538 |
82
+ | CodeRankEmbed | 137M | 0.9378 | 0.2604 | 0.7686 | 0.8990 | 0.7166 | 0.1993 |
83
+ | **CodeCompass-Embed** | **494M** | **0.9228** | **0.3305** | **0.5673** | **0.6480** | **0.4080** | **0.1277** |
84
+ | Snowflake-Arctic-Embed-L | 568M | 0.9146 | 0.1958 | 0.5401 | 0.8718 | 0.6503 | 0.1435 |
85
+ | BGE-M3 | 568M | 0.8976 | 0.2194 | 0.5728 | 0.8501 | 0.6437 | 0.1445 |
86
+ | BGE-Base-en-v1.5 | 109M | 0.8944 | 0.2125 | 0.5265 | 0.8581 | 0.6423 | 0.1415 |
87
+ | CodeT5+-110M | 110M | 0.8702 | 0.1794 | 0.3275 | 0.8147 | 0.5804 | 0.1179 |
88
+
89
+ *CodeCompass-Embed ranks #1 on CodeTrans-DL and #4 on CSN-Python.*
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
90
 
91
  ## Usage
92
 
 
163
  - **Base Model**: Qwen2.5-Coder-0.5B
164
  - **Training Data**: 8.8M samples from CoRNStack, StackOverflow, CodeSearchNet
165
  - **Architecture Modification**: Converted all 24 attention layers from causal to bidirectional
166
+ - **Pooling**: Mean pooling
167
  - **Loss**: InfoNCE with temperature Ο„=0.05
168
  - **Hard Negatives**: 7 per sample (embedding-mined)
169
  - **Effective Batch Size**: 1024 (via GradCache)
170
+ - **Training Steps**: 950
171
+ - **Hardware**: NVIDIA H100
172
 
173
  ## Limitations
174
 
175
+ - Weaker on Q&A style tasks (StackOverflow-QA, CodeFeedback)
176
  - Trained primarily on Python/JavaScript/Java/Go/PHP/Ruby
177
  - May not generalize well to low-resource programming languages
178