Simplify README: single benchmark table, factual highlights
Browse files
README.md
CHANGED
|
@@ -52,12 +52,12 @@ model-index:
|
|
| 52 |
|
| 53 |
## Model Highlights
|
| 54 |
|
| 55 |
-
- π
|
| 56 |
-
- π₯
|
| 57 |
-
- β‘
|
| 58 |
-
- π
|
| 59 |
-
-
|
| 60 |
-
-
|
| 61 |
|
| 62 |
## Model Details
|
| 63 |
|
|
@@ -73,48 +73,20 @@ model-index:
|
|
| 73 |
|
| 74 |
## Benchmark Results (CoIR)
|
| 75 |
|
| 76 |
-
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
|
| 80 |
-
|
| 81 |
-
|
| 82 |
-
|
|
| 83 |
-
|
| 84 |
-
|
|
| 85 |
-
|
|
| 86 |
-
|
|
| 87 |
-
|
|
| 88 |
-
|
| 89 |
-
|
| 90 |
-
| 7 | Snowflake-Arctic-Embed-L | 568M | 0.1958 |
|
| 91 |
-
| 8 | CodeT5+-110M | 110M | 0.1794 |
|
| 92 |
-
|
| 93 |
-
### CodeSearchNet-Python β Top 4
|
| 94 |
-
|
| 95 |
-
Strong performance on the primary code search benchmark (NL β Code retrieval).
|
| 96 |
-
|
| 97 |
-
| Rank | Model | Params | CSN-Python NDCG@10 |
|
| 98 |
-
|------|-------|--------|-------------------|
|
| 99 |
-
| 1 | SFR-Embedding-Code | 400M | 0.9505 |
|
| 100 |
-
| 2 | Jina-Code-v2 | 161M | 0.9439 |
|
| 101 |
-
| 3 | CodeRankEmbed | 137M | 0.9378 |
|
| 102 |
-
| **4** | **CodeCompass-Embed (ours)** | **494M** | **0.9228** |
|
| 103 |
-
| 5 | Snowflake-Arctic-Embed-L | 568M | 0.9146 |
|
| 104 |
-
| 6 | BGE-M3 | 568M | 0.8976 |
|
| 105 |
-
| 7 | BGE-Base-en-v1.5 | 109M | 0.8944 |
|
| 106 |
-
| 8 | CodeT5+-110M | 110M | 0.8702 |
|
| 107 |
-
|
| 108 |
-
### Full Results (All Tasks)
|
| 109 |
-
|
| 110 |
-
| Task | NDCG@10 | MRR@10 |
|
| 111 |
-
|------|---------|--------|
|
| 112 |
-
| **codesearchnet-python** | **0.9228** | **0.9106** |
|
| 113 |
-
| stackoverflow-qa | 0.6480 | 0.6156 |
|
| 114 |
-
| synthetic-text2sql | 0.5673 | 0.4853 |
|
| 115 |
-
| codefeedback-st | 0.4080 | 0.3698 |
|
| 116 |
-
| **codetrans-dl** | **0.3305** π | **0.2161** |
|
| 117 |
-
| apps | 0.1277 | 0.1097 |
|
| 118 |
|
| 119 |
## Usage
|
| 120 |
|
|
@@ -191,16 +163,16 @@ For optimal performance, use these instruction prefixes for queries:
|
|
| 191 |
- **Base Model**: Qwen2.5-Coder-0.5B
|
| 192 |
- **Training Data**: 8.8M samples from CoRNStack, StackOverflow, CodeSearchNet
|
| 193 |
- **Architecture Modification**: Converted all 24 attention layers from causal to bidirectional
|
| 194 |
-
- **Pooling**: Mean pooling
|
| 195 |
- **Loss**: InfoNCE with temperature Ο=0.05
|
| 196 |
- **Hard Negatives**: 7 per sample (embedding-mined)
|
| 197 |
- **Effective Batch Size**: 1024 (via GradCache)
|
| 198 |
-
- **Training Steps**: 950
|
| 199 |
-
- **Hardware**: NVIDIA H100
|
| 200 |
|
| 201 |
## Limitations
|
| 202 |
|
| 203 |
-
-
|
| 204 |
- Trained primarily on Python/JavaScript/Java/Go/PHP/Ruby
|
| 205 |
- May not generalize well to low-resource programming languages
|
| 206 |
|
|
|
|
| 52 |
|
| 53 |
## Model Highlights
|
| 54 |
|
| 55 |
+
- π #1 on CodeTrans-DL (code translation between frameworks)
|
| 56 |
+
- π₯ #4 on CodeSearchNet-Python (natural language to code search)
|
| 57 |
+
- β‘ 494M parameters, 896-dim embeddings
|
| 58 |
+
- π Bidirectional attention (converted from causal LLM)
|
| 59 |
+
- π― Mean pooling with L2 normalization
|
| 60 |
+
- π Trained at 512 tokens, extrapolates to longer sequences via RoPE
|
| 61 |
|
| 62 |
## Model Details
|
| 63 |
|
|
|
|
| 73 |
|
| 74 |
## Benchmark Results (CoIR)
|
| 75 |
|
| 76 |
+
Evaluated on the [CoIR Benchmark](https://github.com/CoIR-team/coir) (NDCG@10). Sorted by CSN-Python.
|
| 77 |
+
|
| 78 |
+
| Model | Params | CSN-Python | CodeTrans-DL | Text2SQL | SO-QA | CF-ST | Apps |
|
| 79 |
+
|-------|--------|------------|--------------|----------|-------|-------|------|
|
| 80 |
+
| SFR-Embedding-Code | 400M | 0.9505 | 0.2683 | 0.9949 | 0.9107 | 0.7258 | 0.2212 |
|
| 81 |
+
| Jina-Code-v2 | 161M | 0.9439 | 0.2739 | 0.5169 | 0.8874 | 0.6975 | 0.1538 |
|
| 82 |
+
| CodeRankEmbed | 137M | 0.9378 | 0.2604 | 0.7686 | 0.8990 | 0.7166 | 0.1993 |
|
| 83 |
+
| **CodeCompass-Embed** | **494M** | **0.9228** | **0.3305** | **0.5673** | **0.6480** | **0.4080** | **0.1277** |
|
| 84 |
+
| Snowflake-Arctic-Embed-L | 568M | 0.9146 | 0.1958 | 0.5401 | 0.8718 | 0.6503 | 0.1435 |
|
| 85 |
+
| BGE-M3 | 568M | 0.8976 | 0.2194 | 0.5728 | 0.8501 | 0.6437 | 0.1445 |
|
| 86 |
+
| BGE-Base-en-v1.5 | 109M | 0.8944 | 0.2125 | 0.5265 | 0.8581 | 0.6423 | 0.1415 |
|
| 87 |
+
| CodeT5+-110M | 110M | 0.8702 | 0.1794 | 0.3275 | 0.8147 | 0.5804 | 0.1179 |
|
| 88 |
+
|
| 89 |
+
*CodeCompass-Embed ranks #1 on CodeTrans-DL and #4 on CSN-Python.*
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 90 |
|
| 91 |
## Usage
|
| 92 |
|
|
|
|
| 163 |
- **Base Model**: Qwen2.5-Coder-0.5B
|
| 164 |
- **Training Data**: 8.8M samples from CoRNStack, StackOverflow, CodeSearchNet
|
| 165 |
- **Architecture Modification**: Converted all 24 attention layers from causal to bidirectional
|
| 166 |
+
- **Pooling**: Mean pooling
|
| 167 |
- **Loss**: InfoNCE with temperature Ο=0.05
|
| 168 |
- **Hard Negatives**: 7 per sample (embedding-mined)
|
| 169 |
- **Effective Batch Size**: 1024 (via GradCache)
|
| 170 |
+
- **Training Steps**: 950
|
| 171 |
+
- **Hardware**: NVIDIA H100
|
| 172 |
|
| 173 |
## Limitations
|
| 174 |
|
| 175 |
+
- Weaker on Q&A style tasks (StackOverflow-QA, CodeFeedback)
|
| 176 |
- Trained primarily on Python/JavaScript/Java/Go/PHP/Ruby
|
| 177 |
- May not generalize well to low-resource programming languages
|
| 178 |
|