Update README.md
Browse files
README.md
CHANGED
|
@@ -207,7 +207,7 @@ This approach enables Alpie Core to deliver reliable, aligned, and context-aware
|
|
| 207 |
| MBPP (pass@1) | **75.20%** | 65.0% | 72.6% | 68.4% | - | 65.6% | 69.64% |
|
| 208 |
| HumanEval (pass@1) | **57.23%** | 43.3% | 53.0% | 54.9% | - | 48.8% | - |
|
| 209 |
|
| 210 |
-
### SWE-Bench Verified Performance
|
| 211 |
|
| 212 |
| Rank | Model | Accuracy (%) | vs Alpie |
|
| 213 |
|------|-------|-------------|----------|
|
|
@@ -219,7 +219,7 @@ This approach enables Alpie Core to deliver reliable, aligned, and context-aware
|
|
| 219 |
| 6 | DeepSeek R1 | 49.2 | -8.6% |
|
| 220 |
| 7 | Devstral | 46.8 | -11.0% |
|
| 221 |
|
| 222 |
-
### Humanity's Last Exam Leaderboard
|
| 223 |
|
| 224 |
| Rank | Model | Accuracy (%) | vs Alpie |
|
| 225 |
|------|-------|-------------|----------|
|
|
|
|
| 207 |
| MBPP (pass@1) | **75.20%** | 65.0% | 72.6% | 68.4% | - | 65.6% | 69.64% |
|
| 208 |
| HumanEval (pass@1) | **57.23%** | 43.3% | 53.0% | 54.9% | - | 48.8% | - |
|
| 209 |
|
| 210 |
+
### SWE-Bench Verified Performance
|
| 211 |
|
| 212 |
| Rank | Model | Accuracy (%) | vs Alpie |
|
| 213 |
|------|-------|-------------|----------|
|
|
|
|
| 219 |
| 6 | DeepSeek R1 | 49.2 | -8.6% |
|
| 220 |
| 7 | Devstral | 46.8 | -11.0% |
|
| 221 |
|
| 222 |
+
### Humanity's Last Exam Leaderboard
|
| 223 |
|
| 224 |
| Rank | Model | Accuracy (%) | vs Alpie |
|
| 225 |
|------|-------|-------------|----------|
|