Update README.md
Browse files
README.md
CHANGED
|
@@ -64,6 +64,7 @@ Debugged vibecoder dataset
|
|
| 64 |
| Tasks | Version | Filter | n-shot | Metric | Vcoder-120B | gpt-oss-120 | DeepSeek-V3.2-Exp |
|
| 65 |
|---------------------|---------|------------------|--------|------------|-------------|------------ |-------------------|
|
| 66 |
| gsm8k (cot) | 3 | flexible-extract | 5 | exact_match ↑ | 0.88 | 0.88 | - |
|
|
|
|
| 67 |
|
| 68 |
**Notes:**
|
| 69 |
- The `(+value)` indicates delta over baseline evaluation.
|
|
|
|
| 64 |
| Tasks | Version | Filter | n-shot | Metric | Vcoder-120B | gpt-oss-120 | DeepSeek-V3.2-Exp |
|
| 65 |
|---------------------|---------|------------------|--------|------------|-------------|------------ |-------------------|
|
| 66 |
| gsm8k (cot) | 3 | flexible-extract | 5 | exact_match ↑ | 0.88 | 0.88 | - |
|
| 67 |
+
- Benchmark used [The Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness/tree/main)
|
| 68 |
|
| 69 |
**Notes:**
|
| 70 |
- The `(+value)` indicates delta over baseline evaluation.
|