Update README.md
Browse files
README.md
CHANGED
|
@@ -63,7 +63,7 @@ Debugged vibecoder dataset
|
|
| 63 |
|
| 64 |
| Tasks | Version | Filter | n-shot | Metric | Vcoder-120B | gpt-oss-120 | DeepSeek-V3.2-Exp |
|
| 65 |
|---------------------|---------|------------------|--------|------------|-------------|------------ |-------------------|
|
| 66 |
-
| gsm8k (cot) | 3 | flexible-extract | 5 | exact_match ↑ | 0.
|
| 67 |
- Benchmark used [The Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness/tree/main)
|
| 68 |
|
| 69 |
**Notes:**
|
|
|
|
| 63 |
|
| 64 |
| Tasks | Version | Filter | n-shot | Metric | Vcoder-120B | gpt-oss-120 | DeepSeek-V3.2-Exp |
|
| 65 |
|---------------------|---------|------------------|--------|------------|-------------|------------ |-------------------|
|
| 66 |
+
| gsm8k (cot) | 3 | flexible-extract | 5 | exact_match ↑ | 0.9557 | 0.88 | - |
|
| 67 |
- Benchmark used [The Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness/tree/main)
|
| 68 |
|
| 69 |
**Notes:**
|