Update README.md
Browse files
README.md
CHANGED
|
@@ -85,6 +85,27 @@ _Note:_ Arcee’s internal evals may use different harnesses; avoid cross-harnes
|
|
| 85 |
- **MMLU-Pro** increases difficulty (10 options; more reasoning-heavy); small deltas are still meaningful.
|
| 86 |
- **IFEVAL** checks **verifiable** constraints (length, keyword counts, format, etc.).
|
| 87 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 88 |
### Reproduce (example commands)
|
| 89 |
|
| 90 |
```bash
|
|
|
|
| 85 |
- **MMLU-Pro** increases difficulty (10 options; more reasoning-heavy); small deltas are still meaningful.
|
| 86 |
- **IFEVAL** checks **verifiable** constraints (length, keyword counts, format, etc.).
|
| 87 |
|
| 88 |
+
|
| 89 |
+
| mmlu | AFM-4.5B-OpenMed | AFM-4.5B |
|
| 90 |
+
| :-------------------- | :--------------- | :------- |
|
| 91 |
+
| **other** | | |
|
| 92 |
+
| clinical_knowledge | 67.55 | 65.66 |
|
| 93 |
+
| college_medicine | 64.74 | 54.34 |
|
| 94 |
+
| professional_medicine | 63.97 | 59.56 |
|
| 95 |
+
| virology | 49.4 | 48.19 |
|
| 96 |
+
| **stem** | | |
|
| 97 |
+
| anatomy | 62.96 | 56.3 |
|
| 98 |
+
| college_biology | 78.47 | 65.97 |
|
| 99 |
+
| college_chemistry | 44.00 | 37.00 |
|
| 100 |
+
| high_school_biology | 79.03 | 71.29 |
|
| 101 |
+
| high_school_chemistry | 53.2 | 43.84 |
|
| 102 |
+
| **groups** | | |
|
| 103 |
+
| humanities | 56.13 | 50.46 |
|
| 104 |
+
| other | 68.97 | 63.47 |
|
| 105 |
+
| social sciences | 73.25 | 68.61 |
|
| 106 |
+
| stem | 48.91 | 42.53 |
|
| 107 |
+
|
| 108 |
+
|
| 109 |
### Reproduce (example commands)
|
| 110 |
|
| 111 |
```bash
|