zechen-nlp commited on
Commit
c8a5a32
·
verified ·
1 Parent(s): 3cc0033

Update Automated MNLP evaluation report (2026-05-03)

Browse files
Files changed (1) hide show
  1. EVAL_REPORT.md +1 -12
EVAL_REPORT.md CHANGED
@@ -2,7 +2,7 @@
2
 
3
  - **Model repo:** [`cs-552-2026-taadmin/math_model`](https://huggingface.co/cs-552-2026-taadmin/math_model)
4
  - **Owner(s):** group **taadmin**
5
- - **Generated at:** 2026-05-03T02:10:11+00:00 (UTC)
6
  - **Pipeline:** [mnlp-project-ci](https://github.com/eric11eca/mnlp-project-ci)
7
 
8
  _This PR is opened automatically by the course CI. It is **non-blocking** — you do not need to merge it. The next nightly run will refresh this file._
@@ -16,17 +16,6 @@ _This PR is opened automatically by the course CI. It is **non-blocking** — yo
16
  | Multilingual | `pass@1` | — | — | not run |
17
  | Safety | `pass@1` | — | — | not run |
18
 
19
- ## Per-source breakdown
20
-
21
- ### Math
22
-
23
- | Source | Metric | Accuracy | # problems |
24
- |---|---|---:|---:|
25
- | `HuggingFaceH4/MATH-500` | `pass@8` | 0.6000 | 5 |
26
- | `MathArena/aime_2025` | `pass@8` | 0.0000 | 2 |
27
- | `MathArena/aime_2026` | `pass@8` | 0.0000 | 1 |
28
- | `MathArena/hmmt_feb_2026` | `pass@8` | 0.0000 | 2 |
29
-
30
  ## Sample completions
31
 
32
  ### Math
 
2
 
3
  - **Model repo:** [`cs-552-2026-taadmin/math_model`](https://huggingface.co/cs-552-2026-taadmin/math_model)
4
  - **Owner(s):** group **taadmin**
5
+ - **Generated at:** 2026-05-03T02:20:59+00:00 (UTC)
6
  - **Pipeline:** [mnlp-project-ci](https://github.com/eric11eca/mnlp-project-ci)
7
 
8
  _This PR is opened automatically by the course CI. It is **non-blocking** — you do not need to merge it. The next nightly run will refresh this file._
 
16
  | Multilingual | `pass@1` | — | — | not run |
17
  | Safety | `pass@1` | — | — | not run |
18
 
 
 
 
 
 
 
 
 
 
 
 
19
  ## Sample completions
20
 
21
  ### Math