zechen-nlp commited on
Commit
f3db357
Β·
verified Β·
1 Parent(s): 1153fe8

Update Automated MNLP evaluation report (2026-05-27)

Browse files
Files changed (1) hide show
  1. EVAL_REPORT.md +5 -5
EVAL_REPORT.md CHANGED
@@ -2,7 +2,7 @@
2
 
3
  - **Model repo:** [`cs-552-2026-thinkinsidethebox/general_knowledge_model`](https://huggingface.co/cs-552-2026-thinkinsidethebox/general_knowledge_model)
4
  - **Owner(s):** group **thinkinsidethebox**
5
- - **Generated at:** 2026-05-26T12:29:10+00:00 (UTC)
6
  - **Pipeline:** [mnlp-project-ci](https://github.com/eric11eca/mnlp-project-ci)
7
 
8
  _This PR is opened automatically by the course CI. It is **non-blocking** β€” you do not need to merge it. The next nightly run will refresh this file._
@@ -12,7 +12,7 @@ _This PR is opened automatically by the course CI. It is **non-blocking** β€” yo
12
  | Benchmark | Accuracy | Status |
13
  |---|---:|---|
14
  | Math | β€” | not run |
15
- | Knowledge | 0.3300 | ok |
16
  | Multilingual | β€” | not run |
17
  | Safety | β€” | not run |
18
 
@@ -39,9 +39,9 @@ _Prompts are intentionally omitted to avoid revealing benchmark contents. For mu
39
 
40
  **Incorrect** (1 shown)
41
 
42
- - **reference**: `A`
43
  - **overall** (0/1 completions correct)
44
- - **extracted** (βœ—): `I`
45
  - **completion**:
46
 
47
  ```text
@@ -49,5 +49,5 @@ _Prompts are intentionally omitted to avoid revealing benchmark contents. For mu
49
 
50
  </think>
51
 
52
- \boxed{I}
53
  ```
 
2
 
3
  - **Model repo:** [`cs-552-2026-thinkinsidethebox/general_knowledge_model`](https://huggingface.co/cs-552-2026-thinkinsidethebox/general_knowledge_model)
4
  - **Owner(s):** group **thinkinsidethebox**
5
+ - **Generated at:** 2026-05-27T13:02:22+00:00 (UTC)
6
  - **Pipeline:** [mnlp-project-ci](https://github.com/eric11eca/mnlp-project-ci)
7
 
8
  _This PR is opened automatically by the course CI. It is **non-blocking** β€” you do not need to merge it. The next nightly run will refresh this file._
 
12
  | Benchmark | Accuracy | Status |
13
  |---|---:|---|
14
  | Math | β€” | not run |
15
+ | Knowledge | 0.3400 | ok |
16
  | Multilingual | β€” | not run |
17
  | Safety | β€” | not run |
18
 
 
39
 
40
  **Incorrect** (1 shown)
41
 
42
+ - **reference**: `F`
43
  - **overall** (0/1 completions correct)
44
+ - **extracted** (βœ—): `C`
45
  - **completion**:
46
 
47
  ```text
 
49
 
50
  </think>
51
 
52
+ \boxed{C}
53
  ```