zechen-nlp commited on
Commit
b769af3
·
verified ·
1 Parent(s): cce7953

Update Automated MNLP evaluation report (2026-06-04)

Browse files
Files changed (1) hide show
  1. EVAL_REPORT.md +3 -3
EVAL_REPORT.md CHANGED
@@ -2,7 +2,7 @@
2
 
3
  - **Model repo:** [`cs-552-2026-the-transformers/general_knowledge_model`](https://huggingface.co/cs-552-2026-the-transformers/general_knowledge_model)
4
  - **Owner(s):** group **the-transformers**
5
- - **Generated at:** 2026-06-03T00:17:23+00:00 (UTC)
6
  - **Pipeline:** [mnlp-project-ci](https://github.com/eric11eca/mnlp-project-ci)
7
 
8
  _This PR is opened automatically by the course CI. It is **non-blocking** — you do not need to merge it. The next nightly run will refresh this file._
@@ -12,7 +12,7 @@ _This PR is opened automatically by the course CI. It is **non-blocking** — yo
12
  | Benchmark | Accuracy | Status |
13
  |---|---:|---|
14
  | Math | — | not run |
15
- | Knowledge | 0.2500 | ok |
16
  | Multilingual | — | not run |
17
  | Safety | — | not run |
18
 
@@ -46,7 +46,7 @@ _Prompts are intentionally omitted to avoid revealing benchmark contents. For mu
46
 
47
  ```text
48
  <think>
49
- This engineering question. Option C is the correct answer.
50
  </think>
51
 
52
  \boxed{B}
 
2
 
3
  - **Model repo:** [`cs-552-2026-the-transformers/general_knowledge_model`](https://huggingface.co/cs-552-2026-the-transformers/general_knowledge_model)
4
  - **Owner(s):** group **the-transformers**
5
+ - **Generated at:** 2026-06-04T22:19:52+00:00 (UTC)
6
  - **Pipeline:** [mnlp-project-ci](https://github.com/eric11eca/mnlp-project-ci)
7
 
8
  _This PR is opened automatically by the course CI. It is **non-blocking** — you do not need to merge it. The next nightly run will refresh this file._
 
12
  | Benchmark | Accuracy | Status |
13
  |---|---:|---|
14
  | Math | — | not run |
15
+ | Knowledge | 0.2400 | ok |
16
  | Multilingual | — | not run |
17
  | Safety | — | not run |
18
 
 
46
 
47
  ```text
48
  <think>
49
+ This engineering question. Option D is the correct answer.
50
  </think>
51
 
52
  \boxed{B}