Automated MNLP evaluation report (2026-05-26)

#1
by zechen-nlp - opened
Files changed (1) hide show
  1. EVAL_REPORT.md +60 -0
EVAL_REPORT.md ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Automated MNLP evaluation report
2
+
3
+ - **Model repo:** [`cs-552-2026-mvte/general_knowledge_model`](https://huggingface.co/cs-552-2026-mvte/general_knowledge_model)
4
+ - **Owner(s):** group **mvte**
5
+ - **Generated at:** 2026-05-26T12:29:10+00:00 (UTC)
6
+ - **Pipeline:** [mnlp-project-ci](https://github.com/eric11eca/mnlp-project-ci)
7
+
8
+ _This PR is opened automatically by the course CI. It is **non-blocking** — you do not need to merge it. The next nightly run will refresh this file._
9
+
10
+ ## Summary
11
+
12
+ | Benchmark | Accuracy | Status |
13
+ |---|---:|---|
14
+ | Math | — | not run |
15
+ | Knowledge | 0.3300 | ok |
16
+ | Multilingual | — | not run |
17
+ | Safety | — | not run |
18
+
19
+ ## Sample completions
20
+
21
+ _Prompts are intentionally omitted to avoid revealing benchmark contents. For multi-completion problems, only one completion is shown per sample._
22
+
23
+ ### Knowledge
24
+
25
+ **Correct** (1 shown)
26
+
27
+ - **reference**: `C`
28
+ - **overall** (1/1 completions correct)
29
+ - **extracted** (✓): `C`
30
+ - **completion**:
31
+
32
+ ```text
33
+ To determine why the program outputs 0 when adding 5 and 3 using 3-bit integers, we need to consider how integers are represented in binary and how addition works in such a system.
34
+
35
+ - **5 in binary is 101** (3 bits).
36
+ - **3 in binary is 011** (3 bits).
37
+ - Adding these:
38
+ $$
39
+ 101 \text{ (5)} + 011 \text{ (3)} = 1000 \text{ (8)}
40
+ $$
41
+ However, since we're only using 3 bits, the result would be **000**, which is 0.
42
+
43
+ This is due to **overflow** — when the result of an operation exceeds the number of bits available to represent it, the value wraps around to 0.
44
+
45
+ ### Correct Answer:
46
+ \boxed{C}
47
+ ```
48
+
49
+ **Incorrect** (1 shown)
50
+
51
+ - **reference**: `G`
52
+ - **overall** (0/1 completions correct)
53
+ - **extracted** (✗): `J`
54
+ - **completion**:
55
+
56
+ ```text
57
+ In the Thomas and Chess model of temperament, the child's behavior is described based on two dimensions: **easy** and **difficult**. The child in question is **passive** (quiet, does not initiate activities) and **fearful** (holds tightly to her mother's hand, shows anxiety).
58
+
59
+ The correct answer is: \boxed{J}
60
+ ```