Automated MNLP evaluation report (2026-05-03)

#1
Files changed (1) hide show
  1. EVAL_REPORT.md +114 -0
EVAL_REPORT.md ADDED
@@ -0,0 +1,114 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Automated MNLP evaluation report
2
+
3
+ - **Model repo:** [`cs-552-2026-taadmin/general_knowledge_model`](https://huggingface.co/cs-552-2026-taadmin/general_knowledge_model)
4
+ - **Owner(s):** group **taadmin**
5
+ - **Generated at:** 2026-05-03T02:20:59+00:00 (UTC)
6
+ - **Pipeline:** [mnlp-project-ci](https://github.com/eric11eca/mnlp-project-ci)
7
+
8
+ _This PR is opened automatically by the course CI. It is **non-blocking** — you do not need to merge it. The next nightly run will refresh this file._
9
+
10
+ ## Summary
11
+
12
+ | Benchmark | Metric | Accuracy | # problems | Status |
13
+ |---|---|---:|---:|---|
14
+ | Math | `pass@8` | — | — | not run |
15
+ | Knowledge | `pass@1` | 0.1500 | 20 | ok |
16
+ | Multilingual | `pass@1` | — | — | not run |
17
+ | Safety | `pass@1` | — | — | not run |
18
+
19
+ ## Sample completions
20
+
21
+ ### Knowledge
22
+
23
+ **Correct** (2 shown)
24
+
25
+ - **prompt** (truncated): `Suppose a spin-half particle is prepared in the state \(0.5|\uparrow\rangle+\frac{\sqrt{3}}{2}|\downarrow\rangle\). Given that \(|\uparrow\rangle\) and \(|\downarrow\rangle\) are eigenstates of \(\sigma_z\), determine the expectation value, rounded to one decimal place, of \(10\sigma_z+5\sigma_x\). Here, the symbols have their usual meanings.
26
+
27
+ Choices:
28
+ A. 1.65
29
+ B. -0.7
30
+ C. 0.85
31
+ D. -1.4`
32
+ - **reference**: `B`
33
+ - **extracted answers** (1/1 correct, ✓):
34
+ - `B`
35
+ - **raw completions**:
36
+ - completion 1:
37
+ ```text
38
+ <think>
39
+ Okay, so I need to find the expectation value of the operator $10\sigma_z + 5\sigma_x$ for a spin-half particle in the state $0.5|\uparrow\rangle + \frac{\sqrt{3}}{2}|\downarrow\rangle$. Let me start by recalling some basic concepts about quantum mechanics and spin operators.
40
+
41
+ First, the Pauli matrices $\sigma_z$ and $\sigma_x$ are the matrices for the spin operators in the z and x directions, respectively. For a spin-1/2 particle, the eigenstates of $\sigma_z$ are $|\uparrow\rangle$ and $|\downarrow\rangle$, which are the basis states. The expectation value of an operator is calculated by taking the inner product of the state with the operator multiplied by the state, right?
42
+
43
+ So, th...
44
+ ```
45
+
46
+ - **prompt** (truncated): `Which one of the following statements regarding the molecular biology of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS‑CoV‑2) is not correct?
47
+
48
+ Choices:
49
+ A. Programmed ribosomal frameshifting creates two polyproteins near to 5` end of the genome by moving back by 1 nucleotide with the help of slippery nucleotides, and pseudoknot. The SARS-CoV-2 programmed ribosomal frameshifting mostly has t...`
50
+ - **reference**: `C`
51
+ - **extracted answers** (1/1 correct, ✓):
52
+ - `C`
53
+ - **raw completions**:
54
+ - completion 1:
55
+ ```text
56
+ <think>
57
+ Okay, let's tackle this question. So, the task is to find which statement about the molecular biology of SARS-CoV-2 is not correct. The options are A, B, C, D. Let me go through each one step by step.
58
+
59
+ Starting with option A: It talks about programmed ribosomal frameshifting in SARS-CoV-2. The statement says that it creates two polyproteins near the 5' end by moving back by 1 nucleotide with slippery nucleotides and pseudoknots. Also, it mentions that the SARS-CoV-2 frameshifting is similar to SARS-CoV. I remember that both SARS-CoV and SARS-CoV-2 use similar mechanisms for frameshifting, like the -1 frameshift signals. So this seems correct. So A might be correct.
60
+
61
+ Option B: It says...
62
+ ```
63
+
64
+ **Incorrect** (2 shown)
65
+
66
+ - **prompt** (truncated): `An unknown organic compound has been synthesized, and its 1H NMR spectrum gave the following data:
67
+ 1H NMR: chemical reference (ppm): 7.0 (1H, d, J = 16.0 Hz), 5.5 (1H, dq), 2.1 (3H, s), 1.6 (3H, d)
68
+ Determine which of the following compounds it is: Cis-propenyl acetate, Trans-propenyl acetate, Cis-butenyl acetate, or Trans-butenyl acetate.
69
+
70
+ Choices:
71
+ A. Cis-propenyl acetate
72
+ B. Cis-butenyl acetate
73
+ C....`
74
+ - **reference**: `D`
75
+ - **extracted answers** (0/1 correct, ✗):
76
+ - `<no answer>`
77
+ - **raw completions**:
78
+ - completion 1:
79
+ ```text
80
+ <think>
81
+ Okay, let's try to figure out which compound this is. The NMR data given is for an unknown organic compound with the following peaks:
82
+
83
+ - 7.0 (1H, d, J=16.0 Hz)
84
+ - 5.5 (1H, dq)
85
+ - 2.1 (3H, s)
86
+ - 1.6 (3H, d)
87
+
88
+ First, I need to recall what each of these NMR peaks represents. The 2.1 ppm peak is a singlet at 2.1, which is probably a methyl group (since it's a singlet and the rest of the molecule is probably not splitting it). The 1.6 ppm peak is a doublet, but wait, the third peak is 1.6 (3H, d). Wait, the 1.6 is a doublet? Wait, no. Wait, the data is:
89
+
90
+ Wait, the user wrote:
91
+
92
+ "1H NMR: chemical reference (ppm): 7.0 (1H, d, J = 16.0 Hz), 5.5 (1H, dq), 2.1 (3H, s), 1.6 (3H, d)"
93
+
94
+ So the peaks ar...
95
+ ```
96
+
97
+ - **prompt** (truncated): `Which of the following starting material(s) would be used to prepare methyl 2-propyl-1,2,4a,5,6,7,8,8a-octahydronaphthalene-1-carboxylate?
98
+
99
+ Choices:
100
+ A. methyl (2E,4E,10Z)-tetradeca-2,4,10-trienoate
101
+ B. methyl (2E,8E,10E)-tetradeca-2,8,10-trienoate
102
+ C. Cyclohexene and methyl 2,3-dimethylenehexanoate
103
+ D. 1-vinylcyclohex-1-ene and methyl hex-2-ynoate`
104
+ - **reference**: `B`
105
+ - **extracted answers** (0/1 correct, ✗):
106
+ - `<no answer>`
107
+ - **raw completions**:
108
+ - completion 1:
109
+ ```text
110
+ <think>
111
+ Okay, let's try to figure out which starting material is used to prepare methyl 2-propyl-1,2,4a,5,6,7,8,8a-octahydronaphthalene-1-carboxylate. Hmm, first, I need to understand the structure of the compound in question.
112
+
113
+ The name given is methyl 2-propyl-1,2,4a,5,6,7,8,8a-octahydronaphthalene-1-carboxylate. Let me break that down. The parent ring is naphthalene, which has two fused benzene rings. The numbering is 1,2,4a,5,6,7,8,8a. So, the structure is a naphthalene ring with eight hydrogen atoms, but here it's substituted with certain groups. The carboxylate group is at position 1, and there are methyl and propyl groups on positions 2 and 3. Wait, the name mentions "2-propyl" so may...
114
+ ```