zechen-nlp commited on
Commit
2f01b96
Β·
verified Β·
1 Parent(s): c6b6768

Update Automated MNLP evaluation report (2026-05-13)

Browse files
Files changed (1) hide show
  1. EVAL_REPORT.md +6 -55
EVAL_REPORT.md CHANGED
@@ -2,7 +2,7 @@
2
 
3
  - **Model repo:** [`cs-552-2026-the-transformers/math_model`](https://huggingface.co/cs-552-2026-the-transformers/math_model)
4
  - **Owner(s):** group **the-transformers**
5
- - **Generated at:** 2026-05-13T04:17:52+00:00 (UTC)
6
  - **Pipeline:** [mnlp-project-ci](https://github.com/eric11eca/mnlp-project-ci)
7
 
8
  _This PR is opened automatically by the course CI. It is **non-blocking** β€” you do not need to merge it. The next nightly run will refresh this file._
@@ -11,7 +11,7 @@ _This PR is opened automatically by the course CI. It is **non-blocking** β€” yo
11
 
12
  | Benchmark | Accuracy | Status |
13
  |---|---:|---|
14
- | Math | 0.3900 | ok |
15
  | Knowledge | β€” | not run |
16
  | Multilingual | β€” | not run |
17
  | Safety | β€” | not run |
@@ -22,66 +22,17 @@ _Prompts are intentionally omitted to avoid revealing benchmark contents. For mu
22
 
23
  ### Math
24
 
25
- **Correct** (1 shown)
26
-
27
- - **reference**: `0`
28
- - **overall** (4/8 completions correct)
29
- - **extracted** (βœ“): `0`
30
- - **completion**:
31
-
32
- ```text
33
- <think>
34
- To determine the degree of the polynomial \(p(x)\), we first need to understand the structure of the polynomial and its components.
35
-
36
- The polynomial \(p(x)\) is given by:
37
- \[
38
- p(x)=\frac{(x-b)(x-c)}{(a-b)(a-c)}+\frac{(x-a)(x-c)}{(b-a)(b-c)}+\frac{(x-a)(x-b)}{(c-a)(c-b)}.
39
- \]
40
-
41
- We can simplify this expression by recognizing that it is a form of a Lagrange interpolation polynomial. The Lagrange interpolation polynomial of degree 2 that passes through the points \((a, f(a))\), \((b, f(b))\), and \((c, f(c))\) is given by:
42
- \[
43
- f(x) = \frac{(x-b)(x-c)}{(a-b)(a-c)}f(a) + \frac{(x-a)(x-c)}{(b-a)(b-c)}f(b) + \frac{(x-a)(x-b)}{(c-a)(c-b)}f(c).
44
- \]
45
-
46
- In our case, \(f(x) = 1\) for all \(x\). Therefore, the polynomial \(p(x)\) is the Lagrange interpolation polynomial that passes through the points \((a, 1)\), \((b, 1)\), and \((c, 1)\). This polynomial is identically equal to 1 for all \(x\), because it is the unique polynomial of degree at most 2 that takes the value 1 at three distinct points.
47
-
48
- Since \(p(x)\) is identically equal to 1, it is a constant polynomial of degree 0.
49
-
50
- Therefore, the degree of the polynomial \(p(x)\) is:
51
- \[
52
- </think>
53
-
54
- \boxed{0}.
55
- \]
56
- ```
57
-
58
  **Incorrect** (1 shown)
59
 
60
  - **reference**: `$(n-2)2^n +1$`
61
  - **overall** (0/8 completions correct)
62
- - **extracted** (βœ—): `2^{n-1}`
63
  - **completion**:
64
 
65
  ```text
66
- <think>
67
- To solve this problem, we need to understand the structure of the set \( C_n \) and determine the greatest positive integer that cannot be represented as a sum of elements from \( C_n \).
68
-
69
- First, let's define the set \( C_n \):
70
- \[ C_n = \{2^n - 2^i : i \text{ is an integer and } 0 \leq i < n\} \]
71
-
72
- This set contains \( n \) elements, each of which is \( 2^n - 2^i \) for \( i = 0, 1, 2, \ldo
73
-
74
- ... [truncated 732 chars] ...
75
-
76
- d the greatest positive integer that cannot be represented as a sum of elements from \( C_n \), we can use the concept of the Frobenius number, which is the largest number that cannot be expressed as a non-negative integer combination of given integers. However, the Frobenius number is only known for two numbers, and it is not generally computable for larger sets.
77
-
78
- Instead, we can consider the structure of the set \( C_n \) and the fact that it contains all integers of the form \( 2^n - 2^i \). The key insight is that the set \( C_n \) contains all integers from \( 1 \) to \( 2^n - 1 \) except for those that are not of the form \( 2^i(2^{n-i} - 1) \).
79
-
80
- By examining the structure of the set \( C_n \), we can determine that the greatest positive integer that cannot be represented as a sum of elements from \( C_n \) is:
81
- \[ \boxed{2^{n-1}} \]
82
 
83
- This result is based on the fact that the set \( C_n \) contains all integers from \( 1 \) to \( 2^n - 1 \) except for those that are not of the form \( 2^i(2^{n-i} - 1) \), and the largest such number is \( 2^{n-1} \).
84
 
85
- Therefore, the greatest positive integer that cannot be represented as a sum of elements of \( C_n \) is:
86
- \[ \boxed{2^{n-1}} \]
87
  ```
 
2
 
3
  - **Model repo:** [`cs-552-2026-the-transformers/math_model`](https://huggingface.co/cs-552-2026-the-transformers/math_model)
4
  - **Owner(s):** group **the-transformers**
5
+ - **Generated at:** 2026-05-13T23:30:17+00:00 (UTC)
6
  - **Pipeline:** [mnlp-project-ci](https://github.com/eric11eca/mnlp-project-ci)
7
 
8
  _This PR is opened automatically by the course CI. It is **non-blocking** β€” you do not need to merge it. The next nightly run will refresh this file._
 
11
 
12
  | Benchmark | Accuracy | Status |
13
  |---|---:|---|
14
+ | Math | 0.0000 | ok |
15
  | Knowledge | β€” | not run |
16
  | Multilingual | β€” | not run |
17
  | Safety | β€” | not run |
 
22
 
23
  ### Math
24
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
  **Incorrect** (1 shown)
26
 
27
  - **reference**: `$(n-2)2^n +1$`
28
  - **overall** (0/8 completions correct)
29
+ - **extracted** (βœ—): `<no answer>`
30
  - **completion**:
31
 
32
  ```text
33
+ !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
 
35
+ ... [truncated 823 chars] ...
36
 
37
+ !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
 
38
  ```