voidash commited on
Commit
0d62e2c
·
verified ·
1 Parent(s): 840263e

Add v4b step600 baseline300 full eval

Browse files
eval/reports/sft_v4b_step600_baseline300_full_eval/SUMMARY.md ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # SFT v1 Eval Summary — sft_v4b_step600_baseline300_full_eval
2
+
3
+ Date: 2026-05-04 05:22:53
4
+
5
+ Base: `google/gemma-4-E2B-it`
6
+ Adapter: `/home/ubuntu/checkpoints/sft_v4b_e2b_seed42/step600`
7
+
8
+
9
+ ## 1. Full Gold (167 items)
10
+ - grounded n=73 chrF=22.81 url_recall=0.75 wrongly_refused=0 (0.0%)
11
+ - refusal n=91 correct_pct=83.5% hallucinated=15
12
+
13
+ ## 2. LLM-as-judge (DeepSeek)
14
+ - skipped: no data
15
+
16
+ ## 3. Belebele (50 NE MC) — regression check
17
+ - accuracy: 58.0% (29/50)
18
+ - Gemma 4 IT baseline: see `eval/gemma3_nepali_baseline.md`
19
+
20
+ ## 4. GSM8K-en (30) — English replay regression
21
+ - accuracy: 53.3% (16/30)
22
+
23
+ ## 5. Roman-NE qualitative (10 prompts)
24
+ - n_degen: 0/10
25
+ - loops: 0, mojibake: 0, empty: 0
26
+ - target: ≤1 degen (was 3 with base Gemma 4 IT)
27
+
28
+ ## 6. Side-by-side vs baseline
29
+ - skipped: no baseline file
30
+
31
+ ## Demoability call
32
+
33
+ **PASS signals**: 4
34
+
35
+ - ✓ url_recall 0.75 ≥ 0.70
36
+ - ✓ wrongly_refused 0% ≤ 10%
37
+ - ✓ roman_degen 0/10 ≤ 1
38
+ - ✓ belebele 0.58 ≥ 0.55
39
+
40
+ **FAIL signals**: 1
41
+
42
+ - ✗ refusal_correct 84% < 90%
eval/reports/sft_v4b_step600_baseline300_full_eval/belebele.json ADDED
@@ -0,0 +1,310 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "summary": {
3
+ "n": 50,
4
+ "correct": 29,
5
+ "accuracy": 0.58,
6
+ "elapsed_sec": 17.7
7
+ },
8
+ "items": [
9
+ {
10
+ "idx": 654,
11
+ "gold": "B",
12
+ "model": "B",
13
+ "ok": true
14
+ },
15
+ {
16
+ "idx": 114,
17
+ "gold": "B",
18
+ "model": "B",
19
+ "ok": true
20
+ },
21
+ {
22
+ "idx": 25,
23
+ "gold": "B",
24
+ "model": "B",
25
+ "ok": true
26
+ },
27
+ {
28
+ "idx": 759,
29
+ "gold": "C",
30
+ "model": "C",
31
+ "ok": true
32
+ },
33
+ {
34
+ "idx": 281,
35
+ "gold": "C",
36
+ "model": "C",
37
+ "ok": true
38
+ },
39
+ {
40
+ "idx": 250,
41
+ "gold": "D",
42
+ "model": "A",
43
+ "ok": false
44
+ },
45
+ {
46
+ "idx": 228,
47
+ "gold": "D",
48
+ "model": "D",
49
+ "ok": true
50
+ },
51
+ {
52
+ "idx": 142,
53
+ "gold": "A",
54
+ "model": "B",
55
+ "ok": false
56
+ },
57
+ {
58
+ "idx": 754,
59
+ "gold": "A",
60
+ "model": "A",
61
+ "ok": true
62
+ },
63
+ {
64
+ "idx": 104,
65
+ "gold": "A",
66
+ "model": "A",
67
+ "ok": true
68
+ },
69
+ {
70
+ "idx": 692,
71
+ "gold": "B",
72
+ "model": "B",
73
+ "ok": true
74
+ },
75
+ {
76
+ "idx": 758,
77
+ "gold": "D",
78
+ "model": "D",
79
+ "ok": true
80
+ },
81
+ {
82
+ "idx": 558,
83
+ "gold": "B",
84
+ "model": "B",
85
+ "ok": true
86
+ },
87
+ {
88
+ "idx": 89,
89
+ "gold": "A",
90
+ "model": "A",
91
+ "ok": true
92
+ },
93
+ {
94
+ "idx": 604,
95
+ "gold": "C",
96
+ "model": "C",
97
+ "ok": true
98
+ },
99
+ {
100
+ "idx": 432,
101
+ "gold": "C",
102
+ "model": "B",
103
+ "ok": false
104
+ },
105
+ {
106
+ "idx": 32,
107
+ "gold": "C",
108
+ "model": "C",
109
+ "ok": true
110
+ },
111
+ {
112
+ "idx": 30,
113
+ "gold": "B",
114
+ "model": "B",
115
+ "ok": true
116
+ },
117
+ {
118
+ "idx": 95,
119
+ "gold": "A",
120
+ "model": "A",
121
+ "ok": true
122
+ },
123
+ {
124
+ "idx": 223,
125
+ "gold": "C",
126
+ "model": "C",
127
+ "ok": true
128
+ },
129
+ {
130
+ "idx": 238,
131
+ "gold": "B",
132
+ "model": "B",
133
+ "ok": true
134
+ },
135
+ {
136
+ "idx": 517,
137
+ "gold": "C",
138
+ "model": "B",
139
+ "ok": false
140
+ },
141
+ {
142
+ "idx": 616,
143
+ "gold": "C",
144
+ "model": "C",
145
+ "ok": true
146
+ },
147
+ {
148
+ "idx": 27,
149
+ "gold": "A",
150
+ "model": "C",
151
+ "ok": false
152
+ },
153
+ {
154
+ "idx": 574,
155
+ "gold": "A",
156
+ "model": "C",
157
+ "ok": false
158
+ },
159
+ {
160
+ "idx": 203,
161
+ "gold": "B",
162
+ "model": "A",
163
+ "ok": false
164
+ },
165
+ {
166
+ "idx": 733,
167
+ "gold": "A",
168
+ "model": "B",
169
+ "ok": false
170
+ },
171
+ {
172
+ "idx": 665,
173
+ "gold": "D",
174
+ "model": "C",
175
+ "ok": false
176
+ },
177
+ {
178
+ "idx": 718,
179
+ "gold": "B",
180
+ "model": "B",
181
+ "ok": true
182
+ },
183
+ {
184
+ "idx": 429,
185
+ "gold": "C",
186
+ "model": "C",
187
+ "ok": true
188
+ },
189
+ {
190
+ "idx": 225,
191
+ "gold": "C",
192
+ "model": "B",
193
+ "ok": false
194
+ },
195
+ {
196
+ "idx": 459,
197
+ "gold": "A",
198
+ "model": "B",
199
+ "ok": false
200
+ },
201
+ {
202
+ "idx": 603,
203
+ "gold": "B",
204
+ "model": "B",
205
+ "ok": true
206
+ },
207
+ {
208
+ "idx": 284,
209
+ "gold": "D",
210
+ "model": "D",
211
+ "ok": true
212
+ },
213
+ {
214
+ "idx": 828,
215
+ "gold": "C",
216
+ "model": "B",
217
+ "ok": false
218
+ },
219
+ {
220
+ "idx": 890,
221
+ "gold": "D",
222
+ "model": "C",
223
+ "ok": false
224
+ },
225
+ {
226
+ "idx": 6,
227
+ "gold": "C",
228
+ "model": "C",
229
+ "ok": true
230
+ },
231
+ {
232
+ "idx": 777,
233
+ "gold": "C",
234
+ "model": "B",
235
+ "ok": false
236
+ },
237
+ {
238
+ "idx": 825,
239
+ "gold": "A",
240
+ "model": "C",
241
+ "ok": false
242
+ },
243
+ {
244
+ "idx": 163,
245
+ "gold": "B",
246
+ "model": "B",
247
+ "ok": true
248
+ },
249
+ {
250
+ "idx": 714,
251
+ "gold": "C",
252
+ "model": "A",
253
+ "ok": false
254
+ },
255
+ {
256
+ "idx": 348,
257
+ "gold": "C",
258
+ "model": "B",
259
+ "ok": false
260
+ },
261
+ {
262
+ "idx": 159,
263
+ "gold": "C",
264
+ "model": "C",
265
+ "ok": true
266
+ },
267
+ {
268
+ "idx": 220,
269
+ "gold": "D",
270
+ "model": "B",
271
+ "ok": false
272
+ },
273
+ {
274
+ "idx": 781,
275
+ "gold": "B",
276
+ "model": "D",
277
+ "ok": false
278
+ },
279
+ {
280
+ "idx": 344,
281
+ "gold": "C",
282
+ "model": "C",
283
+ "ok": true
284
+ },
285
+ {
286
+ "idx": 94,
287
+ "gold": "B",
288
+ "model": "C",
289
+ "ok": false
290
+ },
291
+ {
292
+ "idx": 389,
293
+ "gold": "C",
294
+ "model": "B",
295
+ "ok": false
296
+ },
297
+ {
298
+ "idx": 99,
299
+ "gold": "B",
300
+ "model": "B",
301
+ "ok": true
302
+ },
303
+ {
304
+ "idx": 367,
305
+ "gold": "D",
306
+ "model": "D",
307
+ "ok": true
308
+ }
309
+ ]
310
+ }
eval/reports/sft_v4b_step600_baseline300_full_eval/full_gold.json ADDED
The diff for this file is too large to render. See raw diff
 
eval/reports/sft_v4b_step600_baseline300_full_eval/gsm8k.json ADDED
@@ -0,0 +1,220 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "summary": {
3
+ "n": 30,
4
+ "correct": 16,
5
+ "accuracy": 0.5333,
6
+ "elapsed_sec": 708.7
7
+ },
8
+ "items": [
9
+ {
10
+ "idx": 1309,
11
+ "gold": "2280",
12
+ "pred": "1880",
13
+ "ok": false,
14
+ "resp": "Here is the step-by-step solution to the problem:\n\n1. **Find Alexandra's total amount raised:**\n Alexandra raises $430.\n\n2. **Find Kim's total amount raised:**\n Kim raises $320 more than Alexandra"
15
+ },
16
+ {
17
+ "idx": 228,
18
+ "gold": "1",
19
+ "pred": "1",
20
+ "ok": true,
21
+ "resp": "Here is the step-by-step solution to the puzzle problem:\n\n1. **Determine the total number of pieces in the puzzle:** The puzzle has 360 pieces.\n2. **Calculate the combined rate at which Kalinda and he"
22
+ },
23
+ {
24
+ "idx": 51,
25
+ "gold": "5",
26
+ "pred": "0",
27
+ "ok": false,
28
+ "resp": "Here is the step-by-step solution to the problem:\n\n1. **Calculate the time Tom takes to travel from 1 PM to 4 PM.**\n - Distance = 4 miles - 1 mile = 3 miles.\n - Time = Distance / Speed = 3 miles /"
29
+ },
30
+ {
31
+ "idx": 563,
32
+ "gold": "12",
33
+ "pred": "3",
34
+ "ok": false,
35
+ "resp": "Here is the step-by-step solution to the problem:\n\n1. **Determine the ages of James' two sons:**\n - The older son is 12 years old.\n - The younger son is 4 years younger than his brother.\n - Youn"
36
+ },
37
+ {
38
+ "idx": 501,
39
+ "gold": "273",
40
+ "pred": "36",
41
+ "ok": false,
42
+ "resp": "To solve this problem, we need to find the total amount of yarn Mariah and her grandma used together.\n\n1. Calculate the amount of yarn Mariah used:\n Mariah used $\\frac{1}{4}$ of a skein.\n\n2. Calcula"
43
+ },
44
+ {
45
+ "idx": 457,
46
+ "gold": "45",
47
+ "pred": "45",
48
+ "ok": true,
49
+ "resp": "Here is the step-by-step solution to the problem:\n\n1. **Initial number of fairies:** Katelyn initially saw 50 fairies.\n2. **Fairies from the east:** Half as many fairies as Katelyn saw came from the e"
50
+ },
51
+ {
52
+ "idx": 285,
53
+ "gold": "21",
54
+ "pred": "21",
55
+ "ok": true,
56
+ "resp": "Here is the step-by-step solution:\n\n1. **Determine Ann's brother's current age:** Ann is 9 years old, and her brother is twice her age. So, Ann's brother is $2 \\times 9 = 18$ years old.\n2. **Determine"
57
+ },
58
+ {
59
+ "idx": 209,
60
+ "gold": "145",
61
+ "pred": "3",
62
+ "ok": false,
63
+ "resp": "Here is the step-by-step solution to the problem:\n\n1. **Calculate the total cost of buying twenty dozen cups:**\n - There are 12 cups in a dozen.\n - Twenty dozen cups is $20 \\times 12 = 240$ cups.\n"
64
+ },
65
+ {
66
+ "idx": 1116,
67
+ "gold": "60",
68
+ "pred": "60",
69
+ "ok": true,
70
+ "resp": "To solve this problem, we need to calculate a 20% increase in the number of students enrolled in the calligraphy class.\n\nHere is the step-by-step reasoning:\n\n1. **Identify the initial number of studen"
71
+ },
72
+ {
73
+ "idx": 178,
74
+ "gold": "122",
75
+ "pred": "122",
76
+ "ok": true,
77
+ "resp": "Here is the step-by-step solution to the problem:\n\n1. **Find the number of crabs Monic has:**\n Monic has 4 fewer crabs than Bo.\n Number of crabs Monic has = Number of crabs Bo has - 4\n Number of"
78
+ },
79
+ {
80
+ "idx": 1209,
81
+ "gold": "29",
82
+ "pred": "29",
83
+ "ok": true,
84
+ "resp": "Here is the step-by-step solution to the problem:\n\n1. **Determine the water removed by each setting per day:**\n - Low setting: 1 liter/day\n - Medium setting: 2 liters/day (twice the low setting)\n "
85
+ },
86
+ {
87
+ "idx": 864,
88
+ "gold": "80",
89
+ "pred": "580",
90
+ "ok": false,
91
+ "resp": "Here is the step-by-step solution to the problem:\n\n1. **Calculate the total number of burritos ordered:**\n George ordered 600 burritos.\n\n2. **Calculate the total number of burritos eaten by Mr. Geor"
92
+ },
93
+ {
94
+ "idx": 65,
95
+ "gold": "36",
96
+ "pred": "7",
97
+ "ok": false,
98
+ "resp": "Here is the step-by-step solution to the problem:\n\n1. **Calculate the total time Jim spends watching TV in one week:**\n Jim watches TV for 2 hours per day.\n Total time watching TV in one week: $2 "
99
+ },
100
+ {
101
+ "idx": 61,
102
+ "gold": "1430",
103
+ "pred": "1430",
104
+ "ok": true,
105
+ "resp": "Here is the step-by-step solution to the problem:\n\n1. **Calculate the total cost of the brooch:**\n Cost of material + Cost of construction = Total cost\n $\\$500 + \\$800 = \\$1300$\n\n2. **Calculate th"
106
+ },
107
+ {
108
+ "idx": 191,
109
+ "gold": "5",
110
+ "pred": "3",
111
+ "ok": false,
112
+ "resp": "To solve this problem, we need to determine the total weight of insects the ducks need per week and then divide that by the number of ducks to find the amount of insects they need per day.\n\nHere is th"
113
+ },
114
+ {
115
+ "idx": 447,
116
+ "gold": "5",
117
+ "pred": "5",
118
+ "ok": true,
119
+ "resp": "Here is the step-by-step solution to the problem:\n\n1. **Joey's new score:** Joey started with 214 points and scored 26 points.\n Joey's new total is $214 + 26 = 240$ points.\n\n2. **Marcy's new score:*"
120
+ },
121
+ {
122
+ "idx": 476,
123
+ "gold": "5",
124
+ "pred": "5",
125
+ "ok": true,
126
+ "resp": "Here is the step-by-step solution to the problem:\n\n1. **Calculate the total number of books Sarah bought:**\n Divide the total amount she spent by the cost of one book.\n $$\\text{Total books} = \\fra"
127
+ },
128
+ {
129
+ "idx": 1034,
130
+ "gold": "66",
131
+ "pred": "66",
132
+ "ok": true,
133
+ "resp": "To solve this problem, we need to calculate the total amount Ted spent on books and pencils.\n\n1. Calculate the total cost of the 3 books:\n Cost of 3 books = 3 books $\\times$ \\$16/book\n\n2. Calculate "
134
+ },
135
+ {
136
+ "idx": 1232,
137
+ "gold": "15",
138
+ "pred": "4",
139
+ "ok": false,
140
+ "resp": "Let $P$ be the original number of pomelos Eve had.\nLet $G$ be the number of pomelos Eve gave to her friend.\n\nThe problem states that after giving her friend some pomelos, Eve is left with $\\frac{1}{4}"
141
+ },
142
+ {
143
+ "idx": 54,
144
+ "gold": "40",
145
+ "pred": "12",
146
+ "ok": false,
147
+ "resp": "Here is the step-by-step solution to the problem:\n\n1. **Identify the total number of kittens:** The Doubtfire sisters have 7 kittens in total.\n2. **Determine the number of kittens Patchy has:** Patchy"
148
+ },
149
+ {
150
+ "idx": 1149,
151
+ "gold": "93",
152
+ "pred": "93",
153
+ "ok": true,
154
+ "resp": "Here is the step-by-step solution:\n\n1. **Determine the total number of fish Jen has:** Jen has 3 fish.\n2. **Calculate the total amount of food needed:** Each fish needs $1 worth of food per day. So, $"
155
+ },
156
+ {
157
+ "idx": 407,
158
+ "gold": "2000",
159
+ "pred": "1000",
160
+ "ok": false,
161
+ "resp": "Here is the step-by-step solution to the problem:\n\n1. **Elliott's total planned steps:** 10,000 steps.\n2. **Steps taken to and from school:** Half of his total planned steps.\n3. **Steps taken during h"
162
+ },
163
+ {
164
+ "idx": 859,
165
+ "gold": "1520",
166
+ "pred": "1520",
167
+ "ok": true,
168
+ "resp": "Here is the step-by-step solution to the problem:\n\n1. **Calculate Rafaela's total tip:** Rafaela received a \\$20 tip from each of the forty customers.\n Rafaela's total tip = $20 \\times 40 = \\$800$.\n"
169
+ },
170
+ {
171
+ "idx": 451,
172
+ "gold": "11050",
173
+ "pred": "113",
174
+ "ok": false,
175
+ "resp": "Here is the step-by-step solution to the problem:\n\n1. **Define variables:**\n - Let $C_1$ be the cost of filling one balloon on the first day.\n - Let $C_2$ be the cost of filling one balloon on the"
176
+ },
177
+ {
178
+ "idx": 919,
179
+ "gold": "90",
180
+ "pred": "90",
181
+ "ok": true,
182
+ "resp": "Here is the step-by-step solution to the problem:\n\n1. **Calculate the total number of lollipops sold by the class:**\n Number of students $\\times$ Lollipops sold per student\n $30 \\text{ students} \\"
183
+ },
184
+ {
185
+ "idx": 1206,
186
+ "gold": "40000",
187
+ "pred": "40000",
188
+ "ok": true,
189
+ "resp": "Here is the step-by-step solution to the problem:\n\n1. **Calculate the amount of the loan John receives:**\n The loan is 40% of the house's value.\n Loan Amount = $250,000 \\times 0.40 = $100,000\n\n2. "
190
+ },
191
+ {
192
+ "idx": 569,
193
+ "gold": "21",
194
+ "pred": "21",
195
+ "ok": true,
196
+ "resp": "Here is the step-by-step solution to the problem:\n\n1. **Determine Liam's total earnings:** Liam made $63 one weekend.\n2. **Determine Mitchell's total earnings:** Mitchell sold 21 lemonades at $4 each."
197
+ },
198
+ {
199
+ "idx": 13,
200
+ "gold": "18",
201
+ "pred": "5",
202
+ "ok": false,
203
+ "resp": "Let $W$ be the initial number of vacuum cleaners Melanie had.\n\n1. She sold a third of her vacuum cleaners at the green house.\n Number sold at the green house: $\\frac{1}{3}W$\n\n2. She sold 2 more to t"
204
+ },
205
+ {
206
+ "idx": 326,
207
+ "gold": "14",
208
+ "pred": "14",
209
+ "ok": true,
210
+ "resp": "Let $B$ be the initial number of people on the bus at the terminal.\n\nAt the first bus stop, 5 more people got on the bus. The number of people on the bus is now $B + 5$.\n\nAt the second bus stop, 7 peo"
211
+ },
212
+ {
213
+ "idx": 865,
214
+ "gold": "23",
215
+ "pred": "33",
216
+ "ok": false,
217
+ "resp": "Let $R$ be the number of red cars on the rink.\nLet $G$ be the number of green cars on the rink.\nLet $B$ be the number of blue cars on the rink.\nLet $Y$ be the number of yellow cars on the rink.\n\nFrom "
218
+ }
219
+ ]
220
+ }
eval/reports/sft_v4b_step600_baseline300_full_eval/llm_judge.json ADDED
@@ -0,0 +1,208 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "summary": {
3
+ "n": 0,
4
+ "n_errors": 50
5
+ },
6
+ "results": [
7
+ {
8
+ "id": "u_grnd_057",
9
+ "error": "RuntimeError: HTTP 402: {\"error\":{\"message\":\"Insufficient Balance\",\"type\":\"unknown_error\",\"param\":null,\"code\":\"invalid_request_error\"}}"
10
+ },
11
+ {
12
+ "id": "u_grnd_023",
13
+ "error": "RuntimeError: HTTP 402: {\"error\":{\"message\":\"Insufficient Balance\",\"type\":\"unknown_error\",\"param\":null,\"code\":\"invalid_request_error\"}}"
14
+ },
15
+ {
16
+ "id": "u_grnd_061",
17
+ "error": "RuntimeError: HTTP 402: {\"error\":{\"message\":\"Insufficient Balance\",\"type\":\"unknown_error\",\"param\":null,\"code\":\"invalid_request_error\"}}"
18
+ },
19
+ {
20
+ "id": "u_grnd_034",
21
+ "error": "RuntimeError: HTTP 402: {\"error\":{\"message\":\"Insufficient Balance\",\"type\":\"unknown_error\",\"param\":null,\"code\":\"invalid_request_error\"}}"
22
+ },
23
+ {
24
+ "id": "u_grnd_009",
25
+ "error": "RuntimeError: HTTP 402: {\"error\":{\"message\":\"Insufficient Balance\",\"type\":\"unknown_error\",\"param\":null,\"code\":\"invalid_request_error\"}}"
26
+ },
27
+ {
28
+ "id": "u_grnd_035",
29
+ "error": "RuntimeError: HTTP 402: {\"error\":{\"message\":\"Insufficient Balance\",\"type\":\"unknown_error\",\"param\":null,\"code\":\"invalid_request_error\"}}"
30
+ },
31
+ {
32
+ "id": "u_grnd_019",
33
+ "error": "RuntimeError: HTTP 402: {\"error\":{\"message\":\"Insufficient Balance\",\"type\":\"unknown_error\",\"param\":null,\"code\":\"invalid_request_error\"}}"
34
+ },
35
+ {
36
+ "id": "u_grnd_025",
37
+ "error": "RuntimeError: HTTP 402: {\"error\":{\"message\":\"Insufficient Balance\",\"type\":\"unknown_error\",\"param\":null,\"code\":\"invalid_request_error\"}}"
38
+ },
39
+ {
40
+ "id": "u_grnd_053",
41
+ "error": "RuntimeError: HTTP 402: {\"error\":{\"message\":\"Insufficient Balance\",\"type\":\"unknown_error\",\"param\":null,\"code\":\"invalid_request_error\"}}"
42
+ },
43
+ {
44
+ "id": "u_grnd_065",
45
+ "error": "RuntimeError: HTTP 402: {\"error\":{\"message\":\"Insufficient Balance\",\"type\":\"unknown_error\",\"param\":null,\"code\":\"invalid_request_error\"}}"
46
+ },
47
+ {
48
+ "id": "u_grnd_018",
49
+ "error": "RuntimeError: HTTP 402: {\"error\":{\"message\":\"Insufficient Balance\",\"type\":\"unknown_error\",\"param\":null,\"code\":\"invalid_request_error\"}}"
50
+ },
51
+ {
52
+ "id": "u_grnd_083",
53
+ "error": "RuntimeError: HTTP 402: {\"error\":{\"message\":\"Insufficient Balance\",\"type\":\"unknown_error\",\"param\":null,\"code\":\"invalid_request_error\"}}"
54
+ },
55
+ {
56
+ "id": "u_grnd_094",
57
+ "error": "RuntimeError: HTTP 402: {\"error\":{\"message\":\"Insufficient Balance\",\"type\":\"unknown_error\",\"param\":null,\"code\":\"invalid_request_error\"}}"
58
+ },
59
+ {
60
+ "id": "u_grnd_055",
61
+ "error": "RuntimeError: HTTP 402: {\"error\":{\"message\":\"Insufficient Balance\",\"type\":\"unknown_error\",\"param\":null,\"code\":\"invalid_request_error\"}}"
62
+ },
63
+ {
64
+ "id": "u_grnd_049",
65
+ "error": "RuntimeError: HTTP 402: {\"error\":{\"message\":\"Insufficient Balance\",\"type\":\"unknown_error\",\"param\":null,\"code\":\"invalid_request_error\"}}"
66
+ },
67
+ {
68
+ "id": "u_grnd_048",
69
+ "error": "RuntimeError: HTTP 402: {\"error\":{\"message\":\"Insufficient Balance\",\"type\":\"unknown_error\",\"param\":null,\"code\":\"invalid_request_error\"}}"
70
+ },
71
+ {
72
+ "id": "u_grnd_052",
73
+ "error": "RuntimeError: HTTP 402: {\"error\":{\"message\":\"Insufficient Balance\",\"type\":\"unknown_error\",\"param\":null,\"code\":\"invalid_request_error\"}}"
74
+ },
75
+ {
76
+ "id": "u_grnd_038",
77
+ "error": "RuntimeError: HTTP 402: {\"error\":{\"message\":\"Insufficient Balance\",\"type\":\"unknown_error\",\"param\":null,\"code\":\"invalid_request_error\"}}"
78
+ },
79
+ {
80
+ "id": "u_grnd_086",
81
+ "error": "RuntimeError: HTTP 402: {\"error\":{\"message\":\"Insufficient Balance\",\"type\":\"unknown_error\",\"param\":null,\"code\":\"invalid_request_error\"}}"
82
+ },
83
+ {
84
+ "id": "u_grnd_045",
85
+ "error": "RuntimeError: HTTP 402: {\"error\":{\"message\":\"Insufficient Balance\",\"type\":\"unknown_error\",\"param\":null,\"code\":\"invalid_request_error\"}}"
86
+ },
87
+ {
88
+ "id": "u_grnd_056",
89
+ "error": "RuntimeError: HTTP 402: {\"error\":{\"message\":\"Insufficient Balance\",\"type\":\"unknown_error\",\"param\":null,\"code\":\"invalid_request_error\"}}"
90
+ },
91
+ {
92
+ "id": "u_grnd_004",
93
+ "error": "RuntimeError: HTTP 402: {\"error\":{\"message\":\"Insufficient Balance\",\"type\":\"unknown_error\",\"param\":null,\"code\":\"invalid_request_error\"}}"
94
+ },
95
+ {
96
+ "id": "u_grnd_084",
97
+ "error": "RuntimeError: HTTP 402: {\"error\":{\"message\":\"Insufficient Balance\",\"type\":\"unknown_error\",\"param\":null,\"code\":\"invalid_request_error\"}}"
98
+ },
99
+ {
100
+ "id": "u_grnd_098",
101
+ "error": "RuntimeError: HTTP 402: {\"error\":{\"message\":\"Insufficient Balance\",\"type\":\"unknown_error\",\"param\":null,\"code\":\"invalid_request_error\"}}"
102
+ },
103
+ {
104
+ "id": "u_grnd_046",
105
+ "error": "RuntimeError: HTTP 402: {\"error\":{\"message\":\"Insufficient Balance\",\"type\":\"unknown_error\",\"param\":null,\"code\":\"invalid_request_error\"}}"
106
+ },
107
+ {
108
+ "id": "u_grnd_054",
109
+ "error": "RuntimeError: HTTP 402: {\"error\":{\"message\":\"Insufficient Balance\",\"type\":\"unknown_error\",\"param\":null,\"code\":\"invalid_request_error\"}}"
110
+ },
111
+ {
112
+ "id": "u_grnd_028",
113
+ "error": "RuntimeError: HTTP 402: {\"error\":{\"message\":\"Insufficient Balance\",\"type\":\"unknown_error\",\"param\":null,\"code\":\"invalid_request_error\"}}"
114
+ },
115
+ {
116
+ "id": "u_grnd_062",
117
+ "error": "RuntimeError: HTTP 402: {\"error\":{\"message\":\"Insufficient Balance\",\"type\":\"unknown_error\",\"param\":null,\"code\":\"invalid_request_error\"}}"
118
+ },
119
+ {
120
+ "id": "u_grnd_030",
121
+ "error": "RuntimeError: HTTP 402: {\"error\":{\"message\":\"Insufficient Balance\",\"type\":\"unknown_error\",\"param\":null,\"code\":\"invalid_request_error\"}}"
122
+ },
123
+ {
124
+ "id": "u_grnd_010",
125
+ "error": "RuntimeError: HTTP 402: {\"error\":{\"message\":\"Insufficient Balance\",\"type\":\"unknown_error\",\"param\":null,\"code\":\"invalid_request_error\"}}"
126
+ },
127
+ {
128
+ "id": "u_grnd_024",
129
+ "error": "RuntimeError: HTTP 402: {\"error\":{\"message\":\"Insufficient Balance\",\"type\":\"unknown_error\",\"param\":null,\"code\":\"invalid_request_error\"}}"
130
+ },
131
+ {
132
+ "id": "u_grnd_067",
133
+ "error": "RuntimeError: HTTP 402: {\"error\":{\"message\":\"Insufficient Balance\",\"type\":\"unknown_error\",\"param\":null,\"code\":\"invalid_request_error\"}}"
134
+ },
135
+ {
136
+ "id": "u_grnd_027",
137
+ "error": "RuntimeError: HTTP 402: {\"error\":{\"message\":\"Insufficient Balance\",\"type\":\"unknown_error\",\"param\":null,\"code\":\"invalid_request_error\"}}"
138
+ },
139
+ {
140
+ "id": "u_grnd_041",
141
+ "error": "RuntimeError: HTTP 402: {\"error\":{\"message\":\"Insufficient Balance\",\"type\":\"unknown_error\",\"param\":null,\"code\":\"invalid_request_error\"}}"
142
+ },
143
+ {
144
+ "id": "u_grnd_029",
145
+ "error": "RuntimeError: HTTP 402: {\"error\":{\"message\":\"Insufficient Balance\",\"type\":\"unknown_error\",\"param\":null,\"code\":\"invalid_request_error\"}}"
146
+ },
147
+ {
148
+ "id": "u_grnd_071",
149
+ "error": "RuntimeError: HTTP 402: {\"error\":{\"message\":\"Insufficient Balance\",\"type\":\"unknown_error\",\"param\":null,\"code\":\"invalid_request_error\"}}"
150
+ },
151
+ {
152
+ "id": "u_grnd_008",
153
+ "error": "RuntimeError: HTTP 402: {\"error\":{\"message\":\"Insufficient Balance\",\"type\":\"unknown_error\",\"param\":null,\"code\":\"invalid_request_error\"}}"
154
+ },
155
+ {
156
+ "id": "u_grnd_058",
157
+ "error": "RuntimeError: HTTP 402: {\"error\":{\"message\":\"Insufficient Balance\",\"type\":\"unknown_error\",\"param\":null,\"code\":\"invalid_request_error\"}}"
158
+ },
159
+ {
160
+ "id": "u_grnd_070",
161
+ "error": "RuntimeError: HTTP 402: {\"error\":{\"message\":\"Insufficient Balance\",\"type\":\"unknown_error\",\"param\":null,\"code\":\"invalid_request_error\"}}"
162
+ },
163
+ {
164
+ "id": "u_grnd_011",
165
+ "error": "RuntimeError: HTTP 402: {\"error\":{\"message\":\"Insufficient Balance\",\"type\":\"unknown_error\",\"param\":null,\"code\":\"invalid_request_error\"}}"
166
+ },
167
+ {
168
+ "id": "u_grnd_059",
169
+ "error": "RuntimeError: HTTP 402: {\"error\":{\"message\":\"Insufficient Balance\",\"type\":\"unknown_error\",\"param\":null,\"code\":\"invalid_request_error\"}}"
170
+ },
171
+ {
172
+ "id": "u_grnd_026",
173
+ "error": "RuntimeError: HTTP 402: {\"error\":{\"message\":\"Insufficient Balance\",\"type\":\"unknown_error\",\"param\":null,\"code\":\"invalid_request_error\"}}"
174
+ },
175
+ {
176
+ "id": "u_grnd_032",
177
+ "error": "RuntimeError: HTTP 402: {\"error\":{\"message\":\"Insufficient Balance\",\"type\":\"unknown_error\",\"param\":null,\"code\":\"invalid_request_error\"}}"
178
+ },
179
+ {
180
+ "id": "u_grnd_012",
181
+ "error": "RuntimeError: HTTP 402: {\"error\":{\"message\":\"Insufficient Balance\",\"type\":\"unknown_error\",\"param\":null,\"code\":\"invalid_request_error\"}}"
182
+ },
183
+ {
184
+ "id": "u_grnd_001",
185
+ "error": "RuntimeError: HTTP 402: {\"error\":{\"message\":\"Insufficient Balance\",\"type\":\"unknown_error\",\"param\":null,\"code\":\"invalid_request_error\"}}"
186
+ },
187
+ {
188
+ "id": "u_grnd_091",
189
+ "error": "RuntimeError: HTTP 402: {\"error\":{\"message\":\"Insufficient Balance\",\"type\":\"unknown_error\",\"param\":null,\"code\":\"invalid_request_error\"}}"
190
+ },
191
+ {
192
+ "id": "u_grnd_042",
193
+ "error": "RuntimeError: HTTP 402: {\"error\":{\"message\":\"Insufficient Balance\",\"type\":\"unknown_error\",\"param\":null,\"code\":\"invalid_request_error\"}}"
194
+ },
195
+ {
196
+ "id": "u_grnd_092",
197
+ "error": "RuntimeError: HTTP 402: {\"error\":{\"message\":\"Insufficient Balance\",\"type\":\"unknown_error\",\"param\":null,\"code\":\"invalid_request_error\"}}"
198
+ },
199
+ {
200
+ "id": "u_grnd_069",
201
+ "error": "RuntimeError: HTTP 402: {\"error\":{\"message\":\"Insufficient Balance\",\"type\":\"unknown_error\",\"param\":null,\"code\":\"invalid_request_error\"}}"
202
+ },
203
+ {
204
+ "id": "u_grnd_031",
205
+ "error": "RuntimeError: HTTP 402: {\"error\":{\"message\":\"Insufficient Balance\",\"type\":\"unknown_error\",\"param\":null,\"code\":\"invalid_request_error\"}}"
206
+ }
207
+ ]
208
+ }
eval/reports/sft_v4b_step600_baseline300_full_eval/roman_ne.json ADDED
@@ -0,0 +1,82 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "summary": {
3
+ "n": 10,
4
+ "n_degen": 0,
5
+ "n_loop": 0,
6
+ "n_mojibake": 0,
7
+ "n_empty": 0,
8
+ "elapsed_sec": 32.3
9
+ },
10
+ "items": [
11
+ {
12
+ "q": "mero nagarikta banauna ko lagi kun office janu parcha?",
13
+ "a": "Yo prashnako adhikarik srot bhetina. Nagarikta sambandhi thap jankari ko lagi Hello Sarkar 1111 ma sampark garnuhos.",
14
+ "loop": false,
15
+ "mojibake": false,
16
+ "empty": false
17
+ },
18
+ {
19
+ "q": "passport renew garna kaha janu parcha?",
20
+ "a": "Yo prashnako adhikarik srot bhetina. Passport renewal ko lagi Department of Passports ko website ma jane, athawa passport office ma sampark garnu hola.",
21
+ "loop": false,
22
+ "mojibake": false,
23
+ "empty": false
24
+ },
25
+ {
26
+ "q": "company registration kasari garne?",
27
+ "a": "Yo prashnako adhikarik srot bhetina. Company registration ko lagi Nepal ko Office of the Company Registrar ma sampark garnuhos.",
28
+ "loop": false,
29
+ "mojibake": false,
30
+ "empty": false
31
+ },
32
+ {
33
+ "q": "PAN number kasari banaune?",
34
+ "a": "Yo prashnako adhikarik srot bhetina. PAN number banaune lagi aafno tax office ma sampark garnuhos.",
35
+ "loop": false,
36
+ "mojibake": false,
37
+ "empty": false
38
+ },
39
+ {
40
+ "q": "driving license ko lagi k k chaine?",
41
+ "a": "Yo prashnako adhikarik srot bhetina. Driving license ko lagi Department of Transport Management, City Traffic Management, and Parking Services ma sampark garnuhos.",
42
+ "loop": false,
43
+ "mojibake": false,
44
+ "empty": false
45
+ },
46
+ {
47
+ "q": "VAT ra PAN ma k farak cha?",
48
+ "a": "Yo prashnako adhikarik srot bhetina. VAT ra PAN ko barema thap jankari ko lagi aantarik kar karyalaya ma sampark garnuhos.",
49
+ "loop": false,
50
+ "mojibake": false,
51
+ "empty": false
52
+ },
53
+ {
54
+ "q": "nagarikta certificate hareyo, kaha janu parcha?",
55
+ "a": "Yo prashnako adhikarik srot bhetina. Nagarikta certificate harayo bhane, tapaile nagarikta karyalaya ma sampark garnu parcha.",
56
+ "loop": false,
57
+ "mojibake": false,
58
+ "empty": false
59
+ },
60
+ {
61
+ "q": "jagga ko malpot kaha tirne?",
62
+ "a": "Yo prashnako adhikarik srot bhetina. Malpot karyalaya ma sampark garnuhos.",
63
+ "loop": false,
64
+ "mojibake": false,
65
+ "empty": false
66
+ },
67
+ {
68
+ "q": "bachhako janmadarta kasari garne?",
69
+ "a": "Yo prashnako adhikarik srot bhetina. Janmadarta sambandhi thap jankari ko lagi local ward office ma sampark garnuhos.",
70
+ "loop": false,
71
+ "mojibake": false,
72
+ "empty": false
73
+ },
74
+ {
75
+ "q": "online tax file kasari garne?",
76
+ "a": "Yo prashnako adhikarik srot bhetina. Online tax filing ko lagi Inland Revenue Department ko website ma jane ma saknu huncha.",
77
+ "loop": false,
78
+ "mojibake": false,
79
+ "empty": false
80
+ }
81
+ ]
82
+ }
eval/reports/sft_v4b_step600_baseline300_full_eval/side_by_side.json ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "sft_results": [
3
+ {
4
+ "id": "u_grnd_057",
5
+ "model_output": "राष्ट्रिय कार्यमञ्चको विधान, २०६५ बमोजिम यसले आफ्ना उद्देश्यहरू कार्यान्वयन गर्न प्रचलित नेपाल कानुनबमोजिम सम्बन्धित निकायबाट अनुमति वा सोको प्रतिलिपि प्राप्त गरेर मात्र आफ्ना उद्देश्यहरू कार्यान्वयन गर्ने [https://moha.gov.np/upload/e66443e81e8cc9c4fa5c099a1fb1bb87/files/National_20Platform_20on_20DRR_20Bidhan_202065_52_20110916032012_1_(1).pdf]। यसले आफ्ना उद्देश्यहरू कार्यान्वयन गर्न आवश्यक पर्ने भवन निर्माण गर्ने वा भाडामा वा लिजमा लिने [https://moha.gov.np/upload/e66443e81e8cc9c4fa5c099a1fb1bb87/files/National_20Platform_20on_20DRR_20Bidhan_202065_52_20110916032012_1_(1).pdf]। यसले आवश्यक पर्ने कर्मचारी नियुक्त गर्ने तथा तिनको सेवाका सर्तहरू निर्धारण गर्ने [https://moha.gov.np/upload/e664"
6
+ },
7
+ {
8
+ "id": "u_grnd_023",
9
+ "model_output": "जिरी नगरपालिकामा अहिले ७ वटा माध्यमिक विद्यालय सञ्चालनमा रहेका छन् [https://jirimun.gov.np/sites/jirimun.gov.np/files/documents/%E0%A4%A8%E0%A4%97%E0%A4%B0%20%E0%A4%AA%E0%A5%8D%E0%A4%B0%E0%A5%8B%E0%A4%AB%E0%A4%BE%E0%A4%88%E0%A4%B2%20%E0%A5%A8%E0%A5%A6%E0%A5%AE%E0%A5%A6.pdf]। तीमध्ये ७ वटा सार्वजनिक र ३ वटा संस्थागत हुन् [https://jirimun.gov.np/sites/jirimun.gov.np/files/documents/%E0%A4%A8%E0%A4%97%E0%A4%B0%20%E0%A4%AA%E0%A5%8D%E0%A4%B0%E0%A5%8B%E0%A4%AB%E0%A4%"
10
+ },
11
+ {
12
+ "id": "u_grnd_061",
13
+ "model_output": "Pestivirus virus group of Flaviviridae family is found in pigs, hogs, and wild boars [https://dls.gov.np/progressfiles/TADS-Manual-Nepali-Final-Revised-2-Sept-2024-1760434631.pdf]. It can be transmitted from pig to pig through direct contact, contaminated feed or water, or through the respiratory tract [https://dls.gov.np/progressfiles/TADS-Manual-Nepali-Final-Revised-2-Sept-2024-1760434631.pdf]. The virus can survive in the pig's intestines for up to 27 days [https://dls.gov.np/progressfiles/TADS-Manual-Nepali-Final-Revised-2-Sept-2024-1760434631.pdf].\nThe virus can be transmitted from pig to pig through contact with infected feces or milk [https://dls.gov.np/progressfiles/TADS-Manual-Nepali-Final-Revised-2-Sept-2024-1760434631.pdf].\nThe virus can be transmitted from pig to pig through contact with infected meat [https://dls.gov.np/progressfiles/TADS-Manual-Nepali-Final-"
14
+ },
15
+ {
16
+ "id": "u_grnd_034",
17
+ "model_output": "Registration number 13906 ma darta SAROJ KUMAR YADAV le 537.97 quintal ukhu supply gareko thiyo [https://doanepal.gov.np/downloadfile/Sugarcane Book_2082_1752400803.pdf]. Unle 37657.90 rupiya incentive paaune ho [https://doanepal.gov.np/downloadfile/Sugarcane Book_2082_13896_13906.pdf]."
18
+ },
19
+ {
20
+ "id": "u_grnd_009",
21
+ "model_output": "भैँसीले पाडो फाल्यो भन्दैमा ब्रुसेलोसिस भनी रिपोर्ट गर्ने मिल्दैन जबसम्म प्रयोगशाला प्रविधिबाट रोग निदान गरिदैन [https://www.moha.gov.np/downloadsfiles/Manual-for-Epidemiological-Reporting-and-Animal-Disease-Recognition-1770710135.pdf]। रोग निदान कार्यमा सजिलो होस् / सबैले अपनाउने रोग निदानका आधारहरूमा एकरुपता आओस भनेर रोग विवरण तथा रोग निदान चार्ट यस पुस्तिकामा समाबेश गरीएको छ [https://www.moha.gov.np/downloadsfiles/Manual-for-Epidemiological-Reporting-and-Animal-Disease-Recognition-1770710135.pdf]।"
22
+ },
23
+ {
24
+ "id": "u_grnd_035",
25
+ "model_output": "Rautahat Akar Pather (112.76 quintal) supply gareko cha [https://doanepal.gov.np/downloadfile/Sugarcane Book_2082_1752400803.pdf]. Uslai 810 Sugar Mill bata NPR 18916.10 incentive mileko cha [https://doanepal.gov.np/downloadfile/Sugarcane Book_2082_1752400803.pdf]."
26
+ },
27
+ {
28
+ "id": "u_grnd_019",
29
+ "model_output": "क्यासिनोमा खेल्न आउने खेलाडीले अधिकतम पाँच हजार (५,०००) अमेरिकी डलर नगद विदेशी मुद्���ा ल्याउन सक्नेछ [https://www.nrb.org.np/fxm/%e0%a5%a7%e0%a5%a7-%e0%a5%a8%e0%a5%a6%e0%a5%ae%e0%a5%a8-%e0%a5%ae%e0%a5%a9-%e0%a4%8f%e0%a4%95%e0%a5%80%e0%a4%95%e0%a5%83%e0%a4%a4-%e0%a4%aa%e0%a4%b0%e0%a4%bf%e0%a4%aa%e0%a4%a4%e0%a5%8d/]। सोभन्दा बढी रकमको हकमा भन्सार घोषणा गरी प्रमाणित गरेको हुनुपर्नेछ वा बैंकिङ्ग प्रणालीमार्फत् रकम प्राप्त गरेको हुनुपर्नेछ [https://www.nrb.org.np/fxm/%e0%a5%a7%e0%a5%a7-%e0%a5%"
30
+ },
31
+ {
32
+ "id": "u_grnd_025",
33
+ "model_output": "नेपालमा विदेशी नागरिकको प्रवेश, बसाइ र प्रस्थानको व्यवस्थापन, नियमन, नियन्त्रण तथा अभिलेखीकरण गर्ने जिम्मेवारी अध्यागमन विभागको हो [https://moha.gov.np/page/main-functions]।"
34
+ },
35
+ {
36
+ "id": "u_grnd_053",
37
+ "model_output": "MAHABIR MAHATO (Mahabir Mahato ko chhora) lai ESCI LTD le unkako ukhu ko lagi 140.18 rupaiya diyeko chha [https://doanepal.gov.np/downloadfile/Sugarcane Book_2082_1752400803.pdf].\n[1] https://doanepal.gov.np/downloadfile/Sugarcane Book_2082_1752400803.pdf"
38
+ },
39
+ {
40
+ "id": "u_grnd_065",
41
+ "model_output": "स्थानीय सरकार सञ्चालन ऐन, २०७४ को दफा ७१ अनुसार स्थानीय सरकारले आगामी आर्थिक वर्षको राजस्व र व्ययको अनुमान कार्यक्रम तथा बजेट प्रस्ताव नगरपालिकाबाट स्वीकृति गराई असार १० गतेसम्म पेश गर्नुपर्छ [https://jirimun.gov.np/sites/jirimun.gov.np/files/documents/mSsj01630472169%E0%A4%9C%E0%A4%BF%E0%A4%B0%E0%A5%80%20%E0%A4%A8%E0%A4%97%E0%A4%B0%E0%A4%AA%E0%A4%BE%E0%A4%B2%E0%A4%BF%E0%A4%95%E0%A4%BE%2C%20%E0%A5%A8%E0%A5%A6%E0%A5%AD%E0%A5%AE.pdf]। बजेट सभामा पेश भएको मितिले २०७६।३।११ मा पारित भएको हो [https://jirimun.gov.np/sites/jirimun.gov.np/files/documents/mSsj01"
42
+ }
43
+ ],
44
+ "pairwise": []
45
+ }
eval/reports/sft_v4b_step600_baseline300_full_eval/side_by_side.md ADDED
@@ -0,0 +1,253 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Side-by-side: SFT vs sonnet-4-6-baseline
2
+
3
+
4
+ ---
5
+
6
+ ## 1. u_grnd_057 (`other` / `devanagari`)
7
+
8
+ **Question**: विपद् जोखिम न्यूनीकरण राष्ट्रिय कार्यमञ्चले आफ्ना उद्देश्यहरू कार्यान्वयन गर्न के–के गर्न पाउँछ र त्यसका लागि कुनै अनुमति लिनुपर्छ?
9
+
10
+
11
+ ### Gold
12
+ ```
13
+ The excerpt states the platform must first obtain permission or approval from the relevant authority under prevailing Nepal law before implementing its objectives, and may also construct or lease premises, purchase or hire office equipment and vehicles, and appoint staff with defined service conditions.
14
+ ```
15
+
16
+
17
+ ### Baseline (sonnet-4-6-baseline)
18
+ ```
19
+
20
+ ```
21
+
22
+
23
+ ### SFT
24
+ ```
25
+ राष्ट्रिय कार्यमञ्चको विधान, २०६५ बमोजिम यसले आफ्ना उद्देश्यहरू कार्यान्वयन गर्न प्रचलित नेपाल कानुनबमोजिम सम्बन्धित निकायबाट अनुमति वा सोको प्रतिलिपि प्राप्त गरेर मात्र आफ्ना उद्देश्यहरू कार्यान्वयन गर्ने [https://moha.gov.np/upload/e66443e81e8cc9c4fa5c099a1fb1bb87/files/National_20Platform_20on_20DRR_20Bidhan_202065_52_20110916032012_1_(1).pdf]। यसले आफ्ना उद्देश्यहरू कार्यान्वयन गर्न आवश्यक पर्ने भवन निर्माण गर्ने वा भाडामा वा लिजमा लिने [https://moha.gov.np/upload/e66443e81e8cc9c4fa5c099a1fb1bb87/files/National_20Platform_20on_20DRR_20Bidhan_202065_52_20110916032012_1_(1).pdf]। यसले आवश्यक
26
+ ```
27
+
28
+
29
+ ---
30
+
31
+ ## 2. u_grnd_023 (`education` / `devanagari`)
32
+
33
+ **Question**: जिरी नगरपालिकामा अहिले कति माध्यमिक विद्यालय सञ्चालनमा छन् र ती मध्ये कति सार्वजनिक र कति संस्थागत हुन्?
34
+
35
+
36
+ ### Gold
37
+ ```
38
+ The excerpt states that Jiri Municipality currently has 10 secondary schools in operation — 7 public and 3 institutional — along with 1 technical school and 2 bachelor-level colleges.
39
+ ```
40
+
41
+
42
+ ### Baseline (sonnet-4-6-baseline)
43
+ ```
44
+
45
+ ```
46
+
47
+
48
+ ### SFT
49
+ ```
50
+ जिरी नगरपालिकामा अहिले ७ वटा माध्यमिक विद्यालय सञ्चालनमा रहेका छन् [https://jirimun.gov.np/sites/jirimun.gov.np/files/documents/%E0%A4%A8%E0%A4%97%E0%A4%B0%20%E0%A4%AA%E0%A5%8D%E0%A4%B0%E0%A5%8B%E0%A4%AB%E0%A4%BE%E0%A4%88%E0%A4%B2%20%E0%A5%A8%E0%A5%A6%E0%A5%AE%E0%A5%A6.pdf]। तीमध्ये ७ वटा सार्वजनिक र ३ वटा संस्थागत हुन् [https://jirimun.gov.np/sites/jirimun.gov.np/files/documents/%E0%A4%A8%E0%A4%97%E0%A4%B0%20%E0%A4%AA%E0%A5%8D%E0%A4%B0%E0%A5%8B%E0%A4%AB%E0%A4%
51
+ ```
52
+
53
+
54
+ ---
55
+
56
+ ## 3. u_grnd_061 (`agriculture` / `roman_nepali`)
57
+
58
+ **Question**: Swine fever lageko bangurko maasu maa yo virus kati din samma bachcha rahanchha?
59
+
60
+
61
+ ### Gold
62
+ ```
63
+ The excerpt states the virus survives up to 27 days in infected pig meat and from several months to years in frozen meat, making disease control extremely difficult.
64
+ ```
65
+
66
+
67
+ ### Baseline (sonnet-4-6-baseline)
68
+ ```
69
+
70
+ ```
71
+
72
+
73
+ ### SFT
74
+ ```
75
+ Pestivirus virus group of Flaviviridae family is found in pigs, hogs, and wild boars [https://dls.gov.np/progressfiles/TADS-Manual-Nepali-Final-Revised-2-Sept-2024-1760434631.pdf]. It can be transmitted from pig to pig through direct contact, contaminated feed or water, or through the respiratory tract [https://dls.gov.np/progressfiles/TADS-Manual-Nepali-Final-Revised-2-Sept-2024-1760434631.pdf]. The virus can survive in the pig's intestines for up to 27 days [https://dls.gov.np/progressfiles/TADS-Manual-Nepali-Final-Revised-2-Sept-2024-1760434631.pdf].
76
+ The virus can be transmitted from pig to
77
+ ```
78
+
79
+
80
+ ---
81
+
82
+ ## 4. u_grnd_034 (`agriculture` / `roman_nepali`)
83
+
84
+ **Question**: Registration number 13906 ma darta SAROJ KUMAR YADAV le kati quintal ukhu supply gareko thiyo ra unle kati rupiya incentive paaune ho?
85
+
86
+
87
+ ### Gold
88
+ ```
89
+ The excerpt's tabular register shows S.N. 13906 Saroj Kumar Yadav supplied 537.97 quintals of sugarcane and is entitled to Rs. 37,657.90 in production incentive from MS PVT.LTD.
90
+ ```
91
+
92
+
93
+ ### Baseline (sonnet-4-6-baseline)
94
+ ```
95
+
96
+ ```
97
+
98
+
99
+ ### SFT
100
+ ```
101
+ Registration number 13906 ma darta SAROJ KUMAR YADAV le 537.97 quintal ukhu supply gareko thiyo [https://doanepal.gov.np/downloadfile/Sugarcane Book_2082_1752400803.pdf]. Unle 37657.90 rupiya incentive paaune ho [https://doanepal.gov.np/downloadfile/Sugarcane Book_2082_13896_13906.pdf].
102
+ ```
103
+
104
+
105
+ ---
106
+
107
+ ## 5. u_grnd_009 (`agriculture` / `devanagari`)
108
+
109
+ **Question**: भैँसीले पाडो फाल्यो भन्दैमा सोझै ब्रुसेलोसिस भनी रिपोर्ट गर्न मिल्छ कि मिल्दैन?
110
+
111
+
112
+ ### Gold
113
+ ```
114
+ मिल्दैन — प्रयोगशाला प्रविधिबाट रोग निदान नगरिकन गर्भपतनको आधारमा मात्र ब्रुसेलोसिस रिपोर्ट गर्न पाइँदैन भनी म्यानुअलमा स्पष्ट उल्लेख गरिएको छ।
115
+ ```
116
+
117
+
118
+ ### Baseline (sonnet-4-6-baseline)
119
+ ```
120
+
121
+ ```
122
+
123
+
124
+ ### SFT
125
+ ```
126
+ भैँसीले पाडो फाल्यो भन्दैमा ब्रुसेलोसिस भनी रिपोर्ट गर्ने मिल्दैन जबसम्म प्रयोगशाला प्रविधिबाट रोग निदान गरिदैन [https://www.moha.gov.np/downloadsfiles/Manual-for-Epidemiological-Reporting-and-Animal-Disease-Recognition-1770710135.pdf]। रोग निदान कार्यमा सजिलो होस् / सबैले अपनाउने रोग निदानका आधारहरूमा एकरुपता आओस भनेर रोग विवरण तथा रोग निदान चार्ट यस पुस्तिकामा समाबेश गरीएको छ [https://www.moha.gov.np/downloadsfiles/Manual-for-Epidemiological-Reporting-and-Animal-Disease-Recognition-1770710135.pdf]।
127
+ ```
128
+
129
+
130
+ ---
131
+
132
+ ## 6. u_grnd_035 (`agriculture` / `roman_nepali`)
133
+
134
+ **Question**: Rautahat Akar Pather ka HARENDRA PATEL le kati quintal ukhu supply gareko cha ra uslai kun sugar mill bata kati paisa incentive mileko cha?
135
+
136
+
137
+ ### Gold
138
+ ```
139
+ The excerpt's register shows S.N. 815 Harendra Patel of Rautahat, Akar Pather supplied 410.45 quintals of sugarcane and received Rs. 28,731.50 in incentive from SBBSCI PVT.LTD.
140
+ ```
141
+
142
+
143
+ ### Baseline (sonnet-4-6-baseline)
144
+ ```
145
+
146
+ ```
147
+
148
+
149
+ ### SFT
150
+ ```
151
+ Rautahat Akar Pather (112.76 quintal) supply gareko cha [https://doanepal.gov.np/downloadfile/Sugarcane Book_2082_1752400803.pdf]. Uslai 810 Sugar Mill bata NPR 18916.10 incentive mileko cha [https://doanepal.gov.np/downloadfile/Sugarcane Book_2082_1752400803.pdf].
152
+ ```
153
+
154
+
155
+ ---
156
+
157
+ ## 7. u_grnd_019 (`banking` / `devanagari`)
158
+
159
+ **Question**: क्यासिनोमा खेल्न आउने खेलाडीले कति अमेरिकी डलर नगद विदेशी मुद्रा ल्याउन पाउँछ, र त्योभन्दा बढी रकम ल्याउनु परेमा के गर्नुपर्छ?
160
+
161
+
162
+ ### Gold
163
+ ```
164
+ Players may bring up to USD 5,000 (or equivalent convertible foreign currency) in cash; any amount above that must either be declared and verified at customs or received through the banking system.
165
+ ```
166
+
167
+
168
+ ### Baseline (sonnet-4-6-baseline)
169
+ ```
170
+
171
+ ```
172
+
173
+
174
+ ### SFT
175
+ ```
176
+ क्यासिनोमा खेल्न आउने खेलाडीले अधिकतम पाँच हजार (५,०००) अमेरिकी डलर नगद विदेशी मुद्रा ल्याउन सक्नेछ [https://www.nrb.org.np/fxm/%e0%a5%a7%e0%a5%a7-%e0%a5%a8%e0%a5%a6%e0%a5%ae%e0%a5%a8-%e0%a5%ae%e0%a5%a9-%e0%a4%8f%e0%a4%95%e0%a5%80%e0%a4%95%e0%a5%83%e0%a4%a4-%e0%a4%aa%e0%a4%b0%e0%a4%bf%e0%a4%aa%e0%a4%a4%e0%a5%8d/]। सोभन्दा बढी रकमको हकमा भन्सार घोषणा गरी प्रमाणित गरेको हुनुपर्नेछ वा बैंकिङ्ग प्रणालीमार्फत् रकम प्राप्त गरेको हुनुपर्नेछ [https://www.nrb.org.np/fxm/%e0%a5%a7%e0%a5%a7-%e0%a5%
177
+ ```
178
+
179
+
180
+ ---
181
+
182
+ ## 8. u_grnd_025 (`visa_immigration` / `devanagari`)
183
+
184
+ **Question**: नेपालमा विदेशी नागरिकको प्रवेश, बसाइ र प्रस्थानको व्यवस्थापन तथा अभिलेखीकरण गर्ने जिम्मेवारी कुन निकायको हो?
185
+
186
+
187
+ ### Gold
188
+ ```
189
+ The excerpt lists this as item 30 under the Ministry of Home Affairs' main functions: management, regulation, control, and recording of foreign nationals' entry, presence, activities, and departure in Nepal.
190
+ ```
191
+
192
+
193
+ ### Baseline (sonnet-4-6-baseline)
194
+ ```
195
+
196
+ ```
197
+
198
+
199
+ ### SFT
200
+ ```
201
+ नेपालमा विदेशी नागरिकको प्रवेश, बसाइ र प्रस्थानको व्यवस्थापन, नियमन, नियन्त्रण तथा अभिलेखीकरण गर्ने जिम्मेवारी अध्यागमन विभागको हो [https://moha.gov.np/page/main-functions]।
202
+ ```
203
+
204
+
205
+ ---
206
+
207
+ ## 9. u_grnd_053 (`agriculture` / `roman_nepali`)
208
+
209
+ **Question**: Mahottari Ramnagar ko MANGAL MAHATO (Mahabir Mahato ko chhora) lai ESCI LTD le unkako ukhu ko lagi kati rupiya diyeko chha?
210
+
211
+
212
+ ### Gold
213
+ ```
214
+ The excerpt lists Mangal Mahato (son of Mahabir Mahato) of Mahottari Ramnagar with 140.18 quintals of sugarcane and a payment of Rs 9,812.6 from ESCI LTD.
215
+ ```
216
+
217
+
218
+ ### Baseline (sonnet-4-6-baseline)
219
+ ```
220
+
221
+ ```
222
+
223
+
224
+ ### SFT
225
+ ```
226
+ MAHABIR MAHATO (Mahabir Mahato ko chhora) lai ESCI LTD le unkako ukhu ko lagi 140.18 rupaiya diyeko chha [https://doanepal.gov.np/downloadfile/Sugarcane Book_2082_1752400803.pdf].
227
+ [1] https://doanepal.gov.np/downloadfile/Sugarcane Book_2082_1752400803.pdf
228
+ ```
229
+
230
+
231
+ ---
232
+
233
+ ## 10. u_grnd_065 (`local_services` / `devanagari`)
234
+
235
+ **Question**: स्थानीय सरकार सञ्चालन ऐन अनुसार नगरपालिकाले बजेट सभामा कुन मितिसम्म पेश गर्नुपर्छ र कहिलेसम्म पारित गर्नुपर्छ?
236
+
237
+
238
+ ### Gold
239
+ ```
240
+ स्थानीय सरकार सञ्चालन ऐन, २०७४ को दफा ७१ अनुसार कार्यपालिकाबाट स्वीकृत बजेट असार १० गतेभित्र सभामा पेश गर्नुपर्छ र असार मसान्तभित्र सभाबाट पारित गर्नुपर्ने व्यवस्था छ।
241
+ ```
242
+
243
+
244
+ ### Baseline (sonnet-4-6-baseline)
245
+ ```
246
+
247
+ ```
248
+
249
+
250
+ ### SFT
251
+ ```
252
+ स्थानीय सरकार सञ्चालन ऐन, २०७४ को दफा ७१ अनुसार स्थानीय सरकारले आगामी आर्थिक वर्षको राजस्व र व्ययको अनुमान कार्यक्रम तथा बजेट प्रस्ताव नगरपालिकाबाट स्वीकृति गराई असार १० गतेसम्म पेश गर्नुपर्छ [https://jirimun.gov.np/sites/jirimun.gov.np/files/documents/mSsj01630472169%E0%A4%9C%E0%A4%BF%E0%A4%B0%E0%A5%80%20%E0%A4%A8%E0%A4%97%E0%A4%B0%E0%A4%AA%E0%A4%BE%E0%A4%B2%E0%A4%BF%E0%A4%95%E0%A4%BE%2C%20%E0%A5%A8%E0%A5%A6%E0%A5%AD%E0%A5%AE.pdf]। बजेट सभामा पेश भएको मितिले २०७६।३।११ मा पारित भएको हो [https://jirimun.gov.np/sites/jirimun.gov.np/files/documents/mSsj01
253
+ ```