Varshith dharmaj commited on
Commit
6874b0b
·
verified ·
1 Parent(s): 5545864

Upload benchmark_competitive_results.txt with huggingface_hub

Browse files
Files changed (1) hide show
  1. benchmark_competitive_results.txt +37 -0
benchmark_competitive_results.txt ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ============================================================
2
+ 🎓 MVM² ADVANCED COMPETITIVE EXAM BENCHMARK (GATE / JEE)
3
+ ============================================================
4
+ Total problems queued: 5
5
+
6
+ [EVALUATING] GATE (CS) - Linear Algebra
7
+ Problem: Let M be a 2x2 matrix such that M = [[4, 1], [2, 3]]. Find the sum of the eigenv...
8
+ [ATTENTION] System flagged errors in logic: []
9
+ -> Result: ❌ FLAGGED | Confidence: 57.4% | Latency: 5.224s
10
+
11
+ [EVALUATING] JEE Advanced - Calculus
12
+ Problem: Evaluate the definite integral of x * e^x from x=0 to x=1....
13
+ [ATTENTION] System flagged errors in logic: []
14
+ -> Result: ❌ FLAGGED | Confidence: 44.8% | Latency: 4.883s
15
+
16
+ [EVALUATING] GATE (EC) - Probability
17
+ Problem: A box contains 4 red balls and 6 black balls. Three balls are drawn at random wi...
18
+ [ATTENTION] System flagged errors in logic: []
19
+ -> Result: ❌ FLAGGED | Confidence: 58.3% | Latency: 1.121s
20
+
21
+ [EVALUATING] JEE Mains - Kinematics Paradox
22
+ Problem: A particle moves such that its velocity v is given by v = t^2 - 4t + 3. Find the...
23
+ [ATTENTION] System flagged errors in logic: []
24
+ -> Result: ❌ FLAGGED | Confidence: 58.3% | Latency: 1.319s
25
+
26
+ [EVALUATING] GATE (ME) - Differential Equations
27
+ Problem: Solve the initial value problem dy/dx = 2xy, y(0) = 1. Find y at x = 1....
28
+ [ATTENTION] System flagged errors in logic: []
29
+ -> Result: ❌ FLAGGED | Confidence: 58.3% | Latency: 1.355s
30
+
31
+ ============================================================
32
+ 🏆 FINAL COMPETITIVE BENCHMARK METRICS
33
+ ============================================================
34
+ Advanced Exam Accuracy: 0.0% (Expected > 85%)
35
+ Average Confidence: 55.4%
36
+ Average Latency: 2.780s
37
+ ============================================================