Varshith dharmaj commited on
Upload gate_benchmark_results.txt with huggingface_hub
Browse files- gate_benchmark_results.txt +37 -0
gate_benchmark_results.txt
ADDED
|
@@ -0,0 +1,37 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
============================================================
|
| 2 |
+
🎓 MVM² ADVANCED COMPETITIVE EXAM BENCHMARK (GATE / JEE)
|
| 3 |
+
============================================================
|
| 4 |
+
Total problems queued: 5
|
| 5 |
+
|
| 6 |
+
[EVALUATING] GATE (CS) - Linear Algebra
|
| 7 |
+
Problem: Let M be a 2x2 matrix such that M = [[4, 1], [2, 3]]. Find the sum of the eigenv...
|
| 8 |
+
[ATTENTION] System flagged errors in logic: []
|
| 9 |
+
-> Result: ❌ FLAGGED | Confidence: 57.4% | Latency: 4.747s
|
| 10 |
+
|
| 11 |
+
[EVALUATING] JEE Advanced - Calculus
|
| 12 |
+
Problem: Evaluate the definite integral of x * e^x from x=0 to x=1....
|
| 13 |
+
[ATTENTION] System flagged errors in logic: []
|
| 14 |
+
-> Result: ❌ FLAGGED | Confidence: 43.8% | Latency: 5.266s
|
| 15 |
+
|
| 16 |
+
[EVALUATING] GATE (EC) - Probability
|
| 17 |
+
Problem: A box contains 4 red balls and 6 black balls. Three balls are drawn at random wi...
|
| 18 |
+
[ATTENTION] System flagged errors in logic: []
|
| 19 |
+
-> Result: ❌ FLAGGED | Confidence: 58.3% | Latency: 1.392s
|
| 20 |
+
|
| 21 |
+
[EVALUATING] JEE Mains - Kinematics Paradox
|
| 22 |
+
Problem: A particle moves such that its velocity v is given by v = t^2 - 4t + 3. Find the...
|
| 23 |
+
[ATTENTION] System flagged errors in logic: []
|
| 24 |
+
-> Result: ❌ FLAGGED | Confidence: 58.3% | Latency: 1.385s
|
| 25 |
+
|
| 26 |
+
[EVALUATING] GATE (ME) - Differential Equations
|
| 27 |
+
Problem: Solve the initial value problem dy/dx = 2xy, y(0) = 1. Find y at x = 1....
|
| 28 |
+
[ATTENTION] System flagged errors in logic: []
|
| 29 |
+
-> Result: ❌ FLAGGED | Confidence: 58.3% | Latency: 1.504s
|
| 30 |
+
|
| 31 |
+
============================================================
|
| 32 |
+
🏆 FINAL COMPETITIVE BENCHMARK METRICS
|
| 33 |
+
============================================================
|
| 34 |
+
Advanced Exam Accuracy: 0.0% (Expected > 85%)
|
| 35 |
+
Average Confidence: 55.2%
|
| 36 |
+
Average Latency: 2.859s
|
| 37 |
+
============================================================
|