Metrics are not matching for ARC 1 and ARC 2
I tried the evaluation using the https://github.com/SamsungSAILMontreal/TinyRecursiveModels/blob/main/evaluators/arc.py
But the metrics are very low for ARC 1 its giving
{'all': {'accuracy': np.float32(0.65308), 'exact_accuracy': np.float32(0.021479713), 'lm_loss': np.float32(1.8157994), 'q_halt_accuracy': np.float32(0.8997613), 'q_halt_loss': np.float32(0.28415218), 'steps': np.float32(16.0)}, 'ARC/pass@1': 0.02375, 'ARC/pass@2': 0.02375, 'ARC/pass@5': 0.02375, 'ARC/pass@10': 0.02375, 'ARC/pass@100': 0.02375, 'ARC/pass@1000': 0.02375}
and for ARC 2 its giving
{'all': {'accuracy': np.float32(0.6418964), 'exact_accuracy': np.float32(0.0), 'lm_loss': np.float32(1.7541337), 'q_halt_accuracy': np.float32(0.9186047), 'q_halt_loss': np.float32(0.29709587), 'steps': np.float32(16.0)}, 'ARC/pass@1': 0.0, 'ARC/pass@2': 0.0, 'ARC/pass@5': 0.0, 'ARC/pass@10': 0.0, 'ARC/pass@100': 0.0, 'ARC/pass@1000': 0.0}
So please let me know If I am evaluating it wrong or there is other issue