precision recall f1-score support Brainstorming 0.67 0.56 0.61 32 Coding 0.90 0.95 0.93 20 Extraction 0.75 0.55 0.63 11 Factual QA 0.24 0.80 0.36 5 Generation 0.80 0.83 0.82 54 Math 0.88 0.96 0.92 24 Reasoning 0.60 0.21 0.32 14 accuracy 0.74 160 macro avg 0.69 0.69 0.66 160 weighted avg 0.76 0.74 0.73 160