| precision recall f1-score support | |
| Brainstorming 0.67 0.56 0.61 32 | |
| Coding 0.90 0.95 0.93 20 | |
| Extraction 0.75 0.55 0.63 11 | |
| Factual QA 0.24 0.80 0.36 5 | |
| Generation 0.80 0.83 0.82 54 | |
| Math 0.88 0.96 0.92 24 | |
| Reasoning 0.60 0.21 0.32 14 | |
| accuracy 0.74 160 | |
| macro avg 0.69 0.69 0.66 160 | |
| weighted avg 0.76 0.74 0.73 160 | |