| Selected checkpoint directory: checkpoints/step_100 | |
| Mean F1: 0.000 | |
| Per-benchmark F1s: | |
| code_generation: 0.000 | |
| common_sense: 0.000 | |
| creative_writing: 0.000 | |
| dialogue_generation: 0.000 | |
| instruction_following: 0.000 | |
| knowledge_retrieval: 0.000 | |
| logical_reasoning: 0.000 | |
| math_reasoning: 0.000 | |
| question_answering: 0.000 | |
| reading_comprehension: 0.000 | |
| safety_evaluation: 0.000 | |
| sentiment_analysis: 0.000 | |
| summarization: 0.000 | |
| text_classification: 0.000 | |
| translation: 0.000 | |
| SHA256(config.json): 7087b54618ddc9cd146c068edaae90d07ca5227b4a0d9bdc9f54e3c03d4dcd39 | |
| SHA256(pytorch_model.bin): 965362299a238de576a92dfdd3e32aea7a2bacc94b2c41541c8c9258b923f587 | |
| Timestamp (UTC): 2026-02-10T04:11:10Z |