TSRBench: A Comprehensive Multi-task Multi-modal Time Series Reasoning Benchmark for Generalist Models
Paper
•
2601.18744
•
Published
•
2
None defined yet.
TSRBench: A Comprehensive Multi-task Multi-modal Time Series Reasoning Benchmark for Generalist Models
Multi-Crit: Benchmarking Multimodal Judges on Pluralistic Criteria-Following