Tiny but Trusted: Efficient Vision-Language Reasoning for Time-Series Anomaly Detection
Abstract
A parameter-efficient vision-language model is developed for time-series anomaly detection using a novel benchmark with natural-language rationales, achieving superior performance and generalization across multiple datasets.
Recent advances in Vision-Language Models (VLMs) have achieved impressive performance across many tasks, yet prior studies report unsatisfactory performance when applying large language or multimodal models to finding abnormal patterns in sequential data. Public anomaly detection benchmarks typically provide interval annotations but not natural-language rationales, making it difficult to fine-tune VLMs to produce grounded, interpretable decisions. To address this gap, we construct VisAnomBench, a curated benchmark built from public time-series datasets and augmented with high-quality anomaly explanations selected from multiple large VLMs using fine-grained, task-specific rewards. Through fine-tuning on this benchmark, we develop VisAnomReasoner, a parameter-efficient VLM for time-series anomaly detection. Experimental results on VisAnomBench show that VisAnomReasoner achieves more accurate anomaly localization and consistently outperforms all baselines, with improvements of at least 21.23 and 23.87 percentage points in precision and F1, respectively. Additional experiments on the TSB-AD-U benchmark demonstrate strong cross-benchmark generalization, with VisAnomReasoner improving precision and F1 by 9.57 and 13.39 percentage points, respectively.
Community
We introduce Tiny but Trusted, a parameter-efficient vision-language framework for time-series anomaly detection with grounded reasoning, enabling compact VLMs to provide accurate and explainable reasoning over sequential data. Instead of treating anomaly detection as interval prediction alone, we construct VisAnomBench, a curated benchmark from public time-series datasets augmented with natural-language anomaly rationales selected using fine-grained, task-specific rewards from multiple large VLMs. Fine-tuning on this benchmark yields VisAnomReasoner, a lightweight VLM that jointly localizes abnormal temporal regions and explains the underlying pattern shifts. The model improves anomaly localization and interpretability while outperforming strong baselines on VisAnomBench and generalizing to TSB-AD-U.
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper