arxiv:2605.30344

Tiny but Trusted: Efficient Vision-Language Reasoning for Time-Series Anomaly Detection

Published on May 28

· Submitted by

Ismini Lourentzou on May 29

Perception and LANguage Lab @UIUC

Upvote

Authors:

Abstract

A parameter-efficient vision-language model is developed for time-series anomaly detection using a novel benchmark with natural-language rationales, achieving superior performance and generalization across multiple datasets.

AI-generated summary

Recent advances in Vision-Language Models (VLMs) have achieved impressive performance across many tasks, yet prior studies report unsatisfactory performance when applying large language or multimodal models to finding abnormal patterns in sequential data. Public anomaly detection benchmarks typically provide interval annotations but not natural-language rationales, making it difficult to fine-tune VLMs to produce grounded, interpretable decisions. To address this gap, we construct VisAnomBench, a curated benchmark built from public time-series datasets and augmented with high-quality anomaly explanations selected from multiple large VLMs using fine-grained, task-specific rewards. Through fine-tuning on this benchmark, we develop VisAnomReasoner, a parameter-efficient VLM for time-series anomaly detection. Experimental results on VisAnomBench show that VisAnomReasoner achieves more accurate anomaly localization and consistently outperforms all baselines, with improvements of at least 21.23 and 23.87 percentage points in precision and F1, respectively. Additional experiments on the TSB-AD-U benchmark demonstrate strong cross-benchmark generalization, with VisAnomReasoner improving precision and F1 by 9.57 and 13.39 percentage points, respectively.

View arXiv page View PDF Project page Add to collection

Community

isminoula

Paper submitter about 3 hours ago

We introduce Tiny but Trusted, a parameter-efficient vision-language framework for time-series anomaly detection with grounded reasoning, enabling compact VLMs to provide accurate and explainable reasoning over sequential data. Instead of treating anomaly detection as interval prediction alone, we construct VisAnomBench, a curated benchmark from public time-series datasets augmented with natural-language anomaly rationales selected using fine-grained, task-specific rewards from multiple large VLMs. Fine-tuning on this benchmark yields VisAnomReasoner, a lightweight VLM that jointly localizes abnormal temporal regions and explains the underlying pattern shifts. The model improves anomaly localization and interpretability while outperforming strong baselines on VisAnomBench and generalizing to TSB-AD-U.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.30344 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.30344 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.30344 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.