Papers
arxiv:2509.23418

Beyond Metadata: Multimodal, Policy-Aware Detection of YouTube Scam Videos

Published on Apr 1
Authors:
,
,
,

Abstract

Multimodal approaches combining text, audio, and visual data outperform unimodal methods in detecting YouTube scams, offering improved robustness and interpretable reasoning for automated content moderation.

YouTube is a major platform for information and entertainment, but its wide accessibility also makes it attractive for scammers to upload deceptive or malicious content. Prior detection approaches rely largely on textual or statistical metadata, such as titles, descriptions, view counts, or likes, which are effective in many cases but can be evaded through benign-looking text, manipulated statistics, or other obfuscation strategies (e.g., 'Leetspeak'), while ignoring visual cues. In this study, we systematically investigate multimodal approaches for detecting YouTube scams. Our dataset consolidates established scam categories and augments them with full-length videos and policy-grounded reasoning annotations. Experiments show that a text-only model using titles and descriptions (fine-tuned BERT) achieves moderate performance (76.61% F1 score), improving slightly with audio transcripts (77.98% F1 score). Visual analysis with a fine-tuned LLaVA-Video model performs better (79.61% F1 score), while a multimodal framework combining titles, descriptions, and video frames achieves the highest performance (82.96% F1 score). Moreover, the multimodal framework showed greater robustness to adversarial perturbations, with accuracy dropping only 1-3%, compared to 12-38% for modality-specific models. Beyond accuracy, the multimodal framework provides interpretable, policy-grounded reasoning, enhancing transparency and practical utility in automated moderation. Using this approach, we analyzed 6,374 in-the-wild YouTube videos and detected 1,864 scams with explicit reasoning, providing a valuable resource for future research.

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2509.23418
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 3

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2509.23418 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2509.23418 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.