arxiv:2602.03817

Adaptive Evidence Weighting for Audio-Spatiotemporal Fusion

Published on Feb 3

· Submitted by

Authors:

Abstract

A fusion framework called FINCH combines audio and spatiotemporal predictors for bioacoustic classification by adaptively weighting evidence based on reliability estimates, outperforming fixed-weight methods and audio-only approaches.

AI-generated summary

Many machine learning systems have access to multiple sources of evidence for the same prediction target, yet these sources often differ in reliability and informativeness across inputs. In bioacoustic classification, species identity may be inferred both from the acoustic signal and from spatiotemporal context such as location and season; while Bayesian inference motivates multiplicative evidence combination, in practice we typically only have access to discriminative predictors rather than calibrated generative models. We introduce Fusion under INdependent Conditional Hypotheses (FINCH), an adaptive log-linear evidence fusion framework that integrates a pre-trained audio classifier with a structured spatiotemporal predictor. FINCH learns a per-sample gating function that estimates the reliability of contextual information from uncertainty and informativeness statistics. The resulting fusion family contains the audio-only classifier as a special case and explicitly bounds the influence of contextual evidence, yielding a risk-contained hypothesis class with an interpretable audio-only fallback. Across benchmarks, FINCH consistently outperforms fixed-weight fusion and audio-only baselines, improving robustness and error trade-offs even when contextual information is weak in isolation. We achieve state-of-the-art performance on CBI and competitive or improved performance on several subsets of BirdSet using a lightweight, interpretable, evidence-based approach. Code is available: \href{https://anonymous.4open.science/r/birdnoise-85CD/README.md{anonymous-repository}}

View arXiv page View PDF Add to collection

Community

leharris3

Paper submitter about 7 hours ago

•

edited about 7 hours ago

Authors introduce a family of adaptive evidence weighting models for audio spatial-temporal fusion; SoTA results on CBI audio-acoustic classification.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2602.03817 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2602.03817 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2602.03817 in a Space README.md to link it from this page.