FINCH: Adaptive Evidence Weighting for Audio-Spatiotemporal Fusion
This is the Stage B checkpoint for FINCH, a bioacoustic species identification framework that fuses audio classification with spatiotemporal priors from eBird abundance data. A frozen NatureLM-audio encoder is paired with a learned gating network that adaptively weights audio evidence against space-time context.
- Paper: arXiv:2602.03817
- Code: github.com/leharris3/birdnoise
Model description
The Stage B model computes fused logits as:
final_logits = audio_logits / T + w(a, x, t) * log(prior + eps)
where w(a, x, t) is a small gating MLP conditioned on audio confidence, prior confidence, location, and time-of-year. T and eps are learned scalars.
Only the trainable parameters (classifier head, gating network, temperature, epsilon) are stored here (~3 MB). The frozen NatureLM-audio encoder (~8B params) is downloaded separately from EarthSpeciesProject/NatureLM-audio at load time.
How to use
Requires Python 3.10+, uv, and a HuggingFace account with access to Meta Llama 3.1 8B Instruct (required by NatureLM-audio).
# Clone the repository
git clone https://github.com/leharris3/birdnoise.git
cd birdnoise
# Install dependencies
uv sync
cd NatureLM-audio && uv pip install -r requirements.txt && cd ..
# Log in to HuggingFace
huggingface-cli login
from Models import HFStageBModel
# Download weights from the Hub
model = HFStageBModel.from_pretrained("leharris3/FINCH")
Training details
- Dataset: BirdCLEF 2021 (184 species)
- Encoder: NatureLM-audio (frozen)
- Stage A: Linear probe + scalar fusion weight (30 epochs)
- Stage B: Gating network
w(a, x, t)+ learned T, eps (warm-started from Stage A) - Best val accuracy: 82.6%
Citation
@article{ovanger2026adaptive,
title = {Adaptive Evidence Weighting for Audio-Spatiotemporal Fusion},
author = {Oscar Ovanger and Levi Harris and Timothy H. Keitt},
journal = {arXiv preprint arXiv:2602.03817},
year = {2026},
url = {https://arxiv.org/abs/2602.03817}
}
- Downloads last month
- 27