FINCH: Adaptive Evidence Weighting for Audio-Spatiotemporal Fusion

This is the Stage B checkpoint for FINCH, a bioacoustic species identification framework that fuses audio classification with spatiotemporal priors from eBird abundance data. A frozen NatureLM-audio encoder is paired with a learned gating network that adaptively weights audio evidence against space-time context.

Paper: arXiv:2602.03817
Code: github.com/leharris3/birdnoise

Model description

The Stage B model computes fused logits as:

final_logits = audio_logits / T + w(a, x, t) * log(prior + eps)

where w(a, x, t) is a small gating MLP conditioned on audio confidence, prior confidence, location, and time-of-year. T and eps are learned scalars.

Only the trainable parameters (classifier head, gating network, temperature, epsilon) are stored here (~3 MB). The frozen NatureLM-audio encoder (~8B params) is downloaded separately from EarthSpeciesProject/NatureLM-audio at load time.

How to use

Requires Python 3.10+, uv, and a HuggingFace account with access to Meta Llama 3.1 8B Instruct (required by NatureLM-audio).

# Clone the repository
git clone https://github.com/leharris3/birdnoise.git
cd birdnoise

# Install dependencies
uv sync
cd NatureLM-audio && uv pip install -r requirements.txt && cd ..

# Log in to HuggingFace
huggingface-cli login

from Models import HFStageBModel

# Download weights from the Hub
model = HFStageBModel.from_pretrained("leharris3/FINCH")

Training details

Dataset: BirdCLEF 2021 (184 species)
Encoder: NatureLM-audio (frozen)
Stage A: Linear probe + scalar fusion weight (30 epochs)
Stage B: Gating network w(a, x, t) + learned T, eps (warm-started from Stage A)
Best val accuracy: 82.6%

Citation

@article{ovanger2026adaptive,
    title   = {Adaptive Evidence Weighting for Audio-Spatiotemporal Fusion},
    author  = {Oscar Ovanger and Levi Harris and Timothy H. Keitt},
    journal = {arXiv preprint arXiv:2602.03817},
    year    = {2026},
    url     = {https://arxiv.org/abs/2602.03817}
}

Downloads last month: 9

Paper for leharris3/FINCH

Adaptive Evidence Weighting for Audio-Spatiotemporal Fusion

Paper • 2602.03817 • Published Feb 3