--- library_name: finch pipeline_tag: audio-classification tags: - model_hub_mixin - pytorch_model_hub_mixin - bioacoustics - bird-species license: apache-2.0 language: - en datasets: - birdclef-2021 --- # FINCH: Adaptive Evidence Weighting for Audio-Spatiotemporal Fusion This is the Stage B checkpoint for **FINCH**, a bioacoustic species identification framework that fuses audio classification with spatiotemporal priors from [eBird](https://ebird.org/) abundance data. A frozen [NatureLM-audio](https://huggingface.co/EarthSpeciesProject/NatureLM-audio) encoder is paired with a learned gating network that adaptively weights audio evidence against space-time context. - **Paper:** [arXiv:2602.03817](https://arxiv.org/abs/2602.03817) - **Code:** [github.com/leharris3/birdnoise](https://github.com/leharris3/birdnoise) ## Model description The Stage B model computes fused logits as: ``` final_logits = audio_logits / T + w(a, x, t) * log(prior + eps) ``` where `w(a, x, t)` is a small gating MLP conditioned on audio confidence, prior confidence, location, and time-of-year. `T` and `eps` are learned scalars. Only the trainable parameters (classifier head, gating network, temperature, epsilon) are stored here (\~3 MB). The frozen NatureLM-audio encoder (\~8B params) is downloaded separately from [EarthSpeciesProject/NatureLM-audio](https://huggingface.co/EarthSpeciesProject/NatureLM-audio) at load time. ## How to use Requires Python 3.10+, [uv](https://docs.astral.sh/uv/), and a HuggingFace account with access to [Meta Llama 3.1 8B Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct) (required by NatureLM-audio). ```bash # Clone the repository git clone https://github.com/leharris3/birdnoise.git cd birdnoise # Install dependencies uv sync cd NatureLM-audio && uv pip install -r requirements.txt && cd .. # Log in to HuggingFace huggingface-cli login ``` ```python from Models import HFStageBModel # Download weights from the Hub model = HFStageBModel.from_pretrained("leharris3/FINCH") ``` ## Training details - **Dataset:** BirdCLEF 2021 (184 species) - **Encoder:** NatureLM-audio (frozen) - **Stage A:** Linear probe + scalar fusion weight (30 epochs) - **Stage B:** Gating network `w(a, x, t)` + learned T, eps (warm-started from Stage A) - **Best val accuracy:** 82.6% ## Citation ```bibtex @article{ovanger2026adaptive, title = {Adaptive Evidence Weighting for Audio-Spatiotemporal Fusion}, author = {Oscar Ovanger and Levi Harris and Timothy H. Keitt}, journal = {arXiv preprint arXiv:2602.03817}, year = {2026}, url = {https://arxiv.org/abs/2602.03817} } ```