FINCH / README.md
leharris3's picture
Update README.md
a7e43ac verified
---
library_name: finch
pipeline_tag: audio-classification
tags:
- model_hub_mixin
- pytorch_model_hub_mixin
- bioacoustics
- bird-species
license: apache-2.0
language:
- en
datasets:
- birdclef-2021
---
# FINCH: Adaptive Evidence Weighting for Audio-Spatiotemporal Fusion
This is the Stage B checkpoint for **FINCH**, a bioacoustic species identification framework that fuses audio classification with spatiotemporal priors from [eBird](https://ebird.org/) abundance data. A frozen [NatureLM-audio](https://huggingface.co/EarthSpeciesProject/NatureLM-audio) encoder is paired with a learned gating network that adaptively weights audio evidence against space-time context.
- **Paper:** [arXiv:2602.03817](https://arxiv.org/abs/2602.03817)
- **Code:** [github.com/leharris3/birdnoise](https://github.com/leharris3/birdnoise)
## Model description
The Stage B model computes fused logits as:
```
final_logits = audio_logits / T + w(a, x, t) * log(prior + eps)
```
where `w(a, x, t)` is a small gating MLP conditioned on audio confidence, prior confidence, location, and time-of-year. `T` and `eps` are learned scalars.
Only the trainable parameters (classifier head, gating network, temperature, epsilon) are stored here (\~3 MB). The frozen NatureLM-audio encoder (\~8B params) is downloaded separately from [EarthSpeciesProject/NatureLM-audio](https://huggingface.co/EarthSpeciesProject/NatureLM-audio) at load time.
## How to use
Requires Python 3.10+, [uv](https://docs.astral.sh/uv/), and a HuggingFace account with access to [Meta Llama 3.1 8B Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct) (required by NatureLM-audio).
```bash
# Clone the repository
git clone https://github.com/leharris3/birdnoise.git
cd birdnoise
# Install dependencies
uv sync
cd NatureLM-audio && uv pip install -r requirements.txt && cd ..
# Log in to HuggingFace
huggingface-cli login
```
```python
from Models import HFStageBModel
# Download weights from the Hub
model = HFStageBModel.from_pretrained("leharris3/FINCH")
```
## Training details
- **Dataset:** BirdCLEF 2021 (184 species)
- **Encoder:** NatureLM-audio (frozen)
- **Stage A:** Linear probe + scalar fusion weight (30 epochs)
- **Stage B:** Gating network `w(a, x, t)` + learned T, eps (warm-started from Stage A)
- **Best val accuracy:** 82.6%
## Citation
```bibtex
@article{ovanger2026adaptive,
title = {Adaptive Evidence Weighting for Audio-Spatiotemporal Fusion},
author = {Oscar Ovanger and Levi Harris and Timothy H. Keitt},
journal = {arXiv preprint arXiv:2602.03817},
year = {2026},
url = {https://arxiv.org/abs/2602.03817}
}
```