|
|
--- |
|
|
library_name: finch |
|
|
pipeline_tag: audio-classification |
|
|
tags: |
|
|
- model_hub_mixin |
|
|
- pytorch_model_hub_mixin |
|
|
- bioacoustics |
|
|
- bird-species |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
datasets: |
|
|
- birdclef-2021 |
|
|
--- |
|
|
|
|
|
# FINCH: Adaptive Evidence Weighting for Audio-Spatiotemporal Fusion |
|
|
|
|
|
This is the Stage B checkpoint for **FINCH**, a bioacoustic species identification framework that fuses audio classification with spatiotemporal priors from [eBird](https://ebird.org/) abundance data. A frozen [NatureLM-audio](https://huggingface.co/EarthSpeciesProject/NatureLM-audio) encoder is paired with a learned gating network that adaptively weights audio evidence against space-time context. |
|
|
|
|
|
- **Paper:** [arXiv:2602.03817](https://arxiv.org/abs/2602.03817) |
|
|
- **Code:** [github.com/leharris3/birdnoise](https://github.com/leharris3/birdnoise) |
|
|
|
|
|
## Model description |
|
|
|
|
|
The Stage B model computes fused logits as: |
|
|
|
|
|
``` |
|
|
final_logits = audio_logits / T + w(a, x, t) * log(prior + eps) |
|
|
``` |
|
|
|
|
|
where `w(a, x, t)` is a small gating MLP conditioned on audio confidence, prior confidence, location, and time-of-year. `T` and `eps` are learned scalars. |
|
|
|
|
|
Only the trainable parameters (classifier head, gating network, temperature, epsilon) are stored here (\~3 MB). The frozen NatureLM-audio encoder (\~8B params) is downloaded separately from [EarthSpeciesProject/NatureLM-audio](https://huggingface.co/EarthSpeciesProject/NatureLM-audio) at load time. |
|
|
|
|
|
## How to use |
|
|
|
|
|
Requires Python 3.10+, [uv](https://docs.astral.sh/uv/), and a HuggingFace account with access to [Meta Llama 3.1 8B Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct) (required by NatureLM-audio). |
|
|
|
|
|
```bash |
|
|
# Clone the repository |
|
|
git clone https://github.com/leharris3/birdnoise.git |
|
|
cd birdnoise |
|
|
|
|
|
# Install dependencies |
|
|
uv sync |
|
|
cd NatureLM-audio && uv pip install -r requirements.txt && cd .. |
|
|
|
|
|
# Log in to HuggingFace |
|
|
huggingface-cli login |
|
|
``` |
|
|
|
|
|
```python |
|
|
from Models import HFStageBModel |
|
|
|
|
|
# Download weights from the Hub |
|
|
model = HFStageBModel.from_pretrained("leharris3/FINCH") |
|
|
``` |
|
|
|
|
|
## Training details |
|
|
|
|
|
- **Dataset:** BirdCLEF 2021 (184 species) |
|
|
- **Encoder:** NatureLM-audio (frozen) |
|
|
- **Stage A:** Linear probe + scalar fusion weight (30 epochs) |
|
|
- **Stage B:** Gating network `w(a, x, t)` + learned T, eps (warm-started from Stage A) |
|
|
- **Best val accuracy:** 82.6% |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@article{ovanger2026adaptive, |
|
|
title = {Adaptive Evidence Weighting for Audio-Spatiotemporal Fusion}, |
|
|
author = {Oscar Ovanger and Levi Harris and Timothy H. Keitt}, |
|
|
journal = {arXiv preprint arXiv:2602.03817}, |
|
|
year = {2026}, |
|
|
url = {https://arxiv.org/abs/2602.03817} |
|
|
} |
|
|
``` |
|
|
|