leharris3
/

FINCH

Audio Classification

model_hub_mixin

pytorch_model_hub_mixin

Model card Files Files and versions

FINCH / README.md

leharris3's picture

Update README.md

a7e43ac verified 3 days ago

|

history blame contribute delete

2.66 kB

	---
	library_name: finch
	pipeline_tag: audio-classification
	tags:
	- model_hub_mixin
	- pytorch_model_hub_mixin
	- bioacoustics
	- bird-species
	license: apache-2.0
	language:
	- en
	datasets:
	- birdclef-2021
	---

	# FINCH: Adaptive Evidence Weighting for Audio-Spatiotemporal Fusion

	This is the Stage B checkpoint for FINCH, a bioacoustic species identification framework that fuses audio classification with spatiotemporal priors from [eBird](https://ebird.org/) abundance data. A frozen [NatureLM-audio](https://huggingface.co/EarthSpeciesProject/NatureLM-audio) encoder is paired with a learned gating network that adaptively weights audio evidence against space-time context.

	- Paper: [arXiv:2602.03817](https://arxiv.org/abs/2602.03817)
	- Code: [github.com/leharris3/birdnoise](https://github.com/leharris3/birdnoise)

	## Model description

	The Stage B model computes fused logits as:

	```
	final_logits = audio_logits / T + w(a, x, t) * log(prior + eps)
	```

	where `w(a, x, t)` is a small gating MLP conditioned on audio confidence, prior confidence, location, and time-of-year. `T` and `eps` are learned scalars.

	Only the trainable parameters (classifier head, gating network, temperature, epsilon) are stored here (\~3 MB). The frozen NatureLM-audio encoder (\~8B params) is downloaded separately from [EarthSpeciesProject/NatureLM-audio](https://huggingface.co/EarthSpeciesProject/NatureLM-audio) at load time.

	## How to use

	Requires Python 3.10+, [uv](https://docs.astral.sh/uv/), and a HuggingFace account with access to [Meta Llama 3.1 8B Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct) (required by NatureLM-audio).

	```bash
	# Clone the repository
	git clone https://github.com/leharris3/birdnoise.git
	cd birdnoise

	# Install dependencies
	uv sync
	cd NatureLM-audio && uv pip install -r requirements.txt && cd ..

	# Log in to HuggingFace
	huggingface-cli login
	```

	```python
	from Models import HFStageBModel

	# Download weights from the Hub
	model = HFStageBModel.from_pretrained("leharris3/FINCH")
	```

	## Training details

	- Dataset: BirdCLEF 2021 (184 species)
	- Encoder: NatureLM-audio (frozen)
	- Stage A: Linear probe + scalar fusion weight (30 epochs)
	- Stage B: Gating network `w(a, x, t)` + learned T, eps (warm-started from Stage A)
	- Best val accuracy: 82.6%

	## Citation

	```bibtex
	@article{ovanger2026adaptive,
	title = {Adaptive Evidence Weighting for Audio-Spatiotemporal Fusion},
	author = {Oscar Ovanger and Levi Harris and Timothy H. Keitt},
	journal = {arXiv preprint arXiv:2602.03817},
	year = {2026},
	url = {https://arxiv.org/abs/2602.03817}
	}
	```