leharris3 commited on
Commit
75afa95
·
verified ·
1 Parent(s): 796dfc3

Add model card

Browse files
Files changed (1) hide show
  1. README.md +52 -4
README.md CHANGED
@@ -4,12 +4,60 @@ pipeline_tag: audio-classification
4
  tags:
5
  - model_hub_mixin
6
  - pytorch_model_hub_mixin
 
 
7
  license: apache-2.0
8
  language:
9
  - en
 
 
10
  ---
11
 
12
- This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
13
- - Code: [More Information Needed]
14
- - Paper: [More Information Needed]
15
- - Docs: [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  tags:
5
  - model_hub_mixin
6
  - pytorch_model_hub_mixin
7
+ - bioacoustics
8
+ - bird-species
9
  license: apache-2.0
10
  language:
11
  - en
12
+ datasets:
13
+ - birdclef-2021
14
  ---
15
 
16
+ # FINCH: Adaptive Evidence Weighting for Audio-Spatiotemporal Fusion
17
+
18
+ This is the Stage B checkpoint for **FINCH**, a bioacoustic species identification framework that fuses audio classification with spatiotemporal priors from [eBird](https://ebird.org/) abundance data. A frozen [NatureLM-audio](https://huggingface.co/EarthSpeciesProject/NatureLM-audio) encoder is paired with a learned gating network that adaptively weights audio evidence against space-time context.
19
+
20
+ - **Paper:** [arXiv:2602.03817](https://arxiv.org/abs/2602.03817)
21
+ - **Code:** [github.com/leharris3/birdnoise](https://github.com/leharris3/birdnoise)
22
+
23
+ ## Model description
24
+
25
+ The Stage B model computes fused logits as:
26
+
27
+ ```
28
+ final_logits = audio_logits / T + w(a, x, t) * log(prior + eps)
29
+ ```
30
+
31
+ where `w(a, x, t)` is a small gating MLP conditioned on audio confidence, prior confidence, location, and time-of-year. `T` and `eps` are learned scalars.
32
+
33
+ Only the trainable parameters (classifier head, gating network, temperature, epsilon) are stored here (~3 MB). The frozen NatureLM-audio encoder (~8B params) is downloaded separately from [EarthSpeciesProject/NatureLM-audio](https://huggingface.co/EarthSpeciesProject/NatureLM-audio) at load time.
34
+
35
+ ## How to use
36
+
37
+ Requires the [FINCH source code](https://github.com/leharris3/birdnoise) and its dependencies.
38
+
39
+ ```python
40
+ from Models import HFStageBModel
41
+
42
+ model = HFStageBModel.from_pretrained("leharris3/FINCH")
43
+ ```
44
+
45
+ ## Training details
46
+
47
+ - **Dataset:** BirdCLEF 2021 (184 species)
48
+ - **Encoder:** NatureLM-audio (frozen)
49
+ - **Stage A:** Linear probe + scalar fusion weight (30 epochs)
50
+ - **Stage B:** Gating network `w(a, x, t)` + learned T, eps (warm-started from Stage A)
51
+ - **Best val accuracy:** 82.0%
52
+
53
+ ## Citation
54
+
55
+ ```bibtex
56
+ @article{ovanger2026adaptive,
57
+ title = {Adaptive Evidence Weighting for Audio-Spatiotemporal Fusion},
58
+ author = {Oscar Ovanger and Levi Harris and Timothy H. Keitt},
59
+ journal = {arXiv preprint arXiv:2602.03817},
60
+ year = {2026},
61
+ url = {https://arxiv.org/abs/2602.03817}
62
+ }
63
+ ```