biodiversica's picture
minor text corrections
403cf4f verified
---
license: cc-by-nc-4.0
tags:
- audio
- bird
- nature
- bioacoustics
- embeddings
- onnx
- backbone
pipeline_tag: feature-extraction
base_model: justinchuby/BirdNET-onnx
---
# BirdNET v2.4 ONNX Backbone
Backbone-only ONNX exports of the [BirdNET v2.4](https://huggingface.co/justinchuby/BirdNET-onnx) bird sound classifier.
The classification head has been removed, leaving only frontend + feature-extraction.
Two variants are provided, matching the originals from [justinchuby/BirdNET-onnx](https://huggingface.co/justinchuby/BirdNET-onnx/tree/main): `model_backbone.onnx` and `birdnet_backbone.onnx`. Both models output a single tensor named **`embedding`** with shape `(1, 1024)`.
Embeddings are numerically verified against the reference TF SavedModel published on Zenodo
([BirdNET_v2.4_protobuf](https://zenodo.org/records/15050749)).
---
## Quick start
```python
import numpy as np
import onnxruntime as ort
from huggingface_hub import hf_hub_download
# Download backbone
path = hf_hub_download(
repo_id="biodiversica/BirdNET-onnx-backbone",
filename="model_backbone.onnx",
)
sess = ort.InferenceSession(path)
# 3 s of audio at 48 kHz
audio = np.zeros((1, 144000), dtype=np.float32)
(embedding,) = sess.run(["embedding"], {"INPUT": audio})
print(embedding.shape) # (1, 1024)
```
For `birdnet_backbone.onnx` the input key is `"input"` (lowercase):
```python
path = hf_hub_download(
repo_id="biodiversica/BirdNET-onnx-backbone",
filename="birdnet_backbone.onnx",
)
sess = ort.InferenceSession(path)
(embedding,) = sess.run(["embedding"], {"input": audio})
print(embedding.shape) # (1, 1024)
```
---
## Extraction procedure
The extraction and testing procedure can be reproduced using `extract_backbone.py`. The script will:
1. Download `model.onnx` and `birdnet.onnx` from [justinchuby/BirdNET-onnx](https://huggingface.co/justinchuby/BirdNET-onnx).
2. Download the BirdNET v2.4 TF SavedModel from Zenodo ([BirdNET_v2.4_protobuf](https://zenodo.org/records/15050749)).
3. Extract the backbone subgraph (everything up to and including the `model/GLOBAL_AVG_POOL/Mean_reduced_0` node), renaming the output to `embedding`.
4. Save `model_backbone.onnx` and `birdnet_backbone.onnx`.
5. Run a numerical comparison between ONNX and TF SavedModel embeddings on a fixed random waveform (seed 42, 3 s at 48 kHz).
Expected output:
```
=== Downloading models ===
Downloaded model.onnx -> ...
Downloaded birdnet.onnx -> ...
Downloading BirdNET protobuf from Zenodo...
Extracted audio-model -> ...
=== Extracting backbones ===
Backbone saved -> model_backbone.onnx
inputs : ['INPUT']
outputs: ['embedding']
Backbone saved -> birdnet_backbone.onnx
inputs : ['input']
outputs: ['embedding']
=== Comparing embeddings against Zenodo TF SavedModel ===
PB embedding shape: (1, 1024)
model_backbone.onnx:
ONNX embedding shape: (1, 1024)
|diff| mean=1.230468e-06 max=9.298325e-06
Embeddings match PB reference with rtol=1e-03, atol=1e-03 PASSED
birdnet_backbone.onnx:
ONNX embedding shape: (1, 1024)
|diff| mean=6.440870e-05 max=5.004406e-04
Embeddings match PB reference with rtol=1e-03, atol=1e-03 PASSED
```
---
## How extraction works
The `_extract` function in `extract_backbone.py` performs a backwards BFS from the
`model/GLOBAL_AVG_POOL/Mean_reduced_0` output node (the global average pool), collecting
every node that contributes to that output and discarding everything downstream (the
classification dense layer). The output tensor is then renamed to `embedding`. It then
rebuilds a minimal ONNX graph containing only the retained nodes and their initializers.
---
## Credits
- Original ONNX conversion: [justinchuby/BirdNET-onnx](https://huggingface.co/justinchuby/BirdNET-onnx)
- [BirdNET Team](https://birdnet.cornell.edu/)