--- license: cc-by-nc-4.0 tags: - audio - bird - nature - bioacoustics - embeddings - onnx - backbone pipeline_tag: feature-extraction base_model: justinchuby/BirdNET-onnx --- # BirdNET v2.4 ONNX Backbone Backbone-only ONNX exports of the [BirdNET v2.4](https://huggingface.co/justinchuby/BirdNET-onnx) bird sound classifier. The classification head has been removed, leaving only frontend + feature-extraction. Two variants are provided, matching the originals from [justinchuby/BirdNET-onnx](https://huggingface.co/justinchuby/BirdNET-onnx/tree/main): `model_backbone.onnx` and `birdnet_backbone.onnx`. Both models output a single tensor named **`embedding`** with shape `(1, 1024)`. Embeddings are numerically verified against the reference TF SavedModel published on Zenodo ([BirdNET_v2.4_protobuf](https://zenodo.org/records/15050749)). --- ## Quick start ```python import numpy as np import onnxruntime as ort from huggingface_hub import hf_hub_download # Download backbone path = hf_hub_download( repo_id="biodiversica/BirdNET-onnx-backbone", filename="model_backbone.onnx", ) sess = ort.InferenceSession(path) # 3 s of audio at 48 kHz audio = np.zeros((1, 144000), dtype=np.float32) (embedding,) = sess.run(["embedding"], {"INPUT": audio}) print(embedding.shape) # (1, 1024) ``` For `birdnet_backbone.onnx` the input key is `"input"` (lowercase): ```python path = hf_hub_download( repo_id="biodiversica/BirdNET-onnx-backbone", filename="birdnet_backbone.onnx", ) sess = ort.InferenceSession(path) (embedding,) = sess.run(["embedding"], {"input": audio}) print(embedding.shape) # (1, 1024) ``` --- ## Extraction procedure The extraction and testing procedure can be reproduced using `extract_backbone.py`. The script will: 1. Download `model.onnx` and `birdnet.onnx` from [justinchuby/BirdNET-onnx](https://huggingface.co/justinchuby/BirdNET-onnx). 2. Download the BirdNET v2.4 TF SavedModel from Zenodo ([BirdNET_v2.4_protobuf](https://zenodo.org/records/15050749)). 3. Extract the backbone subgraph (everything up to and including the `model/GLOBAL_AVG_POOL/Mean_reduced_0` node), renaming the output to `embedding`. 4. Save `model_backbone.onnx` and `birdnet_backbone.onnx`. 5. Run a numerical comparison between ONNX and TF SavedModel embeddings on a fixed random waveform (seed 42, 3 s at 48 kHz). Expected output: ``` === Downloading models === Downloaded model.onnx -> ... Downloaded birdnet.onnx -> ... Downloading BirdNET protobuf from Zenodo... Extracted audio-model -> ... === Extracting backbones === Backbone saved -> model_backbone.onnx inputs : ['INPUT'] outputs: ['embedding'] Backbone saved -> birdnet_backbone.onnx inputs : ['input'] outputs: ['embedding'] === Comparing embeddings against Zenodo TF SavedModel === PB embedding shape: (1, 1024) model_backbone.onnx: ONNX embedding shape: (1, 1024) |diff| mean=1.230468e-06 max=9.298325e-06 Embeddings match PB reference with rtol=1e-03, atol=1e-03 PASSED birdnet_backbone.onnx: ONNX embedding shape: (1, 1024) |diff| mean=6.440870e-05 max=5.004406e-04 Embeddings match PB reference with rtol=1e-03, atol=1e-03 PASSED ``` --- ## How extraction works The `_extract` function in `extract_backbone.py` performs a backwards BFS from the `model/GLOBAL_AVG_POOL/Mean_reduced_0` output node (the global average pool), collecting every node that contributes to that output and discarding everything downstream (the classification dense layer). The output tensor is then renamed to `embedding`. It then rebuilds a minimal ONNX graph containing only the retained nodes and their initializers. --- ## Credits - Original ONNX conversion: [justinchuby/BirdNET-onnx](https://huggingface.co/justinchuby/BirdNET-onnx) - [BirdNET Team](https://birdnet.cornell.edu/)