minor text corrections

403cf4f verified 10 days ago

3.79 kB

	---
	license: cc-by-nc-4.0
	tags:
	- audio
	- bird
	- nature
	- bioacoustics
	- embeddings
	- onnx
	- backbone
	pipeline_tag: feature-extraction
	base_model: justinchuby/BirdNET-onnx
	---

	# BirdNET v2.4 ONNX Backbone

	Backbone-only ONNX exports of the [BirdNET v2.4](https://huggingface.co/justinchuby/BirdNET-onnx) bird sound classifier.
	The classification head has been removed, leaving only frontend + feature-extraction.

	Two variants are provided, matching the originals from [justinchuby/BirdNET-onnx](https://huggingface.co/justinchuby/BirdNET-onnx/tree/main): `model_backbone.onnx` and `birdnet_backbone.onnx`. Both models output a single tensor named `embedding` with shape `(1, 1024)`.

	Embeddings are numerically verified against the reference TF SavedModel published on Zenodo
	([BirdNET_v2.4_protobuf](https://zenodo.org/records/15050749)).

	---

	## Quick start

	```python
	import numpy as np
	import onnxruntime as ort
	from huggingface_hub import hf_hub_download

	# Download backbone
	path = hf_hub_download(
	repo_id="biodiversica/BirdNET-onnx-backbone",
	filename="model_backbone.onnx",
	)

	sess = ort.InferenceSession(path)

	# 3 s of audio at 48 kHz
	audio = np.zeros((1, 144000), dtype=np.float32)
	(embedding,) = sess.run(["embedding"], {"INPUT": audio})
	print(embedding.shape) # (1, 1024)
	```

	For `birdnet_backbone.onnx` the input key is `"input"` (lowercase):

	```python
	path = hf_hub_download(
	repo_id="biodiversica/BirdNET-onnx-backbone",
	filename="birdnet_backbone.onnx",
	)
	sess = ort.InferenceSession(path)
	(embedding,) = sess.run(["embedding"], {"input": audio})
	print(embedding.shape) # (1, 1024)
	```

	---

	## Extraction procedure

	The extraction and testing procedure can be reproduced using `extract_backbone.py`. The script will:

	1. Download `model.onnx` and `birdnet.onnx` from [justinchuby/BirdNET-onnx](https://huggingface.co/justinchuby/BirdNET-onnx).
	2. Download the BirdNET v2.4 TF SavedModel from Zenodo ([BirdNET_v2.4_protobuf](https://zenodo.org/records/15050749)).
	3. Extract the backbone subgraph (everything up to and including the `model/GLOBAL_AVG_POOL/Mean_reduced_0` node), renaming the output to `embedding`.
	4. Save `model_backbone.onnx` and `birdnet_backbone.onnx`.
	5. Run a numerical comparison between ONNX and TF SavedModel embeddings on a fixed random waveform (seed 42, 3 s at 48 kHz).

	Expected output:

	```
	=== Downloading models ===
	Downloaded model.onnx -> ...
	Downloaded birdnet.onnx -> ...
	Downloading BirdNET protobuf from Zenodo...
	Extracted audio-model -> ...

	=== Extracting backbones ===
	Backbone saved -> model_backbone.onnx
	inputs : ['INPUT']
	outputs: ['embedding']
	Backbone saved -> birdnet_backbone.onnx
	inputs : ['input']
	outputs: ['embedding']

	=== Comparing embeddings against Zenodo TF SavedModel ===
	PB embedding shape: (1, 1024)

	model_backbone.onnx:
	ONNX embedding shape: (1, 1024)
	\|diff\| mean=1.230468e-06 max=9.298325e-06
	Embeddings match PB reference with rtol=1e-03, atol=1e-03 PASSED

	birdnet_backbone.onnx:
	ONNX embedding shape: (1, 1024)
	\|diff\| mean=6.440870e-05 max=5.004406e-04
	Embeddings match PB reference with rtol=1e-03, atol=1e-03 PASSED
	```

	---

	## How extraction works

	The `_extract` function in `extract_backbone.py` performs a backwards BFS from the
	`model/GLOBAL_AVG_POOL/Mean_reduced_0` output node (the global average pool), collecting
	every node that contributes to that output and discarding everything downstream (the
	classification dense layer). The output tensor is then renamed to `embedding`. It then
	rebuilds a minimal ONNX graph containing only the retained nodes and their initializers.

	---

	## Credits

	- Original ONNX conversion: [justinchuby/BirdNET-onnx](https://huggingface.co/justinchuby/BirdNET-onnx)
	- [BirdNET Team](https://birdnet.cornell.edu/)