biodiversica commited on
Commit
3c97f2f
·
verified ·
1 Parent(s): 367c3fa

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +118 -0
README.md ADDED
@@ -0,0 +1,118 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ tags:
4
+ - audio
5
+ - bird
6
+ - nature
7
+ - bioacoustics
8
+ - embeddings
9
+ - onnx
10
+ - backbone
11
+ pipeline_tag: feature-extraction
12
+ base_model: justinchuby/BirdNET-onnx
13
+ ---
14
+
15
+ # BirdNET v2.4 ONNX Backbone
16
+
17
+ Backbone-only ONNX exports of the [BirdNET v2.4](https://huggingface.co/justinchuby/BirdNET-onnx) bird sound classifier.
18
+ The classification head has been removed, leaving only frontend + feature-extraction.
19
+
20
+ Two variants are provided, matching the originals from [justinchuby/BirdNET-onnx](https://huggingface.co/justinchuby/BirdNET-onnx/tree/main): `model_backbone.onnx` and `birdnet_backbone.onnx`. Both models output a single tensor named **`embedding`** with shape `(1, 1024)`.
21
+
22
+ Embeddings are numerically verified against the reference TF SavedModel published on Zenodo
23
+ ([BirdNET_v2.4_protobuf](https://zenodo.org/records/15050749)).
24
+
25
+ ---
26
+
27
+ ## Quick start
28
+
29
+ ```python
30
+ import numpy as np
31
+ import onnxruntime as ort
32
+ from huggingface_hub import hf_hub_download
33
+
34
+ # Download backbone
35
+ path = hf_hub_download(
36
+ repo_id="biodiversica/BirdNET-onnx-backbone",
37
+ filename="model_backbone.onnx",
38
+ )
39
+
40
+ sess = ort.InferenceSession(path)
41
+
42
+ # 3 s of audio at 48 kHz
43
+ audio = np.zeros((1, 144000), dtype=np.float32)
44
+ (embedding,) = sess.run(["embedding"], {"INPUT": audio})
45
+ print(embedding.shape) # (1, 1024)
46
+ ```
47
+
48
+ For `birdnet_backbone.onnx` the input key is `"input"` (lowercase):
49
+
50
+ ```python
51
+ path = hf_hub_download(
52
+ repo_id="biodiversica/BirdNET-onnx-backbone",
53
+ filename="birdnet_backbone.onnx",
54
+ )
55
+ sess = ort.InferenceSession(path)
56
+ (embedding,) = sess.run(["embedding"], {"input": audio})
57
+ print(embedding.shape) # (1, 1024)
58
+ ```
59
+
60
+ ---
61
+
62
+ ## Extraction procedure
63
+
64
+ The extraction and testing procedure can be reproduced using `extract_backbone.py`. The script will:
65
+
66
+ 1. Download `model.onnx` and `birdnet.onnx` from [justinchuby/BirdNET-onnx](https://huggingface.co/justinchuby/BirdNET-onnx).
67
+ 2. Download the BirdNET v2.4 TF SavedModel from Zenodo ([BirdNET_v2.4_protobuf](https://zenodo.org/records/15050749)).
68
+ 3. Extract the backbone subgraph (everything up to and including the `model/GLOBAL_AVG_POOL/Mean_reduced_0` node), renaming the output to `embedding`.
69
+ 4. Save `model_backbone.onnx` and `birdnet_backbone.onnx`.
70
+ 5. Run a numerical comparison between ONNX and TF SavedModel embeddings on a fixed random waveform (seed 42, 3 s at 48 kHz).
71
+
72
+ Expected output:
73
+
74
+ ```
75
+ === Downloading models ===
76
+ Downloaded model.onnx -> ...
77
+ Downloaded birdnet.onnx -> ...
78
+ Downloading BirdNET protobuf from Zenodo...
79
+ Extracted audio-model -> ...
80
+
81
+ === Extracting backbones ===
82
+ Backbone saved -> model_backbone.onnx
83
+ inputs : ['INPUT']
84
+ outputs: ['embedding']
85
+ Backbone saved -> birdnet_backbone.onnx
86
+ inputs : ['input']
87
+ outputs: ['embedding']
88
+
89
+ === Comparing embeddings against Zenodo TF SavedModel ===
90
+ PB embedding shape: (1, 1024)
91
+
92
+ model_backend.onnx:
93
+ ONNX embedding shape: (1, 1024)
94
+ |diff| mean=1.230468e-06 max=9.298325e-06
95
+ Embeddings match PB reference with rtol=1e-03, atol=1e-03 PASSED
96
+
97
+ birdnet_backend.onnx:
98
+ ONNX embedding shape: (1, 1024)
99
+ |diff| mean=6.440870e-05 max=5.004406e-04
100
+ Embeddings match PB reference with rtol=1e-03, atol=1e-03 PASSED
101
+ ```
102
+
103
+ ---
104
+
105
+ ## How extraction works
106
+
107
+ The `_extract` function in `extract_backbone.py` performs a backwards BFS from the
108
+ `model/GLOBAL_AVG_POOL/Mean_reduced_0` output node (the global average pool), collecting
109
+ every node that contributes to that output and discarding everything downstream (the
110
+ classification dense layer). The output tensor is then renamed to `embedding`. It then
111
+ rebuilds a minimal ONNX graph containing only the retained nodes and their initializers.
112
+
113
+ ---
114
+
115
+ ## Credits
116
+
117
+ - Original ONNX conversion: [justinchuby/BirdNET-onnx](https://huggingface.co/justinchuby/BirdNET-onnx)
118
+ - Reference protobuf: [BirdNET_v2.4_protobuf on Zenodo](https://zenodo.org/records/15050749)