Perch v2 ONNX Backbone

Backbone-only ONNX exports of the Perch v2 bird vocalization classifier. The classification head has been removed, leaving only frontend + feature-extraction.

Two variants are provided, matching the originals from justinchuby/Perch-onnx: perch_v2_backbone.onnx and perch_v2_no_dft_backbone.onnx. Both models output a single tensor named embedding with shape (1, 1536).

Embeddings are numerically verified against the reference TF SavedModel published on Kaggle (google/bird-vocalization-classifier).

Quick start

import numpy as np
import onnxruntime as ort
from huggingface_hub import hf_hub_download

# Download backbone
path = hf_hub_download(
    repo_id="biodiversica/Perch-onnx-backbone",
    filename="perch_v2_backbone.onnx",
)

sess = ort.InferenceSession(path)

# 5 s of audio at 32 kHz
audio = np.zeros((1, 160000), dtype=np.float32)
(embedding,) = sess.run(["embedding"], {"inputs": audio})
print(embedding.shape)  # (1, 1536)

Extraction procedure

The extraction and testing procedure can be reproduced using extract_backbone.py. The script will:

Download perch_v2.onnx and perch_v2_no_dft.onnx from justinchuby/Perch-onnx.
Download the Perch v2 TF SavedModel from Kaggle (google/bird-vocalization-classifier).
Extract the backbone subgraph (everything up to and including the embedding node).
Save perch_v2_backbone.onnx and perch_v2_no_dft_backbone.onnx.
Run a numerical comparison between ONNX and TF SavedModel embeddings on a fixed random waveform (seed 42, 5s at 32 kHz).

Expected output:

=== Downloading models ===
Downloaded perch_v2.onnx -> ...
Downloaded perch_v2_no_dft.onnx -> ...
Downloaded Kaggle model -> ...

=== Extracting backbones ===
Backbone saved -> perch_v2_backbone.onnx
  inputs : ['inputs']
  outputs: ['embedding']
Backbone saved -> perch_v2_no_dft_backbone.onnx
  inputs : ['inputs']
  outputs: ['embedding']

=== Comparing embeddings against Kaggle TF SavedModel ===
PB embedding shape: (1, 1536)

perch_v2:
  ONNX embedding shape: (1, 1536)
  |diff| mean=<small value>  max=<small value>
  Embeddings match PB reference  PASSED

perch_v2_no_dft:
  ONNX embedding shape: (1, 1536)
  |diff| mean=<small value>  max=<small value>
  Embeddings match PB reference  PASSED

How extraction works

The _extract function in extract_backbone.py performs a backwards BFS from the embedding output node, collecting every node that contributes to that output and discarding everything downstream (the classification head). It then rebuilds a minimal ONNX graph containing only the retained nodes and their initializers.

Credits

Original ONNX conversion: justinchuby/Perch-onnx
Original model: Google, bird-vocalization-classifier on Kaggle
Perch paper: Google Research — Perch

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for biodiversica/Perch-onnx-backbone

Base model

cgeorgiaw/Perch

Quantized

justinchuby/Perch-onnx

Quantized

(1)

this model

Paper for biodiversica/Perch-onnx-backbone

Perch 2.0: The Bittern Lesson for Bioacoustics

Paper • 2508.04665 • Published Aug 6, 2025 • 2