| --- |
| pretty_name: Google Perch v2 (ONNX) |
| license: apache-2.0 |
| tags: |
| - audio |
| - bird |
| - nature |
| - science |
| - vocalization |
| - bio |
| - birds-classification |
| - bioacoustics |
| - onnx |
| base_model: |
| - cgeorgiaw/Perch |
| --- |
| |
| # Google Perch v2 (ONNX) |
|
|
| ONNX format of the Google Perch v2 bird vocalization classifier, repackaged for use with [BirdNET-Go](https://github.com/tphakala/birdnet-go). |
|
|
| ## Origin and Attribution |
|
|
| The Perch v2 model was developed by **Google Research** as part of the [bird-vocalization-classifier](https://www.kaggle.com/models/google/bird-vocalization-classifier/) project. It uses an EfficientNet-B3 architecture with approximately 12 million embedding parameters and 91 million classification parameters covering nearly 15,000 species. |
|
|
| The ONNX conversion was performed by [justinchuby](https://huggingface.co/justinchuby/Perch-onnx), who also created an optimized variant with the DFT node converted to MatMul for additional speedup. |
|
|
| The label file originates from the [cgeorgiaw/Perch](https://huggingface.co/cgeorgiaw/Perch) repository on HuggingFace. |
|
|
| ## Files |
|
|
| | File | Size | Description | |
| |------|------|-------------| |
| | `perch_v2.onnx` | 409 MB | Standard ONNX model (tolerance 1e-5 vs TFLite) | |
| | `perch_v2_no_dft.onnx` | 413 MB | ONNX with DFT converted to MatMul, faster inference (tolerance 2e-4) | |
| | `labels.txt` | 313 KB | Species labels (14,795 entries, iNaturalist taxonomy) | |
| | `SHA256SUMS` | - | Checksums for integrity verification | |
|
|
| ## Model Information |
|
|
| ### Input |
|
|
| - **Name:** `inputs` |
| - **Shape:** `[batch, 160000]` (5 seconds at 32 kHz) |
| - **Type:** `float32` |
|
|
| ### Outputs |
|
|
| | Output | Shape | Type | Description | |
| |--------|-------|------|-------------| |
| | `embedding` | `[batch, 1536]` | float32 | Audio embedding vector | |
| | `spatial_embedding` | `[batch, 16, 4, 1536]` | float32 | Spatial features | |
| | `spectrogram` | `[batch, 500, 128]` | float32 | Computed spectrogram | |
| | `label` | `[batch, 14795]` | float32 | Species classification logits | |
|
|
| ### Species Coverage |
|
|
| - Approximately 10,000 bird species |
| - Frogs, crickets, grasshoppers, and mammals |
| - Based on training data from Xeno-Canto, iNaturalist, Animal Sound Archive, and FSD50k |
|
|
| ### Performance (ONNX vs TFLite, 100 runs) |
|
|
| | Metric | ONNX | TFLite | |
| |--------|------|--------| |
| | Mean | 66.4 ms | 608.8 ms | |
| | Speedup | **9.2x faster** | baseline | |
|
|
| ## License |
|
|
| **Apache 2.0**, following the original Google Perch license. |
|
|
| ## Credits |
|
|
| - **[Google Research](https://www.kaggle.com/models/google/bird-vocalization-classifier/)** for the original Perch v2 model |
| - **[justinchuby](https://huggingface.co/justinchuby/Perch-onnx)** for the ONNX conversion and DFT-to-MatMul optimization |
| - **[cgeorgiaw](https://huggingface.co/cgeorgiaw/Perch)** for the Keras-compatible model and label files |
|
|
| ## References |
|
|
| - Perch model: [Google Bird Vocalization Classifier on Kaggle](https://www.kaggle.com/models/google/bird-vocalization-classifier/) |
| - ONNX source: [justinchuby/Perch-onnx](https://huggingface.co/justinchuby/Perch-onnx) |
| - Labels and Keras model: [cgeorgiaw/Perch](https://huggingface.co/cgeorgiaw/Perch) |
| - Perch codebase: [google-research/perch](https://github.com/google-research/perch) |
|
|