--- pretty_name: Google Perch v2 (ONNX) license: apache-2.0 tags: - audio - bird - nature - science - vocalization - bio - birds-classification - bioacoustics - onnx base_model: - cgeorgiaw/Perch --- # Google Perch v2 (ONNX) ONNX format of the Google Perch v2 bird vocalization classifier, repackaged for use with [BirdNET-Go](https://github.com/tphakala/birdnet-go). ## Origin and Attribution The Perch v2 model was developed by **Google Research** as part of the [bird-vocalization-classifier](https://www.kaggle.com/models/google/bird-vocalization-classifier/) project. It uses an EfficientNet-B3 architecture with approximately 12 million embedding parameters and 91 million classification parameters covering nearly 15,000 species. The ONNX conversion was performed by [justinchuby](https://huggingface.co/justinchuby/Perch-onnx), who also created an optimized variant with the DFT node converted to MatMul for additional speedup. The label file originates from the [cgeorgiaw/Perch](https://huggingface.co/cgeorgiaw/Perch) repository on HuggingFace. ## Files | File | Size | Description | |------|------|-------------| | `perch_v2.onnx` | 409 MB | Standard ONNX model (tolerance 1e-5 vs TFLite) | | `perch_v2_no_dft.onnx` | 413 MB | ONNX with DFT converted to MatMul, faster inference (tolerance 2e-4) | | `labels.txt` | 313 KB | Species labels (14,795 entries, iNaturalist taxonomy) | | `SHA256SUMS` | - | Checksums for integrity verification | ## Model Information ### Input - **Name:** `inputs` - **Shape:** `[batch, 160000]` (5 seconds at 32 kHz) - **Type:** `float32` ### Outputs | Output | Shape | Type | Description | |--------|-------|------|-------------| | `embedding` | `[batch, 1536]` | float32 | Audio embedding vector | | `spatial_embedding` | `[batch, 16, 4, 1536]` | float32 | Spatial features | | `spectrogram` | `[batch, 500, 128]` | float32 | Computed spectrogram | | `label` | `[batch, 14795]` | float32 | Species classification logits | ### Species Coverage - Approximately 10,000 bird species - Frogs, crickets, grasshoppers, and mammals - Based on training data from Xeno-Canto, iNaturalist, Animal Sound Archive, and FSD50k ### Performance (ONNX vs TFLite, 100 runs) | Metric | ONNX | TFLite | |--------|------|--------| | Mean | 66.4 ms | 608.8 ms | | Speedup | **9.2x faster** | baseline | ## License **Apache 2.0**, following the original Google Perch license. ## Credits - **[Google Research](https://www.kaggle.com/models/google/bird-vocalization-classifier/)** for the original Perch v2 model - **[justinchuby](https://huggingface.co/justinchuby/Perch-onnx)** for the ONNX conversion and DFT-to-MatMul optimization - **[cgeorgiaw](https://huggingface.co/cgeorgiaw/Perch)** for the Keras-compatible model and label files ## References - Perch model: [Google Bird Vocalization Classifier on Kaggle](https://www.kaggle.com/models/google/bird-vocalization-classifier/) - ONNX source: [justinchuby/Perch-onnx](https://huggingface.co/justinchuby/Perch-onnx) - Labels and Keras model: [cgeorgiaw/Perch](https://huggingface.co/cgeorgiaw/Perch) - Perch codebase: [google-research/perch](https://github.com/google-research/perch)