Perch-v2 / README.md
tphakala's picture
Upload folder using huggingface_hub
c29bb4e verified
---
pretty_name: Google Perch v2 (ONNX)
license: apache-2.0
tags:
- audio
- bird
- nature
- science
- vocalization
- bio
- birds-classification
- bioacoustics
- onnx
base_model:
- cgeorgiaw/Perch
---
# Google Perch v2 (ONNX)
ONNX format of the Google Perch v2 bird vocalization classifier, repackaged for use with [BirdNET-Go](https://github.com/tphakala/birdnet-go).
## Origin and Attribution
The Perch v2 model was developed by **Google Research** as part of the [bird-vocalization-classifier](https://www.kaggle.com/models/google/bird-vocalization-classifier/) project. It uses an EfficientNet-B3 architecture with approximately 12 million embedding parameters and 91 million classification parameters covering nearly 15,000 species.
The ONNX conversion was performed by [justinchuby](https://huggingface.co/justinchuby/Perch-onnx), who also created an optimized variant with the DFT node converted to MatMul for additional speedup.
The label file originates from the [cgeorgiaw/Perch](https://huggingface.co/cgeorgiaw/Perch) repository on HuggingFace.
## Files
| File | Size | Description |
|------|------|-------------|
| `perch_v2.onnx` | 409 MB | Standard ONNX model (tolerance 1e-5 vs TFLite) |
| `perch_v2_no_dft.onnx` | 413 MB | ONNX with DFT converted to MatMul, faster inference (tolerance 2e-4) |
| `labels.txt` | 313 KB | Species labels (14,795 entries, iNaturalist taxonomy) |
| `SHA256SUMS` | - | Checksums for integrity verification |
## Model Information
### Input
- **Name:** `inputs`
- **Shape:** `[batch, 160000]` (5 seconds at 32 kHz)
- **Type:** `float32`
### Outputs
| Output | Shape | Type | Description |
|--------|-------|------|-------------|
| `embedding` | `[batch, 1536]` | float32 | Audio embedding vector |
| `spatial_embedding` | `[batch, 16, 4, 1536]` | float32 | Spatial features |
| `spectrogram` | `[batch, 500, 128]` | float32 | Computed spectrogram |
| `label` | `[batch, 14795]` | float32 | Species classification logits |
### Species Coverage
- Approximately 10,000 bird species
- Frogs, crickets, grasshoppers, and mammals
- Based on training data from Xeno-Canto, iNaturalist, Animal Sound Archive, and FSD50k
### Performance (ONNX vs TFLite, 100 runs)
| Metric | ONNX | TFLite |
|--------|------|--------|
| Mean | 66.4 ms | 608.8 ms |
| Speedup | **9.2x faster** | baseline |
## License
**Apache 2.0**, following the original Google Perch license.
## Credits
- **[Google Research](https://www.kaggle.com/models/google/bird-vocalization-classifier/)** for the original Perch v2 model
- **[justinchuby](https://huggingface.co/justinchuby/Perch-onnx)** for the ONNX conversion and DFT-to-MatMul optimization
- **[cgeorgiaw](https://huggingface.co/cgeorgiaw/Perch)** for the Keras-compatible model and label files
## References
- Perch model: [Google Bird Vocalization Classifier on Kaggle](https://www.kaggle.com/models/google/bird-vocalization-classifier/)
- ONNX source: [justinchuby/Perch-onnx](https://huggingface.co/justinchuby/Perch-onnx)
- Labels and Keras model: [cgeorgiaw/Perch](https://huggingface.co/cgeorgiaw/Perch)
- Perch codebase: [google-research/perch](https://github.com/google-research/perch)