File size: 3,179 Bytes
c29bb4e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
---
pretty_name: Google Perch v2 (ONNX)
license: apache-2.0
tags:
- audio
- bird
- nature
- science
- vocalization
- bio
- birds-classification
- bioacoustics
- onnx
base_model:
- cgeorgiaw/Perch
---

# Google Perch v2 (ONNX)

ONNX format of the Google Perch v2 bird vocalization classifier, repackaged for use with [BirdNET-Go](https://github.com/tphakala/birdnet-go).

## Origin and Attribution

The Perch v2 model was developed by **Google Research** as part of the [bird-vocalization-classifier](https://www.kaggle.com/models/google/bird-vocalization-classifier/) project. It uses an EfficientNet-B3 architecture with approximately 12 million embedding parameters and 91 million classification parameters covering nearly 15,000 species.

The ONNX conversion was performed by [justinchuby](https://huggingface.co/justinchuby/Perch-onnx), who also created an optimized variant with the DFT node converted to MatMul for additional speedup.

The label file originates from the [cgeorgiaw/Perch](https://huggingface.co/cgeorgiaw/Perch) repository on HuggingFace.

## Files

| File | Size | Description |
|------|------|-------------|
| `perch_v2.onnx` | 409 MB | Standard ONNX model (tolerance 1e-5 vs TFLite) |
| `perch_v2_no_dft.onnx` | 413 MB | ONNX with DFT converted to MatMul, faster inference (tolerance 2e-4) |
| `labels.txt` | 313 KB | Species labels (14,795 entries, iNaturalist taxonomy) |
| `SHA256SUMS` | - | Checksums for integrity verification |

## Model Information

### Input

- **Name:** `inputs`
- **Shape:** `[batch, 160000]` (5 seconds at 32 kHz)
- **Type:** `float32`

### Outputs

| Output | Shape | Type | Description |
|--------|-------|------|-------------|
| `embedding` | `[batch, 1536]` | float32 | Audio embedding vector |
| `spatial_embedding` | `[batch, 16, 4, 1536]` | float32 | Spatial features |
| `spectrogram` | `[batch, 500, 128]` | float32 | Computed spectrogram |
| `label` | `[batch, 14795]` | float32 | Species classification logits |

### Species Coverage

- Approximately 10,000 bird species
- Frogs, crickets, grasshoppers, and mammals
- Based on training data from Xeno-Canto, iNaturalist, Animal Sound Archive, and FSD50k

### Performance (ONNX vs TFLite, 100 runs)

| Metric | ONNX | TFLite |
|--------|------|--------|
| Mean | 66.4 ms | 608.8 ms |
| Speedup | **9.2x faster** | baseline |

## License

**Apache 2.0**, following the original Google Perch license.

## Credits

- **[Google Research](https://www.kaggle.com/models/google/bird-vocalization-classifier/)** for the original Perch v2 model
- **[justinchuby](https://huggingface.co/justinchuby/Perch-onnx)** for the ONNX conversion and DFT-to-MatMul optimization
- **[cgeorgiaw](https://huggingface.co/cgeorgiaw/Perch)** for the Keras-compatible model and label files

## References

- Perch model: [Google Bird Vocalization Classifier on Kaggle](https://www.kaggle.com/models/google/bird-vocalization-classifier/)
- ONNX source: [justinchuby/Perch-onnx](https://huggingface.co/justinchuby/Perch-onnx)
- Labels and Keras model: [cgeorgiaw/Perch](https://huggingface.co/cgeorgiaw/Perch)
- Perch codebase: [google-research/perch](https://github.com/google-research/perch)