Perch V2 — Optimized TFLite Models for Raspberry Pi

Optimized variants of Google's Perch V2 bird vocalization classifier for edge deployment on Raspberry Pi and ARM64 devices.

Three model variants converted directly from the official Google SavedModel, each targeting a different performance/quality trade-off.

Models

Model	Size	Inference (RPi 5)	Embedding cosine	Top-1 agree	Top-5 agree	Best for
`perch_v2_original.tflite`	409 MB	435 ms	baseline	baseline	baseline	Reference / high-RAM devices
`perch_v2_fp16.tflite`	205 MB	384 ms	0.9999	100%	99%	RPi 5 (recommended)
`perch_v2_dynint8.tflite`	105 MB	299 ms	0.9927	93%	90%	RPi 4 / low-RAM devices

Benchmarked on Raspberry Pi 5 Model B (8GB, Cortex-A76 @ 2.4GHz), 20 real bird recordings from 20 species, 5 runs each, 4 threads.

Quick Start

Choose your model

RPi 5 (4-8 GB): Use perch_v2_fp16.tflite — near-perfect accuracy, 2x smaller than original
RPi 4 (2-4 GB): Use perch_v2_dynint8.tflite — 4x smaller, 31% faster, very good accuracy
Desktop / reference: Use perch_v2_original.tflite — exact Google baseline

Usage

# Works with ai-edge-litert, tflite-runtime, or tensorflow
from ai_edge_litert.interpreter import Interpreter
import numpy as np

model_path = "perch_v2_fp16.tflite"  # or dynint8, or original
interpreter = Interpreter(model_path=model_path, num_threads=4)
interpreter.allocate_tensors()

inp = interpreter.get_input_details()
out = interpreter.get_output_details()

# Input: 5 seconds of audio at 32 kHz
audio = np.zeros((1, 160000), dtype=np.float32)  # replace with real audio
interpreter.set_tensor(inp[0]["index"], audio)
interpreter.invoke()

# Get species logits (14,795 classes)
logits = interpreter.get_tensor(out[3]["index"])[0]
top_species = np.argsort(logits)[-5:][::-1]

Download a single model

from huggingface_hub import hf_hub_download

# Download only the model you need
model_path = hf_hub_download(
    "ernensbjorn/perch-v2-int8-tflite",
    "perch_v2_fp16.tflite"
)

Model Details

Architecture

Backbone: EfficientNet-B3 (~12M params for embeddings)
Classification head: ~~91M params (~~101.8M total)
Input: 5.0 seconds @ 32,000 Hz = 160,000 float32 samples
Outputs:
- Index 0: Spatial embeddings (16 x 4 x 1536)
- Index 1: Temporal features
- Index 2: 1536-dim global embedding
- Index 3: 14,795 species logits (use this for classification)

Species Coverage

~~10,340 bird species + frogs, insects, mammals (~~14,795 total classes).

Use the included labels.txt for class names and bird_indices.json to filter bird-only species.

Quantization Methods

Variant	Method	What's quantized	File size reduction
original	None (float32 baseline)	Nothing	1x
fp16	TFLite float16 quantization	Weights stored as float16, dequantized at runtime	2x smaller
dynint8	TFLite dynamic range quantization	Weights quantized to int8, activations remain float32	4x smaller

All variants were converted directly from the official Google Perch V2 SavedModel using tf.lite.TFLiteConverter with appropriate optimization flags. No binary patching or post-hoc manipulation.

Detailed Benchmarks

Raspberry Pi 5 (8 GB, Cortex-A76 @ 2.4 GHz, 4 threads)

Model	Size	p50 latency	p95 latency	Embedding cosine (mean)	Embedding cosine (min)	Top-1	Top-5
original	409 MB	435 ms	534 ms	baseline	baseline	baseline	baseline
fp16	205 MB	384 ms	477 ms	0.999994	0.999991	100%	99%
dynint8	105 MB	299 ms	405 ms	0.992748	0.972732	93%	90%

Embedding cosine: Cosine similarity of the 1536-dim embedding vector vs the float32 baseline. Values > 0.99 indicate negligible quality loss for downstream tasks.
Top-1/Top-5 agreement: How often the quantized model's top predicted species matches the original's prediction.
Test data: 20 real field recordings from 20 species (Rougegorge familier, Courlis cendré, Grive mauvis, Sarcelle d'hiver, Râle d'eau, etc.)

Raspberry Pi 4 Estimates

The RPi 4 (Cortex-A72 @ 1.8 GHz) is roughly 2-3x slower than the RPi 5. Expected latencies:

Model	Estimated p50	RAM needed
original	~1000-1300 ms	~500 MB
fp16	~900-1150 ms	~300 MB
dynint8	~700-900 ms	~150 MB

For RPi 4 with 2 GB RAM, dynint8 is strongly recommended.

Origin

Converted from the official Google Perch V2 SavedModel (hosted by Google researcher cgeorgiaw on HuggingFace).

Created as part of the Birdash project — an open-source bird detection dashboard and engine for Raspberry Pi.

License

Apache 2.0 (same as the original Perch V2 model by Google)

Citation

If you use these models, please cite the original Perch V2 work:

@article{ghani2023global,
  title={Global birdsong embeddings enable superior transfer learning for bioacoustic classification},
  author={Ghani, Burooj and Denton, Tom and Kahl, Stefan and Klinck, Holger},
  journal={Scientific Reports},
  year={2023}
}

Downloads last month: 178