BSG-BAT ONNX Models
ONNX conversions of BSG-BAT v0.21, a convolutional neural network that identifies 21 European bat species from ultrasonic audio recordings.
Unlike embedding-based bat classifiers, BSG-BAT is a standalone end-to-end CNN. It takes a precomputed log-mel spectrogram of true 384 kHz ultrasonic audio and outputs per-species logits directly. There is no BirdNET backbone and no "slow-down" reinterpretation of the sample rate: the model sees real ultrasonic frequencies (9-150 kHz).
These ONNX files are a faithful, numerically verified conversion of the original PyTorch checkpoints. They are not retrained or modified.
How it works
384 kHz mono WAV
-> log10 mel spectrogram (n_fft=1024, hop=768, n_mels=128, fmin=9000, fmax=150000)
-> per-segment normalize (mean/std, subtract per-bin median, clip 0..6)
-> sliding window 512 frames (~1.024 s), hop 250 frames (0.5 s)
-> CNN [1,1,512,128] -> [1,22] logits
-> sigmoid (multi-label; prob = 1/(1+exp(-logit)))
Model architecture
- Input:
spectrogram[batch, 1, 512, 128]float32 (1 channel, 512 time frames, 128 mel bins) - Output:
logits[batch, 22]float32 (21 species +Background) - Activation: sigmoid per class (multi-label, trained with BCEWithLogits)
- Backbone: 6x Conv2d + max-pool, 3x fully connected (see
original_code/supervised.py)
The spectrogram must be produced exactly as in original_code/data384.py (wav2spectrograms). The mel bin center frequencies are provided in original_code/mel128_freq9k_150k.txt.
Species (output index order)
bsgbat_labels.txt lists the 22 output classes in model index order:
0 Barbastella barbastellus 11 Pipistrellus nathusii
1 Eptesicus nilssonii 12 Pipistrellus pipistrellus
2 Eptesicus serotinus 13 Pipistrellus pygmaeus
3 Hypsugo savii 14 Plecotus auritus
4 Miniopterus schreibersii 15 Plecotus austriacus
5 Myotis alcathoe 16 Rhinolophus euryale
6 Myotis crypticus 17 Rhinolophus ferrumequinum
7 Myotis daubentonii 18 Rhinolophus hipposideros
8 Nyctalus leisleri 19 Tadarida teniotis
9 Nyctalus noctula 20 Vespertilio murinus
10 Pipistrellus kuhlii 21 Background
Ensemble
Six independently trained checkpoints are provided (bsgbat_v0.21_r1.onnx through r6.onnx), matching the original release. The authors intend them to be used as an ensemble: run all six and combine the per-class logits (or probabilities) with min, max, mean, or median. A single checkpoint also works on its own.
Files
| File | Description |
|---|---|
bsgbat_v0.21_r1.onnx .. r6.onnx |
The six ensemble checkpoints (FP32, ~83 MB each) |
bsgbat_labels.txt |
22 class labels in output index order |
original_code/ |
Original BSG-BAT preprocessing and model code (for exact reproduction) |
original_code/mel128_freq9k_150k.txt |
Mel bin center frequencies |
original_code/species21bg |
Original label/index mapping |
export_onnx.py |
The script used to convert the PyTorch checkpoints to ONNX |
SHA256SUMS |
Checksums for all files |
Usage with Python (ONNX Runtime)
import numpy as np
import librosa
import onnxruntime as ort
# 1. Build the spectrogram exactly as the model expects (see original_code/data384.py)
def wav_to_segments(wavfile, ntime=512, nhop=250, nfreq=128):
y, sr = librosa.load(wavfile, sr=384000, mono=True, res_type="kaiser_fast")
S = librosa.feature.melspectrogram(
y=y, sr=sr, n_fft=1024, hop_length=768,
n_mels=nfreq, fmin=9000, fmax=150000,
).T
segs = []
for start in range(0, max(len(S) - ntime, 0) + 1, nhop):
seg = np.log10(S[start:start + ntime] + 1e-6)
seg = (seg - seg.mean()) / seg.std()
seg = np.clip(seg - np.median(seg, axis=0), 0.0, 6.0)
segs.append(seg.astype(np.float32))
return np.stack(segs) # [n, 512, 128]
segments = wav_to_segments("bat_recording_384kHz.wav")
x = segments[:, None, :, :] # [n, 1, 512, 128]
# 2. Run the ensemble and average the logits
sessions = [ort.InferenceSession(f"bsgbat_v0.21_r{i}.onnx",
providers=["CPUExecutionProvider"]) for i in range(1, 7)]
logits = np.mean([s.run(["logits"], {"spectrogram": x})[0] for s in sessions], axis=0)
probs = 1.0 / (1.0 + np.exp(-logits)) # [n, 22], multi-label
labels = [l.strip() for l in open("bsgbat_labels.txt") if l.strip()]
detected = (probs > 0.5).any(axis=0)
for i, present in enumerate(detected):
if present and labels[i] != "Background":
print(f"{labels[i]}: max prob {probs[:, i].max():.2f}")
probs > 0.5 corresponds to the original default threshold (logit > 0). The original compute_logits.py writes per-segment logits so you can choose species-specific thresholds.
Conversion details
- Source: BSG-BAT v0.21 PyTorch checkpoints (
model_v0.21_r1.pt..r6.pt). - Exported with PyTorch
torch.onnx.export(dynamo exporter, opset 18), dynamic batch axis, weights stored inline (single self-contained.onnx). - Verified: for every checkpoint, PyTorch vs ONNX Runtime output max absolute difference < 3e-6 on random input.
- The exporter script is included as
export_onnx.py. TheNetdefinition is copied verbatim fromoriginal_code/supervised.py.
Audio requirements
- Sample rate: 384 kHz mono. Lower-rate recordings can be resampled to 384 kHz, but content above the source Nyquist will be absent. The model reads the 9-150 kHz band.
- Codec: lossless (WAV/FLAC). Lossy codecs (AAC, MP3, Opus) discard ultrasonic content.
License
CC-BY-4.0, following the original BSG-BAT release. You may use, share, and adapt these models, including commercially, with attribution.
Citation and attribution
These ONNX files are a conversion of the original work:
bsg-bat team (2025). BSG-BAT (v0.21). Zenodo. https://doi.org/10.5281/zenodo.15495676
Acknowledgments
- The bsg-bat team for the original BSG-BAT model, training, and code (CC-BY-4.0).
- Conversion to ONNX by tphakala for use with BirdNET-Go and other ONNX Runtime pipelines.