BSG-BAT ONNX Models

ONNX conversions of BSG-BAT v0.21, a convolutional neural network that identifies 21 European bat species from ultrasonic audio recordings.

Unlike embedding-based bat classifiers, BSG-BAT is a standalone end-to-end CNN. It takes a precomputed log-mel spectrogram of true 384 kHz ultrasonic audio and outputs per-species logits directly. There is no BirdNET backbone and no "slow-down" reinterpretation of the sample rate: the model sees real ultrasonic frequencies (9-150 kHz).

These ONNX files are a faithful, numerically verified conversion of the original PyTorch checkpoints. They are not retrained or modified.

How it works

384 kHz mono WAV
   -> log10 mel spectrogram  (n_fft=1024, hop=768, n_mels=128, fmin=9000, fmax=150000)
   -> per-segment normalize   (mean/std, subtract per-bin median, clip 0..6)
   -> sliding window           512 frames (~1.024 s), hop 250 frames (0.5 s)
   -> CNN  [1,1,512,128] -> [1,22] logits
   -> sigmoid                  (multi-label; prob = 1/(1+exp(-logit)))

Model architecture

Input: spectrogram [batch, 1, 512, 128] float32 (1 channel, 512 time frames, 128 mel bins)
Output: logits [batch, 22] float32 (21 species + Background)
Activation: sigmoid per class (multi-label, trained with BCEWithLogits)
Backbone: 6x Conv2d + max-pool, 3x fully connected (see original_code/supervised.py)

The spectrogram must be produced exactly as in original_code/data384.py (wav2spectrograms). The mel bin center frequencies are provided in original_code/mel128_freq9k_150k.txt.

Species (output index order)

bsgbat_labels.txt lists the 22 output classes in model index order:

0  Barbastella barbastellus      11 Pipistrellus nathusii
1  Eptesicus nilssonii           12 Pipistrellus pipistrellus
2  Eptesicus serotinus           13 Pipistrellus pygmaeus
3  Hypsugo savii                 14 Plecotus auritus
4  Miniopterus schreibersii      15 Plecotus austriacus
5  Myotis alcathoe               16 Rhinolophus euryale
6  Myotis crypticus              17 Rhinolophus ferrumequinum
7  Myotis daubentonii            18 Rhinolophus hipposideros
8  Nyctalus leisleri             19 Tadarida teniotis
9  Nyctalus noctula              20 Vespertilio murinus
10 Pipistrellus kuhlii           21 Background

Ensemble

Six independently trained checkpoints are provided (bsgbat_v0.21_r1.onnx through r6.onnx), matching the original release. The authors intend them to be used as an ensemble: run all six and combine the per-class logits (or probabilities) with min, max, mean, or median. A single checkpoint also works on its own.

Files

File	Description
`bsgbat_v0.21_r1.onnx` .. `r6.onnx`	The six ensemble checkpoints (FP32, ~83 MB each)
`bsgbat_labels.txt`	22 class labels in output index order
`original_code/`	Original BSG-BAT preprocessing and model code (for exact reproduction)
`original_code/mel128_freq9k_150k.txt`	Mel bin center frequencies
`original_code/species21bg`	Original label/index mapping
`export_onnx.py`	The script used to convert the PyTorch checkpoints to ONNX
`SHA256SUMS`	Checksums for all files

Usage with Python (ONNX Runtime)

import numpy as np
import librosa
import onnxruntime as ort

# 1. Build the spectrogram exactly as the model expects (see original_code/data384.py)
def wav_to_segments(wavfile, ntime=512, nhop=250, nfreq=128):
    y, sr = librosa.load(wavfile, sr=384000, mono=True, res_type="kaiser_fast")
    S = librosa.feature.melspectrogram(
        y=y, sr=sr, n_fft=1024, hop_length=768,
        n_mels=nfreq, fmin=9000, fmax=150000,
    ).T
    segs = []
    for start in range(0, max(len(S) - ntime, 0) + 1, nhop):
        seg = np.log10(S[start:start + ntime] + 1e-6)
        seg = (seg - seg.mean()) / seg.std()
        seg = np.clip(seg - np.median(seg, axis=0), 0.0, 6.0)
        segs.append(seg.astype(np.float32))
    return np.stack(segs)  # [n, 512, 128]

segments = wav_to_segments("bat_recording_384kHz.wav")
x = segments[:, None, :, :]  # [n, 1, 512, 128]

# 2. Run the ensemble and average the logits
sessions = [ort.InferenceSession(f"bsgbat_v0.21_r{i}.onnx",
                                 providers=["CPUExecutionProvider"]) for i in range(1, 7)]
logits = np.mean([s.run(["logits"], {"spectrogram": x})[0] for s in sessions], axis=0)
probs = 1.0 / (1.0 + np.exp(-logits))  # [n, 22], multi-label

labels = [l.strip() for l in open("bsgbat_labels.txt") if l.strip()]
detected = (probs > 0.5).any(axis=0)
for i, present in enumerate(detected):
    if present and labels[i] != "Background":
        print(f"{labels[i]}: max prob {probs[:, i].max():.2f}")

probs > 0.5 corresponds to the original default threshold (logit > 0). The original compute_logits.py writes per-segment logits so you can choose species-specific thresholds.

Conversion details

Source: BSG-BAT v0.21 PyTorch checkpoints (model_v0.21_r1.pt .. r6.pt).
Exported with PyTorch torch.onnx.export (dynamo exporter, opset 18), dynamic batch axis, weights stored inline (single self-contained .onnx).
Verified: for every checkpoint, PyTorch vs ONNX Runtime output max absolute difference < 3e-6 on random input.
The exporter script is included as export_onnx.py. The Net definition is copied verbatim from original_code/supervised.py.

Audio requirements

Sample rate: 384 kHz mono. Lower-rate recordings can be resampled to 384 kHz, but content above the source Nyquist will be absent. The model reads the 9-150 kHz band.
Codec: lossless (WAV/FLAC). Lossy codecs (AAC, MP3, Opus) discard ultrasonic content.

License

CC-BY-4.0, following the original BSG-BAT release. You may use, share, and adapt these models, including commercially, with attribution.

Citation and attribution

These ONNX files are a conversion of the original work:

bsg-bat team (2025). BSG-BAT (v0.21). Zenodo. https://doi.org/10.5281/zenodo.15495676

Acknowledgments

The bsg-bat team for the original BSG-BAT model, training, and code (CC-BY-4.0).
Conversion to ONNX by tphakala for use with BirdNET-Go and other ONNX Runtime pipelines.

Downloads last month: -; Downloads are not tracked for this model. How to track