🐱 Cat Distress Detection

Binary classifier that detects distress in cat vocalisations, distinguishing isolation meows (distress) from brushing and food-anticipation meows (normal). Returns both a label and a confidence probability for every prediction.

Dataset

CatMeows — Ntalampiras et al. (2019). 440 recordings from 21 cats (10 Maine Coon, 11 European Shorthair). Recorded via Bluetooth collar microphone at 8 kHz.

Context	Files	Label
Brushing	127	normal (0)
Isolation	221	distress (1)
Food anticipation	92	normal (0)

Preprocessing

Applied prior to feature extraction:

Native sample rate preserved at 8 kHz (hardware Nyquist = 4 kHz — no resampling)
DC offset removal
High-pass filter: 100 Hz, 5th-order Butterworth, zero-phase
Padded to 2.5 s with trailing zeros

Features (81 total)

Group	Count	Description
MFCCs	52	13 coefficients × (mean, std) + delta × (mean, std)
Spectral	20	centroid, bandwidth, rolloff, flatness, contrast (4 bands), ZCR
Spectral entropy	2	mean + std — captures tonal vs. noisy signal structure
Temporal	4	RMS mean/std, onset rate, temporal centroid
Pitch (F0)	3	mean, std, voiced ratio via pyin

Model

LightGBM classifier, hyperparameters tuned via Optuna (100 trials). Decision threshold optimised for F2 score (recall weighted 2× over precision) — prioritises catching distress over avoiding false alarms, appropriate for welfare monitoring.

Evaluation

Strategy: Leave-one-cat-out (LOCO) cross-validation — trained on 20 cats, evaluated on the held-out cat, repeated for all evaluable cats. This tests generalisation to cats never seen during training.

Metric	Value
Mean LOCO AUC	0.832 ± 0.14
Min LOCO AUC	0.5
Max LOCO AUC	1.0
Cats with AUC ≥ 0.80	11 / 15
Decision threshold	0.135

Known limitations:

21 cats is a small population — high individual variance (std=0.14) is expected
5 cats could not be evaluated (only one context recorded): BRI01, CLE01, IND01, JJX01, LEO01
Lower performance for cats with few recordings or unusual vocal styles: REG01, WHO01, SPI01, CAN01
Recorded via collar microphone at 8 kHz — may not generalise to other recording setups

Usage

import joblib
import librosa
import numpy as np

# Load model artefacts
art = joblib.load("cat_distress_model_tuned.joblib")

# Load audio (must be 8 kHz — same preprocessing as training)
y, sr = librosa.load("your_cat_meow.wav", sr=8000, mono=True)

# Pad/trim to 2.5 s
n_samples = int(8000 * 2.5)
y = np.pad(y, (0, max(0, n_samples - len(y))))[:n_samples]

# Extract features using extract_features() from the training notebook
feats = extract_features(y, sr=art['native_sr'])
X = np.array([feats.get(c, 0.0) for c in art['feature_cols']]).reshape(1, -1)
X = art['scaler'].transform(X)

# Predict
proba = art['model'].predict_proba(X)[0, 1]
label = 'distress' if proba >= art['threshold'] else 'normal'

print(f"Prediction  : {label}")
print(f"Confidence  : {proba:.1%}")
print(f"Threshold   : {art['threshold']:.3f}")

Example outputs:

Prediction  : distress
Confidence  : 84.3%
Threshold   : 0.135

Prediction  : normal
Confidence  : 12.7%
Threshold   : 0.135

Interpreting confidence: Values close to the threshold (e.g. 0.10–0.20) indicate the model is uncertain. Values above 0.60 are high-confidence distress detections.

References

Ntalampiras et al. (2019). Automatic Classification of Cat Vocalizations Emitted in Different Contexts. Animals 9(8):543. https://doi.org/10.3390/ani9080543
CatMeows dataset: https://zenodo.org/records/4008297

Downloads last month: 2

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using belpekkan/cat_distress_detection 1

Evaluation results

Mean LOCO AUC
self-reported

0.832
Decision threshold (F2-optimised)
self-reported

0.135