🐱 Cat Distress Detection

Binary classifier that detects distress in cat vocalisations, distinguishing isolation meows (distress) from brushing and food-anticipation meows (normal). Returns both a label and a confidence probability for every prediction.

Dataset

CatMeows β€” Ntalampiras et al. (2019). 440 recordings from 21 cats (10 Maine Coon, 11 European Shorthair). Recorded via Bluetooth collar microphone at 8 kHz.

Context Files Label
Brushing 127 normal (0)
Isolation 221 distress (1)
Food anticipation 92 normal (0)

Preprocessing

Applied prior to feature extraction:

  • Native sample rate preserved at 8 kHz (hardware Nyquist = 4 kHz β€” no resampling)
  • DC offset removal
  • High-pass filter: 100 Hz, 5th-order Butterworth, zero-phase
  • Padded to 2.5 s with trailing zeros

Features (81 total)

Group Count Description
MFCCs 52 13 coefficients Γ— (mean, std) + delta Γ— (mean, std)
Spectral 20 centroid, bandwidth, rolloff, flatness, contrast (4 bands), ZCR
Spectral entropy 2 mean + std β€” captures tonal vs. noisy signal structure
Temporal 4 RMS mean/std, onset rate, temporal centroid
Pitch (F0) 3 mean, std, voiced ratio via pyin

Model

LightGBM classifier, hyperparameters tuned via Optuna (100 trials). Decision threshold optimised for F2 score (recall weighted 2Γ— over precision) β€” prioritises catching distress over avoiding false alarms, appropriate for welfare monitoring.

Evaluation

Strategy: Leave-one-cat-out (LOCO) cross-validation β€” trained on 20 cats, evaluated on the held-out cat, repeated for all evaluable cats. This tests generalisation to cats never seen during training.

Metric Value
Mean LOCO AUC 0.832 Β± 0.14
Min LOCO AUC 0.5
Max LOCO AUC 1.0
Cats with AUC β‰₯ 0.80 11 / 15
Decision threshold 0.135

Known limitations:

  • 21 cats is a small population β€” high individual variance (std=0.14) is expected
  • 5 cats could not be evaluated (only one context recorded): BRI01, CLE01, IND01, JJX01, LEO01
  • Lower performance for cats with few recordings or unusual vocal styles: REG01, WHO01, SPI01, CAN01
  • Recorded via collar microphone at 8 kHz β€” may not generalise to other recording setups

Usage

import joblib
import librosa
import numpy as np

# Load model artefacts
art = joblib.load("cat_distress_model_tuned.joblib")

# Load audio (must be 8 kHz β€” same preprocessing as training)
y, sr = librosa.load("your_cat_meow.wav", sr=8000, mono=True)

# Pad/trim to 2.5 s
n_samples = int(8000 * 2.5)
y = np.pad(y, (0, max(0, n_samples - len(y))))[:n_samples]

# Extract features using extract_features() from the training notebook
feats = extract_features(y, sr=art['native_sr'])
X = np.array([feats.get(c, 0.0) for c in art['feature_cols']]).reshape(1, -1)
X = art['scaler'].transform(X)

# Predict
proba = art['model'].predict_proba(X)[0, 1]
label = 'distress' if proba >= art['threshold'] else 'normal'

print(f"Prediction  : {label}")
print(f"Confidence  : {proba:.1%}")
print(f"Threshold   : {art['threshold']:.3f}")

Example outputs:

Prediction  : distress
Confidence  : 84.3%
Threshold   : 0.135

Prediction  : normal
Confidence  : 12.7%
Threshold   : 0.135

Interpreting confidence: Values close to the threshold (e.g. 0.10–0.20) indicate the model is uncertain. Values above 0.60 are high-confidence distress detections.

References

Downloads last month
13
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Space using belpekkan/cat_distress_detection 1

Evaluation results