π± Cat Distress Detection
Binary classifier that detects distress in cat vocalisations, distinguishing isolation meows (distress) from brushing and food-anticipation meows (normal). Returns both a label and a confidence probability for every prediction.
Dataset
CatMeows β Ntalampiras et al. (2019). 440 recordings from 21 cats (10 Maine Coon, 11 European Shorthair). Recorded via Bluetooth collar microphone at 8 kHz.
| Context | Files | Label |
|---|---|---|
| Brushing | 127 | normal (0) |
| Isolation | 221 | distress (1) |
| Food anticipation | 92 | normal (0) |
Preprocessing
Applied prior to feature extraction:
- Native sample rate preserved at 8 kHz (hardware Nyquist = 4 kHz β no resampling)
- DC offset removal
- High-pass filter: 100 Hz, 5th-order Butterworth, zero-phase
- Padded to 2.5 s with trailing zeros
Features (81 total)
| Group | Count | Description |
|---|---|---|
| MFCCs | 52 | 13 coefficients Γ (mean, std) + delta Γ (mean, std) |
| Spectral | 20 | centroid, bandwidth, rolloff, flatness, contrast (4 bands), ZCR |
| Spectral entropy | 2 | mean + std β captures tonal vs. noisy signal structure |
| Temporal | 4 | RMS mean/std, onset rate, temporal centroid |
| Pitch (F0) | 3 | mean, std, voiced ratio via pyin |
Model
LightGBM classifier, hyperparameters tuned via Optuna (100 trials). Decision threshold optimised for F2 score (recall weighted 2Γ over precision) β prioritises catching distress over avoiding false alarms, appropriate for welfare monitoring.
Evaluation
Strategy: Leave-one-cat-out (LOCO) cross-validation β trained on 20 cats, evaluated on the held-out cat, repeated for all evaluable cats. This tests generalisation to cats never seen during training.
| Metric | Value |
|---|---|
| Mean LOCO AUC | 0.832 Β± 0.14 |
| Min LOCO AUC | 0.5 |
| Max LOCO AUC | 1.0 |
| Cats with AUC β₯ 0.80 | 11 / 15 |
| Decision threshold | 0.135 |
Known limitations:
- 21 cats is a small population β high individual variance (std=0.14) is expected
- 5 cats could not be evaluated (only one context recorded): BRI01, CLE01, IND01, JJX01, LEO01
- Lower performance for cats with few recordings or unusual vocal styles: REG01, WHO01, SPI01, CAN01
- Recorded via collar microphone at 8 kHz β may not generalise to other recording setups
Usage
import joblib
import librosa
import numpy as np
# Load model artefacts
art = joblib.load("cat_distress_model_tuned.joblib")
# Load audio (must be 8 kHz β same preprocessing as training)
y, sr = librosa.load("your_cat_meow.wav", sr=8000, mono=True)
# Pad/trim to 2.5 s
n_samples = int(8000 * 2.5)
y = np.pad(y, (0, max(0, n_samples - len(y))))[:n_samples]
# Extract features using extract_features() from the training notebook
feats = extract_features(y, sr=art['native_sr'])
X = np.array([feats.get(c, 0.0) for c in art['feature_cols']]).reshape(1, -1)
X = art['scaler'].transform(X)
# Predict
proba = art['model'].predict_proba(X)[0, 1]
label = 'distress' if proba >= art['threshold'] else 'normal'
print(f"Prediction : {label}")
print(f"Confidence : {proba:.1%}")
print(f"Threshold : {art['threshold']:.3f}")
Example outputs:
Prediction : distress
Confidence : 84.3%
Threshold : 0.135
Prediction : normal
Confidence : 12.7%
Threshold : 0.135
Interpreting confidence: Values close to the threshold (e.g. 0.10β0.20) indicate the model is uncertain. Values above 0.60 are high-confidence distress detections.
References
- Ntalampiras et al. (2019). Automatic Classification of Cat Vocalizations Emitted in Different Contexts. Animals 9(8):543. https://doi.org/10.3390/ani9080543
- CatMeows dataset: https://zenodo.org/records/4008297
- Downloads last month
- 13
Space using belpekkan/cat_distress_detection 1
Evaluation results
- Mean LOCO AUCself-reported0.832
- Decision threshold (F2-optimised)self-reported0.135