--- language: en tags: - audio - audio-classification - bee - hive-monitoring - beekeeping library_name: sklearn license: mit metrics: - accuracy - f1 --- # Bee Audio Classifier 5-class audio classifier for bee colony health monitoring. Trained on segmented hive recordings using MFCC-based feature extraction. > Last updated: 2026-05-16 21:27 UTC ## Classes | Label | Description | |---|---| | (classes not found) | — | ## Model performance | File | Description | Accuracy | F1 (weighted) | |---|---|---|---| | `bee_cnn_classifier.h5` | CNN (Mel Spectrogram) | — | — | | `best_cnn.h5` | CNN checkpoint | — | — | `label_encoder.pkl` is required by all classical ML models. `cnn_label_encoder.pkl` is required by the CNN models. ## Feature extraction (0 features per 5-second segment) - 40 MFCCs × (mean + std) = 80 - 40 delta-MFCCs × mean = 40 - 12 Chroma × (mean + std) = 24 - Mel spectrogram stats (mean, std, max, min) = 4 - Spectral centroid (mean + std) = 2 - Spectral bandwidth (mean + std) = 2 - Spectral rolloff (mean + std) = 2 - Spectral contrast × 7 × mean = 7 - Zero crossing rate (mean + std) = 2 - RMS energy (mean + std) = 2 - Tonnetz × 6 × mean = 6 ## Quick Python usage ```python import joblib import librosa import numpy as np model = joblib.load("random_forest_model.pkl") le = joblib.load("label_encoder.pkl") def extract_features(y, sr, n_mfcc=40, hop_length=512, n_fft=2048): feats = {} mfcc = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=n_mfcc, n_fft=n_fft, hop_length=hop_length) for i in range(n_mfcc): feats[f"mfcc_{i}_mean"] = np.mean(mfcc[i]) feats[f"mfcc_{i}_std"] = np.std(mfcc[i]) delta = librosa.feature.delta(mfcc) for i in range(n_mfcc): feats[f"mfcc_delta_{i}_mean"] = np.mean(delta[i]) chroma = librosa.feature.chroma_stft(y=y, sr=sr, n_fft=n_fft, hop_length=hop_length) for i in range(12): feats[f"chroma_{i}_mean"] = np.mean(chroma[i]) feats[f"chroma_{i}_std"] = np.std(chroma[i]) mel = librosa.feature.melspectrogram(y=y, sr=sr, hop_length=hop_length) mel_db = librosa.power_to_db(mel, ref=np.max) feats["mel_mean"] = np.mean(mel_db); feats["mel_std"] = np.std(mel_db) feats["mel_max"] = np.max(mel_db); feats["mel_min"] = np.min(mel_db) sc = librosa.feature.spectral_centroid(y=y, sr=sr, hop_length=hop_length) feats["spectral_centroid_mean"] = np.mean(sc); feats["spectral_centroid_std"] = np.std(sc) sb = librosa.feature.spectral_bandwidth(y=y, sr=sr, hop_length=hop_length) feats["spectral_bandwidth_mean"] = np.mean(sb); feats["spectral_bandwidth_std"] = np.std(sb) sr_f = librosa.feature.spectral_rolloff(y=y, sr=sr, hop_length=hop_length) feats["spectral_rolloff_mean"] = np.mean(sr_f); feats["spectral_rolloff_std"] = np.std(sr_f) contrast = librosa.feature.spectral_contrast(y=y, sr=sr, hop_length=hop_length) for i in range(contrast.shape[0]): feats[f"spectral_contrast_{i}_mean"] = np.mean(contrast[i]) zcr = librosa.feature.zero_crossing_rate(y, hop_length=hop_length) feats["zcr_mean"] = np.mean(zcr); feats["zcr_std"] = np.std(zcr) rms = librosa.feature.rms(y=y, hop_length=hop_length) feats["rms_mean"] = np.mean(rms); feats["rms_std"] = np.std(rms) harmonic = librosa.effects.harmonic(y) tonnetz = librosa.feature.tonnetz(y=harmonic, sr=sr) for i in range(6): feats[f"tonnetz_{i}_mean"] = np.mean(tonnetz[i]) return np.array(list(feats.values())).reshape(1, -1) y, sr = librosa.load("hive_recording.wav", sr=22050) seg = y[:int(5.0 * sr)] feat = extract_features(seg, sr) pred = le.classes_[model.predict(feat)[0]] print(pred) ```