File size: 3,726 Bytes
c21f2aa
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23055a7
c21f2aa
 
 
 
 
24ef7da
c21f2aa
 
 
 
 
 
 
 
 
 
 
24ef7da
c21f2aa
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24ef7da
c21f2aa
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
---
language: en
tags:
  - audio
  - audio-classification
  - bee
  - hive-monitoring
  - beekeeping
library_name: sklearn
license: mit
metrics:
  - accuracy
  - f1
---

# Bee Audio Classifier

5-class audio classifier for bee colony health monitoring.
Trained on segmented hive recordings using MFCC-based feature extraction.

> Last updated: 2026-05-19 15:54 UTC

## Classes

| Label | Description |
|---|---|
| (classes not found) | — |

## Model performance

| File | Description | Accuracy | F1 (weighted) |
|---|---|---|---|
| `bee_cnn_classifier.h5` | CNN (Mel Spectrogram) | — | — |
| `best_cnn.h5` | CNN checkpoint | — | — |

`label_encoder.pkl` is required by all classical ML models.
`cnn_label_encoder.pkl` is required by the CNN models.

## Feature extraction (0 features per 5-second segment)

- 40 MFCCs × (mean + std) = 80
- 40 delta-MFCCs × mean = 40
- 12 Chroma × (mean + std) = 24
- Mel spectrogram stats (mean, std, max, min) = 4
- Spectral centroid (mean + std) = 2
- Spectral bandwidth (mean + std) = 2
- Spectral rolloff (mean + std) = 2
- Spectral contrast × 7 × mean = 7
- Zero crossing rate (mean + std) = 2
- RMS energy (mean + std) = 2
- Tonnetz × 6 × mean = 6

## Quick Python usage

```python
import joblib
import librosa
import numpy as np

model = joblib.load("random_forest_model.pkl")
le    = joblib.load("label_encoder.pkl")

def extract_features(y, sr, n_mfcc=40,
                     hop_length=512, n_fft=2048):
    feats = {}
    mfcc = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=n_mfcc, n_fft=n_fft, hop_length=hop_length)
    for i in range(n_mfcc):
        feats[f"mfcc_{i}_mean"] = np.mean(mfcc[i])
        feats[f"mfcc_{i}_std"]  = np.std(mfcc[i])
    delta = librosa.feature.delta(mfcc)
    for i in range(n_mfcc):
        feats[f"mfcc_delta_{i}_mean"] = np.mean(delta[i])
    chroma = librosa.feature.chroma_stft(y=y, sr=sr, n_fft=n_fft, hop_length=hop_length)
    for i in range(12):
        feats[f"chroma_{i}_mean"] = np.mean(chroma[i])
        feats[f"chroma_{i}_std"]  = np.std(chroma[i])
    mel = librosa.feature.melspectrogram(y=y, sr=sr, hop_length=hop_length)
    mel_db = librosa.power_to_db(mel, ref=np.max)
    feats["mel_mean"] = np.mean(mel_db); feats["mel_std"]  = np.std(mel_db)
    feats["mel_max"]  = np.max(mel_db);  feats["mel_min"]  = np.min(mel_db)
    sc = librosa.feature.spectral_centroid(y=y, sr=sr, hop_length=hop_length)
    feats["spectral_centroid_mean"] = np.mean(sc); feats["spectral_centroid_std"] = np.std(sc)
    sb = librosa.feature.spectral_bandwidth(y=y, sr=sr, hop_length=hop_length)
    feats["spectral_bandwidth_mean"] = np.mean(sb); feats["spectral_bandwidth_std"] = np.std(sb)
    sr_f = librosa.feature.spectral_rolloff(y=y, sr=sr, hop_length=hop_length)
    feats["spectral_rolloff_mean"] = np.mean(sr_f); feats["spectral_rolloff_std"] = np.std(sr_f)
    contrast = librosa.feature.spectral_contrast(y=y, sr=sr, hop_length=hop_length)
    for i in range(contrast.shape[0]):
        feats[f"spectral_contrast_{i}_mean"] = np.mean(contrast[i])
    zcr = librosa.feature.zero_crossing_rate(y, hop_length=hop_length)
    feats["zcr_mean"] = np.mean(zcr); feats["zcr_std"] = np.std(zcr)
    rms = librosa.feature.rms(y=y, hop_length=hop_length)
    feats["rms_mean"] = np.mean(rms); feats["rms_std"] = np.std(rms)
    harmonic = librosa.effects.harmonic(y)
    tonnetz = librosa.feature.tonnetz(y=harmonic, sr=sr)
    for i in range(6):
        feats[f"tonnetz_{i}_mean"] = np.mean(tonnetz[i])
    return np.array(list(feats.values())).reshape(1, -1)

y, sr = librosa.load("hive_recording.wav", sr=22050)
seg   = y[:int(5.0 * sr)]
feat  = extract_features(seg, sr)
pred  = le.classes_[model.predict(feat)[0]]
print(pred)
```