--- license: cc-by-nc-sa-4.0 library_name: sklearn tags: - bioacoustics - insect-classification - birdnet - edge-ai - raspberry-pi - non-commercial datasets: - InsectSet459 - iNatSounds - ESC-50 --- # InsectNet A BirdNET-Pi sidecar that classifies insect sounds in real time. **Research prototype — active development.** ## What It Is InsectNet is a lightweight sklearn head trained on BirdNET's 6,522-dim logit space. It runs alongside BirdNET-Pi on a Raspberry Pi, watches the audio stream, and sorts captured WAVs into acoustic classes. The architecture is simple: StandardScaler → OneVsRest(LogisticRegression). Nothing novel — the interesting part is that BirdNET's logit space encodes insect acoustic structure well enough that a linear probe works for several classes. ## What's Validated Field validation at Pine Hollow, Tennessee (35.8565, -83.3744): | Class | Status | Confidence (field) | Notes | |-------|--------|-------------------|-------| | background | Production | N/A | 0.984 F1, 1,669 public clips + field negatives | | cicada_drone | Working | 83-100% | Natural capture at 83%, playback at 99-100%. AC unit false positive at 92%. | | frog | Working | 51-99% | Natural chorus confirmed. 440+ captures in one evening, two species identified. | | cricket_katydid | Likely working | 99+% | Playback at 100%. Natural summer data pending. | | grasshopper | Data-limited | TBD | 183 training clips, 0.701 F1. Not production-ready. | | bee | Untrained | TBD | 43 training clips, 0.608 F1. No real field captures. Known false positives from weed whacker and night noise. | ## What It's Not This is not a finished product. It's a working research prototype that has been field-tested enough to know it catches real insects — and also catches enough false positives to know it shouldn't be trusted blindly. - The F1 numbers are from cross-validation on public training data, not from field deployment. Actual performance varies with environment, mic placement, and insect proximity. - All threshold tuning was done over one month at a single location. - Grasshopper and bee classes need substantially more training data before they can be used without human review. ## Known Limitations - **BirdNET dependency.** The classifier requires BirdNET's TFLite model to extract logits. Without BirdNET, the classifier can't run. - **Mic placement.** The outdoor mic at Pine Hollow is upward-facing for birds. Ground-level insect sounds must be loud to reach it. - **No cicada species channels.** BirdNET has zero cicada labels. Cicada detection relies on general acoustic features in the BirdNET embedding space. - **False positives.** AC units → cicada_drone (92%). Weed whackers → bee (98%). Night noise → bee (50-70%). - **All BirdNET species IDs are approximate.** BirdNET maps to the closest species in its 6,522-label set, which may not be the true species. ## How to Use The classifier alone isn't useful standalone — it needs BirdNET's TFLite model to produce logits. The full capture pipeline lives on GitHub: [https://github.com/vortexpjeff/insectnet](https://github.com/vortexpjeff/insectnet) ```python # After extracting BirdNET logits (6,522-dim vector): import joblib clf = joblib.load("classifier.joblib") X = clf["scaler"].transform(logits.reshape(1, -1)) scores = clf["classifier"].predict_proba(X)[0] for i, cls in enumerate(clf["classes"]): print(f"{cls}: {scores[i]*100:.1f}%") ``` ## Training Data | Source | Clips | License | Content | |--------|-------|---------|---------| | InsectSet459 | ~1,800 | CC BY-NC-SA 4.0 | 459 insect species, primarily Orthoptera | | iNatSounds | ~1,041 | CC BY-NC 4.0 | iNaturalist insect observations | | ESC-50 | 1,519 | CC BY-NC 4.0 | Environmental noise (background class) | | Pine Hollow field | 38 (unreviewed) | CC BY-NC-SA 4.0 | Natural captures from Pi sidecar | All training data and the BirdNET backbone are non-commercial. Derivative classifiers must use a compatible license. ## Project Status Actively developed. Summer 2026 is the primary field data collection window for improving grasshopper, bee, and cricket classes. New captures are being accumulated continuously via the BirdNET-Pi sidecar. ## License CC BY-NC-SA 4.0 — See LICENSE file.