insectnet / README.md
TheVortexProject's picture
Initial upload: 6-class BirdNET-logit classifier, model card, docs
0e7b80b verified
|
Raw
History Blame Contribute Delete
4.28 kB
---
license: cc-by-nc-sa-4.0
library_name: sklearn
tags:
- bioacoustics
- insect-classification
- birdnet
- edge-ai
- raspberry-pi
- non-commercial
datasets:
- InsectSet459
- iNatSounds
- ESC-50
---
# InsectNet
A BirdNET-Pi sidecar that classifies insect sounds in real time.
**Research prototype β€” active development.**
## What It Is
InsectNet is a lightweight sklearn head trained on BirdNET's 6,522-dim logit
space. It runs alongside BirdNET-Pi on a Raspberry Pi, watches the audio
stream, and sorts captured WAVs into acoustic classes.
The architecture is simple: StandardScaler β†’ OneVsRest(LogisticRegression).
Nothing novel β€” the interesting part is that BirdNET's logit space encodes
insect acoustic structure well enough that a linear probe works for several
classes.
## What's Validated
Field validation at Pine Hollow, Tennessee (35.8565, -83.3744):
| Class | Status | Confidence (field) | Notes |
|-------|--------|-------------------|-------|
| background | Production | N/A | 0.984 F1, 1,669 public clips + field negatives |
| cicada_drone | Working | 83-100% | Natural capture at 83%, playback at 99-100%. AC unit false positive at 92%. |
| frog | Working | 51-99% | Natural chorus confirmed. 440+ captures in one evening, two species identified. |
| cricket_katydid | Likely working | 99+% | Playback at 100%. Natural summer data pending. |
| grasshopper | Data-limited | TBD | 183 training clips, 0.701 F1. Not production-ready. |
| bee | Untrained | TBD | 43 training clips, 0.608 F1. No real field captures. Known false positives from weed whacker and night noise. |
## What It's Not
This is not a finished product. It's a working research prototype that has
been field-tested enough to know it catches real insects β€” and also catches
enough false positives to know it shouldn't be trusted blindly.
- The F1 numbers are from cross-validation on public training data, not from
field deployment. Actual performance varies with environment, mic placement,
and insect proximity.
- All threshold tuning was done over one month at a single location.
- Grasshopper and bee classes need substantially more training data before
they can be used without human review.
## Known Limitations
- **BirdNET dependency.** The classifier requires BirdNET's TFLite model to
extract logits. Without BirdNET, the classifier can't run.
- **Mic placement.** The outdoor mic at Pine Hollow is upward-facing for birds.
Ground-level insect sounds must be loud to reach it.
- **No cicada species channels.** BirdNET has zero cicada labels. Cicada
detection relies on general acoustic features in the BirdNET embedding space.
- **False positives.** AC units β†’ cicada_drone (92%). Weed whackers β†’ bee
(98%). Night noise β†’ bee (50-70%).
- **All BirdNET species IDs are approximate.** BirdNET maps to the closest
species in its 6,522-label set, which may not be the true species.
## How to Use
The classifier alone isn't useful standalone β€” it needs BirdNET's TFLite
model to produce logits. The full capture pipeline lives on GitHub:
[https://github.com/vortexpjeff/insectnet](https://github.com/vortexpjeff/insectnet)
```python
# After extracting BirdNET logits (6,522-dim vector):
import joblib
clf = joblib.load("classifier.joblib")
X = clf["scaler"].transform(logits.reshape(1, -1))
scores = clf["classifier"].predict_proba(X)[0]
for i, cls in enumerate(clf["classes"]):
print(f"{cls}: {scores[i]*100:.1f}%")
```
## Training Data
| Source | Clips | License | Content |
|--------|-------|---------|---------|
| InsectSet459 | ~1,800 | CC BY-NC-SA 4.0 | 459 insect species, primarily Orthoptera |
| iNatSounds | ~1,041 | CC BY-NC 4.0 | iNaturalist insect observations |
| ESC-50 | 1,519 | CC BY-NC 4.0 | Environmental noise (background class) |
| Pine Hollow field | 38 (unreviewed) | CC BY-NC-SA 4.0 | Natural captures from Pi sidecar |
All training data and the BirdNET backbone are non-commercial. Derivative
classifiers must use a compatible license.
## Project Status
Actively developed. Summer 2026 is the primary field data collection window
for improving grasshopper, bee, and cricket classes. New captures are being
accumulated continuously via the BirdNET-Pi sidecar.
## License
CC BY-NC-SA 4.0 β€” See LICENSE file.