File size: 4,277 Bytes
0e7b80b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
---
license: cc-by-nc-sa-4.0
library_name: sklearn
tags:
- bioacoustics
- insect-classification
- birdnet
- edge-ai
- raspberry-pi
- non-commercial
datasets:
- InsectSet459
- iNatSounds
- ESC-50
---

# InsectNet

A BirdNET-Pi sidecar that classifies insect sounds in real time.
**Research prototype β€” active development.**

## What It Is

InsectNet is a lightweight sklearn head trained on BirdNET's 6,522-dim logit
space. It runs alongside BirdNET-Pi on a Raspberry Pi, watches the audio
stream, and sorts captured WAVs into acoustic classes.

The architecture is simple: StandardScaler β†’ OneVsRest(LogisticRegression).
Nothing novel β€” the interesting part is that BirdNET's logit space encodes
insect acoustic structure well enough that a linear probe works for several
classes.

## What's Validated

Field validation at Pine Hollow, Tennessee (35.8565, -83.3744):

| Class | Status | Confidence (field) | Notes |
|-------|--------|-------------------|-------|
| background | Production | N/A | 0.984 F1, 1,669 public clips + field negatives |
| cicada_drone | Working | 83-100% | Natural capture at 83%, playback at 99-100%. AC unit false positive at 92%. |
| frog | Working | 51-99% | Natural chorus confirmed. 440+ captures in one evening, two species identified. |
| cricket_katydid | Likely working | 99+% | Playback at 100%. Natural summer data pending. |
| grasshopper | Data-limited | TBD | 183 training clips, 0.701 F1. Not production-ready. |
| bee | Untrained | TBD | 43 training clips, 0.608 F1. No real field captures. Known false positives from weed whacker and night noise. |

## What It's Not

This is not a finished product. It's a working research prototype that has
been field-tested enough to know it catches real insects β€” and also catches
enough false positives to know it shouldn't be trusted blindly.

- The F1 numbers are from cross-validation on public training data, not from
  field deployment. Actual performance varies with environment, mic placement,
  and insect proximity.
- All threshold tuning was done over one month at a single location.
- Grasshopper and bee classes need substantially more training data before
  they can be used without human review.

## Known Limitations

- **BirdNET dependency.** The classifier requires BirdNET's TFLite model to
  extract logits. Without BirdNET, the classifier can't run.
- **Mic placement.** The outdoor mic at Pine Hollow is upward-facing for birds.
  Ground-level insect sounds must be loud to reach it.
- **No cicada species channels.** BirdNET has zero cicada labels. Cicada
  detection relies on general acoustic features in the BirdNET embedding space.
- **False positives.** AC units β†’ cicada_drone (92%). Weed whackers β†’ bee
  (98%). Night noise β†’ bee (50-70%).
- **All BirdNET species IDs are approximate.** BirdNET maps to the closest
  species in its 6,522-label set, which may not be the true species.

## How to Use

The classifier alone isn't useful standalone β€” it needs BirdNET's TFLite
model to produce logits. The full capture pipeline lives on GitHub:

[https://github.com/vortexpjeff/insectnet](https://github.com/vortexpjeff/insectnet)

```python
# After extracting BirdNET logits (6,522-dim vector):
import joblib
clf = joblib.load("classifier.joblib")
X = clf["scaler"].transform(logits.reshape(1, -1))
scores = clf["classifier"].predict_proba(X)[0]
for i, cls in enumerate(clf["classes"]):
    print(f"{cls}: {scores[i]*100:.1f}%")
```

## Training Data

| Source | Clips | License | Content |
|--------|-------|---------|---------|
| InsectSet459 | ~1,800 | CC BY-NC-SA 4.0 | 459 insect species, primarily Orthoptera |
| iNatSounds | ~1,041 | CC BY-NC 4.0 | iNaturalist insect observations |
| ESC-50 | 1,519 | CC BY-NC 4.0 | Environmental noise (background class) |
| Pine Hollow field | 38 (unreviewed) | CC BY-NC-SA 4.0 | Natural captures from Pi sidecar |

All training data and the BirdNET backbone are non-commercial. Derivative
classifiers must use a compatible license.

## Project Status

Actively developed. Summer 2026 is the primary field data collection window
for improving grasshopper, bee, and cricket classes. New captures are being
accumulated continuously via the BirdNET-Pi sidecar.

## License

CC BY-NC-SA 4.0 β€” See LICENSE file.