insectnet / architecture.md
TheVortexProject's picture
Initial upload: 6-class BirdNET-logit classifier, model card, docs
0e7b80b verified
|
Raw
History Blame Contribute Delete
4.91 kB
# Architecture
How InsectNet integrates with BirdNET-Pi and why it's designed this way.
## BirdNET-Pi Model
BirdNET-Pi uses a **socket-based client-server architecture** for audio analysis:
```
arecord (15s WAV β†’ StreamData/)
β””β†’ birdnet_analysis.sh (shell loop)
β””β†’ analyze.py (socket client on port 5050)
β””β†’ BirdNET-Lite server loads WAV, runs TFLite, returns CSV
β””β†’ detection: WAV β†’ Extracted/By_Date/{species}/
β””β†’ no detection: WAV deleted
```
Key design patterns InsectNet mirrors:
- **Binary WAV lifecycle** β€” every WAV is processed once. Keep or delete, no middle state.
- **Detection-only persistence** β€” non-detections produce zero artifacts.
- **Shell-based orchestration** β€” each service is an independent systemd unit.
## InsectNet's Role
InsectNet is a **read-only sidecar**. It never touches BirdNET-Pi's files β€” it
reads StreamData/ via inotify and copies WAVs to its own directory before
BirdNET-Pi deletes them.
```
StreamData/ (new WAV)
β”‚
β”œβ”€β”€β†’ BirdNET-Lite (port 5050) β†’ CSV β†’ keep/delete
β”‚
└──→ InsectNet inotify β†’ copy WAV β†’ librosa β†’ TFLite β†’ logits β†’ sklearn β†’ keep/delete
β”‚
captures/{class}/{ts}_{cls}_{conf}.wav
detections.jsonl (append)
```
## Why BirdNET Logits
InsectNet classifiers train on BirdNET's **6,522-dim logit space**, not raw
audio. This is possible because BirdNET v2.4 has 31 Orthoptera species in its
label set β€” field crickets, tree crickets, conehead katydids, ground crickets,
and meadow katydids. The logit space already encodes insect acoustic structure.
Cicadas are absent from BirdNET's labels, but their acoustic features still
produce distinguishable patterns in the logit space (confirmed by field
validation with cosine similarity against training centroids).
## Classifier Architecture
All production InsectNet classifiers use:
```
StandardScaler β†’ OneVsRest(LogisticRegression(C=0.1, class_weight='balanced'))
```
- **StandardScaler** normalizes the 6,522-dim logit vectors
- **OneVsRest** trains one binary classifier per class (sigmoid output)
- **LogisticRegression** with L2 regularization (C=0.1), balanced class weights
This is the same architecture BirdNET uses internally without the softmax β€”
sigmoid-per-class allows multi-label predictions (one clip can be both
"cicada_drone" and "frog").
## Multi-Label Training
Training data format: clips are labeled with lists of active classes, not a
single category. A clip containing overlapping frog and cricket calls is
labeled `["frog", "cricket_katydid"]`.
`MultiLabelBinarizer` converts to an indicator matrix. Per-class
F1-optimized thresholds are swept 0.1-0.95 during evaluation. Each class gets
its own decision threshold.
## Background Training Data
Background clips come from two sources:
1. **BirdNET bird clips** β€” every labeled bird clip is confirmed non-insect
audio from the same microphone and environment.
2. **Public datasets** (ESC-50 for environmental noise, iNatSounds for
labeled insect audio).
## Two-Tier System
InsectNet operates at two levels:
| Layer | Runs On | Backbone | Purpose |
|-------|---------|----------|---------|
| **Sidecar** | BirdNET-Pi (Pi 4) | BirdNET TFLite logits | Real-time capture, keeps WAVs |
| **Archive** | Workstation | Perch 2.0 embeddings | Offline enrichment, multi-taxa discovery |
The sidecar is the edge capture system. The archive (separate repo) is the
analysis layer that pulls captures, embeds them with Perch 2.0, and enables
multi-taxa classification. They are complementary.
## BirdNET Species Coverage
BirdNET v2.4 has 6,522 species labels. Insect-relevant coverage:
| Group | In BirdNET? | Notes |
|-------|-------------|-------|
| 31 Orthoptera (crickets, katydids) | βœ… | Field crickets, tree crickets, coneheads, ground crickets, meadow katydids |
| 0 Cicada species | ❌ | Zero cicada labels β€” relies on general acoustic features |
| 0 Bee species | ❌ | Zero Hymenoptera labels |
| 0 Grasshopper species | ❌ | Though some Acrididae may trigger Orthoptera channels |
This means 31 logit channels carry insect-class information directly; the
other 6,491 channels may carry incidental insect structure.
## BirdNET-Pi Access
Default credentials:
- **Host:** 192.168.1.223
- **User:** birdnetpi / birdnetpi
- **Python:** `/home/birdnetpi/BirdNET-Pi/birdnet/bin/python3`
- **Model:** `/home/birdnetpi/BirdNET-Pi/model/BirdNET_GLOBAL_6K_V2.4_Model_FP16.tflite`
- **StreamData:** `/home/birdnetpi/BirdSongs/StreamData/`
The sidecar expects the TFLite model at `DEFAULT_BIRDNET_MODEL` and StreamData
at `DEFAULT_STREAMDATA` (both configurable via CLI args).