Instructions to use TheVortexProject/insectnet with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Scikit-learn
How to use TheVortexProject/insectnet with Scikit-learn:
from huggingface_hub import hf_hub_download import joblib model = joblib.load( hf_hub_download("TheVortexProject/insectnet", "sklearn_model.joblib") ) # only load pickle files from sources you trust # read more about it here https://skops.readthedocs.io/en/stable/persistence.html - Notebooks
- Google Colab
- Kaggle
| license: cc-by-nc-sa-4.0 | |
| library_name: sklearn | |
| tags: | |
| - bioacoustics | |
| - insect-classification | |
| - birdnet | |
| - edge-ai | |
| - raspberry-pi | |
| - non-commercial | |
| datasets: | |
| - InsectSet459 | |
| - iNatSounds | |
| - ESC-50 | |
| # InsectNet | |
| A BirdNET-Pi sidecar that classifies insect sounds in real time. | |
| **Research prototype β active development.** | |
| ## What It Is | |
| InsectNet is a lightweight sklearn head trained on BirdNET's 6,522-dim logit | |
| space. It runs alongside BirdNET-Pi on a Raspberry Pi, watches the audio | |
| stream, and sorts captured WAVs into acoustic classes. | |
| The architecture is simple: StandardScaler β OneVsRest(LogisticRegression). | |
| Nothing novel β the interesting part is that BirdNET's logit space encodes | |
| insect acoustic structure well enough that a linear probe works for several | |
| classes. | |
| ## What's Validated | |
| Field validation at Pine Hollow, Tennessee (35.8565, -83.3744): | |
| | Class | Status | Confidence (field) | Notes | | |
| |-------|--------|-------------------|-------| | |
| | background | Production | N/A | 0.984 F1, 1,669 public clips + field negatives | | |
| | cicada_drone | Working | 83-100% | Natural capture at 83%, playback at 99-100%. AC unit false positive at 92%. | | |
| | frog | Working | 51-99% | Natural chorus confirmed. 440+ captures in one evening, two species identified. | | |
| | cricket_katydid | Likely working | 99+% | Playback at 100%. Natural summer data pending. | | |
| | grasshopper | Data-limited | TBD | 183 training clips, 0.701 F1. Not production-ready. | | |
| | bee | Untrained | TBD | 43 training clips, 0.608 F1. No real field captures. Known false positives from weed whacker and night noise. | | |
| ## What It's Not | |
| This is not a finished product. It's a working research prototype that has | |
| been field-tested enough to know it catches real insects β and also catches | |
| enough false positives to know it shouldn't be trusted blindly. | |
| - The F1 numbers are from cross-validation on public training data, not from | |
| field deployment. Actual performance varies with environment, mic placement, | |
| and insect proximity. | |
| - All threshold tuning was done over one month at a single location. | |
| - Grasshopper and bee classes need substantially more training data before | |
| they can be used without human review. | |
| ## Known Limitations | |
| - **BirdNET dependency.** The classifier requires BirdNET's TFLite model to | |
| extract logits. Without BirdNET, the classifier can't run. | |
| - **Mic placement.** The outdoor mic at Pine Hollow is upward-facing for birds. | |
| Ground-level insect sounds must be loud to reach it. | |
| - **No cicada species channels.** BirdNET has zero cicada labels. Cicada | |
| detection relies on general acoustic features in the BirdNET embedding space. | |
| - **False positives.** AC units β cicada_drone (92%). Weed whackers β bee | |
| (98%). Night noise β bee (50-70%). | |
| - **All BirdNET species IDs are approximate.** BirdNET maps to the closest | |
| species in its 6,522-label set, which may not be the true species. | |
| ## How to Use | |
| The classifier alone isn't useful standalone β it needs BirdNET's TFLite | |
| model to produce logits. The full capture pipeline lives on GitHub: | |
| [https://github.com/vortexpjeff/insectnet](https://github.com/vortexpjeff/insectnet) | |
| ```python | |
| # After extracting BirdNET logits (6,522-dim vector): | |
| import joblib | |
| clf = joblib.load("classifier.joblib") | |
| X = clf["scaler"].transform(logits.reshape(1, -1)) | |
| scores = clf["classifier"].predict_proba(X)[0] | |
| for i, cls in enumerate(clf["classes"]): | |
| print(f"{cls}: {scores[i]*100:.1f}%") | |
| ``` | |
| ## Training Data | |
| | Source | Clips | License | Content | | |
| |--------|-------|---------|---------| | |
| | InsectSet459 | ~1,800 | CC BY-NC-SA 4.0 | 459 insect species, primarily Orthoptera | | |
| | iNatSounds | ~1,041 | CC BY-NC 4.0 | iNaturalist insect observations | | |
| | ESC-50 | 1,519 | CC BY-NC 4.0 | Environmental noise (background class) | | |
| | Pine Hollow field | 38 (unreviewed) | CC BY-NC-SA 4.0 | Natural captures from Pi sidecar | | |
| All training data and the BirdNET backbone are non-commercial. Derivative | |
| classifiers must use a compatible license. | |
| ## Project Status | |
| Actively developed. Summer 2026 is the primary field data collection window | |
| for improving grasshopper, bee, and cricket classes. New captures are being | |
| accumulated continuously via the BirdNET-Pi sidecar. | |
| ## License | |
| CC BY-NC-SA 4.0 β See LICENSE file. | |