sukriramli
/

tiny-bird-diffusion

Audio Classification

bird-sound-identification

vision-transformers

Model card Files Files and versions

tiny-bird-diffusion / README.md

sukriramli's picture

Upload README.md with huggingface_hub

666d084 verified 17 days ago

|

History Blame Contribute Delete

3.32 kB

	---
	language:
	- en
	license: mit
	tags:
	- bioacoustics
	- audio-classification
	- bird-sound-identification
	- pytorch
	- vision-transformers
	- protoclr
	- umap
	- hdbscan
	pipeline_tag: audio-classification
	library_name: pytorch
	metrics:
	- accuracy
	---

	# 🦅 Edge-Optimized Bioacoustic Atlas & Real-Time Avian Classifier

	[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1EL5VS_vAKvojPf5UPuQVFbK5gkgP51hB?usp=sharing)
	[![Hugging Face Repository](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Repository-blue)](https://huggingface.co/sukriramli/tiny-bird-diffusion)
	[![Framework: PyTorch](https://img.shields.io/badge/Framework-PyTorch-ee4c2c?logo=pytorch)](https://pytorch.org/)
	[![Optimization: UMAP + HDBSCAN](https://img.shields.io/badge/Optimization-UMAP%20%2B%20HDBSCAN-green)]()
	[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

	A decoupled, ultra-lightweight machine learning pipeline designed for low-latency edge deployment, automated avian species tracking, and interactive audio streaming.

	By leveraging Prototypical Contrastive Learning (ProtoCLR) backbones combined with advanced topological manifold compression (UMAP & HDBSCAN), this system projects complex audio waveforms onto a dense, 2D geometric map. The resulting production architecture handles 168 unique biological species across 149 autonomous eco-acoustic clusters natively in a client browser window with sub-second latency—completely bypassing the need for compute-heavy cloud inferencing heads.

	---

	## 🛠️ The System Architecture Problem & Our Solution

	### Our Decoupled Geometric Solution
	This repository implements a decoupled mathematical pattern. Heavy feature extraction is processed upfront. The complex high-dimensional latent space is then permanently compressed into a frozen geometric lookup coordinate plane. The client device only runs low-compute spatial distance algorithms, achieving zero-lag edge inference.

	---

	## 🔬 Core Engineering Pillars

	### 1. High-Ratio Manifold Compression
	Instead of forcing edge hardware to hold dense classification layer weights, we isolate the 512-dimensional floating-point latent vectors generated by the transformer. We utilize UMAP (Uniform Manifold Approximation and Projection) to topology-map this high-dimensional array down to a highly constrained 2D coordinate vector (X, Y). This slashes the database RAM footprint by over 99% while preserving semantic biological boundaries.

	### 2. Acoustic Domain Shift Mitigation (augment.py)
	Pre-trained foundation models are typically trained on pristine, studio-grade wildlife audio recordings, causing them to fail frequently in noisy consumer spaces. To bridge this gap, our data preparation pipeline routes clean data shards through a custom acoustic corruption environment mimicking real-world conditions.

	---

	## 📁 Modular Codebase Layout

	* `augment.py`: Digital Signal Processing (DSP) environment warping functions (Noise, Echo, Muffling filters).
	* `pipeline.py`: Low-latency engineering pipeline managing model configuration and UMAP coordinate projection.
	* `api.py`: Clean, production-ready prediction endpoint designed for real-time app integration.