Spaces:
Sleeping
A newer version of the Gradio SDK is available: 6.13.0
title: 🧠 VocClassifier — Infant Vocalization Classifier
emoji: 🍼
colorFrom: indigo
colorTo: yellow
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: true
license: mit
🧠 VocClassifier — Infant Vocalization Classifier
This interactive demo classifies infant vocalizations into five acoustic categories based on their spectral features.
Classes:
🗣️ Non-canonical • 🎶 Other • 👶 Canonical • 😢 Cry • 😂 Laugh
🧩 How It Works
- Upload or record an infant vocalization clip (
.wav, ≤10 seconds). - The system converts it into a Mel-spectrogram and classifies it using a ConvNeXt model trained with FastAI + TIMM.
- It outputs predicted probabilities for each of the five classes.
The model was fine-tuned on Mel-spectrograms derived from the BabbleCor corpus BabbleCor Dataset on OSF.
You can retrain this model on your own infant data using the open-source repository below 👇
📦 Source code & training pipeline:
👉 https://github.com/arunps12/VisionInfantNet
⚙️ Supported Classes
| Class | Description |
|---|---|
| Non-canonical | Early immature, non-syllabic vocalizations |
| Other | Unclassified or environmental sounds |
| Canonical | Mature babbling, clear consonant–vowel patterns |
| Cry | Distress or discomfort vocalizations |
| Laugh | Playful or positive expressions |
🧠 Model Details
- Architecture: ConvNeXt (Tiny)
- Framework: FastAI (v2.7) + PyTorch (v2.4)
- Features: Mel-spectrogram (224×224 px, Jet colormap)
- Training corpus: BabbleCor (restricted, BabbleCor Dataset on OSF)
- Fine-tuning: 5-way infant vocalization classification
🧰 How to Use
- Click Upload or record a short infant vocalization.
- Press Submit.
- View the predicted class and probability distribution.
🧾 Credits
Author: Arun Singh
Affiliation: University of Oslo, Norway
License: MIT
GitHub: https://github.com/arunps12/VisionInfantNet
💬 Notes
This demo showcases how spectral features of infant sounds can be visualized and classified.
The underlying model can be fine-tuned for any infant dataset following the VisionInfantNet pipeline.
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference