tot-talk / README_GITHUB.md
grungecoder's picture
Configure for HF Spaces
605cb44
# 🍼 TotTalk Cry Eval
Real-time multi-model baby cry classification tool. Available as a **CLI** (terminal with live mic) and a **Gradio web app** (browser-based, deployable for free).
## Models
| # | Name | Type | Source | Speed |
|---|------|------|--------|-------|
| 1 | **foduucom-SVC** | sklearn SVC, 194-dim MFCC features | [HuggingFace](https://huggingface.co/foduucom/baby-cry-classification) | < 1 ms |
| 2 | **DistilHuBERT** | DistilHuBERT fine-tune (5 classes) | [HuggingFace](https://huggingface.co/AmeerHesham/distilhubert-finetuned-baby_cry) | ~35 ms |
| 3 | **Kibalama-9c** | Wav2Vec2 fine-tune (9 classes incl. discomfort, tired, cold/hot) | [HuggingFace](https://huggingface.co/Kibalama/baby_cry_classification_model) | ~90 ms |
| 4 | **YAMNet-detector** | TF Hub YAMNet (binary cry gate) | [TF Hub](https://tfhub.dev/google/yamnet/1) | < 10 ms |
## Web app (Gradio)
```bash
cd cry-eval
uv sync
uv run python app.py
```
Open `http://localhost:7860` β€” record audio from your mic or upload a file.
### Deploy for free on HuggingFace Spaces
1. Go to [huggingface.co/new-space](https://huggingface.co/new-space)
2. Select **Gradio β†’ Blank**, **CPU Basic** (free), Public visibility
3. Create the Space, then push:
```bash
cp README.md README_GITHUB.md
cp README_HF.md README.md
git remote add hf https://huggingface.co/spaces/YOUR_USERNAME/cry-eval
git add -A && git commit -m "Configure for HF Spaces"
git push hf main
```
4. Deploys automatically (~5 min first build)
## CLI (terminal)
```bash
# Run with mic input
uv run python main.py
# Run with an audio file
uv run python main.py --file path/to/cry.wav
# Select specific models
uv run python main.py --models svc,hubert,kibalama
# Disable YAMNet gating
uv run python main.py --no-yamnet-gate
# Save predictions to JSONL
uv run python main.py --save-log results.jsonl
```
## Requirements
- Python β‰₯ 3.11
- A working microphone (for live mode)
- ~1 GB RAM for transformer models
Model weights are auto-downloaded on first run into HuggingFace/TF Hub caches.
## Project structure
```
cry-eval/
β”œβ”€β”€ pyproject.toml
β”œβ”€β”€ requirements.txt # for HF Spaces / pip deployments
β”œβ”€β”€ README.md
β”œβ”€β”€ README_HF.md # HuggingFace Spaces metadata
β”œβ”€β”€ app.py # Gradio web UI
β”œβ”€β”€ main.py # CLI entrypoint
β”œβ”€β”€ models/
β”‚ β”œβ”€β”€ base.py # abstract CryClassifier + CryPrediction
β”‚ β”œβ”€β”€ foduucom_svc.py # sklearn SVC
β”‚ β”œβ”€β”€ wiam_wav2vec2.py # DistilHuBERT fine-tune
β”‚ β”œβ”€β”€ kibalama.py # Wav2Vec2 9-class fine-tune
β”‚ β”œβ”€β”€ yamnet.py # YAMNet binary detector
β”‚ └── ensemble.py # orchestrates all models
β”œβ”€β”€ audio/
β”‚ β”œβ”€β”€ capture.py # MicCapture + FileCapture
β”‚ └── preprocess.py # MFCC, mel, resample, RMS
β”œβ”€β”€ display/
β”‚ └── table.py # Rich live table renderer
└── weights/ # auto-downloaded (gitignored)
```