tot-talk / README_GITHUB.md
grungecoder's picture
Configure for HF Spaces
605cb44

A newer version of the Gradio SDK is available: 6.15.2

Upgrade

🍼 TotTalk Cry Eval

Real-time multi-model baby cry classification tool. Available as a CLI (terminal with live mic) and a Gradio web app (browser-based, deployable for free).

Models

# Name Type Source Speed
1 foduucom-SVC sklearn SVC, 194-dim MFCC features HuggingFace < 1 ms
2 DistilHuBERT DistilHuBERT fine-tune (5 classes) HuggingFace ~35 ms
3 Kibalama-9c Wav2Vec2 fine-tune (9 classes incl. discomfort, tired, cold/hot) HuggingFace ~90 ms
4 YAMNet-detector TF Hub YAMNet (binary cry gate) TF Hub < 10 ms

Web app (Gradio)

cd cry-eval
uv sync
uv run python app.py

Open http://localhost:7860 β€” record audio from your mic or upload a file.

Deploy for free on HuggingFace Spaces

  1. Go to huggingface.co/new-space
  2. Select Gradio β†’ Blank, CPU Basic (free), Public visibility
  3. Create the Space, then push:
    cp README.md README_GITHUB.md
    cp README_HF.md README.md
    git remote add hf https://huggingface.co/spaces/YOUR_USERNAME/cry-eval
    git add -A && git commit -m "Configure for HF Spaces"
    git push hf main
    
  4. Deploys automatically (~5 min first build)

CLI (terminal)

# Run with mic input
uv run python main.py

# Run with an audio file
uv run python main.py --file path/to/cry.wav

# Select specific models
uv run python main.py --models svc,hubert,kibalama

# Disable YAMNet gating
uv run python main.py --no-yamnet-gate

# Save predictions to JSONL
uv run python main.py --save-log results.jsonl

Requirements

  • Python β‰₯ 3.11
  • A working microphone (for live mode)
  • ~1 GB RAM for transformer models

Model weights are auto-downloaded on first run into HuggingFace/TF Hub caches.

Project structure

cry-eval/
β”œβ”€β”€ pyproject.toml
β”œβ”€β”€ requirements.txt       # for HF Spaces / pip deployments
β”œβ”€β”€ README.md
β”œβ”€β”€ README_HF.md           # HuggingFace Spaces metadata
β”œβ”€β”€ app.py                 # Gradio web UI
β”œβ”€β”€ main.py                # CLI entrypoint
β”œβ”€β”€ models/
β”‚   β”œβ”€β”€ base.py           # abstract CryClassifier + CryPrediction
β”‚   β”œβ”€β”€ foduucom_svc.py   # sklearn SVC
β”‚   β”œβ”€β”€ wiam_wav2vec2.py  # DistilHuBERT fine-tune
β”‚   β”œβ”€β”€ kibalama.py       # Wav2Vec2 9-class fine-tune
β”‚   β”œβ”€β”€ yamnet.py         # YAMNet binary detector
β”‚   └── ensemble.py       # orchestrates all models
β”œβ”€β”€ audio/
β”‚   β”œβ”€β”€ capture.py        # MicCapture + FileCapture
β”‚   └── preprocess.py     # MFCC, mel, resample, RMS
β”œβ”€β”€ display/
β”‚   └── table.py          # Rich live table renderer
└── weights/              # auto-downloaded (gitignored)