Spaces:
Sleeping
Sleeping
| # πΌ TotTalk Cry Eval | |
| Real-time multi-model baby cry classification tool. Available as a **CLI** (terminal with live mic) and a **Gradio web app** (browser-based, deployable for free). | |
| ## Models | |
| | # | Name | Type | Source | Speed | | |
| |---|------|------|--------|-------| | |
| | 1 | **foduucom-SVC** | sklearn SVC, 194-dim MFCC features | [HuggingFace](https://huggingface.co/foduucom/baby-cry-classification) | < 1 ms | | |
| | 2 | **DistilHuBERT** | DistilHuBERT fine-tune (5 classes) | [HuggingFace](https://huggingface.co/AmeerHesham/distilhubert-finetuned-baby_cry) | ~35 ms | | |
| | 3 | **Kibalama-9c** | Wav2Vec2 fine-tune (9 classes incl. discomfort, tired, cold/hot) | [HuggingFace](https://huggingface.co/Kibalama/baby_cry_classification_model) | ~90 ms | | |
| | 4 | **YAMNet-detector** | TF Hub YAMNet (binary cry gate) | [TF Hub](https://tfhub.dev/google/yamnet/1) | < 10 ms | | |
| ## Web app (Gradio) | |
| ```bash | |
| cd cry-eval | |
| uv sync | |
| uv run python app.py | |
| ``` | |
| Open `http://localhost:7860` β record audio from your mic or upload a file. | |
| ### Deploy for free on HuggingFace Spaces | |
| 1. Go to [huggingface.co/new-space](https://huggingface.co/new-space) | |
| 2. Select **Gradio β Blank**, **CPU Basic** (free), Public visibility | |
| 3. Create the Space, then push: | |
| ```bash | |
| cp README.md README_GITHUB.md | |
| cp README_HF.md README.md | |
| git remote add hf https://huggingface.co/spaces/YOUR_USERNAME/cry-eval | |
| git add -A && git commit -m "Configure for HF Spaces" | |
| git push hf main | |
| ``` | |
| 4. Deploys automatically (~5 min first build) | |
| ## CLI (terminal) | |
| ```bash | |
| # Run with mic input | |
| uv run python main.py | |
| # Run with an audio file | |
| uv run python main.py --file path/to/cry.wav | |
| # Select specific models | |
| uv run python main.py --models svc,hubert,kibalama | |
| # Disable YAMNet gating | |
| uv run python main.py --no-yamnet-gate | |
| # Save predictions to JSONL | |
| uv run python main.py --save-log results.jsonl | |
| ``` | |
| ## Requirements | |
| - Python β₯ 3.11 | |
| - A working microphone (for live mode) | |
| - ~1 GB RAM for transformer models | |
| Model weights are auto-downloaded on first run into HuggingFace/TF Hub caches. | |
| ## Project structure | |
| ``` | |
| cry-eval/ | |
| βββ pyproject.toml | |
| βββ requirements.txt # for HF Spaces / pip deployments | |
| βββ README.md | |
| βββ README_HF.md # HuggingFace Spaces metadata | |
| βββ app.py # Gradio web UI | |
| βββ main.py # CLI entrypoint | |
| βββ models/ | |
| β βββ base.py # abstract CryClassifier + CryPrediction | |
| β βββ foduucom_svc.py # sklearn SVC | |
| β βββ wiam_wav2vec2.py # DistilHuBERT fine-tune | |
| β βββ kibalama.py # Wav2Vec2 9-class fine-tune | |
| β βββ yamnet.py # YAMNet binary detector | |
| β βββ ensemble.py # orchestrates all models | |
| βββ audio/ | |
| β βββ capture.py # MicCapture + FileCapture | |
| β βββ preprocess.py # MFCC, mel, resample, RMS | |
| βββ display/ | |
| β βββ table.py # Rich live table renderer | |
| βββ weights/ # auto-downloaded (gitignored) | |
| ``` | |