grungecoder commited on
Commit
66c65bc
·
1 Parent(s): ea2601f

Add Gradio web app + HuggingFace Spaces deployment config

Browse files
Files changed (6) hide show
  1. README.md +27 -4
  2. README_HF.md +11 -0
  3. app.py +204 -0
  4. pyproject.toml +1 -0
  5. requirements.txt +12 -0
  6. uv.lock +0 -0
README.md CHANGED
@@ -1,6 +1,6 @@
1
  # 🍼 TotTalk Cry Eval
2
 
3
- Real-time multi-model baby cry classification CLI tool. Captures mic audio (or reads a file), runs four open-source models simultaneously on 1-second windows, and displays a live comparison table in the terminal.
4
 
5
  ## Models
6
 
@@ -11,13 +11,33 @@ Real-time multi-model baby cry classification CLI tool. Captures mic audio (or r
11
  | 3 | **Kibalama-9c** | Wav2Vec2 fine-tune (9 classes incl. discomfort, tired, cold/hot) | [HuggingFace](https://huggingface.co/Kibalama/baby_cry_classification_model) | ~90 ms |
12
  | 4 | **YAMNet-detector** | TF Hub YAMNet (binary cry gate) | [TF Hub](https://tfhub.dev/google/yamnet/1) | < 10 ms |
13
 
14
- ## Quick start
15
 
16
  ```bash
17
- # Install dependencies (using uv)
18
  cd cry-eval
19
  uv sync
 
 
 
 
 
 
20
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
  # Run with mic input
22
  uv run python main.py
23
 
@@ -47,8 +67,11 @@ Model weights are auto-downloaded on first run into HuggingFace/TF Hub caches.
47
  ```
48
  cry-eval/
49
  ├── pyproject.toml
 
50
  ├── README.md
51
- ├── main.py # CLI entrypoint
 
 
52
  ├── models/
53
  │ ├── base.py # abstract CryClassifier + CryPrediction
54
  │ ├── foduucom_svc.py # sklearn SVC
 
1
  # 🍼 TotTalk Cry Eval
2
 
3
+ Real-time multi-model baby cry classification tool. Available as a **CLI** (terminal with live mic) and a **Gradio web app** (browser-based, deployable for free).
4
 
5
  ## Models
6
 
 
11
  | 3 | **Kibalama-9c** | Wav2Vec2 fine-tune (9 classes incl. discomfort, tired, cold/hot) | [HuggingFace](https://huggingface.co/Kibalama/baby_cry_classification_model) | ~90 ms |
12
  | 4 | **YAMNet-detector** | TF Hub YAMNet (binary cry gate) | [TF Hub](https://tfhub.dev/google/yamnet/1) | < 10 ms |
13
 
14
+ ## Web app (Gradio)
15
 
16
  ```bash
 
17
  cd cry-eval
18
  uv sync
19
+ uv run python app.py
20
+ ```
21
+
22
+ Open `http://localhost:7860` — record audio from your mic or upload a file.
23
+
24
+ ### Deploy for free on HuggingFace Spaces
25
 
26
+ 1. Go to [huggingface.co/new-space](https://huggingface.co/new-space)
27
+ 2. Select **Gradio → Blank**, **CPU Basic** (free), Public visibility
28
+ 3. Create the Space, then push:
29
+ ```bash
30
+ cp README.md README_GITHUB.md
31
+ cp README_HF.md README.md
32
+ git remote add hf https://huggingface.co/spaces/YOUR_USERNAME/cry-eval
33
+ git add -A && git commit -m "Configure for HF Spaces"
34
+ git push hf main
35
+ ```
36
+ 4. Deploys automatically (~5 min first build)
37
+
38
+ ## CLI (terminal)
39
+
40
+ ```bash
41
  # Run with mic input
42
  uv run python main.py
43
 
 
67
  ```
68
  cry-eval/
69
  ├── pyproject.toml
70
+ ├── requirements.txt # for HF Spaces / pip deployments
71
  ├── README.md
72
+ ├── README_HF.md # HuggingFace Spaces metadata
73
+ ├── app.py # Gradio web UI
74
+ ├── main.py # CLI entrypoint
75
  ├── models/
76
  │ ├── base.py # abstract CryClassifier + CryPrediction
77
  │ ├── foduucom_svc.py # sklearn SVC
README_HF.md ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: TotTalk Cry Classifier
3
+ emoji: 👶
4
+ colorFrom: gray
5
+ colorTo: gray
6
+ sdk: gradio
7
+ sdk_version: "5.23.0"
8
+ app_file: app.py
9
+ pinned: false
10
+ license: mit
11
+ ---
app.py ADDED
@@ -0,0 +1,204 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """TotTalk Cry Eval — Gradio web UI."""
2
+
3
+ from __future__ import annotations
4
+
5
+ from collections import Counter
6
+
7
+ import gradio as gr
8
+ import librosa
9
+ import numpy as np
10
+
11
+ from audio.preprocess import SAMPLE_RATE, is_silent, normalize_audio, resample
12
+ from models.base import LABEL_EMOJI, LABEL_MEANING, CryPrediction
13
+ from models.ensemble import EnsembleClassifier, compute_consensus
14
+
15
+ # ── Load models at startup (cached in process) ───────────────────────────────
16
+ ensemble = EnsembleClassifier(use_yamnet_gate=True)
17
+ ensemble.load_all()
18
+
19
+
20
+ # ── Core analysis function ────────────────────────────────────────────────────
21
+ def analyze(audio_tuple: tuple[int, np.ndarray] | None) -> str:
22
+ """Accept audio from Gradio, run ensemble, return styled HTML."""
23
+ if audio_tuple is None:
24
+ return _wrap("Upload or record audio to get started.")
25
+
26
+ sr, data = audio_tuple
27
+
28
+ # Gradio gives int16 or float — normalize to float32
29
+ if data.dtype != np.float32:
30
+ data = data.astype(np.float32) / max(np.abs(data).max(), 1)
31
+
32
+ # Mono
33
+ if data.ndim > 1:
34
+ data = data.mean(axis=1)
35
+
36
+ # Resample to 16 kHz
37
+ if sr != SAMPLE_RATE:
38
+ data = resample(data, sr, SAMPLE_RATE)
39
+
40
+ # Pick the loudest 1-second window
41
+ window_len = SAMPLE_RATE
42
+ hop = window_len // 2
43
+ best_window = None
44
+ best_rms = 0.0
45
+
46
+ for start in range(0, len(data) - window_len + 1, hop):
47
+ chunk = data[start : start + window_len]
48
+ rms = float(np.sqrt(np.mean(chunk**2)))
49
+ if rms > best_rms:
50
+ best_rms = rms
51
+ best_window = chunk
52
+
53
+ if best_window is None or is_silent(best_window):
54
+ return _card("Result", "No cry detected",
55
+ "The audio seems silent or doesn't contain a baby cry.")
56
+
57
+ best_window = normalize_audio(best_window)
58
+ predictions = ensemble.predict_all(best_window, SAMPLE_RATE)
59
+
60
+ return _render_results(predictions)
61
+
62
+
63
+ # ── HTML renderers ────────────────────────────────────────────────────────────
64
+ def _render_results(predictions: list[CryPrediction]) -> str:
65
+ """Build the full results HTML."""
66
+ parts: list[str] = []
67
+
68
+ # Consensus
69
+ consensus_text = compute_consensus(predictions)
70
+ if consensus_text:
71
+ valid = [p.label for p in predictions
72
+ if p.model_name != "YAMNet-detector"
73
+ and not p.error
74
+ and p.label not in ("no_cry", "timeout", "error")]
75
+ winning = Counter(valid).most_common(1)[0][0] if valid else ""
76
+ advice = LABEL_MEANING.get(winning, "")
77
+ parts.append(_card("Consensus", consensus_text, advice))
78
+
79
+ # Model breakdown
80
+ parts.append('<div style="margin-top:1.25rem; font-size:0.7rem; '
81
+ 'text-transform:uppercase; letter-spacing:0.08em; '
82
+ 'color:#666; font-weight:500;">Model breakdown</div>')
83
+
84
+ for pred in predictions:
85
+ if pred.label == "no_cry" and pred.confidence == 0.0:
86
+ continue
87
+ emoji = LABEL_EMOJI.get(pred.label, "")
88
+ label = pred.label.replace("_", " ").title()
89
+ pct = int(pred.confidence * 100)
90
+ parts.append(
91
+ f'<div style="background:#111; border:1px solid #222; '
92
+ f'border-radius:12px; padding:1.1rem 1.4rem; margin-top:0.6rem;">'
93
+ f'<div style="font-size:0.7rem; text-transform:uppercase; '
94
+ f'letter-spacing:0.08em; color:#666; font-weight:500;">{pred.model_name}</div>'
95
+ f'<div style="font-size:1.3rem; font-weight:600; color:#fff; '
96
+ f'margin-top:0.15rem;">{emoji} {label}</div>'
97
+ f'<div style="font-size:0.8rem; color:#666; margin-top:0.1rem;">'
98
+ f'{pct}% confidence · {pred.latency_ms:.0f} ms</div>'
99
+ f'<div style="background:#1a1a1a; border-radius:4px; height:6px; '
100
+ f'margin-top:0.4rem;">'
101
+ f'<div style="background:#fff; border-radius:4px; height:6px; '
102
+ f'width:{pct}%;"></div></div></div>'
103
+ )
104
+
105
+ return "\n".join(parts)
106
+
107
+
108
+ def _card(title: str, main: str, sub: str = "") -> str:
109
+ """A centered highlight card."""
110
+ sub_html = f'<div style="font-size:0.85rem; color:#666; margin-top:0.5rem; font-style:italic;">{sub}</div>' if sub else ""
111
+ return (
112
+ f'<div style="background:#111; border:1px solid #333; border-radius:12px; '
113
+ f'padding:1.5rem; text-align:center;">'
114
+ f'<div style="font-size:0.7rem; text-transform:uppercase; '
115
+ f'letter-spacing:0.1em; color:#666;">{title}</div>'
116
+ f'<div style="font-size:1.7rem; font-weight:300; color:#fff; '
117
+ f'margin-top:0.25rem;">{main}</div>'
118
+ f'{sub_html}</div>'
119
+ )
120
+
121
+
122
+ def _wrap(msg: str) -> str:
123
+ return f'<div style="text-align:center; color:#666; padding:2rem 0;">{msg}</div>'
124
+
125
+
126
+ # ── Custom CSS for dark monochrome look ───────────────────────────────────────
127
+ CUSTOM_CSS = """
128
+ @import url('https://fonts.googleapis.com/css2?family=Inter:wght@300;400;500;600;700&display=swap');
129
+ body, .gradio-container { font-family: 'Inter', sans-serif !important; }
130
+ .gradio-container { max-width: 720px !important; margin: auto !important; }
131
+ footer { display: none !important; }
132
+ h1 { font-weight: 300 !important; letter-spacing: -0.03em !important; }
133
+ """
134
+
135
+ # ── Theme ─────────────────────────────────────────────────────────────────────
136
+ THEME = gr.themes.Base(
137
+ primary_hue=gr.themes.colors.gray,
138
+ secondary_hue=gr.themes.colors.gray,
139
+ neutral_hue=gr.themes.colors.gray,
140
+ font=gr.themes.GoogleFont("Inter"),
141
+ ).set(
142
+ body_background_fill="#0a0a0a",
143
+ body_background_fill_dark="#0a0a0a",
144
+ block_background_fill="#111111",
145
+ block_background_fill_dark="#111111",
146
+ block_border_color="#222222",
147
+ block_border_color_dark="#222222",
148
+ block_label_text_color="#666666",
149
+ block_label_text_color_dark="#666666",
150
+ block_title_text_color="#e0e0e0",
151
+ block_title_text_color_dark="#e0e0e0",
152
+ body_text_color="#e0e0e0",
153
+ body_text_color_dark="#e0e0e0",
154
+ body_text_color_subdued="#666666",
155
+ body_text_color_subdued_dark="#666666",
156
+ button_primary_background_fill="transparent",
157
+ button_primary_background_fill_dark="transparent",
158
+ button_primary_border_color="#222222",
159
+ button_primary_border_color_dark="#222222",
160
+ button_primary_text_color="#e0e0e0",
161
+ button_primary_text_color_dark="#e0e0e0",
162
+ input_background_fill="#111111",
163
+ input_background_fill_dark="#111111",
164
+ input_border_color="#222222",
165
+ input_border_color_dark="#222222",
166
+ )
167
+
168
+ # ── App ───────────────────────────────────────────────────────────────────────
169
+ with gr.Blocks(title="TotTalk · Cry Classifier") as app:
170
+
171
+ gr.Markdown("# 👶 TotTalk\nUpload or record a baby cry and get an instant multi-model analysis.")
172
+
173
+ with gr.Tabs():
174
+ with gr.TabItem("🎙 Record"):
175
+ mic_input = gr.Audio(
176
+ sources=["microphone"],
177
+ type="numpy",
178
+ label="Record from mic",
179
+ )
180
+ mic_btn = gr.Button("Analyze recording", variant="primary", size="lg")
181
+ with gr.TabItem("📁 Upload file"):
182
+ file_input = gr.Audio(
183
+ sources=["upload"],
184
+ type="numpy",
185
+ label="Upload WAV / MP3 / FLAC",
186
+ )
187
+ file_btn = gr.Button("Analyze file", variant="primary", size="lg")
188
+
189
+ output = gr.HTML(
190
+ value=_wrap("Upload or record audio above, then click Analyze."),
191
+ label="Results",
192
+ )
193
+
194
+ mic_btn.click(fn=analyze, inputs=mic_input, outputs=output)
195
+ file_btn.click(fn=analyze, inputs=file_input, outputs=output)
196
+
197
+ gr.Markdown(
198
+ '<p style="text-align:center; font-size:0.75rem; color:#444; margin-top:2rem;">'
199
+ "TotTalk Cry Eval · Open-source multi-model comparison tool · "
200
+ "Models run server-side — your audio is not stored.</p>"
201
+ )
202
+
203
+ if __name__ == "__main__":
204
+ app.launch(theme=THEME, css=CUSTOM_CSS)
pyproject.toml CHANGED
@@ -18,6 +18,7 @@ dependencies = [
18
  "rich>=13.7.0",
19
  "click>=8.1.0",
20
  "soundfile>=0.12.0",
 
21
  ]
22
 
23
  [project.scripts]
 
18
  "rich>=13.7.0",
19
  "click>=8.1.0",
20
  "soundfile>=0.12.0",
21
+ "gradio>=4.0.0",
22
  ]
23
 
24
  [project.scripts]
requirements.txt ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ numpy>=1.24.0
2
+ librosa>=0.10.0
3
+ scikit-learn>=1.3.0
4
+ joblib>=1.3.0
5
+ torch>=2.1.0
6
+ torchaudio>=2.1.0
7
+ transformers>=4.38.0
8
+ tensorflow>=2.15.0
9
+ tensorflow-hub>=0.15.0
10
+ huggingface-hub>=0.20.0
11
+ soundfile>=0.12.0
12
+ gradio>=4.0.0
uv.lock CHANGED
The diff for this file is too large to render. See raw diff