testingfaces commited on
Commit
933850f
·
verified ·
1 Parent(s): ffe4c21

Upload 3 files

Browse files
Files changed (3) hide show
  1. README.md +51 -6
  2. app.py +272 -0
  3. requirements.txt +9 -0
README.md CHANGED
@@ -1,14 +1,59 @@
1
  ---
2
- title: Clearwave Ai
3
- emoji: 🌖
4
  colorFrom: blue
5
- colorTo: red
6
  sdk: gradio
7
- sdk_version: 6.8.0
8
  app_file: app.py
9
  pinned: false
10
  license: mit
11
- short_description: 'AI audio pipeline: denoise, transcribe, and translate audio '
12
  ---
13
 
14
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: ClearWave AI
3
+ emoji: 🎵
4
  colorFrom: blue
5
+ colorTo: green
6
  sdk: gradio
7
+ sdk_version: "4.0.0"
8
  app_file: app.py
9
  pinned: false
10
  license: mit
 
11
  ---
12
 
13
+ # 🎵 ClearWave AI
14
+
15
+ **Professional 3-Department Audio Processing Pipeline**
16
+ Runs 100% free on Hugging Face ZeroGPU (A10G · 24 GB VRAM)
17
+
18
+ ## What It Does
19
+
20
+ Upload any audio file and ClearWave AI runs it through three AI departments:
21
+
22
+ | Dept | Model | What it does |
23
+ |------|-------|--------------|
24
+ | 🎙️ Denoiser | DeepFilterNet3 | Removes background noise, EBU R128 normalisation |
25
+ | 📝 Transcriber | Groq Whisper large-v3 | Speech-to-text, 10-20x faster than local Whisper |
26
+ | 🌐 Translator | NLLB-200-distilled-600M | Offline translation, 200 languages |
27
+
28
+ **Example:**
29
+ ```
30
+ Input : English audio "Hello this is a test"
31
+ Original (EN) : Hello this is a test
32
+ Translated (TE): హలో ఇది ఒక పరీక్ష
33
+ Total time : ~6 seconds
34
+ ```
35
+
36
+ ## Setting Your Groq API Key
37
+
38
+ 1. Get a free key at [console.groq.com](https://console.groq.com)
39
+ 2. In your Space: **Settings → Variables and secrets → New secret**
40
+ 3. Name: `GROQ_API_KEY`, Value: your key (`gsk_...`)
41
+ 4. Save — Space restarts automatically
42
+
43
+ Without a key, the app falls back to local Whisper small (still works, slower).
44
+
45
+ ## How to Use
46
+
47
+ 1. Upload any audio file (MP3, WAV, AAC, OGG, M4A, FLAC, M4A, OPUS...)
48
+ 2. Set Input Language (or leave as Auto Detect)
49
+ 3. Set Output Language
50
+ 4. Click **Process Audio**
51
+ 5. View results in the Text Results, Clean Audio, and Timings tabs
52
+
53
+ ## Supported Languages
54
+
55
+ English · Telugu · Hindi · Tamil · Kannada (+ 195 more via NLLB-200)
56
+
57
+ ## Cost
58
+
59
+ **$0** — Hugging Face ZeroGPU + Groq free tier (14,400s audio/day)
app.py ADDED
@@ -0,0 +1,272 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ ClearWave AI - Cloud Audio Processing Pipeline
3
+ Deployed on Hugging Face Spaces with ZeroGPU
4
+ """
5
+
6
+ import gradio as gr
7
+ import spaces
8
+ import os
9
+ import time
10
+ import tempfile
11
+ import shutil
12
+
13
+ from services.denoiser import Denoiser
14
+ from services.transcriber import Transcriber
15
+ from services.translator import Translator
16
+
17
+ # ─────────────────────────────────────────────
18
+ # Init all 3 departments ONCE at startup
19
+ # ─────────────────────────────────────────────
20
+ print("🚀 ClearWave AI starting up...")
21
+ denoiser = Denoiser()
22
+ transcriber = Transcriber()
23
+ translator = Translator()
24
+ print("✅ All 3 departments ready!")
25
+
26
+ # ─────────────────────────────────────────────
27
+ # Language mappings
28
+ # ─────────────────────────────────────────────
29
+ INPUT_LANG_MAP = {
30
+ "Auto Detect": "auto",
31
+ "English": "en",
32
+ "Telugu": "te",
33
+ "Hindi": "hi",
34
+ "Tamil": "ta",
35
+ "Kannada": "kn",
36
+ }
37
+
38
+ OUTPUT_LANG_MAP = {
39
+ "Telugu": "te",
40
+ "Hindi": "hi",
41
+ "Tamil": "ta",
42
+ "English": "en",
43
+ "Kannada": "kn",
44
+ }
45
+
46
+ LANG_BADGES = {
47
+ "en": "🇬🇧 English",
48
+ "te": "🇮🇳 Telugu",
49
+ "hi": "🇮🇳 Hindi",
50
+ "ta": "🇮🇳 Tamil",
51
+ "kn": "🇮🇳 Kannada",
52
+ "auto": "🔍 Auto-detected",
53
+ }
54
+
55
+ # ─────────────────────────────────────────────
56
+ # Core pipeline
57
+ # ─────────────────────────────────────────────
58
+ @spaces.GPU
59
+ def process_audio(audio_path, input_lang_label, output_lang_label, progress=gr.Progress()):
60
+ if audio_path is None:
61
+ return None, "⚠️ Please upload an audio file.", "", "", "❌ No audio uploaded"
62
+
63
+ input_lang = INPUT_LANG_MAP.get(input_lang_label, "auto")
64
+ output_lang = OUTPUT_LANG_MAP.get(output_lang_label, "te")
65
+
66
+ temp_dir = tempfile.mkdtemp(prefix="clearwave_")
67
+ timings = {}
68
+ total_start = time.time()
69
+
70
+ try:
71
+ # ─── Dept 1: Denoise ─────────────────────────
72
+ progress(0.05, desc="🎙️ Dept 1 — Denoising audio with DeepFilterNet3…")
73
+ t0 = time.time()
74
+ denoised_path = denoiser.process(audio_path, temp_dir)
75
+ timings["denoise"] = time.time() - t0
76
+ progress(0.40, desc=f"✅ Denoised in {timings['denoise']:.1f}s")
77
+
78
+ # ─── Dept 2: Transcribe ───────────────────────
79
+ progress(0.45, desc="📝 Dept 2 — Transcribing with Groq Whisper large-v3…")
80
+ t0 = time.time()
81
+ transcript, detected_lang, tx_method = transcriber.transcribe(
82
+ denoised_path, language=input_lang
83
+ )
84
+ timings["transcribe"] = time.time() - t0
85
+ progress(0.75, desc=f"✅ Transcribed in {timings['transcribe']:.1f}s [{tx_method}]")
86
+
87
+ # ─── Dept 3: Translate ────────────────────────
88
+ progress(0.80, desc="🌐 Dept 3 — Translating with NLLB-200…")
89
+ t0 = time.time()
90
+
91
+ effective_src = detected_lang if input_lang == "auto" else input_lang
92
+ if effective_src == output_lang:
93
+ translated = transcript
94
+ tr_method = "skipped (same language)"
95
+ else:
96
+ translated, tr_method = translator.translate(
97
+ transcript, src_lang=effective_src, tgt_lang=output_lang
98
+ )
99
+ timings["translate"] = time.time() - t0
100
+ progress(0.95, desc=f"✅ Translated in {timings['translate']:.1f}s [{tr_method}]")
101
+
102
+ total_time = time.time() - total_start
103
+
104
+ # ─── Format outputs ───────────────────────────
105
+ src_badge = LANG_BADGES.get(effective_src, "🔍 Unknown")
106
+ tgt_badge = LANG_BADGES.get(output_lang, "🌐")
107
+
108
+ transcript_md = f"**{src_badge}**\n\n{transcript}"
109
+ translated_md = f"**{tgt_badge}**\n\n{translated}"
110
+
111
+ timing_md = (
112
+ f"### ⏱️ Processing Times\n\n"
113
+ f"| Department | Time | Method |\n"
114
+ f"|---|---|---|\n"
115
+ f"| 🎙️ Denoiser (Dept 1) | `{timings['denoise']:.1f}s` | DeepFilterNet3 |\n"
116
+ f"| 📝 Transcriber (Dept 2) | `{timings['transcribe']:.1f}s` | {tx_method} |\n"
117
+ f"| 🌐 Translator (Dept 3) | `{timings['translate']:.1f}s` | {tr_method} |\n"
118
+ f"| **⚡ Total** | **`{total_time:.1f}s`** | 3-dept pipeline |\n\n"
119
+ f"> Running on Hugging Face ZeroGPU (A10G 24GB) — 100% free"
120
+ )
121
+
122
+ progress(1.0, desc=f"🎉 Complete! {total_time:.1f}s total")
123
+
124
+ # Copy denoised file to stable output path
125
+ out_audio = os.path.join(temp_dir, "clearwave_denoised.wav")
126
+ shutil.copy(denoised_path, out_audio)
127
+
128
+ return (
129
+ out_audio,
130
+ transcript_md,
131
+ translated_md,
132
+ timing_md,
133
+ f"✅ Pipeline complete in {total_time:.1f}s"
134
+ )
135
+
136
+ except Exception as e:
137
+ import traceback
138
+ err = traceback.format_exc()
139
+ print(f"[ClearWave] Pipeline error:\n{err}")
140
+ # Clean up temp on error
141
+ shutil.rmtree(temp_dir, ignore_errors=True)
142
+ return (
143
+ None,
144
+ f"❌ Error: {str(e)}",
145
+ "",
146
+ f"**Error details:**\n```\n{err}\n```",
147
+ f"❌ Failed — {str(e)}"
148
+ )
149
+
150
+
151
+ # ─────────────────────────────────────────────
152
+ # UI
153
+ # ─────────────────────────────────────────────
154
+ CSS = """
155
+ body, .gradio-container { background:#0d1117 !important; color:#e6edf3 !important; }
156
+
157
+ .header-wrap {
158
+ background: linear-gradient(135deg,#161b22,#1c2128);
159
+ border:1px solid #30363d; border-radius:12px;
160
+ padding:28px 32px; margin-bottom:18px; text-align:center;
161
+ }
162
+ .header-wrap h1 {
163
+ font-size:2.2em; font-weight:700; margin:0 0 6px;
164
+ background:linear-gradient(90deg,#58a6ff,#3fb950,#f78166);
165
+ -webkit-background-clip:text; -webkit-text-fill-color:transparent;
166
+ }
167
+ .header-wrap p { color:#8b949e; font-size:0.98em; margin:0; }
168
+
169
+ .pipe-strip {
170
+ display:flex; gap:8px; justify-content:center; flex-wrap:wrap; margin-bottom:14px;
171
+ }
172
+ .dept-pill {
173
+ background:#21262d; border:1px solid #30363d;
174
+ border-radius:20px; padding:5px 14px;
175
+ font-size:0.82em; color:#8b949e;
176
+ }
177
+
178
+ .panel { background:#161b22 !important; border:1px solid #30363d !important; border-radius:10px !important; }
179
+
180
+ footer { display:none !important; }
181
+ """
182
+
183
+ with gr.Blocks(css=CSS, title="ClearWave AI", theme=gr.themes.Base()) as demo:
184
+
185
+ # Header
186
+ gr.HTML("""
187
+ <div class="header-wrap">
188
+ <h1>🎵 ClearWave AI</h1>
189
+ <p>Professional 3-Department Audio Processing Pipeline · ZeroGPU · 100% Free</p>
190
+ </div>
191
+ <div class="pipe-strip">
192
+ <span class="dept-pill">🎙️ Dept 1 · DeepFilterNet3 Denoiser</span>
193
+ <span class="dept-pill">📝 Dept 2 · Groq Whisper large-v3</span>
194
+ <span class="dept-pill">🌐 Dept 3 · NLLB-200 Translator</span>
195
+ </div>
196
+ """)
197
+
198
+ with gr.Row(equal_height=False):
199
+
200
+ # ── Left: Input controls ──────────────────────
201
+ with gr.Column(scale=1, min_width=280):
202
+ audio_in = gr.Audio(
203
+ label="🎤 Upload or Record Audio",
204
+ type="filepath",
205
+ sources=["upload", "microphone"],
206
+ )
207
+
208
+ with gr.Group():
209
+ input_lang = gr.Dropdown(
210
+ label="Input Language",
211
+ choices=list(INPUT_LANG_MAP.keys()),
212
+ value="Auto Detect",
213
+ )
214
+ output_lang = gr.Dropdown(
215
+ label="Output Language",
216
+ choices=list(OUTPUT_LANG_MAP.keys()),
217
+ value="Telugu",
218
+ )
219
+
220
+ run_btn = gr.Button("⚡ Process Audio", variant="primary", size="lg")
221
+ status_md = gr.Markdown("*Upload audio and press Process.*")
222
+
223
+ # ── Right: Results ────────────────────────────
224
+ with gr.Column(scale=2):
225
+ with gr.Tabs():
226
+ with gr.Tab("📝 Text Results"):
227
+ with gr.Row():
228
+ with gr.Column():
229
+ gr.Markdown("#### Original Transcript")
230
+ transcript_out = gr.Markdown("*Will appear here…*")
231
+ with gr.Column():
232
+ gr.Markdown("#### Translation")
233
+ translation_out = gr.Markdown("*Will appear here…*")
234
+
235
+ with gr.Tab("🎵 Clean Audio"):
236
+ audio_out = gr.Audio(
237
+ label="Denoised Audio (download)",
238
+ type="filepath",
239
+ interactive=False,
240
+ )
241
+ gr.Markdown(
242
+ "*Noise-cancelled with DeepFilterNet3, "
243
+ "normalized to EBU R128 broadcast standard.*"
244
+ )
245
+
246
+ with gr.Tab("⏱️ Timings"):
247
+ timing_out = gr.Markdown("*Timings will appear after processing…*")
248
+
249
+ # Footer
250
+ gr.HTML("""
251
+ <div style="text-align:center;padding:16px;color:#484f58;font-size:0.8em;
252
+ border-top:1px solid #21262d;margin-top:16px;">
253
+ ClearWave AI · DeepFilterNet3 + Groq Whisper large-v3 + NLLB-200-distilled-600M ·
254
+ Hugging Face ZeroGPU (A10G 24GB)
255
+ </div>
256
+ """)
257
+
258
+ # Wire up
259
+ run_btn.click(
260
+ fn=process_audio,
261
+ inputs=[audio_in, input_lang, output_lang],
262
+ outputs=[audio_out, transcript_out, translation_out, timing_out, status_md],
263
+ show_progress=True,
264
+ )
265
+
266
+ if __name__ == "__main__":
267
+ demo.launch(
268
+ server_name="0.0.0.0",
269
+ server_port=7860,
270
+ show_error=True,
271
+ max_file_size="100mb",
272
+ )
requirements.txt ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ deepfilternet
2
+ soundfile
3
+ pyloudnorm
4
+ groq
5
+ faster-whisper
6
+ sentencepiece
7
+ sacremoses
8
+ deep-translator
9
+ gradio>=4.0.0