Ashish Kumar commited on
Commit
65fbf1d
Β·
0 Parent(s):

Fix: Add @spaces.GPU function to suppress runtime error

Browse files
Files changed (4) hide show
  1. DEPLOYMENT.md +62 -0
  2. README.md +48 -0
  3. app.py +279 -0
  4. requirements.txt +6 -0
DEPLOYMENT.md ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Deploy to Hugging Face Space
2
+
3
+ ## Quick Deployment Steps
4
+
5
+ 1. **Go to your Space**: https://huggingface.co/spaces/ashishkblink/NuralVoice
6
+
7
+ 2. **Upload Files**:
8
+ - Click on "Files" tab
9
+ - Upload these files:
10
+ - `app.py` (main application)
11
+ - `requirements.txt` (dependencies)
12
+ - `README.md` (optional, but recommended)
13
+
14
+ 3. **Wait for Build**:
15
+ - Hugging Face will automatically:
16
+ - Install dependencies from `requirements.txt`
17
+ - Download your NuralVoiceSTT model
18
+ - Start the Gradio app
19
+ - First build takes ~5-10 minutes (model download)
20
+ - You'll see build logs in real-time
21
+
22
+ 4. **Test Your Playground**:
23
+ - Once built, click "App" tab
24
+ - Click the microphone button
25
+ - Allow microphone permissions
26
+ - Start speaking!
27
+
28
+ ## Files to Upload
29
+
30
+ Make sure these files are in your Space:
31
+
32
+ ```
33
+ hf_space/
34
+ β”œβ”€β”€ app.py ← Main playground application
35
+ β”œβ”€β”€ requirements.txt ← Python dependencies
36
+ └── README.md ← Space description (optional)
37
+ ```
38
+
39
+ ## What the Playground Does
40
+
41
+ - βœ… Real-time microphone input
42
+ - βœ… Live transcription as you speak
43
+ - βœ… Beautiful, user-friendly interface
44
+ - βœ… Automatic model download from your HF repo
45
+ - βœ… Works directly in the browser
46
+
47
+ ## Troubleshooting
48
+
49
+ If the app doesn't work:
50
+ 1. Check build logs for errors
51
+ 2. Verify model repo ID is correct: `ashishkblink/NuralVoiceSTT`
52
+ 3. Make sure all files are uploaded
53
+ 4. Check that Gradio version is compatible
54
+
55
+ ## Customization
56
+
57
+ You can customize:
58
+ - Colors in the `custom_css` section
59
+ - Instructions text
60
+ - UI layout
61
+ - Model settings
62
+
README.md ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: NuralVoiceSTT Playground
3
+ emoji: 🎀
4
+ colorFrom: blue
5
+ colorTo: purple
6
+ sdk: gradio
7
+ sdk_version: 4.0.0
8
+ app_file: app.py
9
+ pinned: false
10
+ license: apache-2.0
11
+ ---
12
+
13
+ # NuralVoiceSTT Playground
14
+
15
+ **Developed by Blink Digital**
16
+
17
+ Real-time speech-to-text playground. Click, speak, and watch your words appear instantly!
18
+
19
+ ## Features
20
+
21
+ - πŸŽ™οΈ **Live Microphone Input** - Click to start recording
22
+ - ⚑ **Real-time Transcription** - See text appear as you speak
23
+ - 🎯 **High Accuracy** - Powered by NuralVoiceSTT model
24
+ - 🌐 **Browser-based** - No installation needed
25
+ - πŸ”’ **Privacy-friendly** - Audio processed in real-time
26
+
27
+ ## How to Use
28
+
29
+ 1. Click the **microphone button**
30
+ 2. Allow microphone permissions when prompted
31
+ 3. Start speaking clearly into your microphone
32
+ 4. Watch your speech convert to text in real-time!
33
+ 5. Click **"Stop Recording"** when finished
34
+
35
+ ## Tips for Best Results
36
+
37
+ - Speak clearly and at a moderate pace
38
+ - Reduce background noise
39
+ - Use a good quality microphone
40
+ - Wait a moment after speaking to see final results
41
+
42
+ ## About
43
+
44
+ NuralVoiceSTT is a high-accuracy English speech-to-text model developed by Blink Digital, optimized for both callcenter and wideband audio scenarios.
45
+
46
+ ---
47
+
48
+ **Developed by Blink Digital** | [Model Repository](https://huggingface.co/ashishkblink/NuralVoiceSTT)
app.py ADDED
@@ -0,0 +1,279 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ NuralVoiceSTT Playground - Hugging Face Space
3
+ Real-time speech-to-text playground with microphone input
4
+ Developed by Blink Digital
5
+
6
+ Note: This app uses CPU (Vosk doesn't require GPU), but we declare a GPU function
7
+ to suppress the warning if the Space is configured with GPU hardware.
8
+ """
9
+ import gradio as gr
10
+ import json
11
+ import numpy as np
12
+ import os
13
+ import sys
14
+
15
+ # Declare GPU function to suppress Hugging Face Spaces warning
16
+ # This is required if the Space is configured with GPU hardware
17
+ # Even though we use CPU, this prevents the runtime error
18
+ try:
19
+ import spaces
20
+ @spaces.GPU
21
+ def gpu_function():
22
+ """Dummy GPU function to satisfy Hugging Face Spaces GPU requirement"""
23
+ # Vosk runs on CPU, so this function does nothing
24
+ # It's only here to suppress the "No @spaces.GPU function detected" warning
25
+ pass
26
+ except ImportError:
27
+ # If spaces module is not available, we're not running on HF Spaces
28
+ pass
29
+
30
+ # Try to import vosk, but handle gracefully if it fails
31
+ try:
32
+ from vosk import Model, KaldiRecognizer, SetLogLevel
33
+ from huggingface_hub import snapshot_download
34
+ VOSK_AVAILABLE = True
35
+ SetLogLevel(-1)
36
+ except ImportError as e:
37
+ print(f"Warning: Vosk not available: {e}")
38
+ VOSK_AVAILABLE = False
39
+
40
+ # Global model variable
41
+ model = None
42
+ model_path = None
43
+ model_loading = False
44
+
45
+ def load_model():
46
+ """Load the NuralVoiceSTT model from Hugging Face"""
47
+ global model, model_path, model_loading
48
+
49
+ if not VOSK_AVAILABLE:
50
+ return None
51
+
52
+ if model is not None:
53
+ return model
54
+
55
+ if model_loading:
56
+ return None
57
+
58
+ model_loading = True
59
+ try:
60
+ print("Loading NuralVoiceSTT model from Hugging Face...")
61
+
62
+ # Download model from Hugging Face (now public, no token needed)
63
+ # Hugging Face Spaces automatically provides HF_TOKEN if needed
64
+ token = os.environ.get("HF_TOKEN", None)
65
+
66
+ model_path = snapshot_download(
67
+ repo_id="ashishkblink/NuralVoiceSTT",
68
+ local_dir="./nuralvoice_model",
69
+ token=token # Will be None for public repo, but Spaces may provide it
70
+ )
71
+
72
+ # Load the model
73
+ model = Model(model_path)
74
+ print("βœ… Model loaded successfully!")
75
+ model_loading = False
76
+ return model
77
+ except Exception as e:
78
+ print(f"Error loading model: {e}")
79
+ print(f"Error type: {type(e).__name__}")
80
+ model_loading = False
81
+ return None
82
+
83
+ # Global recognizer for streaming (one per session)
84
+ recognizer = None
85
+ current_sample_rate = None
86
+
87
+ def process_streaming_audio(audio_data):
88
+ """
89
+ Process streaming audio in real-time and return transcription as you speak
90
+ This function is called continuously during recording
91
+ """
92
+ global model, recognizer, current_sample_rate
93
+
94
+ if not VOSK_AVAILABLE:
95
+ return "❌ Error: Vosk library not available."
96
+
97
+ if model is None:
98
+ model = load_model()
99
+ if model is None:
100
+ return "⏳ Loading model... Please wait a moment."
101
+
102
+ if audio_data is None:
103
+ recognizer = None
104
+ current_sample_rate = None
105
+ return ""
106
+
107
+ try:
108
+ sample_rate, audio_array = audio_data
109
+
110
+ # Initialize recognizer if sample rate changed or first time
111
+ if recognizer is None or current_sample_rate != sample_rate:
112
+ recognizer = KaldiRecognizer(model, sample_rate)
113
+ recognizer.SetWords(True)
114
+ current_sample_rate = sample_rate
115
+
116
+ # Convert to numpy array if needed
117
+ if isinstance(audio_array, list):
118
+ audio_array = np.array(audio_array, dtype=np.float32)
119
+
120
+ # Normalize audio to [-1, 1] if needed
121
+ if audio_array.dtype != np.int16:
122
+ if audio_array.max() > 1.0 or audio_array.min() < -1.0:
123
+ max_val = np.max(np.abs(audio_array))
124
+ if max_val > 0:
125
+ audio_array = audio_array / max_val
126
+ audio_array = (audio_array * 32767).astype(np.int16)
127
+
128
+ # Convert to bytes
129
+ audio_bytes = audio_array.tobytes()
130
+
131
+ # Process audio chunk in real-time
132
+ if recognizer.AcceptWaveform(audio_bytes):
133
+ # Final result for this chunk
134
+ result = json.loads(recognizer.Result())
135
+ if 'text' in result and result['text']:
136
+ return result['text']
137
+ else:
138
+ # Partial result (still processing)
139
+ partial = json.loads(recognizer.PartialResult())
140
+ if 'partial' in partial and partial['partial']:
141
+ return partial['partial']
142
+
143
+ return ""
144
+
145
+ except Exception as e:
146
+ return f"❌ Error: {str(e)}"
147
+
148
+ def get_final_transcription(audio_data):
149
+ """Get final transcription when recording stops"""
150
+ global recognizer
151
+
152
+ if recognizer is None:
153
+ return ""
154
+
155
+ try:
156
+ final_result = json.loads(recognizer.FinalResult())
157
+ recognizer = None # Reset for next session
158
+ if 'text' in final_result and final_result['text']:
159
+ return final_result['text']
160
+ except:
161
+ recognizer = None
162
+
163
+ return ""
164
+
165
+ # Create Gradio interface
166
+ with gr.Blocks(
167
+ title="NuralVoiceSTT Playground - Blink Digital"
168
+ ) as demo:
169
+
170
+ # Header
171
+ gr.Markdown("""
172
+ # 🎀 NuralVoiceSTT Playground
173
+
174
+ **Developed by Blink Digital**
175
+
176
+ **Real-time streaming speech-to-text** - See your words appear instantly as you speak!
177
+ """)
178
+
179
+ # Instructions
180
+ with gr.Accordion("πŸ“‹ How to Use", open=False):
181
+ gr.Markdown("""
182
+ 1. Click the **microphone button** below
183
+ 2. Allow microphone permissions when prompted
184
+ 3. Start speaking - **text appears in real-time as you speak!**
185
+ 4. No need to stop - it streams continuously
186
+ 5. Click **"Stop"** when finished
187
+ """)
188
+
189
+ with gr.Row():
190
+ with gr.Column():
191
+ gr.Markdown("### πŸŽ™οΈ Live Audio Stream")
192
+ microphone = gr.Audio(
193
+ label="Click to Start Streaming",
194
+ type="numpy",
195
+ sources=["microphone"],
196
+ streaming=True, # Enable streaming mode
197
+ show_label=True
198
+ )
199
+ status = gr.HTML("""
200
+ <div style="padding: 10px; background: #d4edda; color: #155724; border-radius: 5px; margin-top: 10px;">
201
+ βœ… Ready - Click microphone to start real-time transcription
202
+ </div>
203
+ """)
204
+
205
+ with gr.Column():
206
+ gr.Markdown("### πŸ“ Live Transcription")
207
+ output = gr.Textbox(
208
+ label="Real-time Text Output",
209
+ lines=12,
210
+ placeholder="Your speech will appear here in real-time as you speak...",
211
+ interactive=False,
212
+ autoscroll=True
213
+ )
214
+
215
+ # Tips
216
+ with gr.Accordion("πŸ’‘ Tips for Best Results", open=False):
217
+ gr.Markdown("""
218
+ - Speak clearly and at a moderate pace
219
+ - Reduce background noise for better accuracy
220
+ - Use a good quality microphone if possible
221
+ - Wait a moment after speaking to see final results
222
+ """)
223
+
224
+ # About
225
+ gr.Markdown("""
226
+ ---
227
+ ### About NuralVoiceSTT
228
+
229
+ **Developed by Blink Digital**
230
+
231
+ NuralVoiceSTT is a high-accuracy English speech-to-text model optimized for both callcenter and wideband audio scenarios.
232
+ """)
233
+
234
+ # Real-time streaming transcription (updates as you speak)
235
+ microphone.stream(
236
+ fn=process_streaming_audio,
237
+ inputs=microphone,
238
+ outputs=output,
239
+ show_progress=False
240
+ )
241
+
242
+ # Update status when microphone starts/stops
243
+ def update_status(audio_data):
244
+ if audio_data is None:
245
+ return gr.HTML("""
246
+ <div style="padding: 10px; background: #d4edda; color: #155724; border-radius: 5px; margin-top: 10px;">
247
+ βœ… Ready - Click microphone to start real-time transcription
248
+ </div>
249
+ """)
250
+ else:
251
+ return gr.HTML("""
252
+ <div style="padding: 10px; background: #fff3cd; color: #856404; border-radius: 5px; margin-top: 10px;">
253
+ 🎀 Streaming... Speak now - text appears in real-time!
254
+ </div>
255
+ """)
256
+
257
+ microphone.change(
258
+ fn=update_status,
259
+ inputs=microphone,
260
+ outputs=status
261
+ )
262
+
263
+ # Load model in background (non-blocking)
264
+ if VOSK_AVAILABLE:
265
+ import threading
266
+ def load_model_background():
267
+ load_model()
268
+ threading.Thread(target=load_model_background, daemon=True).start()
269
+
270
+ # Enable queuing for better performance
271
+ demo.queue()
272
+
273
+ # For Hugging Face Spaces, the demo must be accessible at module level
274
+ # Spaces will automatically call demo.launch() - we don't need to call it manually
275
+ # The demo object being defined is enough for Spaces to detect and run it
276
+
277
+ # For local testing only
278
+ if __name__ == "__main__":
279
+ demo.launch(server_name="0.0.0.0", server_port=7860, theme=gr.themes.Soft())
requirements.txt ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ gradio>=4.0.0
2
+ vosk>=0.3.45
3
+ huggingface-hub>=0.16.0
4
+ soundfile>=0.12.0
5
+ numpy>=1.21.0
6
+