Spaces:

ashishkblink
/

NuralVoice

Running

App Files Files Community

Ashish Kumar commited on Jan 7

Commit

65fbf1d

0 Parent(s):

Fix: Add @spaces.GPU function to suppress runtime error

Browse files

Files changed (4) hide show

DEPLOYMENT.md +62 -0
README.md +48 -0
app.py +279 -0
requirements.txt +6 -0

DEPLOYMENT.md ADDED Viewed

	@@ -0,0 +1,62 @@

+# Deploy to Hugging Face Space
+## Quick Deployment Steps
+1. **Go to your Space**: https://huggingface.co/spaces/ashishkblink/NuralVoice
+2. **Upload Files**:
+   - Click on "Files" tab
+   - Upload these files:
+     - `app.py` (main application)
+     - `requirements.txt` (dependencies)
+     - `README.md` (optional, but recommended)
+3. **Wait for Build**:
+   - Hugging Face will automatically:
+     - Install dependencies from `requirements.txt`
+     - Download your NuralVoiceSTT model
+     - Start the Gradio app
+   - First build takes ~5-10 minutes (model download)
+   - You'll see build logs in real-time
+4. **Test Your Playground**:
+   - Once built, click "App" tab
+   - Click the microphone button
+   - Allow microphone permissions
+   - Start speaking!
+## Files to Upload
+Make sure these files are in your Space:
+```
+hf_space/
+├── app.py              ← Main playground application
+├── requirements.txt    ← Python dependencies
+└── README.md          ← Space description (optional)
+```
+## What the Playground Does
+- ✅ Real-time microphone input
+- ✅ Live transcription as you speak
+- ✅ Beautiful, user-friendly interface
+- ✅ Automatic model download from your HF repo
+- ✅ Works directly in the browser
+## Troubleshooting
+If the app doesn't work:
+1. Check build logs for errors
+2. Verify model repo ID is correct: `ashishkblink/NuralVoiceSTT`
+3. Make sure all files are uploaded
+4. Check that Gradio version is compatible
+## Customization
+You can customize:
+- Colors in the `custom_css` section
+- Instructions text
+- UI layout
+- Model settings

README.md ADDED Viewed

	@@ -0,0 +1,48 @@

+---
+title: NuralVoiceSTT Playground
+emoji: 🎤
+colorFrom: blue
+colorTo: purple
+sdk: gradio
+sdk_version: 4.0.0
+app_file: app.py
+pinned: false
+license: apache-2.0
+---
+# NuralVoiceSTT Playground
+**Developed by Blink Digital**
+Real-time speech-to-text playground. Click, speak, and watch your words appear instantly!
+## Features
+- 🎙️ **Live Microphone Input** - Click to start recording
+- ⚡ **Real-time Transcription** - See text appear as you speak
+- 🎯 **High Accuracy** - Powered by NuralVoiceSTT model
+- 🌐 **Browser-based** - No installation needed
+- 🔒 **Privacy-friendly** - Audio processed in real-time
+## How to Use
+1. Click the **microphone button**
+2. Allow microphone permissions when prompted
+3. Start speaking clearly into your microphone
+4. Watch your speech convert to text in real-time!
+5. Click **"Stop Recording"** when finished
+## Tips for Best Results
+- Speak clearly and at a moderate pace
+- Reduce background noise
+- Use a good quality microphone
+- Wait a moment after speaking to see final results
+## About
+NuralVoiceSTT is a high-accuracy English speech-to-text model developed by Blink Digital, optimized for both callcenter and wideband audio scenarios.
+---
+**Developed by Blink Digital** | [Model Repository](https://huggingface.co/ashishkblink/NuralVoiceSTT)

app.py ADDED Viewed

	@@ -0,0 +1,279 @@

+"""
+NuralVoiceSTT Playground - Hugging Face Space
+Real-time speech-to-text playground with microphone input
+Developed by Blink Digital
+Note: This app uses CPU (Vosk doesn't require GPU), but we declare a GPU function
+to suppress the warning if the Space is configured with GPU hardware.
+"""
+import gradio as gr
+import json
+import numpy as np
+import os
+import sys
+# Declare GPU function to suppress Hugging Face Spaces warning
+# This is required if the Space is configured with GPU hardware
+# Even though we use CPU, this prevents the runtime error
+try:
+    import spaces
+    @spaces.GPU
+    def gpu_function():
+        """Dummy GPU function to satisfy Hugging Face Spaces GPU requirement"""
+        # Vosk runs on CPU, so this function does nothing
+        # It's only here to suppress the "No @spaces.GPU function detected" warning
+        pass
+except ImportError:
+    # If spaces module is not available, we're not running on HF Spaces
+    pass
+# Try to import vosk, but handle gracefully if it fails
+try:
+    from vosk import Model, KaldiRecognizer, SetLogLevel
+    from huggingface_hub import snapshot_download
+    VOSK_AVAILABLE = True
+    SetLogLevel(-1)
+except ImportError as e:
+    print(f"Warning: Vosk not available: {e}")
+    VOSK_AVAILABLE = False
+# Global model variable
+model = None
+model_path = None
+model_loading = False
+def load_model():
+    """Load the NuralVoiceSTT model from Hugging Face"""
+    global model, model_path, model_loading
+    if not VOSK_AVAILABLE:
+        return None
+    if model is not None:
+        return model
+    if model_loading:
+        return None
+    model_loading = True
+    try:
+        print("Loading NuralVoiceSTT model from Hugging Face...")
+        # Download model from Hugging Face (now public, no token needed)
+        # Hugging Face Spaces automatically provides HF_TOKEN if needed
+        token = os.environ.get("HF_TOKEN", None)
+        model_path = snapshot_download(
+            repo_id="ashishkblink/NuralVoiceSTT",
+            local_dir="./nuralvoice_model",
+            token=token  # Will be None for public repo, but Spaces may provide it
+        )
+        # Load the model
+        model = Model(model_path)
+        print("✅ Model loaded successfully!")
+        model_loading = False
+        return model
+    except Exception as e:
+        print(f"Error loading model: {e}")
+        print(f"Error type: {type(e).__name__}")
+        model_loading = False
+        return None
+# Global recognizer for streaming (one per session)
+recognizer = None
+current_sample_rate = None
+def process_streaming_audio(audio_data):
+    """
+    Process streaming audio in real-time and return transcription as you speak
+    This function is called continuously during recording
+    """
+    global model, recognizer, current_sample_rate
+    if not VOSK_AVAILABLE:
+        return "❌ Error: Vosk library not available."
+    if model is None:
+        model = load_model()
+        if model is None:
+            return "⏳ Loading model... Please wait a moment."
+    if audio_data is None:
+        recognizer = None
+        current_sample_rate = None
+        return ""
+    try:
+        sample_rate, audio_array = audio_data
+        # Initialize recognizer if sample rate changed or first time
+        if recognizer is None or current_sample_rate != sample_rate:
+            recognizer = KaldiRecognizer(model, sample_rate)
+            recognizer.SetWords(True)
+            current_sample_rate = sample_rate
+        # Convert to numpy array if needed
+        if isinstance(audio_array, list):
+            audio_array = np.array(audio_array, dtype=np.float32)
+        # Normalize audio to [-1, 1] if needed
+        if audio_array.dtype != np.int16:
+            if audio_array.max() > 1.0 or audio_array.min() < -1.0:
+                max_val = np.max(np.abs(audio_array))
+                if max_val > 0:
+                    audio_array = audio_array / max_val
+            audio_array = (audio_array * 32767).astype(np.int16)
+        # Convert to bytes
+        audio_bytes = audio_array.tobytes()
+        # Process audio chunk in real-time
+        if recognizer.AcceptWaveform(audio_bytes):
+            # Final result for this chunk
+            result = json.loads(recognizer.Result())
+            if 'text' in result and result['text']:
+                return result['text']
+        else:
+            # Partial result (still processing)
+            partial = json.loads(recognizer.PartialResult())
+            if 'partial' in partial and partial['partial']:
+                return partial['partial']
+        return ""
+    except Exception as e:
+        return f"❌ Error: {str(e)}"
+def get_final_transcription(audio_data):
+    """Get final transcription when recording stops"""
+    global recognizer
+    if recognizer is None:
+        return ""
+    try:
+        final_result = json.loads(recognizer.FinalResult())
+        recognizer = None  # Reset for next session
+        if 'text' in final_result and final_result['text']:
+            return final_result['text']
+    except:
+        recognizer = None
+    return ""
+# Create Gradio interface
+with gr.Blocks(
+    title="NuralVoiceSTT Playground - Blink Digital"
+) as demo:
+    # Header
+    gr.Markdown("""
+    # 🎤 NuralVoiceSTT Playground
+    **Developed by Blink Digital**
+    **Real-time streaming speech-to-text** - See your words appear instantly as you speak!
+    """)
+    # Instructions
+    with gr.Accordion("📋 How to Use", open=False):
+        gr.Markdown("""
+        1. Click the **microphone button** below
+        2. Allow microphone permissions when prompted
+        3. Start speaking - **text appears in real-time as you speak!**
+        4. No need to stop - it streams continuously
+        5. Click **"Stop"** when finished
+        """)
+    with gr.Row():
+        with gr.Column():
+            gr.Markdown("### 🎙️ Live Audio Stream")
+            microphone = gr.Audio(
+                label="Click to Start Streaming",
+                type="numpy",
+                sources=["microphone"],
+                streaming=True,  # Enable streaming mode
+                show_label=True
+            )
+            status = gr.HTML("""
+            <div style="padding: 10px; background: #d4edda; color: #155724; border-radius: 5px; margin-top: 10px;">
+                ✅ Ready - Click microphone to start real-time transcription
+            </div>
+            """)
+        with gr.Column():
+            gr.Markdown("### 📝 Live Transcription")
+            output = gr.Textbox(
+                label="Real-time Text Output",
+                lines=12,
+                placeholder="Your speech will appear here in real-time as you speak...",
+                interactive=False,
+                autoscroll=True
+            )
+    # Tips
+    with gr.Accordion("💡 Tips for Best Results", open=False):
+        gr.Markdown("""
+        - Speak clearly and at a moderate pace
+        - Reduce background noise for better accuracy
+        - Use a good quality microphone if possible
+        - Wait a moment after speaking to see final results
+        """)
+    # About
+    gr.Markdown("""
+    ---
+    ### About NuralVoiceSTT
+    **Developed by Blink Digital**
+    NuralVoiceSTT is a high-accuracy English speech-to-text model optimized for both callcenter and wideband audio scenarios.
+    """)
+    # Real-time streaming transcription (updates as you speak)
+    microphone.stream(
+        fn=process_streaming_audio,
+        inputs=microphone,
+        outputs=output,
+        show_progress=False
+    )
+    # Update status when microphone starts/stops
+    def update_status(audio_data):
+        if audio_data is None:
+            return gr.HTML("""
+            <div style="padding: 10px; background: #d4edda; color: #155724; border-radius: 5px; margin-top: 10px;">
+                ✅ Ready - Click microphone to start real-time transcription
+            </div>
+            """)
+        else:
+            return gr.HTML("""
+            <div style="padding: 10px; background: #fff3cd; color: #856404; border-radius: 5px; margin-top: 10px;">
+                🎤 Streaming... Speak now - text appears in real-time!
+            </div>
+            """)
+    microphone.change(
+        fn=update_status,
+        inputs=microphone,
+        outputs=status
+    )
+# Load model in background (non-blocking)
+if VOSK_AVAILABLE:
+    import threading
+    def load_model_background():
+        load_model()
+    threading.Thread(target=load_model_background, daemon=True).start()
+# Enable queuing for better performance
+demo.queue()
+# For Hugging Face Spaces, the demo must be accessible at module level
+# Spaces will automatically call demo.launch() - we don't need to call it manually
+# The demo object being defined is enough for Spaces to detect and run it
+# For local testing only
+if __name__ == "__main__":
+    demo.launch(server_name="0.0.0.0", server_port=7860, theme=gr.themes.Soft())

requirements.txt ADDED Viewed

	@@ -0,0 +1,6 @@

+gradio>=4.0.0
+vosk>=0.3.45
+huggingface-hub>=0.16.0
+soundfile>=0.12.0
+numpy>=1.21.0