Rahuluni commited on
Commit
1095508
Β·
1 Parent(s): 86fcf07

add eng only

Browse files
Files changed (2) hide show
  1. README.md +30 -6
  2. app.py +18 -10
README.md CHANGED
@@ -4,14 +4,38 @@ sdk: gradio
4
  emoji: πŸš€
5
  colorFrom: red
6
  ---
7
- # Whisper-Small Speech-to-Text (Gradio)
 
 
 
 
 
 
8
 
9
- Drop these files into a new Hugging Face Space (Gradio template):
10
- - app.py
11
- - requirements.txt
12
 
13
- The app uses `openai/whisper-small` via Hugging Face Transformers pipeline for CPU-friendly offline transcription.
14
 
15
  ## Usage
16
  - Click the microphone recorder to record or upload an audio file.
17
- - Click **Transcribe** to get the text.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  emoji: πŸš€
5
  colorFrom: red
6
  ---
7
+ ---
8
+ license: apache-2.0
9
+ sdk: gradio
10
+ emoji: πŸš€
11
+ colorFrom: red
12
+ ---
13
+ # Whisper-Small Speech-to-English (Gradio)
14
 
15
+ Drop these files into a Hugging Face Space (Gradio template):
16
+ - `app.py`
17
+ - `requirements.txt`
18
 
19
+ This app uses `openai/whisper-small` in translate mode to convert spoken audio into English text (Whisper's `translate` task). The model runs CPU-only by default and is suitable for small/medium audio files.
20
 
21
  ## Usage
22
  - Click the microphone recorder to record or upload an audio file.
23
+ - Click **Transcribe** to get English text output (the app translates input speech into English).
24
+
25
+ ## Debug
26
+ Set `DEBUG = True` in `app.py` to enable logging and save resampled WAVs (written to your system temp directory) for inspection.
27
+
28
+ ## Run locally
29
+ ```powershell
30
+ # Windows PowerShell
31
+ python -m venv venv_hf
32
+ venv_hf\Scripts\Activate.ps1
33
+ pip install -r requirements.txt
34
+ python app.py
35
+ ```
36
+
37
+ Open the Gradio URL shown in the console (usually http://0.0.0.0:7860).
38
+
39
+ ## Notes
40
+ - The `openai/whisper-small` model runs on CPU and may take time for longer files.
41
+ - For other target languages or lower latency consider using the Hugging Face Inference API or a separate text translation pipeline.
app.py CHANGED
@@ -14,7 +14,14 @@ from transformers import pipeline
14
  # The model "openai/whisper-small" is public and works on CPU (smaller memory footprint).
15
  # Loading may take a few seconds at startup.
16
  ASR_MODEL = "openai/whisper-small"
17
- asr = pipeline("automatic-speech-recognition", model=ASR_MODEL, chunk_length_s=30, ignore_warning=True)
 
 
 
 
 
 
 
18
 
19
  # Debug flag: set True to print audio shapes/dtypes and save resampled temp WAVs
20
  DEBUG = False
@@ -183,12 +190,13 @@ def transcribe(audio):
183
  def clear_audio():
184
  return None, ""
185
 
186
- with gr.Blocks(title="Whisper Tiny Speech-to-Text (Free on HF Spaces)") as demo:
 
187
  gr.Markdown(
188
  """
189
- # πŸŽ™οΈ Whisper-Small Speech-to-Text
190
- Record or upload audio and click **Transcribe**.
191
- Uses the `openai/whisper-small` model (runs CPU-only).
192
  """
193
  )
194
 
@@ -227,14 +235,14 @@ with gr.Blocks(title="Whisper Tiny Speech-to-Text (Free on HF Spaces)") as demo:
227
 
228
  # Copy transcript to clipboard (Gradio has `copy` action for buttons)
229
  copy_btn.click(
230
- fn=lambda txt: txt,
231
- inputs=transcript,
232
- outputs=None
233
  )
234
 
235
  gr.Markdown(
236
- "Notes: Small model runs on CPU but will still take a bit of time for longer files. "
237
- "If you need translation to English or better latency, consider smaller models or the HF Inference API."
238
  )
239
 
240
  if __name__ == "__main__":
 
14
  # The model "openai/whisper-small" is public and works on CPU (smaller memory footprint).
15
  # Loading may take a few seconds at startup.
16
  ASR_MODEL = "openai/whisper-small"
17
+ # Use Whisper's translate task so output is English regardless of input language
18
+ asr = pipeline(
19
+ "automatic-speech-recognition",
20
+ model=ASR_MODEL,
21
+ chunk_length_s=30,
22
+ ignore_warning=True,
23
+ generate_kwargs={"task": "translate"},
24
+ )
25
 
26
  # Debug flag: set True to print audio shapes/dtypes and save resampled temp WAVs
27
  DEBUG = False
 
190
  def clear_audio():
191
  return None, ""
192
 
193
+
194
+ with gr.Blocks(title="Whisper-Small Speech-to-English") as demo:
195
  gr.Markdown(
196
  """
197
+ # πŸŽ™οΈ Whisper-Small Speech-to-English
198
+ Record or upload audio and click **Transcribe**.
199
+ This app uses `openai/whisper-small` in translate mode and returns English text.
200
  """
201
  )
202
 
 
235
 
236
  # Copy transcript to clipboard (Gradio has `copy` action for buttons)
237
  copy_btn.click(
238
+ fn=lambda txt: txt,
239
+ inputs=transcript,
240
+ outputs=None,
241
  )
242
 
243
  gr.Markdown(
244
+ "Notes: The app translates spoken audio to English using Whisper (translate task). "
245
+ "Small model runs on CPU and may take time for longer files. For lower latency or other target languages, consider the HF Inference API or additional translation pipelines."
246
  )
247
 
248
  if __name__ == "__main__":