Spaces:

Shanuka01
/

Stt_Test_01

Sleeping

App Files Files Community

Shanuka01 commited on Oct 25, 2023

Commit

95e0c29

1 Parent(s): 3693f95

Upload 6 files

Browse files

Files changed (6) hide show

Dockerfile +16 -0
Pipfile +4 -0
Pipfile.lock +0 -0
README.md +84 -0
app.py +44 -0
requirements.txt +80 -0

Dockerfile ADDED Viewed

	@@ -0,0 +1,16 @@

+FROM python:3.8-slim-buster
+RUN apt update && apt install -y ffmpeg git
+WORKDIR /app
+COPY requirements.txt requirements.txt
+RUN pip3 install -r requirements.txt
+EXPOSE 8000 8000
+ENV GRADIO_SERVER_PORT 8000
+COPY . .
+CMD [ "python3", "app.py"]

Pipfile ADDED Viewed

	@@ -0,0 +1,4 @@

+[packages]
+whisper = {git = "https://github.com/openai/whisper.git"}
+gradio = "*"
+ffmpeg-python = "*"

Pipfile.lock ADDED Viewed

The diff for this file is too large to render. See raw diff

README.md ADDED Viewed

	@@ -0,0 +1,84 @@

+# Whisper OpenAi Tool Gradio Web implementation
+Whisper is an automatic speech recognition (ASR) system Gradio Web UI Implementation
+## Installation
+Install ffmeg on Your Device
+```bash
+  # on Ubuntu or Debian
+  sudo apt update
+  sudo apt install ffmpeg
+  # on MacOS using Homebrew (https://brew.sh/)
+  brew install ffmpeg
+  # on Windows using Chocolatey (https://chocolatey.org/)
+  choco install ffmpeg
+  # on Windows using Scoop (https://scoop.sh/)
+  scoop install ffmpeg
+```
+Download Program
+```bash
+  mkdir whisper-sppech2txt
+  cd whisper-sppech2txt
+  git clone https://github.com/innovatorved/whisper-openai-gradio-implementation.git .
+  pip install -r requirements.txt
+```
+Run Program
+```bash
+  python app.py
+```
+## Available models and languages ([Credit](https://github.com/innovatorved/whisper-openai-gradio-implementation/blob/main/README.md))
+There are five model sizes, four with English-only versions, offering speed and accuracy tradeoffs. Below are the names of the available models and their approximate memory requirements and relative speed.
+|  Size  | Parameters | English-only model | Multilingual model | Required VRAM | Relative speed |
+|:------:|:----------:|:------------------:|:------------------:|:-------------:|:--------------:|
+|  tiny  |    39 M    |     `tiny.en`      |       `tiny`       |     ~1 GB     |      ~32x      |
+|  base  |    74 M    |     `base.en`      |       `base`       |     ~1 GB     |      ~16x      |
+| small  |   244 M    |     `small.en`     |      `small`       |     ~2 GB     |      ~6x       |
+| medium |   769 M    |    `medium.en`     |      `medium`      |     ~5 GB     |      ~2x       |
+| large  |   1550 M   |        N/A         |      `large`       |    ~10 GB     |       1x       |
+For English-only applications, the `.en` models tend to perform better, especially for the `tiny.en` and `base.en` models. We observed that the difference becomes less significant for the `small.en` and `medium.en` models.
+## Screenshots
+![Screenshort](https://raw.githubusercontent.com/innovatorved/whisper-openai-gradio-implementation/main/img/screenshort.png)
+## License
+[MIT](https://choosealicense.com/licenses/mit/)
+## Reference
+- [https://github.com/openai/whisper](https://github.com/openai/whisper)
+- [https://openai.com/blog/whisper/](https://openai.com/blog/whisper/)
+## Authors
+- [Ved Gupta](https://www.github.com/innovatorved)
+## 🚀 About Me
+I'm a Developer i will feel the code then write .
+## Support
+For support, email vedgupta@protonmail.com

app.py ADDED Viewed

	@@ -0,0 +1,44 @@

+import whisper
+# You can choose your model from - see it on readme file and update the modelname
+modelname = "base"
+model = whisper.load_model(modelname)
+import gradio as gr
+import time
+def SpeechToText(audio):
+    if audio == None : return ""
+    time.sleep(1)
+    audio = whisper.load_audio(audio)
+    audio = whisper.pad_or_trim(audio)
+    # make log-Mel spectrogram and move to the same device as the model
+    mel = whisper.log_mel_spectrogram(audio).to(model.device)
+    # Detect the Max probability of language ?
+    _, probs = model.detect_language(mel)
+    language = max(probs, key=probs.get)
+    #  Decode audio to Text
+    options = whisper.DecodingOptions(fp16 = False)
+    result = whisper.decode(model, mel, options)
+    return (language , result.text)
+print("Starting the Gradio Web UI")
+gr.Interface(
+    title = 'OpenAI Whisper implementation on Gradio Web UI',
+    fn=SpeechToText,
+    inputs=[
+        gr.Audio(source="microphone", type="filepath")
+    ],
+    outputs=[
+        "label",
+        "textbox",
+    ],
+    live=True
+).launch(
+    debug=False,
+)

requirements.txt ADDED Viewed

	@@ -0,0 +1,80 @@

+aiofiles==23.1.0
+aiohttp==3.8.4
+aiosignal==1.3.1
+altair==5.0.1
+anyio==3.7.0
+async-timeout==4.0.2
+attrs==23.1.0
+bcrypt==4.0.0
+certifi==2023.5.7
+cffi==1.15.1
+charset-normalizer==3.1.0
+click==8.1.3
+contourpy==1.0.7
+cryptography==38.0.1
+cycler==0.11.0
+exceptiongroup==1.1.1
+fastapi==0.95.2
+ffmpeg-python==0.2.0
+ffmpy==0.3.0
+filelock==3.12.0
+fonttools==4.39.4
+frozenlist==1.3.3
+fsspec==2023.5.0
+future==0.18.2
+gradio==3.32.0
+gradio_client==0.2.5
+h11==0.14.0
+httpcore==0.17.2
+httpx==0.24.1
+huggingface-hub==0.14.1
+idna==3.4
+Jinja2==3.1.2
+jsonschema==4.17.3
+kiwisolver==1.4.4
+linkify-it-py==2.0.2
+markdown-it-py==2.2.0
+MarkupSafe==2.1.2
+matplotlib==3.7.1
+mdit-py-plugins==0.3.3
+mdurl==0.1.2
+more-itertools==8.14.0
+multidict==6.0.4
+numpy==1.24.3
+orjson==3.8.14
+packaging==23.1
+pandas==2.0.1
+paramiko==2.11.0
+Pillow==9.5.0
+pycparser==2.21
+pycryptodome==3.15.0
+pydantic==1.10.8
+pydub==0.25.1
+Pygments==2.15.1
+PyNaCl==1.5.0
+pyparsing==3.0.9
+pyrsistent==0.19.3
+python-dateutil==2.8.2
+python-multipart==0.0.6
+pytz==2023.3
+PyYAML==6.0
+regex==2022.9.13
+requests==2.31.0
+rfc3986==1.5.0
+semantic-version==2.10.0
+six==1.16.0
+sniffio==1.3.0
+starlette==0.27.0
+tokenizers==0.12.1
+toolz==0.12.0
+torch==1.12.1
+tqdm==4.65.0
+transformers==4.22.2
+typing_extensions==4.6.2
+tzdata==2023.3
+uc-micro-py==1.0.2
+urllib3==2.0.2
+uvicorn==0.22.0
+websockets==11.0.3
+whisper @ git+https://github.com/openai/whisper.git@0b1ba3d46ebf7fe6f953acfd8cad62a4f851b49f
+yarl==1.9.2