bniladridas commited on
Commit
cf00f16
·
1 Parent(s): 6369f85

Update README with rebranding and cleanup

Browse files
Files changed (1) hide show
  1. README.md +5 -22
README.md CHANGED
@@ -22,12 +22,10 @@ tags:
22
 
23
  # Speech Recognition AI: Fine-Tuned Whisper and Wav2Vec2 for Real-Time Audio
24
 
25
- ![Hugging Face](https://huggingface.co/front/assets/huggingface_logo-noborder.svg)
26
-
27
  This project fine-tunes OpenAI's Whisper (`whisper-small`) and Facebook's Wav2Vec2 (`wav2vec2-base-960h`) models for real-time speech recognition using live audio recordings. It’s designed for dynamic environments where low-latency transcription is key, such as live conversations or streaming audio.
28
 
29
  ## Model Description
30
- This is a fine-tuned version of [OpenAI's Whisper small model](https://huggingface.co/openai/whisper-small) and [Facebook's Wav2Vec2 base model](https://huggingface.co/facebook/wav2vec2-base-960h), optimized for real-time speech-to-text transcription. The models were trained on live 16kHz mono audio recordings, improving transcription accuracy over their base versions for continuous input scenarios.
31
 
32
  ## Features
33
  - **Real-time audio recording**: Captures live 16kHz mono audio via microphone input.
@@ -36,21 +34,6 @@ This is a fine-tuned version of [OpenAI's Whisper small model](https://huggingfa
36
  - **Model saving/loading**: Automatically saves fine-tuned models with timestamps.
37
  - **Dual model support**: Choose between Whisper and Wav2Vec2 architectures.
38
 
39
- *Note*: Currently supports English-only transcription.
40
-
41
- ## Installation
42
- Clone the repository and install the dependencies:
43
- ```bash
44
- git clone https://github.com/bniladridas/speech-model.git
45
- cd speech-model
46
- pip install -r requirements.txt
47
- ```
48
-
49
- Optional: Install system dependencies for Sounddevice (e.g., libsoundio on Linux):
50
- ```bash
51
- sudo apt-get install libsndfile1
52
- ```
53
-
54
  ## Usage
55
 
56
  ### Start Fine-Tuning
@@ -112,8 +95,8 @@ A GPU is recommended for faster fine-tuning. See `requirements.txt` for the full
112
  To load the models from Hugging Face:
113
  ```python
114
  from transformers import WhisperForConditionalGeneration, WhisperProcessor
115
- model = WhisperForConditionalGeneration.from_pretrained("bniladridas/speech-recognition-ai-fine-tune")
116
- processor = WhisperProcessor.from_pretrained("bniladridas/speech-recognition-ai-fine-tune")
117
  ```
118
 
119
  ## Repository Structure
@@ -132,7 +115,7 @@ speech-model/
132
  The models are fine-tuned on live audio recordings collected during runtime. No pre-existing dataset is required—users generate their own data via microphone input.
133
 
134
  ## Evaluation Results
135
- Placeholder: Future updates will include WER (Word Error Rate) metrics compared to base models.
136
 
137
  ## License
138
- Licensed under the MIT License. See the LICENSE file for details.
 
22
 
23
  # Speech Recognition AI: Fine-Tuned Whisper and Wav2Vec2 for Real-Time Audio
24
 
 
 
25
  This project fine-tunes OpenAI's Whisper (`whisper-small`) and Facebook's Wav2Vec2 (`wav2vec2-base-960h`) models for real-time speech recognition using live audio recordings. It’s designed for dynamic environments where low-latency transcription is key, such as live conversations or streaming audio.
26
 
27
  ## Model Description
28
+ Fine-tuned Whisper and Wav2Vec2 models for real-time speech recognition on live audio.
29
 
30
  ## Features
31
  - **Real-time audio recording**: Captures live 16kHz mono audio via microphone input.
 
34
  - **Model saving/loading**: Automatically saves fine-tuned models with timestamps.
35
  - **Dual model support**: Choose between Whisper and Wav2Vec2 architectures.
36
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37
  ## Usage
38
 
39
  ### Start Fine-Tuning
 
95
  To load the models from Hugging Face:
96
  ```python
97
  from transformers import WhisperForConditionalGeneration, WhisperProcessor
98
+ model = WhisperForConditionalGeneration.from_pretrained("harpertoken/harpertokenASR")
99
+ processor = WhisperProcessor.from_pretrained("harpertoken/harpertokenASR")
100
  ```
101
 
102
  ## Repository Structure
 
115
  The models are fine-tuned on live audio recordings collected during runtime. No pre-existing dataset is required—users generate their own data via microphone input.
116
 
117
  ## Evaluation Results
118
+ Future updates will include WER (Word Error Rate) metrics compared to base models.
119
 
120
  ## License
121
+ Licensed under the MIT License.