bniladridas
commited on
Commit
·
cf00f16
1
Parent(s):
6369f85
Update README with rebranding and cleanup
Browse files
README.md
CHANGED
|
@@ -22,12 +22,10 @@ tags:
|
|
| 22 |
|
| 23 |
# Speech Recognition AI: Fine-Tuned Whisper and Wav2Vec2 for Real-Time Audio
|
| 24 |
|
| 25 |
-

|
| 26 |
-
|
| 27 |
This project fine-tunes OpenAI's Whisper (`whisper-small`) and Facebook's Wav2Vec2 (`wav2vec2-base-960h`) models for real-time speech recognition using live audio recordings. It’s designed for dynamic environments where low-latency transcription is key, such as live conversations or streaming audio.
|
| 28 |
|
| 29 |
## Model Description
|
| 30 |
-
|
| 31 |
|
| 32 |
## Features
|
| 33 |
- **Real-time audio recording**: Captures live 16kHz mono audio via microphone input.
|
|
@@ -36,21 +34,6 @@ This is a fine-tuned version of [OpenAI's Whisper small model](https://huggingfa
|
|
| 36 |
- **Model saving/loading**: Automatically saves fine-tuned models with timestamps.
|
| 37 |
- **Dual model support**: Choose between Whisper and Wav2Vec2 architectures.
|
| 38 |
|
| 39 |
-
*Note*: Currently supports English-only transcription.
|
| 40 |
-
|
| 41 |
-
## Installation
|
| 42 |
-
Clone the repository and install the dependencies:
|
| 43 |
-
```bash
|
| 44 |
-
git clone https://github.com/bniladridas/speech-model.git
|
| 45 |
-
cd speech-model
|
| 46 |
-
pip install -r requirements.txt
|
| 47 |
-
```
|
| 48 |
-
|
| 49 |
-
Optional: Install system dependencies for Sounddevice (e.g., libsoundio on Linux):
|
| 50 |
-
```bash
|
| 51 |
-
sudo apt-get install libsndfile1
|
| 52 |
-
```
|
| 53 |
-
|
| 54 |
## Usage
|
| 55 |
|
| 56 |
### Start Fine-Tuning
|
|
@@ -112,8 +95,8 @@ A GPU is recommended for faster fine-tuning. See `requirements.txt` for the full
|
|
| 112 |
To load the models from Hugging Face:
|
| 113 |
```python
|
| 114 |
from transformers import WhisperForConditionalGeneration, WhisperProcessor
|
| 115 |
-
model = WhisperForConditionalGeneration.from_pretrained("
|
| 116 |
-
processor = WhisperProcessor.from_pretrained("
|
| 117 |
```
|
| 118 |
|
| 119 |
## Repository Structure
|
|
@@ -132,7 +115,7 @@ speech-model/
|
|
| 132 |
The models are fine-tuned on live audio recordings collected during runtime. No pre-existing dataset is required—users generate their own data via microphone input.
|
| 133 |
|
| 134 |
## Evaluation Results
|
| 135 |
-
|
| 136 |
|
| 137 |
## License
|
| 138 |
-
Licensed under the MIT License.
|
|
|
|
| 22 |
|
| 23 |
# Speech Recognition AI: Fine-Tuned Whisper and Wav2Vec2 for Real-Time Audio
|
| 24 |
|
|
|
|
|
|
|
| 25 |
This project fine-tunes OpenAI's Whisper (`whisper-small`) and Facebook's Wav2Vec2 (`wav2vec2-base-960h`) models for real-time speech recognition using live audio recordings. It’s designed for dynamic environments where low-latency transcription is key, such as live conversations or streaming audio.
|
| 26 |
|
| 27 |
## Model Description
|
| 28 |
+
Fine-tuned Whisper and Wav2Vec2 models for real-time speech recognition on live audio.
|
| 29 |
|
| 30 |
## Features
|
| 31 |
- **Real-time audio recording**: Captures live 16kHz mono audio via microphone input.
|
|
|
|
| 34 |
- **Model saving/loading**: Automatically saves fine-tuned models with timestamps.
|
| 35 |
- **Dual model support**: Choose between Whisper and Wav2Vec2 architectures.
|
| 36 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 37 |
## Usage
|
| 38 |
|
| 39 |
### Start Fine-Tuning
|
|
|
|
| 95 |
To load the models from Hugging Face:
|
| 96 |
```python
|
| 97 |
from transformers import WhisperForConditionalGeneration, WhisperProcessor
|
| 98 |
+
model = WhisperForConditionalGeneration.from_pretrained("harpertoken/harpertokenASR")
|
| 99 |
+
processor = WhisperProcessor.from_pretrained("harpertoken/harpertokenASR")
|
| 100 |
```
|
| 101 |
|
| 102 |
## Repository Structure
|
|
|
|
| 115 |
The models are fine-tuned on live audio recordings collected during runtime. No pre-existing dataset is required—users generate their own data via microphone input.
|
| 116 |
|
| 117 |
## Evaluation Results
|
| 118 |
+
Future updates will include WER (Word Error Rate) metrics compared to base models.
|
| 119 |
|
| 120 |
## License
|
| 121 |
+
Licensed under the MIT License.
|