thecodeworm commited on
Commit
0c8e1ec
Β·
verified Β·
1 Parent(s): d7f6937

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +290 -1
README.md CHANGED
@@ -10,4 +10,293 @@ pipeline_tag: audio-classification
10
  tags:
11
  - python
12
  - pytorch
13
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  tags:
11
  - python
12
  - pytorch
13
+ ---
14
+ # πŸŽ™οΈ ClearSpeech
15
+
16
+ **AI-Powered Speech Enhancement & Transcription System**
17
+
18
+ ClearSpeech uses a custom U-Net deep learning model to remove background noise from audio, then transcribes the enhanced audio using OpenAI's Whisper. Perfect for cleaning up voice recordings, meeting audio, podcasts, or any noisy speech.
19
+
20
+ **🌐 Live Website (will be updated)**: https://clearspeech.yourdomain.com
21
+
22
+
23
+
24
+ ## 🌟 Features
25
+
26
+ - 🧹 **AI-Powered Noise Reduction**: Custom U-Net model trained to remove background noise
27
+ - πŸ“ **Automatic Transcription**: Whisper integration for accurate speech-to-text
28
+ - ⚑ **Fast Processing**: Optimized pipeline with GPU support
29
+ - 🌐 **REST API**: Easy-to-use FastAPI backend
30
+ - 🎯 **High Quality**: Val loss of 0.031
31
+ - πŸ”§ **Flexible**: Enhancement-only, transcription-only, or both
32
+
33
+
34
+ ## πŸ“‹ Table of Contents
35
+
36
+ - Installation
37
+ - Quick Start
38
+ - API Documentation
39
+ - Project Structure
40
+ - Contributing
41
+
42
+
43
+ ## πŸš€ Installation
44
+
45
+ ### Prerequisites
46
+
47
+ - Python 3.8+
48
+
49
+ - pip
50
+
51
+ - Optional CUDA GPU
52
+ ### Step 1: Clone Repository
53
+ ```
54
+ git clone https://github.com/yourusername/ClearSpeech.git
55
+ cd ClearSpeech
56
+ ```
57
+
58
+ ### Step 2: Create Virtual Environment
59
+ ```
60
+ # Create environment
61
+ python3.10 -m venv venv
62
+ # Activate (macOS/Linux)
63
+ source venv/bin/activate
64
+ # Activate (Windows)
65
+ venv\Scripts\activate
66
+ ```
67
+ ### Step 3: Install Dependencies
68
+ ```
69
+ # Install dependencies
70
+ pip install -r requirements.txt
71
+ ```
72
+ ### Step 4: Download Pretrained Model
73
+ ```
74
+ # Download model
75
+ python -c "
76
+ from huggingface_hub import hf_hub_download
77
+ hf_hub_download(
78
+ repo_id='thecodeworm/clearspeech-unet',
79
+ filename='best_model.pt',
80
+ local_dir='enhancement_model/checkpoints/'
81
+ )
82
+ "
83
+ ```
84
+ ### Step 5: Generate Noisy Samples
85
+ 1. Make your own WAV sample
86
+ 2. Run the generate_noisy_samples.py file on the sample to make the audio noisier to test the model
87
+ ```
88
+ # Generate all noise types at multiple SNR levvels
89
+ python generate_noisy_samples.py \
90
+ --input my_clean_voice.wav \
91
+ --output test_samples/
92
+ ```
93
+
94
+
95
+ ## ⚑ Quick Start
96
+
97
+ ### Method 1: Using the API (Recommended)
98
+
99
+ **Start the server:**
100
+ ```
101
+ python -m backend.app
102
+ ```
103
+ Server starts at `http://localhost:8000`
104
+
105
+ **Start the server:**
106
+ ```
107
+ cd frontend
108
+ python -m http.server 3000
109
+ ```
110
+ Frontend starts at `http://localhost:3000`
111
+
112
+ **Process audio:**
113
+ ```
114
+ # Full pipeline (enhance + transcribe)
115
+ curl -X POST "http://localhost:8000/process" \
116
+ -F "file=@your_audio.wav" \
117
+ | jq .
118
+
119
+ # Enhance only
120
+ curl -X POST "http://localhost:8000/enhance" \
121
+ -F "file=@your_audio.wav" \
122
+ -o enhanced_output.wav
123
+
124
+ # Transcribe only
125
+ curl -X POST "http://localhost:8000/transcribe" \
126
+ -F "file=@your_audio.wav" \
127
+ -F "enhance=true" \
128
+ | jq .
129
+ ```
130
+ **Method 2: Using Python**
131
+ ```
132
+ from backend.inference_pipeline import EnhancementPipeline
133
+
134
+ # Initialize pipeline
135
+ pipeline = EnhancementPipeline(
136
+ cnn_checkpoint_path="enhancement_model/checkpoints/best_model.pt",
137
+ whisper_model_name="base",
138
+ device="cpu" # or "cuda" or "mps"
139
+ )
140
+
141
+ # Process audio
142
+ result = pipeline.process("path/to/noisy_audio.wav")
143
+
144
+ print(f"Transcript: {result['transcript']}")
145
+ print(f"Duration: {result['duration']:.2f}s")
146
+
147
+ # Save enhanced audio
148
+ import soundfile as sf
149
+ sf.write("enhanced.wav", result['enhanced_audio'], result['sample_rate'])
150
+ ```
151
+ **Method 3: Command Line**
152
+ ```
153
+ # Enhance audio file
154
+ python enhancement_model/infer.py \
155
+ --checkpoint enhancement_model/checkpoints/best_model.pt \
156
+ --input noisy_audio.wav \
157
+ --output enhanced_audio.wav \
158
+ --comparison # Creates stereo comparison file
159
+ ```
160
+ ## πŸ“š API Documentation
161
+
162
+ ### Interactive Docs
163
+
164
+ Once the server is running, visit:
165
+
166
+ - **Swagger UI**: [http://localhost:8000/docs](http://localhost:8000/docs)
167
+ - **ReDoc**: [http://localhost:8000/redoc](http://localhost:8000/redoc)
168
+
169
+ ### Endpoints
170
+
171
+ #### `POST /process`
172
+
173
+ Process audio with enhancement and transcription.
174
+
175
+ **Request:**
176
+ ```curl -X POST "http://localhost:8000/process" \
177
+ -F "file=@audio.wav" \
178
+ -F "language=en" \
179
+ -F "skip_enhancement=false"
180
+ ```
181
+ **Response:**
182
+ ```
183
+ {
184
+ "success": true,
185
+ "transcript": "Transcribed text here",
186
+ "duration": 3.5,
187
+ "language": "en",
188
+ "enhanced_audio_url": "/download/enhanced_123.wav",
189
+ "segments": [...],
190
+ "processing_time": 2.3
191
+ }
192
+ ```
193
+ #### `POST /enhance`
194
+
195
+ Enhance audio only (no transcription).
196
+
197
+ **Request:**
198
+ ```
199
+ curl -X POST "http://localhost:8000/enhance" \
200
+ -F "file=@audio.wav" \
201
+ -o enhanced.wav
202
+ ```
203
+ **Response:** Enhanced audio file (WAV)
204
+
205
+ #### `POST /transcribe`
206
+
207
+ Transcribe audio with optional enhancement.
208
+
209
+ **Request:**
210
+ ```
211
+ curl -X POST "http://localhost:8000/transcribe" \
212
+ -F "file=@audio.wav" \
213
+ -F "language=en" \
214
+ -F "enhance=true"
215
+ ```
216
+ **Response:**
217
+ ```
218
+ {
219
+ "success": true,
220
+ "transcript": "Transcribed text",
221
+ "duration": 3.5,
222
+ "language": "en",
223
+ "segments": [...]
224
+ }
225
+ ```
226
+ #### `GET /download/{filename}`
227
+ Download enhanced audio file.
228
+
229
+ #### `GET /health`
230
+ Health check endpoint.
231
+
232
+ ## πŸ“ Project Structure
233
+ ```
234
+ ClearSpeech/
235
+ β”œβ”€β”€ backend/ # FastAPI backend
236
+ β”‚ β”œβ”€β”€ app.py # Main API server
237
+ β”‚ β”œβ”€β”€ inference_pipeline.py # Processing pipeline
238
+ β”‚ └── requirements.txt
239
+ β”œβ”€β”€ enhancement_model/ # U-Net model
240
+ β”‚ β”œβ”€β”€ model.py # U-Net architecture
241
+ β”‚ β”œβ”€β”€ dataset.py # PyTorch dataset
242
+ β”‚ β”œβ”€β”€ train.py # Training script
243
+ β”‚ β”œβ”€β”€ infer.py # Inference script
244
+ β”‚ β”œβ”€β”€ checkpoints/ # Trained models
245
+ β”‚ β”‚ └── best_model.pt
246
+ β”‚ └── requirements.txt
247
+ β”œβ”€β”€ data/ # Training/test data
248
+ β”‚ β”œβ”€β”€ audio_clean/ # Clean audio
249
+ β”‚ β”œβ”€β”€ audio_raw/ # Noisy audio
250
+ β”‚ β”œβ”€β”€ metadata/
251
+ β”‚ β”‚ └── metadata.json # Dataset metadata
252
+ β”‚ └── spectrograms/ # Mel-spectrograms
253
+ β”‚ β”œβ”€β”€ clean/
254
+ β”‚ └── noisy/
255
+ β”œβ”€β”€ frontend/ # Web interface (optional)
256
+ β”‚ β”œβ”€β”€ index.html
257
+ β”‚ └── script.js
258
+ β”œβ”€β”€ tests/ # Test files
259
+ β”‚ └── test_backend.py
260
+ β”œβ”€β”€ README.md
261
+ └── requirements.txt
262
+ ```
263
+ ## 🀝 Contributing
264
+
265
+ We welcome contributions! Here's how:
266
+
267
+ 1. **Fork the repository**
268
+ 2. **Create a feature branch**: `git checkout -b feature/amazing-feature`
269
+ 3. **Commit changes**: `git commit -m 'Add amazing feature'`
270
+ 4. **Push to branch**: `git push origin feature/amazing-feature`
271
+ 5. **Open a Pull Request**
272
+
273
+ **Development Setup**
274
+ ```
275
+ # Install dev dependencies
276
+ pip install -r requirements-dev.txt
277
+
278
+ # Run tests before committing
279
+ python -m pytest tests/
280
+
281
+ # Format code
282
+ black backend/ enhancement_model/
283
+ ```
284
+ ## πŸ™ Acknowledgments
285
+
286
+ - **U-Net Architecture**: Inspired by [Ronneberger et al.](https://arxiv.org/abs/1505.04597)
287
+ - **Whisper**: [OpenAI Whisper](https://github.com/openai/whisper)
288
+ - **Training Data**: [LibriSpeech](http://www.openslr.org/12/), [MS-SNSD](https://github.com/microsoft/MS-SNSD)
289
+
290
+ ## πŸ“§ Contact
291
+ **Project Maintainers**: Aditya Chanda, Josh Pal, Advik Kumar Singh
292
+
293
+ **Project Link**: [https://github.com/thecodeworm/ClearSpeech](https://github.com/thecodeworm/ClearSpeech)
294
+
295
+ ## ⭐ Show Your Support
296
+
297
+ Give a ⭐️ if this project helped you!
298
+
299
+ ----------
300
+
301
+ **Built with ❀️ using PyTorch and FastAPI**
302
+