thecodeworm commited on
Commit
4756c56
Β·
verified Β·
1 Parent(s): fa9567d

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +289 -11
README.md CHANGED
@@ -1,11 +1,289 @@
1
- ---
2
- title: Clearspeechapi
3
- emoji: 🐨
4
- colorFrom: green
5
- colorTo: blue
6
- sdk: docker
7
- pinned: false
8
- license: mit
9
- ---
10
-
11
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # πŸŽ™οΈ ClearSpeech
2
+
3
+ **AI-Powered Speech Enhancement & Transcription System**
4
+
5
+ ClearSpeech uses a custom U-Net deep learning model to remove background noise from audio, then transcribes the enhanced audio using OpenAI's Whisper. Perfect for cleaning up voice recordings, meeting audio, podcasts, or any noisy speech.
6
+
7
+ **🌐 Live Website (will be updated)**: https://clearspeech.yourdomain.com
8
+
9
+
10
+
11
+ ## 🌟 Features
12
+
13
+ - 🧹 **AI-Powered Noise Reduction**: Custom U-Net model trained to remove background noise
14
+ - πŸ“ **Automatic Transcription**: Whisper integration for accurate speech-to-text
15
+ - ⚑ **Fast Processing**: Optimized pipeline with GPU support
16
+ - 🌐 **REST API**: Easy-to-use FastAPI backend
17
+ - 🎯 **High Quality**: Val loss of 0.031
18
+ - πŸ”§ **Flexible**: Enhancement-only, transcription-only, or both
19
+
20
+
21
+ ## πŸ“‹ Table of Contents
22
+
23
+ - Installation
24
+ - Quick Start
25
+ - API Documentation
26
+ - Project Structure
27
+ - Contributing
28
+
29
+
30
+ ## πŸš€ Installation
31
+
32
+ ### Prerequisites
33
+
34
+ - Python 3.8+
35
+
36
+ - pip
37
+
38
+ - Optional CUDA GPU
39
+ ### Step 1: Clone Repository
40
+ ```
41
+ git clone https://github.com/yourusername/ClearSpeech.git
42
+ cd ClearSpeech
43
+ ```
44
+
45
+ ### Step 2: Create Virtual Environment
46
+ ```
47
+ # Create environment
48
+ python3.10 -m venv venv
49
+ # Activate (macOS/Linux)
50
+ source venv/bin/activate
51
+ # Activate (Windows)
52
+ venv\Scripts\activate
53
+ ```
54
+ ### Step 3: Install Dependencies
55
+ ```
56
+ # Install dependencies
57
+ pip install -r requirements.txt
58
+ ```
59
+ ### Step 4: Download Pretrained Model
60
+ ```
61
+ # Download model
62
+ python -c "
63
+ from huggingface_hub import hf_hub_download
64
+ hf_hub_download(
65
+ repo_id='thecodeworm/clearspeech-unet',
66
+ filename='best_model.pt',
67
+ local_dir='enhancement_model/checkpoints/'
68
+ )
69
+ "
70
+ ```
71
+ ### Step 5: Generate Noisy Samples
72
+ 1. Make your own WAV sample
73
+ 2. Run the generate_noisy_samples.py file on the sample to make the audio noisier to test the model
74
+ ```
75
+ # Generate all noise types at multiple SNR levvels
76
+ python generate_noisy_samples.py \
77
+ --input my_clean_voice.wav \
78
+ --output test_samples/
79
+ ```
80
+
81
+
82
+ ## ⚑ Quick Start
83
+
84
+ ### Method 1: Using the API (Recommended)
85
+
86
+ **Start the server:**
87
+ ```
88
+ python -m backend.app
89
+ ```
90
+ Server starts at `http://localhost:8000`
91
+
92
+ **Start the server:**
93
+ ```
94
+ cd frontend
95
+ python -m http.server 3000
96
+ ```
97
+ Frontend starts at `http://localhost:3000`
98
+
99
+ **Process audio:**
100
+ ```
101
+ # Full pipeline (enhance + transcribe)
102
+ curl -X POST "http://localhost:8000/process" \
103
+ -F "file=@your_audio.wav" \
104
+ | jq .
105
+
106
+ # Enhance only
107
+ curl -X POST "http://localhost:8000/enhance" \
108
+ -F "file=@your_audio.wav" \
109
+ -o enhanced_output.wav
110
+
111
+ # Transcribe only
112
+ curl -X POST "http://localhost:8000/transcribe" \
113
+ -F "file=@your_audio.wav" \
114
+ -F "enhance=true" \
115
+ | jq .
116
+ ```
117
+ **Method 2: Using Python**
118
+ ```
119
+ from backend.inference_pipeline import EnhancementPipeline
120
+
121
+ # Initialize pipeline
122
+ pipeline = EnhancementPipeline(
123
+ cnn_checkpoint_path="enhancement_model/checkpoints/best_model.pt",
124
+ whisper_model_name="base",
125
+ device="cpu" # or "cuda" or "mps"
126
+ )
127
+
128
+ # Process audio
129
+ result = pipeline.process("path/to/noisy_audio.wav")
130
+
131
+ print(f"Transcript: {result['transcript']}")
132
+ print(f"Duration: {result['duration']:.2f}s")
133
+
134
+ # Save enhanced audio
135
+ import soundfile as sf
136
+ sf.write("enhanced.wav", result['enhanced_audio'], result['sample_rate'])
137
+ ```
138
+ **Method 3: Command Line**
139
+ ```
140
+ # Enhance audio file
141
+ python enhancement_model/infer.py \
142
+ --checkpoint enhancement_model/checkpoints/best_model.pt \
143
+ --input noisy_audio.wav \
144
+ --output enhanced_audio.wav \
145
+ --comparison # Creates stereo comparison file
146
+ ```
147
+ ## πŸ“š API Documentation
148
+
149
+ ### Interactive Docs
150
+
151
+ Once the server is running, visit:
152
+
153
+ - **Swagger UI**: [http://localhost:8000/docs](http://localhost:8000/docs)
154
+ - **ReDoc**: [http://localhost:8000/redoc](http://localhost:8000/redoc)
155
+
156
+ ### Endpoints
157
+
158
+ #### `POST /process`
159
+
160
+ Process audio with enhancement and transcription.
161
+
162
+ **Request:**
163
+ ```curl -X POST "http://localhost:8000/process" \
164
+ -F "file=@audio.wav" \
165
+ -F "language=en" \
166
+ -F "skip_enhancement=false"
167
+ ```
168
+ **Response:**
169
+ ```
170
+ {
171
+ "success": true,
172
+ "transcript": "Transcribed text here",
173
+ "duration": 3.5,
174
+ "language": "en",
175
+ "enhanced_audio_url": "/download/enhanced_123.wav",
176
+ "segments": [...],
177
+ "processing_time": 2.3
178
+ }
179
+ ```
180
+ #### `POST /enhance`
181
+
182
+ Enhance audio only (no transcription).
183
+
184
+ **Request:**
185
+ ```
186
+ curl -X POST "http://localhost:8000/enhance" \
187
+ -F "file=@audio.wav" \
188
+ -o enhanced.wav
189
+ ```
190
+ **Response:** Enhanced audio file (WAV)
191
+
192
+ #### `POST /transcribe`
193
+
194
+ Transcribe audio with optional enhancement.
195
+
196
+ **Request:**
197
+ ```
198
+ curl -X POST "http://localhost:8000/transcribe" \
199
+ -F "file=@audio.wav" \
200
+ -F "language=en" \
201
+ -F "enhance=true"
202
+ ```
203
+ **Response:**
204
+ ```
205
+ {
206
+ "success": true,
207
+ "transcript": "Transcribed text",
208
+ "duration": 3.5,
209
+ "language": "en",
210
+ "segments": [...]
211
+ }
212
+ ```
213
+ #### `GET /download/{filename}`
214
+ Download enhanced audio file.
215
+
216
+ #### `GET /health`
217
+ Health check endpoint.
218
+
219
+ ## πŸ“ Project Structure
220
+ ```
221
+ ClearSpeech/
222
+ β”œβ”€β”€ backend/ # FastAPI backend
223
+ β”‚ β”œβ”€β”€ app.py # Main API server
224
+ β”‚ β”œβ”€β”€ inference_pipeline.py # Processing pipeline
225
+ β”‚ └── requirements.txt
226
+ β”œβ”€β”€ enhancement_model/ # U-Net model
227
+ β”‚ β”œβ”€β”€ model.py # U-Net architecture
228
+ β”‚ β”œβ”€β”€ dataset.py # PyTorch dataset
229
+ β”‚ β”œβ”€β”€ train.py # Training script
230
+ β”‚ β”œβ”€β”€ infer.py # Inference script
231
+ β”‚ β”œβ”€β”€ checkpoints/ # Trained models
232
+ β”‚ β”‚ └── best_model.pt
233
+ β”‚ └── requirements.txt
234
+ β”œβ”€β”€ data/ # Training/test data
235
+ β”‚ β”œβ”€β”€ audio_clean/ # Clean audio
236
+ β”‚ β”œβ”€β”€ audio_raw/ # Noisy audio
237
+ β”‚ β”œβ”€β”€ metadata/
238
+ β”‚ β”‚ └── metadata.json # Dataset metadata
239
+ β”‚ └── spectrograms/ # Mel-spectrograms
240
+ β”‚ β”œβ”€β”€ clean/
241
+ β”‚ └── noisy/
242
+ β”œβ”€β”€ frontend/ # Web interface (optional)
243
+ β”‚ β”œβ”€β”€ index.html
244
+ β”‚ └── script.js
245
+ β”œβ”€β”€ tests/ # Test files
246
+ β”‚ └── test_backend.py
247
+ β”œβ”€β”€ README.md
248
+ └── requirements.txt
249
+ ```
250
+ ## 🀝 Contributing
251
+
252
+ We welcome contributions! Here's how:
253
+
254
+ 1. **Fork the repository**
255
+ 2. **Create a feature branch**: `git checkout -b feature/amazing-feature`
256
+ 3. **Commit changes**: `git commit -m 'Add amazing feature'`
257
+ 4. **Push to branch**: `git push origin feature/amazing-feature`
258
+ 5. **Open a Pull Request**
259
+
260
+ **Development Setup**
261
+ ```
262
+ # Install dev dependencies
263
+ pip install -r requirements-dev.txt
264
+
265
+ # Run tests before committing
266
+ python -m pytest tests/
267
+
268
+ # Format code
269
+ black backend/ enhancement_model/
270
+ ```
271
+ ## πŸ™ Acknowledgments
272
+
273
+ - **U-Net Architecture**: Inspired by [Ronneberger et al.](https://arxiv.org/abs/1505.04597)
274
+ - **Whisper**: [OpenAI Whisper](https://github.com/openai/whisper)
275
+ - **Training Data**: [LibriSpeech](http://www.openslr.org/12/), [MS-SNSD](https://github.com/microsoft/MS-SNSD)
276
+
277
+ ## πŸ“§ Contact
278
+ **Project Maintainers**: Aditya Chanda, Josh Pal, Advik Kumar Singh
279
+
280
+ **Project Link**: [https://github.com/thecodeworm/ClearSpeech](https://github.com/thecodeworm/ClearSpeech)
281
+
282
+ ## ⭐ Show Your Support
283
+
284
+ Give a ⭐️ if this project helped you!
285
+
286
+ ----------
287
+
288
+ **Built with ❀️ using PyTorch and FastAPI**
289
+