sidchak commited on
Commit
9d4f0e8
Β·
0 Parent(s):

Setup ByteTrack integration (if applicable): README.md

Browse files
Files changed (1) hide show
  1. README.md +246 -0
README.md ADDED
@@ -0,0 +1,246 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Clean Speak
2
+
3
+ A robust multimodal system for detecting and rephrasing profanity in both speech and text, leveraging advanced NLP models to ensure accurate filtering while preserving conversational context.
4
+
5
+ ![Profanity Detection System](https://img.shields.io/badge/AI-NLP%20System-blue)
6
+ ![Python](https://img.shields.io/badge/Python-3.10%2B-green)
7
+ ![Transformers](https://img.shields.io/badge/HuggingFace-Transformers-yellow)
8
+
9
+ ## 🌐 Live Demo
10
+
11
+ Try the system without installation via our Hugging Face Spaces deployment:
12
+
13
+ [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/sidchak/cleanspeak)
14
+
15
+ This live version leverages Hugging Face's ZeroGPU technology, which provides on-demand GPU acceleration for inference while optimising resource usage.
16
+
17
+ ## πŸ“‹ Features
18
+
19
+ - **Multimodal Analysis**: Process both written text and spoken audio
20
+ - **Context-Aware Detection**: Goes beyond simple keyword matching
21
+ - **Automatic Content Refinement**: Intelligently rephrases content while preserving meaning
22
+ - **Audio Synthesis**: Converts rephrased content into high-quality spoken audio
23
+ - **Classification System**: Categorises content by toxicity levels
24
+ - **User-Friendly Interface**: Intuitive Gradio-based UI
25
+ - **Real-time Streaming**: Process audio in real-time as you speak
26
+ - **Adjustable Sensitivity**: Fine-tune profanity detection threshold
27
+ - **Visual Highlighting**: Instantly identify problematic words with visual highlighting
28
+ - **Toxicity Classification**: Automatically categorize content from "No Toxicity" to "Severe Toxicity"
29
+ - **Performance Optimization**: Half-precision support for improved GPU memory efficiency
30
+ - **Cloud Deployment**: Available as a hosted service on Hugging Face Spaces
31
+
32
+ ## 🧠 Models Used
33
+
34
+ The system leverages four powerful models:
35
+
36
+ 1. **Profanity Detection**: `parsawar/profanity_model_3.1` - A RoBERTa-based model trained for offensive language detection
37
+ 2. **Content Refinement**: `s-nlp/t5-paranmt-detox` - A T5-based model for rephrasing offensive language
38
+ 3. **Speech-to-Text**: OpenAI's `Whisper` (large-v2) - For transcribing spoken audio
39
+ 4. **Text-to-Speech**: Microsoft's `SpeechT5` - For converting rephrased text back to audio
40
+
41
+ ## πŸš€ Deployment Options
42
+
43
+ ### Online Deployment (No Installation Required)
44
+
45
+ Access the application directly through Hugging Face Spaces:
46
+ - **URL**: [https://huggingface.co/spaces/sidchak/cleanspeak](https://huggingface.co/spaces/sidchak/cleanspeak)
47
+ - **Technology**: Built with ZeroGPU for efficient GPU resource allocation
48
+ - **Features**: All features of the full application accessible through your browser
49
+ - **Source Code**: [GitHub Repository](https://github.com/sidchak-gh/cleanspeak)
50
+
51
+ ### Local Installation
52
+
53
+ #### Prerequisites
54
+
55
+ - Python 3.10+
56
+ - CUDA-compatible GPU recommended (but CPU mode works too)
57
+ - FFmpeg for audio processing
58
+
59
+ #### Option 1: Using Conda (Recommended for Local Development)
60
+
61
+ ```bash
62
+ # Clone the repository
63
+ git clone https://github.com/sidchak-gh/cleanspeak.git
64
+ cd cleanspeak
65
+
66
+ # Method A: Create environment from environment.yml (recommended)
67
+ conda env create -f environment.yml
68
+ conda activate llm_project
69
+
70
+ # Method B: Create a new conda environment manually
71
+ conda create -n profanity-detection python=3.10
72
+ conda activate profanity-detection
73
+
74
+ # Install PyTorch with CUDA support (adjust CUDA version if needed)
75
+ conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
76
+
77
+ # Install FFmpeg for audio processing
78
+ conda install -c conda-forge ffmpeg
79
+
80
+ # Install Pillow properly to avoid DLL errors
81
+ conda install -c conda-forge pillow
82
+
83
+ # Install additional dependencies
84
+ pip install -r requirements.txt
85
+
86
+ # Set environment variable to avoid OpenMP conflicts (recommended)
87
+ conda env config vars set KMP_DUPLICATE_LIB_OK=TRUE
88
+ conda activate profanity-detection # Re-activate to apply the variable
89
+ ```
90
+
91
+ #### Option 2: Using Docker
92
+
93
+ ```bash
94
+ # Clone the repository
95
+ git clone https://github.com/sidchak-gh/cleanspeak.git
96
+ cd cleanspeak
97
+
98
+ # Build and run the Docker container
99
+ docker-compose build --no-cache
100
+
101
+ docker-compose up
102
+ ```
103
+
104
+ ## πŸ”§ Usage
105
+
106
+ ### Using the Online Interface (Hugging Face Spaces)
107
+
108
+ 1. Visit [https://huggingface.co/spaces/sidchak/cleanspeak](https://huggingface.co/spaces/sidchak/cleanspeak)
109
+ 2. The interface might take a moment to load on first access as it allocates resources
110
+ 3. Follow the same usage instructions as below, starting with "Initialize Models"
111
+
112
+ ### Using the Local Interface
113
+
114
+ 1. **Initialise Models**
115
+ - Click the "Initialize Models" button when you first open the interface
116
+ - Wait for all models to load (this may take a few minutes on first run)
117
+
118
+ 2. **Text Analysis Tab**
119
+ - Enter text into the text box
120
+ - Adjust the "Profanity Detection Sensitivity" slider if needed
121
+ - Click "Analyze Text"
122
+ - View results including profanity score, toxicity classification, and rephrased content
123
+ - See highlighted profane words in the text
124
+ - Listen to the audio version of the rephrased content
125
+
126
+ 3. **Audio Analysis Tab**
127
+ - Upload an audio file or record directly using your microphone
128
+ - Click "Analyze Audio"
129
+ - View transcription, profanity analysis, and rephrased content
130
+ - Listen to the cleaned audio version of the rephrased content
131
+
132
+ 4. **Real-time Streaming Tab**
133
+ - Click "Start Real-time Processing"
134
+ - Speak into your microphone
135
+ - Watch as your speech is transcribed, analyzed, and rephrased in real-time
136
+ - Listen to the clean audio output
137
+ - Click "Stop Real-time Processing" when finished
138
+
139
+ ## ⚠️ Troubleshooting
140
+
141
+ ### OpenMP Runtime Conflict
142
+
143
+ If you encounter this error:
144
+ ```
145
+ OMP: Error #15: Initializing libiomp5md.dll, but found libiomp5md.dll already initialized.
146
+ ```
147
+
148
+ **Solutions:**
149
+
150
+ 1. **Temporary fix**: Set environment variable before running:
151
+ ```bash
152
+ set KMP_DUPLICATE_LIB_OK=TRUE # Windows
153
+ export KMP_DUPLICATE_LIB_OK=TRUE # Linux/Mac
154
+ ```
155
+
156
+ 2. **Code-based fix**: Add to the beginning of your script:
157
+ ```python
158
+ import os
159
+ os.environ['KMP_DUPLICATE_LIB_OK'] = 'TRUE'
160
+ ```
161
+
162
+ 3. **Permanent fix for Conda environment**:
163
+ ```bash
164
+ conda env config vars set KMP_DUPLICATE_LIB_OK=TRUE -n profanity-detection
165
+ conda deactivate
166
+ conda activate profanity-detection
167
+ ```
168
+
169
+ ### GPU Memory Issues
170
+
171
+ If you encounter CUDA out of memory errors:
172
+
173
+ 1. Use smaller models:
174
+ ```python
175
+ # Change Whisper from "large" to "medium" or "small"
176
+ whisper_model = whisper.load_model("medium").to(device)
177
+
178
+ # Keep the TTS model on CPU to save GPU memory
179
+ tts_model = SpeechT5ForTextToSpeech.from_pretrained(TTS_MODEL) # CPU mode
180
+ ```
181
+
182
+ 2. Run some models on CPU instead of GPU:
183
+ ```python
184
+ # Remove .to(device) to keep model on CPU
185
+ t5_model = AutoModelForSeq2SeqLM.from_pretrained(T5_MODEL) # CPU mode
186
+ ```
187
+
188
+ 3. Use Docker with specific GPU memory limits:
189
+ ```yaml
190
+ # In docker-compose.yml
191
+ deploy:
192
+ resources:
193
+ reservations:
194
+ devices:
195
+ - driver: nvidia
196
+ count: 1
197
+ capabilities: [gpu]
198
+ options:
199
+ memory: 4G # Limit to 4GB of GPU memory
200
+ ```
201
+
202
+ ### Hugging Face Spaces-Specific Issues
203
+
204
+ 1. **Long initialization time**: The first time you access the Space, it may take longer to initialize as models are downloaded and cached.
205
+
206
+ 2. **Timeout errors**: If the model takes too long to process your request, try again with shorter text or audio inputs.
207
+
208
+ 3. **Browser compatibility**: Ensure your browser allows microphone access for audio recording features.
209
+
210
+ ### First-Time Slowness
211
+
212
+ When first run, the application downloads all models, which may take time. Subsequent runs will be faster as models are cached locally. The text-to-speech model requires additional download time on first use.
213
+
214
+ ## πŸ“„ Project Structure
215
+
216
+ ```
217
+ cleanspeak/
218
+ β”œβ”€β”€ profanity_detector.py # Main application file
219
+ β”œβ”€β”€ Dockerfile # For containerised deployment
220
+ β”œβ”€β”€ docker-compose.yml # Container orchestration
221
+ β”œβ”€β”€ requirements.txt # Python dependencies
222
+ β”œβ”€β”€ environment.yml # Conda environment specification
223
+ └── README.md # This file
224
+ ```
225
+
226
+ ## Author
227
+
228
+ - Siddharth Chakraborty
229
+
230
+ ## πŸ“š References
231
+
232
+ - [HuggingFace Transformers](https://huggingface.co/docs/transformers/index)
233
+ - [OpenAI Whisper](https://github.com/openai/whisper)
234
+ - [Microsoft SpeechT5](https://huggingface.co/microsoft/speecht5_tts)
235
+ - [Gradio Documentation](https://gradio.app/docs/)
236
+ - [Hugging Face Spaces](https://huggingface.co/spaces)
237
+
238
+ ## πŸ“ License
239
+
240
+ This project is licensed under the MIT License - see the LICENSE file for details.
241
+
242
+ ## πŸ™ Acknowledgments
243
+
244
+ - This project utilises models from HuggingFace Hub, Microsoft, and OpenAI
245
+ - Inspired by research in content moderation and responsible AI
246
+ - Hugging Face for providing the Spaces platform with ZeroGPU technology