Spaces:
Sleeping
Sleeping
Update README.md
Browse files
README.md
CHANGED
|
@@ -1,51 +1,59 @@
|
|
| 1 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
|
|
|
|
| 3 |
A sophisticated voice recognition system built with PyTorch and deployed on Hugging Face Spaces using Gradio. This system uses transfer learning with ResNet18 and advanced audio processing techniques for secure voice-based access control.
|
|
|
|
| 4 |
|
| 5 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
|
| 7 |
-
|
| 8 |
-
- **Security-Focused**: Implements confidence thresholds and authorization checks
|
| 9 |
-
- **Data Augmentation**: Trained with comprehensive audio augmentation techniques
|
| 10 |
-
- **Real-time Processing**: Fast inference for real-time voice recognition
|
| 11 |
-
- **User-Friendly Interface**: Clean Gradio interface for easy interaction
|
| 12 |
|
| 13 |
-
|
|
|
|
|
|
|
|
|
|
| 14 |
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
- **Output**: Multi-class classification for voice identification
|
| 18 |
-
- **Security**: Confidence-based access control with authorized user validation
|
| 19 |
|
| 20 |
-
|
|
|
|
|
|
|
|
|
|
| 21 |
|
| 22 |
-
|
| 23 |
-
- **Feature Extraction**: 40 MFCC coefficients
|
| 24 |
-
- **Sample Rate**: Flexible (auto-detected)
|
| 25 |
-
- **Window Length**: 174 time frames (standardized)
|
| 26 |
-
- **Augmentation**: Noise addition, time shifting, pitch shifting, time stretching
|
| 27 |
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
- **Epochs**: 25 with early stopping capability
|
| 34 |
|
| 35 |
-
|
| 36 |
-
- **Authorization List**: Predefined authorized users (user1-user7)
|
| 37 |
-
- **Confidence Threshold**: Configurable (default: 0.7)
|
| 38 |
-
- **False Acceptance Rate**: Minimized through strict thresholding
|
| 39 |
-
- **Access Control**: Binary grant/deny decisions with detailed logging
|
| 40 |
|
| 41 |
-
|
|
|
|
|
|
|
|
|
|
| 42 |
|
| 43 |
-
|
|
|
|
| 44 |
Visit the Hugging Face Space to try the system: [Your Space URL]
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
```bash
|
| 48 |
-
# Clone the repository
|
| 49 |
git clone [your-repo-url]
|
| 50 |
cd voice-recognition-security
|
| 51 |
|
|
@@ -54,85 +62,78 @@ pip install -r requirements.txt
|
|
| 54 |
|
| 55 |
# Run the application
|
| 56 |
python app.py
|
| 57 |
-
|
| 58 |
-
|
| 59 |
-
### API Usage
|
| 60 |
The system accepts audio files in various formats (.wav, .mp3, .flac, .ogg, .m4a, .aac) and returns:
|
| 61 |
-
- Access decision (Granted/Denied)
|
| 62 |
-
- Predicted user
|
| 63 |
-
- Confidence score
|
| 64 |
-
- Detailed status information
|
| 65 |
-
|
| 66 |
-
## π Model Performance
|
| 67 |
-
|
| 68 |
-
### Training Results
|
| 69 |
-
- **Overall Accuracy**: ~95%+ on test set
|
| 70 |
-
- **False Acceptance Rate**: <0.05
|
| 71 |
-
- **False Rejection Rate**: <0.10
|
| 72 |
-
- **Security Score**: >85%
|
| 73 |
-
|
| 74 |
-
### Supported Audio Formats
|
| 75 |
-
- WAV (recommended)
|
| 76 |
-
- MP3
|
| 77 |
-
- FLAC
|
| 78 |
-
- OGG
|
| 79 |
-
- M4A
|
| 80 |
-
- AAC
|
| 81 |
-
|
| 82 |
-
## π οΈ Technical Implementation
|
| 83 |
-
|
| 84 |
-
### Data Augmentation Techniques
|
| 85 |
-
1. **White/Pink Noise Addition**: Improves robustness to background noise
|
| 86 |
-
2. **Time Shifting**: Handles timing variations in speech
|
| 87 |
-
3. **Pitch Shifting**: Accounts for natural voice variations
|
| 88 |
-
4. **Time Stretching**: Adapts to different speaking speeds
|
| 89 |
-
5. **Volume Changes**: Normalizes for different recording levels
|
| 90 |
-
6. **Frequency/Time Masking**: SpecAugment for better generalization
|
| 91 |
-
|
| 92 |
-
### Security Measures
|
| 93 |
-
1. **Multi-factor Authentication**: Voice + confidence scoring
|
| 94 |
-
2. **Threshold-based Rejection**: Configurable confidence thresholds
|
| 95 |
-
3. **Authorization Validation**: Whitelist-based access control
|
| 96 |
-
4. **Anomaly Detection**: Low-confidence sample rejection
|
| 97 |
-
|
| 98 |
-
## π Security Considerations
|
| 99 |
|
| 100 |
-
|
|
|
|
|
|
|
|
|
|
| 101 |
|
| 102 |
-
|
| 103 |
-
|
| 104 |
-
3. **Rate Limiting**: Implement request throttling
|
| 105 |
-
4. **Audit Logging**: Log all access attempts
|
| 106 |
-
5. **Regular Retraining**: Update model with new voice samples
|
| 107 |
|
| 108 |
-
|
|
|
|
|
|
|
|
|
|
| 109 |
|
| 110 |
-
|
| 111 |
-
- PyTorch 2.1.0+
|
| 112 |
-
- Gradio 4.44.0+
|
| 113 |
-
- Librosa 0.10.1+
|
| 114 |
-
- Scikit-learn 1.3.2+
|
| 115 |
-
- NumPy, SciPy, Matplotlib
|
| 116 |
|
| 117 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 118 |
|
| 119 |
-
|
|
|
|
| 120 |
|
| 121 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 122 |
|
| 123 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 124 |
|
| 125 |
-
|
|
|
|
| 126 |
|
| 127 |
-
|
| 128 |
-
|
| 129 |
-
|
| 130 |
-
|
|
|
|
| 131 |
|
| 132 |
-
|
| 133 |
|
| 134 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 135 |
|
| 136 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 137 |
|
| 138 |
-
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: Voice Recognition Security System
|
| 3 |
+
emoji: π€
|
| 4 |
+
colorFrom: blue
|
| 5 |
+
colorTo: purple
|
| 6 |
+
sdk: gradio
|
| 7 |
+
sdk_version: "4.44.0"
|
| 8 |
+
app_file: app.py
|
| 9 |
+
pinned: false
|
| 10 |
+
---
|
| 11 |
|
| 12 |
+
π€ Voice Recognition Security System
|
| 13 |
A sophisticated voice recognition system built with PyTorch and deployed on Hugging Face Spaces using Gradio. This system uses transfer learning with ResNet18 and advanced audio processing techniques for secure voice-based access control.
|
| 14 |
+
π Features
|
| 15 |
|
| 16 |
+
Advanced Voice Recognition: Uses transfer learning with ResNet18 for high-accuracy voice identification
|
| 17 |
+
Security-Focused: Implements confidence thresholds and authorization checks
|
| 18 |
+
Data Augmentation: Trained with comprehensive audio augmentation techniques
|
| 19 |
+
Real-time Processing: Fast inference for real-time voice recognition
|
| 20 |
+
User-Friendly Interface: Clean Gradio interface for easy interaction
|
| 21 |
|
| 22 |
+
ποΈ Model Architecture
|
|
|
|
|
|
|
|
|
|
|
|
|
| 23 |
|
| 24 |
+
Base Model: ResNet18 with transfer learning
|
| 25 |
+
Input: MFCC features (40 coefficients, 174 time frames)
|
| 26 |
+
Output: Multi-class classification for voice identification
|
| 27 |
+
Security: Confidence-based access control with authorized user validation
|
| 28 |
|
| 29 |
+
π Technical Details
|
| 30 |
+
Audio Processing
|
|
|
|
|
|
|
| 31 |
|
| 32 |
+
Feature Extraction: 40 MFCC coefficients
|
| 33 |
+
Sample Rate: Flexible (auto-detected)
|
| 34 |
+
Window Length: 174 time frames (standardized)
|
| 35 |
+
Augmentation: Noise addition, time shifting, pitch shifting, time stretching
|
| 36 |
|
| 37 |
+
Model Training
|
|
|
|
|
|
|
|
|
|
|
|
|
| 38 |
|
| 39 |
+
Transfer Learning: Pre-trained ResNet18 backbone
|
| 40 |
+
Optimization: Adam optimizer with learning rate scheduling
|
| 41 |
+
Regularization: Dropout (0.5) and weight decay
|
| 42 |
+
Batch Size: 32
|
| 43 |
+
Epochs: 25 with early stopping capability
|
|
|
|
| 44 |
|
| 45 |
+
Security Features
|
|
|
|
|
|
|
|
|
|
|
|
|
| 46 |
|
| 47 |
+
Authorization List: Predefined authorized users (user1-user7)
|
| 48 |
+
Confidence Threshold: Configurable (default: 0.7)
|
| 49 |
+
False Acceptance Rate: Minimized through strict thresholding
|
| 50 |
+
Access Control: Binary grant/deny decisions with detailed logging
|
| 51 |
|
| 52 |
+
π§ Usage
|
| 53 |
+
Online Demo
|
| 54 |
Visit the Hugging Face Space to try the system: [Your Space URL]
|
| 55 |
+
Local Installation
|
| 56 |
+
bash# Clone the repository
|
|
|
|
|
|
|
| 57 |
git clone [your-repo-url]
|
| 58 |
cd voice-recognition-security
|
| 59 |
|
|
|
|
| 62 |
|
| 63 |
# Run the application
|
| 64 |
python app.py
|
| 65 |
+
API Usage
|
|
|
|
|
|
|
| 66 |
The system accepts audio files in various formats (.wav, .mp3, .flac, .ogg, .m4a, .aac) and returns:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 67 |
|
| 68 |
+
Access decision (Granted/Denied)
|
| 69 |
+
Predicted user
|
| 70 |
+
Confidence score
|
| 71 |
+
Detailed status information
|
| 72 |
|
| 73 |
+
π Model Performance
|
| 74 |
+
Training Results
|
|
|
|
|
|
|
|
|
|
| 75 |
|
| 76 |
+
Overall Accuracy: ~95%+ on test set
|
| 77 |
+
False Acceptance Rate: <0.05
|
| 78 |
+
False Rejection Rate: <0.10
|
| 79 |
+
Security Score: >85%
|
| 80 |
|
| 81 |
+
Supported Audio Formats
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 82 |
|
| 83 |
+
WAV (recommended)
|
| 84 |
+
MP3
|
| 85 |
+
FLAC
|
| 86 |
+
OGG
|
| 87 |
+
M4A
|
| 88 |
+
AAC
|
| 89 |
|
| 90 |
+
π οΈ Technical Implementation
|
| 91 |
+
Data Augmentation Techniques
|
| 92 |
|
| 93 |
+
White/Pink Noise Addition: Improves robustness to background noise
|
| 94 |
+
Time Shifting: Handles timing variations in speech
|
| 95 |
+
Pitch Shifting: Accounts for natural voice variations
|
| 96 |
+
Time Stretching: Adapts to different speaking speeds
|
| 97 |
+
Volume Changes: Normalizes for different recording levels
|
| 98 |
+
Frequency/Time Masking: SpecAugment for better generalization
|
| 99 |
|
| 100 |
+
Security Measures
|
| 101 |
+
|
| 102 |
+
Multi-factor Authentication: Voice + confidence scoring
|
| 103 |
+
Threshold-based Rejection: Configurable confidence thresholds
|
| 104 |
+
Authorization Validation: Whitelist-based access control
|
| 105 |
+
Anomaly Detection: Low-confidence sample rejection
|
| 106 |
|
| 107 |
+
π Security Considerations
|
| 108 |
+
This system is designed for demonstration purposes. For production deployment:
|
| 109 |
|
| 110 |
+
Encrypt Model Files: Protect model weights from unauthorized access
|
| 111 |
+
Secure Audio Transmission: Use HTTPS and audio encryption
|
| 112 |
+
Rate Limiting: Implement request throttling
|
| 113 |
+
Audit Logging: Log all access attempts
|
| 114 |
+
Regular Retraining: Update model with new voice samples
|
| 115 |
|
| 116 |
+
π Requirements
|
| 117 |
|
| 118 |
+
Python 3.8+
|
| 119 |
+
PyTorch 2.1.0+
|
| 120 |
+
Gradio 4.44.0+
|
| 121 |
+
Librosa 0.10.1+
|
| 122 |
+
Scikit-learn 1.3.2+
|
| 123 |
+
NumPy, SciPy, Matplotlib
|
| 124 |
|
| 125 |
+
π€ Contributing
|
| 126 |
+
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
|
| 127 |
+
π License
|
| 128 |
+
This project is licensed under the MIT License - see the LICENSE file for details.
|
| 129 |
+
π Acknowledgments
|
| 130 |
+
|
| 131 |
+
PyTorch Team: For the excellent deep learning framework
|
| 132 |
+
Librosa Developers: For comprehensive audio processing tools
|
| 133 |
+
Hugging Face: For providing the deployment platform
|
| 134 |
+
Gradio Team: For the intuitive interface framework
|
| 135 |
+
|
| 136 |
+
π Contact
|
| 137 |
+
For questions, issues, or collaboration opportunities, please open an issue on GitHub or contact [your-contact-info].
|
| 138 |
|
| 139 |
+
Built with β€οΈ using PyTorch, Gradio, and Hugging Face Spaces
|