Spaces:
Sleeping
Sleeping
Update README.md
#1
by
msmaje
- opened
README.md
CHANGED
|
@@ -7,133 +7,7 @@ sdk: gradio
|
|
| 7 |
sdk_version: 5.32.1
|
| 8 |
app_file: app.py
|
| 9 |
pinned: false
|
| 10 |
-
---
|
| 11 |
-
|
| 12 |
-
π€ Voice Recognition Security System
|
| 13 |
-
A sophisticated voice recognition system built with PyTorch and deployed on Hugging Face Spaces using Gradio. This system uses transfer learning with ResNet18 and advanced audio processing techniques for secure voice-based access control.
|
| 14 |
-
π Features
|
| 15 |
-
|
| 16 |
-
Advanced Voice Recognition: Uses transfer learning with ResNet18 for high-accuracy voice identification
|
| 17 |
-
Security-Focused: Implements confidence thresholds and authorization checks
|
| 18 |
-
Data Augmentation: Trained with comprehensive audio augmentation techniques
|
| 19 |
-
Real-time Processing: Fast inference for real-time voice recognition
|
| 20 |
-
User-Friendly Interface: Clean Gradio interface for easy interaction
|
| 21 |
-
|
| 22 |
-
ποΈ Model Architecture
|
| 23 |
-
|
| 24 |
-
Base Model: ResNet18 with transfer learning
|
| 25 |
-
Input: MFCC features (40 coefficients, 174 time frames)
|
| 26 |
-
Output: Multi-class classification for voice identification
|
| 27 |
-
Security: Confidence-based access control with authorized user validation
|
| 28 |
-
|
| 29 |
-
π Technical Details
|
| 30 |
-
Audio Processing
|
| 31 |
-
|
| 32 |
-
Feature Extraction: 40 MFCC coefficients
|
| 33 |
-
Sample Rate: Flexible (auto-detected)
|
| 34 |
-
Window Length: 174 time frames (standardized)
|
| 35 |
-
Augmentation: Noise addition, time shifting, pitch shifting, time stretching
|
| 36 |
-
|
| 37 |
-
Model Training
|
| 38 |
-
|
| 39 |
-
Transfer Learning: Pre-trained ResNet18 backbone
|
| 40 |
-
Optimization: Adam optimizer with learning rate scheduling
|
| 41 |
-
Regularization: Dropout (0.5) and weight decay
|
| 42 |
-
Batch Size: 32
|
| 43 |
-
Epochs: 25 with early stopping capability
|
| 44 |
-
|
| 45 |
-
Security Features
|
| 46 |
-
|
| 47 |
-
Authorization List: Predefined authorized users (user1-user7)
|
| 48 |
-
Confidence Threshold: Configurable (default: 0.7)
|
| 49 |
-
False Acceptance Rate: Minimized through strict thresholding
|
| 50 |
-
Access Control: Binary grant/deny decisions with detailed logging
|
| 51 |
-
|
| 52 |
-
π§ Usage
|
| 53 |
-
Online Demo
|
| 54 |
-
Visit the Hugging Face Space to try the system: [Your Space URL]
|
| 55 |
-
Local Installation
|
| 56 |
-
bash# Clone the repository
|
| 57 |
-
git clone [your-repo-url]
|
| 58 |
-
cd voice-recognition-security
|
| 59 |
-
|
| 60 |
-
# Install dependencies
|
| 61 |
-
pip install -r requirements.txt
|
| 62 |
-
|
| 63 |
-
# Run the application
|
| 64 |
-
python app.py
|
| 65 |
-
API Usage
|
| 66 |
-
The system accepts audio files in various formats (.wav, .mp3, .flac, .ogg, .m4a, .aac) and returns:
|
| 67 |
-
|
| 68 |
-
Access decision (Granted/Denied)
|
| 69 |
-
Predicted user
|
| 70 |
-
Confidence score
|
| 71 |
-
Detailed status information
|
| 72 |
-
|
| 73 |
-
π Model Performance
|
| 74 |
-
Training Results
|
| 75 |
-
|
| 76 |
-
Overall Accuracy: ~95%+ on test set
|
| 77 |
-
False Acceptance Rate: <0.05
|
| 78 |
-
False Rejection Rate: <0.10
|
| 79 |
-
Security Score: >85%
|
| 80 |
-
|
| 81 |
-
Supported Audio Formats
|
| 82 |
-
|
| 83 |
-
WAV (recommended)
|
| 84 |
-
MP3
|
| 85 |
-
FLAC
|
| 86 |
-
OGG
|
| 87 |
-
M4A
|
| 88 |
-
AAC
|
| 89 |
-
|
| 90 |
-
π οΈ Technical Implementation
|
| 91 |
-
Data Augmentation Techniques
|
| 92 |
-
|
| 93 |
-
White/Pink Noise Addition: Improves robustness to background noise
|
| 94 |
-
Time Shifting: Handles timing variations in speech
|
| 95 |
-
Pitch Shifting: Accounts for natural voice variations
|
| 96 |
-
Time Stretching: Adapts to different speaking speeds
|
| 97 |
-
Volume Changes: Normalizes for different recording levels
|
| 98 |
-
Frequency/Time Masking: SpecAugment for better generalization
|
| 99 |
-
|
| 100 |
-
Security Measures
|
| 101 |
-
|
| 102 |
-
Multi-factor Authentication: Voice + confidence scoring
|
| 103 |
-
Threshold-based Rejection: Configurable confidence thresholds
|
| 104 |
-
Authorization Validation: Whitelist-based access control
|
| 105 |
-
Anomaly Detection: Low-confidence sample rejection
|
| 106 |
-
|
| 107 |
-
π Security Considerations
|
| 108 |
-
This system is designed for demonstration purposes. For production deployment:
|
| 109 |
-
|
| 110 |
-
Encrypt Model Files: Protect model weights from unauthorized access
|
| 111 |
-
Secure Audio Transmission: Use HTTPS and audio encryption
|
| 112 |
-
Rate Limiting: Implement request throttling
|
| 113 |
-
Audit Logging: Log all access attempts
|
| 114 |
-
Regular Retraining: Update model with new voice samples
|
| 115 |
-
|
| 116 |
-
π Requirements
|
| 117 |
-
|
| 118 |
-
Python 3.8+
|
| 119 |
-
PyTorch 2.1.0+
|
| 120 |
-
Gradio 4.44.0+
|
| 121 |
-
Librosa 0.10.1+
|
| 122 |
-
Scikit-learn 1.3.2+
|
| 123 |
-
NumPy, SciPy, Matplotlib
|
| 124 |
|
| 125 |
-
π€ Contributing
|
| 126 |
-
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
|
| 127 |
-
π License
|
| 128 |
-
This project is licensed under the MIT License - see the LICENSE file for details.
|
| 129 |
-
π Acknowledgments
|
| 130 |
|
| 131 |
-
PyTorch Team: For the excellent deep learning framework
|
| 132 |
-
Librosa Developers: For comprehensive audio processing tools
|
| 133 |
-
Hugging Face: For providing the deployment platform
|
| 134 |
-
Gradio Team: For the intuitive interface framework
|
| 135 |
|
| 136 |
-
π Contact
|
| 137 |
-
For questions, issues, or collaboration opportunities, please open an issue on GitHub or contact [your-contact-info].
|
| 138 |
|
| 139 |
-
Built with β€οΈ using PyTorch, Gradio, and Hugging Face Spaces
|
|
|
|
| 7 |
sdk_version: 5.32.1
|
| 8 |
app_file: app.py
|
| 9 |
pinned: false
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
|
|
|
|
|
|
|
| 13 |
|
|
|