--- language: en license: mit tags: - audio-classification - underwater-acoustics - marine-biology - pytorch - mel-spectrogram - resnet18 datasets: - custom metrics: - accuracy - balanced_accuracy model-index: - name: Marine1-Underwater-Acoustic-Classifier results: - task: type: audio-classification name: Underwater Sound Classification metrics: - type: accuracy value: 98.33 name: Accuracy - type: balanced_accuracy value: 91.67 name: Balanced Accuracy --- # Marine1: Underwater Acoustic Classifier 🌊🐋 > **⚠️ IMPORTANT: Use Safetensors Format** > This repository contains both `.pth` (pickle) and `.safetensors` formats. > **Please use the `.safetensors` files** to avoid pickle security vulnerabilities. > See [How to Use](#how-to-use) section below for examples. ## Model Description Marine1 is a state-of-the-art deep learning model for classifying underwater acoustic events. Built on a fine-tuned ResNet18 architecture, it achieves 98.33% accuracy in distinguishing between four categories of marine soundscapes. This model is designed for marine biologists, oceanographers, researchers, and conservationists working with underwater audio data. ## Model Details - **Model Type**: Audio Classification (CNN-based) - **Architecture**: Fine-tuned ResNet18 (transfer learning from ImageNet) - **Input**: Audio files (WAV, MP3, FLAC) - max 10 seconds - **Output**: 4-class classification with confidence scores - **Framework**: PyTorch 2.0+ - **Parameters**: ~11M parameters - **Training Time**: ~10 minutes (4 epochs) - **Format**: Available in both safetensors (recommended) and PyTorch formats ### 🔒 Security Note This model is available in **safetensors** format, which is the **recommended secure format** that avoids pickle vulnerabilities. **Why Safetensors?** - ✅ **No arbitrary code execution risks** - Safe to load from any source - ✅ **Fast loading times** - Optimized performance - ✅ **Memory-efficient** - Better resource usage - ✅ **Cross-platform compatibility** - Works everywhere **⚠️ The `.pth` files are provided for backward compatibility only.** **Always prefer `.safetensors` for production use.** Learn more: [Hugging Face Safetensors Documentation](https://huggingface.co/docs/safetensors/index) ## Categories The model classifies underwater sounds into four distinct categories: 1. **🐋 Marine Animals** (`marine_animal`) - Whales, dolphins, orcas, and other marine mammals - Fish sounds and biological vocalizations 2. **🚢 Vessels** (`vessel`) - Ships, boats, submarines - Maritime traffic and propeller sounds 3. **🌊 Natural Sounds** (`natural_sound`) - Ocean waves, water movement - Bubbles, rain, and environmental sounds 4. **🔧 Other Anthropogenic** (`other_anthropogenic`) - Human-made sounds (non-vessel) - Underwater construction, sonar, etc. ## Performance | Metric | Value | |--------|-------| | **Accuracy** | 98.33% | | **Balanced Accuracy** | 91.67% | | **Training Epochs** | 4 | | **Training Time** | ~10 minutes | ### Per-Class Performance - Marine Animals: Excellent (high precision/recall) - Vessels: Excellent (high precision/recall) - Natural Sounds: Good (limited training data) - Other Anthropogenic: Good (limited training data) ## How to Use ### Installation ```bash pip install torch torchaudio librosa numpy safetensors ``` ### Quick Start (Recommended - Safetensors) ```python import torch import librosa import numpy as np from safetensors.torch import load_file # Load model (secure format) device = torch.device("cuda" if torch.cuda.is_available() else "cpu") state_dict = load_file("best_model_finetuned.safetensors", device=str(device)) # Load and process audio audio_path = "underwater_sound.wav" y, sr = librosa.load(audio_path, sr=16000, duration=10.0) # Create mel spectrogram mel_spec = librosa.feature.melspectrogram( y=y, sr=sr, n_mels=128, n_fft=2048, hop_length=512, fmax=8000 ) log_mel_spec = librosa.power_to_db(mel_spec, ref=np.max) # Prepare input input_tensor = torch.FloatTensor(log_mel_spec).unsqueeze(0).unsqueeze(0).to(device) # Predict model.eval() with torch.no_grad(): outputs = model(input_tensor) probabilities = torch.nn.functional.softmax(outputs, dim=1)[0] predicted_class = probabilities.argmax().item() confidence = probabilities[predicted_class].item() # Map to class names class_names = ["vessel", "marine_animal", "natural_sound", "other_anthropogenic"] print(f"Prediction: {class_names[predicted_class]} ({confidence*100:.2f}%)") ``` ### Using the Inference Class (Easiest) ```python from huggingface_hub import hf_hub_download from inference import Marine1Classifier # Download model (safetensors format - secure!) model_path = hf_hub_download( repo_id="shiv207/Marine1", filename="best_model_finetuned.safetensors" ) # Initialize classifier classifier = Marine1Classifier(model_path) # Make prediction result = classifier.predict("underwater_sound.wav") print(f"Prediction: {result['predicted_class']}") print(f"Confidence: {result['confidence']*100:.2f}%") ``` ### Using the Complete Pipeline For a full-featured implementation with preprocessing and JSON output: ```bash # Clone the repository git clone https://github.com/shiv207/underwater-audio-classifier.git cd underwater-audio-classifier # Install dependencies pip install -r requirements.txt # Run prediction (supports both .pth and .safetensors) python predict_minimal.py --audio your_audio.wav --model models/best_model_finetuned.safetensors # Generate UDA-compliant JSON python generate_json.py --audio your_audio.wav --output result.json ``` ### Streamlit Web Interface ```bash streamlit run app.py ``` Features: - Drag-and-drop multiple audio files - Batch processing with progress tracking - Visual spectrograms and probability charts - Export results as JSON (individual or batch) - Event detection mode with temporal localization ## Training Data The model was trained on a diverse dataset of underwater acoustic recordings: - **Marine Animals**: Whale songs, dolphin clicks, orca vocalizations - **Vessels**: Various ship types (cargo, tanker, passenger, tug) - **Natural Sounds**: Ocean waves, water movement, environmental sounds - **Other Sounds**: Anthropogenic non-vessel sounds **Note**: Natural sounds and other anthropogenic categories have limited training samples (13 each), which may affect performance on edge cases. ## Technical Details ### Audio Processing Pipeline 1. **Resampling**: Audio resampled to 16kHz 2. **Duration**: Truncated/padded to 10 seconds 3. **Spectrogram**: 128-band mel spectrogram (n_fft=2048, hop_length=512) 4. **Normalization**: Log-scale conversion with dB normalization 5. **Augmentation** (training only): Speed, pitch, noise injection ### Model Architecture ``` ResNet18 (Pre-trained on ImageNet) ├── Conv1 (frozen) ├── Layer1 (frozen) ├── Layer2 (fine-tuned) ├── Layer3 (fine-tuned) ├── Layer4 (fine-tuned) └── FC Layer (4 classes, trained from scratch) ``` ### Training Configuration - **Optimizer**: Adam (lr=0.0001) - **Loss**: CrossEntropyLoss - **Batch Size**: 8 - **Epochs**: 4 - **Data Split**: 80% train, 20% validation - **Device**: CPU/MPS (Apple Silicon compatible) ## Limitations 1. **Limited Data**: Natural sounds and other anthropogenic categories have fewer training samples 2. **Duration**: Optimized for 10-second audio clips 3. **Sample Rate**: Best performance at 16kHz 4. **Domain**: Trained on specific underwater environments; may need fine-tuning for different acoustic conditions 5. **Background Noise**: Performance may degrade with high noise levels ## Intended Use ### Primary Use Cases - Marine biology research and species monitoring - Vessel traffic analysis and maritime surveillance - Environmental impact assessment - Underwater acoustic event detection - Marine conservation and ecosystem monitoring ### Out-of-Scope Use - Real-time streaming (model requires preprocessing) - Non-underwater audio classification - Medical or security applications - High-stakes decision making without human oversight ## Ethical Considerations - This model is intended for research and conservation purposes - Should not be used as the sole basis for regulatory or enforcement decisions - May have biases based on training data distribution - Performance varies across different marine environments - Users should validate results in their specific context ## Citation If you use this model in your research, please cite: ```bibtex @misc{marine1-underwater-classifier, author = {Shivam Sharma}, title = {Marine1: Underwater Acoustic Classifier}, year = {2024}, publisher = {HuggingFace}, howpublished = {\url{https://huggingface.co/shiv207/Marine1}}, } ``` ## Model Variants This repository includes multiple model formats: ### Safetensors Format (🔒 Recommended - Secure) 1. **best_model_finetuned.safetensors** ⭐ - Fine-tuned ResNet18 - 98.33% accuracy - Secure format (no pickle vulnerabilities) - Best overall performance 2. **best_model_simple.safetensors** - Custom CNN trained from scratch - 93% accuracy - Lighter weight alternative - Secure format ### Legacy Formats (⚠️ Not Recommended) 3. **best_model_finetuned.pth** - PyTorch pickle format (legacy) - ⚠️ Contains pickle security warnings - **Use safetensors version instead** 4. **best_model_simple.pth** - PyTorch pickle format (legacy) - ⚠️ Contains pickle security warnings - **Use safetensors version instead** 5. **Marine 1.mlmodel** - CoreML format for iOS/macOS deployment - Optimized for Apple devices ## Additional Resources - **GitHub Repository**: [underwater-audio-classifier](https://github.com/shiv207/underwater-audio-classifier) - **Demo Application**: Streamlit web interface included - **Documentation**: Comprehensive README and code comments - **JSON Export**: Grand Challenge UDA format compatible ## License MIT License - Free for commercial and research use ## Acknowledgments - Built with PyTorch and Streamlit - ResNet18 architecture from torchvision - Audio processing with librosa - Inspired by marine conservation efforts ## Contact For questions, issues, or collaboration opportunities: - GitHub: [@shiv207](https://github.com/shiv207) - Repository Issues: [Report a bug](https://github.com/shiv207/underwater-audio-classifier/issues) --- **Model Card Authors**: Shivam Sharma **Last Updated**: October 2024