---
language: en
license: mit
tags:
- audio-classification
- underwater-acoustics
- marine-biology
- pytorch
- mel-spectrogram
- resnet18
datasets:
- custom
metrics:
- accuracy
- balanced_accuracy
model-index:
- name: Marine1-Underwater-Acoustic-Classifier
  results:
  - task:
      type: audio-classification
      name: Underwater Sound Classification
    metrics:
    - type: accuracy
      value: 98.33
      name: Accuracy
    - type: balanced_accuracy
      value: 91.67
      name: Balanced Accuracy
---

# Marine1: Underwater Acoustic Classifier 🌊🐋

> **⚠️ IMPORTANT: Use Safetensors Format**  
> This repository contains both `.pth` (pickle) and `.safetensors` formats.  
> **Please use the `.safetensors` files** to avoid pickle security vulnerabilities.  
> See [How to Use](#how-to-use) section below for examples.

## Model Description

Marine1 is a state-of-the-art deep learning model for classifying underwater acoustic events. Built on a fine-tuned ResNet18 architecture, it achieves 98.33% accuracy in distinguishing between four categories of marine soundscapes.

This model is designed for marine biologists, oceanographers, researchers, and conservationists working with underwater audio data.

## Model Details

- **Model Type**: Audio Classification (CNN-based)
- **Architecture**: Fine-tuned ResNet18 (transfer learning from ImageNet)
- **Input**: Audio files (WAV, MP3, FLAC) - max 10 seconds
- **Output**: 4-class classification with confidence scores
- **Framework**: PyTorch 2.0+
- **Parameters**: ~11M parameters
- **Training Time**: ~10 minutes (4 epochs)
- **Format**: Available in both safetensors (recommended) and PyTorch formats

### 🔒 Security Note

This model is available in **safetensors** format, which is the **recommended secure format** that avoids pickle vulnerabilities. 

**Why Safetensors?**
- ✅ **No arbitrary code execution risks** - Safe to load from any source
- ✅ **Fast loading times** - Optimized performance
- ✅ **Memory-efficient** - Better resource usage
- ✅ **Cross-platform compatibility** - Works everywhere

**⚠️ The `.pth` files are provided for backward compatibility only.**  
**Always prefer `.safetensors` for production use.**

Learn more: [Hugging Face Safetensors Documentation](https://huggingface.co/docs/safetensors/index)

## Categories

The model classifies underwater sounds into four distinct categories:

1. **🐋 Marine Animals** (`marine_animal`)
   - Whales, dolphins, orcas, and other marine mammals
   - Fish sounds and biological vocalizations

2. **🚢 Vessels** (`vessel`)
   - Ships, boats, submarines
   - Maritime traffic and propeller sounds

3. **🌊 Natural Sounds** (`natural_sound`)
   - Ocean waves, water movement
   - Bubbles, rain, and environmental sounds

4. **🔧 Other Anthropogenic** (`other_anthropogenic`)
   - Human-made sounds (non-vessel)
   - Underwater construction, sonar, etc.

## Performance

| Metric | Value |
|--------|-------|
| **Accuracy** | 98.33% |
| **Balanced Accuracy** | 91.67% |
| **Training Epochs** | 4 |
| **Training Time** | ~10 minutes |

### Per-Class Performance
- Marine Animals: Excellent (high precision/recall)
- Vessels: Excellent (high precision/recall)
- Natural Sounds: Good (limited training data)
- Other Anthropogenic: Good (limited training data)

## How to Use

### Installation

```bash
pip install torch torchaudio librosa numpy safetensors
```

### Quick Start (Recommended - Safetensors)

```python
import torch
import librosa
import numpy as np
from safetensors.torch import load_file

# Load model (secure format)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
state_dict = load_file("best_model_finetuned.safetensors", device=str(device))

# Load and process audio
audio_path = "underwater_sound.wav"
y, sr = librosa.load(audio_path, sr=16000, duration=10.0)

# Create mel spectrogram
mel_spec = librosa.feature.melspectrogram(
    y=y, sr=sr, n_mels=128, n_fft=2048, 
    hop_length=512, fmax=8000
)
log_mel_spec = librosa.power_to_db(mel_spec, ref=np.max)

# Prepare input
input_tensor = torch.FloatTensor(log_mel_spec).unsqueeze(0).unsqueeze(0).to(device)

# Predict
model.eval()
with torch.no_grad():
    outputs = model(input_tensor)
    probabilities = torch.nn.functional.softmax(outputs, dim=1)[0]
    predicted_class = probabilities.argmax().item()
    confidence = probabilities[predicted_class].item()

# Map to class names
class_names = ["vessel", "marine_animal", "natural_sound", "other_anthropogenic"]
print(f"Prediction: {class_names[predicted_class]} ({confidence*100:.2f}%)")
```

### Using the Inference Class (Easiest)

```python
from huggingface_hub import hf_hub_download
from inference import Marine1Classifier

# Download model (safetensors format - secure!)
model_path = hf_hub_download(
    repo_id="shiv207/Marine1",
    filename="best_model_finetuned.safetensors"
)

# Initialize classifier
classifier = Marine1Classifier(model_path)

# Make prediction
result = classifier.predict("underwater_sound.wav")
print(f"Prediction: {result['predicted_class']}")
print(f"Confidence: {result['confidence']*100:.2f}%")
```

### Using the Complete Pipeline

For a full-featured implementation with preprocessing and JSON output:

```bash
# Clone the repository
git clone https://github.com/shiv207/underwater-audio-classifier.git
cd underwater-audio-classifier

# Install dependencies
pip install -r requirements.txt

# Run prediction (supports both .pth and .safetensors)
python predict_minimal.py --audio your_audio.wav --model models/best_model_finetuned.safetensors

# Generate UDA-compliant JSON
python generate_json.py --audio your_audio.wav --output result.json
```

### Streamlit Web Interface

```bash
streamlit run app.py
```

Features:
- Drag-and-drop multiple audio files
- Batch processing with progress tracking
- Visual spectrograms and probability charts
- Export results as JSON (individual or batch)
- Event detection mode with temporal localization

## Training Data

The model was trained on a diverse dataset of underwater acoustic recordings:

- **Marine Animals**: Whale songs, dolphin clicks, orca vocalizations
- **Vessels**: Various ship types (cargo, tanker, passenger, tug)
- **Natural Sounds**: Ocean waves, water movement, environmental sounds
- **Other Sounds**: Anthropogenic non-vessel sounds

**Note**: Natural sounds and other anthropogenic categories have limited training samples (13 each), which may affect performance on edge cases.

## Technical Details

### Audio Processing Pipeline

1. **Resampling**: Audio resampled to 16kHz
2. **Duration**: Truncated/padded to 10 seconds
3. **Spectrogram**: 128-band mel spectrogram (n_fft=2048, hop_length=512)
4. **Normalization**: Log-scale conversion with dB normalization
5. **Augmentation** (training only): Speed, pitch, noise injection

### Model Architecture

```
ResNet18 (Pre-trained on ImageNet)
├── Conv1 (frozen)
├── Layer1 (frozen)
├── Layer2 (fine-tuned)
├── Layer3 (fine-tuned)
├── Layer4 (fine-tuned)
└── FC Layer (4 classes, trained from scratch)
```

### Training Configuration

- **Optimizer**: Adam (lr=0.0001)
- **Loss**: CrossEntropyLoss
- **Batch Size**: 8
- **Epochs**: 4
- **Data Split**: 80% train, 20% validation
- **Device**: CPU/MPS (Apple Silicon compatible)

## Limitations

1. **Limited Data**: Natural sounds and other anthropogenic categories have fewer training samples
2. **Duration**: Optimized for 10-second audio clips
3. **Sample Rate**: Best performance at 16kHz
4. **Domain**: Trained on specific underwater environments; may need fine-tuning for different acoustic conditions
5. **Background Noise**: Performance may degrade with high noise levels

## Intended Use

### Primary Use Cases
- Marine biology research and species monitoring
- Vessel traffic analysis and maritime surveillance
- Environmental impact assessment
- Underwater acoustic event detection
- Marine conservation and ecosystem monitoring

### Out-of-Scope Use
- Real-time streaming (model requires preprocessing)
- Non-underwater audio classification
- Medical or security applications
- High-stakes decision making without human oversight

## Ethical Considerations

- This model is intended for research and conservation purposes
- Should not be used as the sole basis for regulatory or enforcement decisions
- May have biases based on training data distribution
- Performance varies across different marine environments
- Users should validate results in their specific context

## Citation

If you use this model in your research, please cite:

```bibtex
@misc{marine1-underwater-classifier,
  author = {Shivam Sharma},
  title = {Marine1: Underwater Acoustic Classifier},
  year = {2024},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/shiv207/Marine1}},
}
```

## Model Variants

This repository includes multiple model formats:

### Safetensors Format (🔒 Recommended - Secure)

1. **best_model_finetuned.safetensors** ⭐
   - Fine-tuned ResNet18
   - 98.33% accuracy
   - Secure format (no pickle vulnerabilities)
   - Best overall performance

2. **best_model_simple.safetensors**
   - Custom CNN trained from scratch
   - 93% accuracy
   - Lighter weight alternative
   - Secure format

### Legacy Formats (⚠️ Not Recommended)

3. **best_model_finetuned.pth**
   - PyTorch pickle format (legacy)
   - ⚠️ Contains pickle security warnings
   - **Use safetensors version instead**

4. **best_model_simple.pth**
   - PyTorch pickle format (legacy)
   - ⚠️ Contains pickle security warnings
   - **Use safetensors version instead**

5. **Marine 1.mlmodel**
   - CoreML format for iOS/macOS deployment
   - Optimized for Apple devices

## Additional Resources

- **GitHub Repository**: [underwater-audio-classifier](https://github.com/shiv207/underwater-audio-classifier)
- **Demo Application**: Streamlit web interface included
- **Documentation**: Comprehensive README and code comments
- **JSON Export**: Grand Challenge UDA format compatible

## License

MIT License - Free for commercial and research use

## Acknowledgments

- Built with PyTorch and Streamlit
- ResNet18 architecture from torchvision
- Audio processing with librosa
- Inspired by marine conservation efforts

## Contact

For questions, issues, or collaboration opportunities:
- GitHub: [@shiv207](https://github.com/shiv207)
- Repository Issues: [Report a bug](https://github.com/shiv207/underwater-audio-classifier/issues)

---

**Model Card Authors**: Shivam Sharma

**Last Updated**: October 2024