shiv207 commited on
Commit
0195699
·
1 Parent(s): a5f411e

Add safetensors models (secure format) and update documentation

Browse files

- Add best_model_finetuned.safetensors (98.33% accuracy)
- Add best_model_simple.safetensors (93% accuracy)
- Update inference.py to support both .pth and .safetensors
- Update README with security information
- Add safetensors to requirements.txt
- Safetensors format avoids pickle vulnerabilities

.gitattributes CHANGED
@@ -1,5 +1,6 @@
1
  *.pth filter=lfs diff=lfs merge=lfs -text
2
  *.pt filter=lfs diff=lfs merge=lfs -text
 
3
  *.mlmodel filter=lfs diff=lfs merge=lfs -text
4
  *.bin filter=lfs diff=lfs merge=lfs -text
5
  *.h5 filter=lfs diff=lfs merge=lfs -text
 
1
  *.pth filter=lfs diff=lfs merge=lfs -text
2
  *.pt filter=lfs diff=lfs merge=lfs -text
3
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
4
  *.mlmodel filter=lfs diff=lfs merge=lfs -text
5
  *.bin filter=lfs diff=lfs merge=lfs -text
6
  *.h5 filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -45,6 +45,17 @@ This model is designed for marine biologists, oceanographers, researchers, and c
45
  - **Framework**: PyTorch 2.0+
46
  - **Parameters**: ~11M parameters
47
  - **Training Time**: ~10 minutes (4 epochs)
 
 
 
 
 
 
 
 
 
 
 
48
 
49
  ## Categories
50
 
@@ -86,19 +97,20 @@ The model classifies underwater sounds into four distinct categories:
86
  ### Installation
87
 
88
  ```bash
89
- pip install torch torchaudio librosa numpy
90
  ```
91
 
92
- ### Quick Start
93
 
94
  ```python
95
  import torch
96
  import librosa
97
  import numpy as np
 
98
 
99
- # Load model
100
  device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
101
- checkpoint = torch.load("best_model_finetuned.pth", map_location=device)
102
 
103
  # Load and process audio
104
  audio_path = "underwater_sound.wav"
@@ -127,6 +139,27 @@ class_names = ["vessel", "marine_animal", "natural_sound", "other_anthropogenic"
127
  print(f"Prediction: {class_names[predicted_class]} ({confidence*100:.2f}%)")
128
  ```
129
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
130
  ### Using the Complete Pipeline
131
 
132
  For a full-featured implementation with preprocessing and JSON output:
@@ -139,8 +172,8 @@ cd underwater-audio-classifier
139
  # Install dependencies
140
  pip install -r requirements.txt
141
 
142
- # Run prediction
143
- python predict_minimal.py --audio your_audio.wav --model models/best_model_finetuned.pth
144
 
145
  # Generate UDA-compliant JSON
146
  python generate_json.py --audio your_audio.wav --output result.json
@@ -248,19 +281,33 @@ If you use this model in your research, please cite:
248
 
249
  ## Model Variants
250
 
251
- This repository includes three model variants:
 
 
252
 
253
- 1. **best_model_finetuned.pth** (Recommended)
254
  - Fine-tuned ResNet18
255
  - 98.33% accuracy
 
256
  - Best overall performance
257
 
258
- 2. **best_model_simple.pth**
259
  - Custom CNN trained from scratch
260
  - 93% accuracy
261
  - Lighter weight alternative
 
 
 
 
 
 
 
 
 
 
 
262
 
263
- 3. **Marine 1.mlmodel**
264
  - CoreML format for iOS/macOS deployment
265
  - Optimized for Apple devices
266
 
 
45
  - **Framework**: PyTorch 2.0+
46
  - **Parameters**: ~11M parameters
47
  - **Training Time**: ~10 minutes (4 epochs)
48
+ - **Format**: Available in both safetensors (recommended) and PyTorch formats
49
+
50
+ ### 🔒 Security Note
51
+
52
+ This model is available in **safetensors** format, which is the recommended secure format that avoids pickle vulnerabilities. The safetensors format provides:
53
+ - ✅ No arbitrary code execution risks
54
+ - ✅ Fast loading times
55
+ - ✅ Memory-efficient
56
+ - ✅ Cross-platform compatibility
57
+
58
+ We recommend using the `.safetensors` files for production use.
59
 
60
  ## Categories
61
 
 
97
  ### Installation
98
 
99
  ```bash
100
+ pip install torch torchaudio librosa numpy safetensors
101
  ```
102
 
103
+ ### Quick Start (Recommended - Safetensors)
104
 
105
  ```python
106
  import torch
107
  import librosa
108
  import numpy as np
109
+ from safetensors.torch import load_file
110
 
111
+ # Load model (secure format)
112
  device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
113
+ state_dict = load_file("best_model_finetuned.safetensors", device=str(device))
114
 
115
  # Load and process audio
116
  audio_path = "underwater_sound.wav"
 
139
  print(f"Prediction: {class_names[predicted_class]} ({confidence*100:.2f}%)")
140
  ```
141
 
142
+ ### Using the Inference Class (Easiest)
143
+
144
+ ```python
145
+ from huggingface_hub import hf_hub_download
146
+ from inference import Marine1Classifier
147
+
148
+ # Download model (safetensors format - secure!)
149
+ model_path = hf_hub_download(
150
+ repo_id="shiv207/Marine1",
151
+ filename="best_model_finetuned.safetensors"
152
+ )
153
+
154
+ # Initialize classifier
155
+ classifier = Marine1Classifier(model_path)
156
+
157
+ # Make prediction
158
+ result = classifier.predict("underwater_sound.wav")
159
+ print(f"Prediction: {result['predicted_class']}")
160
+ print(f"Confidence: {result['confidence']*100:.2f}%")
161
+ ```
162
+
163
  ### Using the Complete Pipeline
164
 
165
  For a full-featured implementation with preprocessing and JSON output:
 
172
  # Install dependencies
173
  pip install -r requirements.txt
174
 
175
+ # Run prediction (supports both .pth and .safetensors)
176
+ python predict_minimal.py --audio your_audio.wav --model models/best_model_finetuned.safetensors
177
 
178
  # Generate UDA-compliant JSON
179
  python generate_json.py --audio your_audio.wav --output result.json
 
281
 
282
  ## Model Variants
283
 
284
+ This repository includes multiple model formats:
285
+
286
+ ### Safetensors Format (🔒 Recommended - Secure)
287
 
288
+ 1. **best_model_finetuned.safetensors**
289
  - Fine-tuned ResNet18
290
  - 98.33% accuracy
291
+ - Secure format (no pickle vulnerabilities)
292
  - Best overall performance
293
 
294
+ 2. **best_model_simple.safetensors**
295
  - Custom CNN trained from scratch
296
  - 93% accuracy
297
  - Lighter weight alternative
298
+ - Secure format
299
+
300
+ ### Legacy Formats
301
+
302
+ 3. **best_model_finetuned.pth**
303
+ - PyTorch pickle format (legacy)
304
+ - Use safetensors version instead
305
+
306
+ 4. **best_model_simple.pth**
307
+ - PyTorch pickle format (legacy)
308
+ - Use safetensors version instead
309
 
310
+ 5. **Marine 1.mlmodel**
311
  - CoreML format for iOS/macOS deployment
312
  - Optimized for Apple devices
313
 
best_model_finetuned.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:348e72f30c774807db9ed46fdab3508448ab9440009d1033863aa81eda533218
3
+ size 45262356
best_model_simple.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:43beb5722867e95ef4b1c6361d209d2d688e96e4a1e388c6e721180ee5ae1d3d
3
+ size 2479820
inference.py CHANGED
@@ -1,6 +1,7 @@
1
  """
2
  Marine1 Underwater Acoustic Classifier - Inference Script
3
  Simple example for using the model with Hugging Face
 
4
  """
5
 
6
  import torch
@@ -10,6 +11,13 @@ from typing import Dict, Tuple
10
  import warnings
11
  warnings.filterwarnings('ignore')
12
 
 
 
 
 
 
 
 
13
 
14
  class Marine1Classifier:
15
  """Underwater acoustic classifier using Marine1 model"""
@@ -19,7 +27,7 @@ class Marine1Classifier:
19
  Initialize the classifier
20
 
21
  Args:
22
- model_path: Path to the .pth model file
23
  device: Device to run on ('cuda', 'cpu', or 'mps'). Auto-detected if None.
24
  """
25
  if device is None:
@@ -33,25 +41,116 @@ class Marine1Classifier:
33
  self.device = torch.device(device)
34
  print(f"Using device: {self.device}")
35
 
36
- # Load checkpoint
37
- checkpoint = torch.load(model_path, map_location=self.device, weights_only=False)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
38
 
39
- # Get class mapping
40
- self.class_to_id = checkpoint['class_to_id']
41
  self.id_to_class = {v: k for k, v in self.class_to_id.items()}
42
  self.class_names = [self.id_to_class[i] for i in range(len(self.id_to_class))]
43
 
44
- # Load model
45
- from torchvision import models
46
- self.model = models.resnet18(weights=None)
47
- self.model.conv1 = torch.nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3, bias=False)
48
- self.model.fc = torch.nn.Linear(self.model.fc.in_features, len(self.class_names))
49
 
50
- self.model.load_state_dict(checkpoint['model_state_dict'])
 
51
  self.model.to(self.device)
52
  self.model.eval()
53
 
54
- print(f"Model loaded successfully with {len(self.class_names)} classes")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
55
 
56
  def process_audio(self, audio_path: str, sr: int = 16000, duration: float = 10.0) -> np.ndarray:
57
  """
 
1
  """
2
  Marine1 Underwater Acoustic Classifier - Inference Script
3
  Simple example for using the model with Hugging Face
4
+ Supports both .pth (pickle) and .safetensors formats
5
  """
6
 
7
  import torch
 
11
  import warnings
12
  warnings.filterwarnings('ignore')
13
 
14
+ try:
15
+ from safetensors.torch import load_file
16
+ SAFETENSORS_AVAILABLE = True
17
+ except ImportError:
18
+ SAFETENSORS_AVAILABLE = False
19
+ print("Warning: safetensors not installed. Install with: pip install safetensors")
20
+
21
 
22
  class Marine1Classifier:
23
  """Underwater acoustic classifier using Marine1 model"""
 
27
  Initialize the classifier
28
 
29
  Args:
30
+ model_path: Path to the model file (.pth or .safetensors)
31
  device: Device to run on ('cuda', 'cpu', or 'mps'). Auto-detected if None.
32
  """
33
  if device is None:
 
41
  self.device = torch.device(device)
42
  print(f"Using device: {self.device}")
43
 
44
+ # Determine file format
45
+ is_safetensors = model_path.endswith('.safetensors')
46
+
47
+ if is_safetensors:
48
+ if not SAFETENSORS_AVAILABLE:
49
+ raise ImportError("safetensors not installed. Install with: pip install safetensors")
50
+
51
+ print(f"Loading safetensors model (secure format)...")
52
+ # Load safetensors
53
+ state_dict = load_file(model_path, device=str(self.device))
54
+
55
+ # Parse metadata
56
+ from safetensors import safe_open
57
+ with safe_open(model_path, framework="pt", device=str(self.device)) as f:
58
+ metadata = f.metadata()
59
+
60
+ # Get class mapping from metadata
61
+ import ast
62
+ self.class_to_id = ast.literal_eval(metadata.get('class_to_id', "{}"))
63
+ if not self.class_to_id:
64
+ # Default mapping
65
+ self.class_to_id = {
66
+ 'vessel': 0, 'marine_animal': 1,
67
+ 'natural_sound': 2, 'other_anthropogenic': 3
68
+ }
69
+ else:
70
+ print(f"Loading PyTorch model (.pth format)...")
71
+ # Load checkpoint
72
+ checkpoint = torch.load(model_path, map_location=self.device, weights_only=False)
73
+
74
+ # Get class mapping
75
+ self.class_to_id = checkpoint['class_to_id']
76
+ state_dict = checkpoint['model_state_dict']
77
 
 
 
78
  self.id_to_class = {v: k for k, v in self.class_to_id.items()}
79
  self.class_names = [self.id_to_class[i] for i in range(len(self.id_to_class))]
80
 
81
+ # Load model architecture (custom fine-tuned ResNet18)
82
+ self.model = self._create_model_architecture(len(self.class_names))
 
 
 
83
 
84
+ # Load weights
85
+ self.model.load_state_dict(state_dict)
86
  self.model.to(self.device)
87
  self.model.eval()
88
 
89
+ format_type = "safetensors (secure)" if is_safetensors else "PyTorch (.pth)"
90
+ print(f"✅ Model loaded successfully ({format_type})")
91
+ print(f" Classes: {len(self.class_names)}")
92
+
93
+ def _create_model_architecture(self, num_classes: int):
94
+ """Create the model architecture matching the trained model"""
95
+ import torch.nn as nn
96
+ from torchvision import models
97
+
98
+ class LightweightFineTuned(nn.Module):
99
+ def __init__(self, num_classes=4):
100
+ super(LightweightFineTuned, self).__init__()
101
+
102
+ resnet = models.resnet18(weights=None)
103
+
104
+ # Adapt first layer for grayscale spectrograms
105
+ self.conv1 = nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3, bias=False)
106
+ self.bn1 = resnet.bn1
107
+ self.relu = resnet.relu
108
+ self.maxpool = resnet.maxpool
109
+
110
+ self.layer1 = resnet.layer1
111
+ self.layer2 = resnet.layer2
112
+ self.layer3 = resnet.layer3
113
+ self.layer4 = resnet.layer4
114
+ self.avgpool = resnet.avgpool
115
+
116
+ self.classifier = nn.Sequential(
117
+ nn.Dropout(0.5),
118
+ nn.Linear(512, 256),
119
+ nn.ReLU(),
120
+ nn.Dropout(0.25),
121
+ nn.Linear(256, num_classes)
122
+ )
123
+
124
+ self.confidence_head = nn.Sequential(
125
+ nn.Linear(512, 1),
126
+ nn.Sigmoid()
127
+ )
128
+
129
+ def forward(self, x, return_confidence=False):
130
+ if len(x.shape) == 3:
131
+ x = x.unsqueeze(1)
132
+
133
+ x = self.conv1(x)
134
+ x = self.bn1(x)
135
+ x = self.relu(x)
136
+ x = self.maxpool(x)
137
+
138
+ x = self.layer1(x)
139
+ x = self.layer2(x)
140
+ x = self.layer3(x)
141
+ x = self.layer4(x)
142
+
143
+ x = self.avgpool(x)
144
+ features = torch.flatten(x, 1)
145
+ logits = self.classifier(features)
146
+
147
+ if return_confidence:
148
+ confidence = self.confidence_head(features)
149
+ return logits, confidence
150
+
151
+ return logits
152
+
153
+ return LightweightFineTuned(num_classes=num_classes)
154
 
155
  def process_audio(self, audio_path: str, sr: int = 16000, duration: float = 10.0) -> np.ndarray:
156
  """
requirements.txt CHANGED
@@ -5,3 +5,4 @@ librosa>=0.10.0
5
  numpy>=1.24.0
6
  scipy>=1.10.0
7
  soundfile>=0.12.0
 
 
5
  numpy>=1.24.0
6
  scipy>=1.10.0
7
  soundfile>=0.12.0
8
+ safetensors>=0.4.0