cxlrd commited on
Commit
c57bdfc
·
verified ·
1 Parent(s): 0b1bf3e

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +91 -0
README.md ADDED
@@ -0,0 +1,91 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: mit
4
+ tags:
5
+ - audio-classification
6
+ - engine-diagnostics
7
+ - knock-detection
8
+ - resnet
9
+ datasets:
10
+ - custom
11
+ metrics:
12
+ - accuracy
13
+ - f1
14
+ ---
15
+
16
+ # Engine Knock Detection - ResNet-18
17
+
18
+ This model detects engine knock from audio recordings using a fine-tuned ResNet-18 architecture on mel-spectrograms.
19
+
20
+ ## Model Description
21
+
22
+ - **Architecture**: ResNet-18 (pretrained on ImageNet, fine-tuned for audio)
23
+ - **Input**: Mel-spectrograms (224x224, 3-channel)
24
+ - **Output**: Binary classification (clean vs knocking)
25
+ - **Framework**: PyTorch
26
+
27
+ ## Performance Metrics
28
+
29
+ Evaluated on test set:
30
+
31
+ | Metric | Score |
32
+ |-----------|--------|
33
+ | Accuracy | 0.8222 |
34
+ | Precision | 0.9710 |
35
+ | Recall | 0.6907 |
36
+ | F1-Score | 0.8072 |
37
+
38
+ ## Usage
39
+
40
+ ```python
41
+ import torch
42
+ import torchaudio
43
+ from torchvision import models
44
+ from huggingface_hub import hf_hub_download
45
+
46
+ # Load model
47
+ model = models.resnet18(pretrained=False)
48
+ model.fc = torch.nn.Linear(model.fc.in_features, 2)
49
+ model_path = hf_hub_download(repo_id="cxlrd/engine-knock-resnet18", filename="model.pth")
50
+ model.load_state_dict(torch.load(model_path, map_location='cpu'))
51
+ model.eval()
52
+
53
+ # Prepare audio
54
+ waveform, sample_rate = torchaudio.load('audio.wav')
55
+ mel_spec = torchaudio.transforms.MelSpectrogram(
56
+ sample_rate=16000, n_fft=1024, hop_length=512, n_mels=128
57
+ )(waveform)
58
+ mel_spec_db = torchaudio.transforms.AmplitudeToDB()(mel_spec)
59
+ mel_spec_db = torch.nn.functional.interpolate(
60
+ mel_spec_db.unsqueeze(0), size=(224, 224), mode='bilinear'
61
+ ).repeat(1, 3, 1, 1)
62
+
63
+ # Predict
64
+ with torch.no_grad():
65
+ output = model(mel_spec_db)
66
+ prediction = torch.argmax(output, dim=1)
67
+ print('Clean' if prediction == 0 else 'Knocking')
68
+ ```
69
+
70
+ ## Training Details
71
+
72
+ - **Dataset**: Custom engine sound recordings (1199 samples)
73
+ - **Training Split**: 70% train, 15% validation, 15% test
74
+ - **Optimizer**: Adam (lr=1e-4, weight_decay=1e-4)
75
+ - **Batch Size**: 16
76
+ - **Early Stopping**: Patience of 5 epochs
77
+ - **Data Augmentation**: Mel-spectrogram normalization
78
+
79
+ ## Citation
80
+
81
+ If you use this model, please cite:
82
+
83
+ ```bibtex
84
+ @misc{engine-knock-resnet18,
85
+ author = {cxlrd},
86
+ title = {Engine Knock Detection with ResNet-18},
87
+ year = {2025},
88
+ publisher = {HuggingFace},
89
+ howpublished = {\url{https://huggingface.co/cxlrd/engine-knock-resnet18}}
90
+ }
91
+ ```