|
|
--- |
|
|
language: en |
|
|
license: mit |
|
|
tags: |
|
|
- audio-classification |
|
|
- engine-diagnostics |
|
|
- knock-detection |
|
|
- resnet |
|
|
datasets: |
|
|
- custom |
|
|
metrics: |
|
|
- accuracy |
|
|
- f1 |
|
|
--- |
|
|
|
|
|
# Engine Knock Detection - ResNet-18 |
|
|
|
|
|
This model detects engine knock from audio recordings using a fine-tuned ResNet-18 architecture on mel-spectrograms. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
- **Architecture**: ResNet-18 (pretrained on ImageNet, fine-tuned for audio) |
|
|
- **Input**: Mel-spectrograms (224x224, 3-channel) |
|
|
- **Output**: Binary classification (clean vs knocking) |
|
|
- **Framework**: PyTorch |
|
|
|
|
|
## Performance Metrics |
|
|
|
|
|
Evaluated on test set: |
|
|
|
|
|
| Metric | Score | |
|
|
|-----------|--------| |
|
|
| Accuracy | 0.8778 | |
|
|
| Precision | 0.9518 | |
|
|
| Recall | 0.8144 | |
|
|
| F1-Score | 0.8778 | |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
import torch |
|
|
import torchaudio |
|
|
from torchvision import models |
|
|
from huggingface_hub import hf_hub_download |
|
|
|
|
|
# Load model |
|
|
model = models.resnet18(pretrained=False) |
|
|
model.fc = torch.nn.Linear(model.fc.in_features, 2) |
|
|
model_path = hf_hub_download(repo_id="cxlrd/engine-knock-resnet18", filename="model.pth") |
|
|
model.load_state_dict(torch.load(model_path, map_location='cpu')) |
|
|
model.eval() |
|
|
|
|
|
# Prepare audio |
|
|
waveform, sample_rate = torchaudio.load('audio.wav') |
|
|
mel_spec = torchaudio.transforms.MelSpectrogram( |
|
|
sample_rate=16000, n_fft=1024, hop_length=512, n_mels=128 |
|
|
)(waveform) |
|
|
mel_spec_db = torchaudio.transforms.AmplitudeToDB()(mel_spec) |
|
|
mel_spec_db = torch.nn.functional.interpolate( |
|
|
mel_spec_db.unsqueeze(0), size=(224, 224), mode='bilinear' |
|
|
).repeat(1, 3, 1, 1) |
|
|
|
|
|
# Predict |
|
|
with torch.no_grad(): |
|
|
output = model(mel_spec_db) |
|
|
prediction = torch.argmax(output, dim=1) |
|
|
print('Clean' if prediction == 0 else 'Knocking') |
|
|
``` |
|
|
|
|
|
## Training Details |
|
|
|
|
|
- **Dataset**: Custom engine sound recordings (1199 samples) |
|
|
- **Training Split**: 70% train, 15% validation, 15% test |
|
|
- **Optimizer**: Adam (lr=1e-4, weight_decay=1e-4) |
|
|
- **Batch Size**: 16 |
|
|
- **Early Stopping**: Patience of 5 epochs |
|
|
- **Data Augmentation**: Mel-spectrogram normalization |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model, please cite: |
|
|
|
|
|
```bibtex |
|
|
@misc{engine-knock-resnet18, |
|
|
author = {cxlrd}, |
|
|
title = {Engine Knock Detection with ResNet-18}, |
|
|
year = {2025}, |
|
|
publisher = {HuggingFace}, |
|
|
howpublished = {\url{https://huggingface.co/cxlrd/engine-knock-resnet18}} |
|
|
} |
|
|
``` |
|
|
|