File size: 2,844 Bytes
f9ffbec 543d067 f9ffbec |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 |
---
language: en
license: mit
tags:
- audio-classification
- engine-diagnostics
- knock-detection
- cnn
- 1d-cnn
datasets:
- custom
metrics:
- accuracy
- f1
---
# Engine Knock Detection - Custom 1D CNN
This model detects engine knock from raw audio waveforms using a custom 1D Convolutional Neural Network.
## Model Description
- **Architecture**: Custom 4-layer 1D CNN
- **Input**: Raw audio waveforms (80,000 samples @ 16kHz = 5 seconds)
- **Output**: Binary classification (clean vs knocking)
- **Framework**: PyTorch
- **Parameters**: ~5M trainable parameters
## Performance Metrics
Evaluated on test set:
| Metric | Score |
|-----------|--------|
| Accuracy | 0.7389 |
| Precision | 0.7660 |
| Recall | 0.7423 |
| F1-Score | 0.7539 |
## Usage
```python
import torch
import torchaudio
from huggingface_hub import hf_hub_download
# Load model architecture (you'll need to define Custom1DCNN class)
# See model architecture in the repository
from model import Custom1DCNN
model = Custom1DCNN(num_classes=2)
model_path = hf_hub_download(repo_id="cxlrd/engine-knock-cnn1d", filename="model.pth")
model.load_state_dict(torch.load(model_path, map_location='cpu'))
model.eval()
# Prepare audio
waveform, sample_rate = torchaudio.load('audio.wav')
if sample_rate != 16000:
waveform = torchaudio.transforms.Resample(sample_rate, 16000)(waveform)
# Pad or truncate to 80000 samples
if waveform.shape[1] > 80000:
waveform = waveform[:, :80000]
else:
waveform = torch.nn.functional.pad(waveform, (0, 80000 - waveform.shape[1]))
# Predict
with torch.no_grad():
output = model(waveform)
prediction = torch.argmax(output, dim=1)
print('Clean' if prediction == 0 else 'Knocking')
```
## Training Details
- **Dataset**: Custom engine sound recordings (1199 samples)
- **Training Split**: 70% train, 15% validation, 15% test
- **Optimizer**: Adam (lr=1e-3, weight_decay=1e-4)
- **Batch Size**: 32
- **Early Stopping**: Patience of 5 epochs
- **No Preprocessing**: Direct raw waveform input
## Architecture Details
```
Conv1D(1β64, k=80, s=4) β BatchNorm β ReLU β MaxPool(4)
Conv1D(64β128, k=3) β BatchNorm β ReLU β MaxPool(4)
Conv1D(128β256, k=3) β BatchNorm β ReLU β MaxPool(4)
Conv1D(256β512, k=3) β BatchNorm β ReLU β AdaptiveAvgPool
Dropout(0.5) β Linear(512β128) β ReLU β Dropout(0.3) β Linear(128β2)
```
## Advantages
- **Fast Inference**: No spectrogram conversion needed
- **Lightweight**: Processes raw audio directly
- **Real-time Capable**: Suitable for edge deployment
## Citation
If you use this model, please cite:
```bibtex
@misc{engine-knock-cnn1d,
author = {cxlrd},
title = {Engine Knock Detection with 1D CNN},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/cxlrd/engine-knock-cnn1d}}
}
```
|