Engine Knock Detection - Custom 1D CNN
This model detects engine knock from raw audio waveforms using a custom 1D Convolutional Neural Network.
Model Description
- Architecture: Custom 4-layer 1D CNN
- Input: Raw audio waveforms (80,000 samples @ 16kHz = 5 seconds)
- Output: Binary classification (clean vs knocking)
- Framework: PyTorch
- Parameters: ~5M trainable parameters
Performance Metrics
Evaluated on test set:
| Metric | Score |
|---|---|
| Accuracy | 0.7389 |
| Precision | 0.7660 |
| Recall | 0.7423 |
| F1-Score | 0.7539 |
Usage
import torch
import torchaudio
from huggingface_hub import hf_hub_download
# Load model architecture (you'll need to define Custom1DCNN class)
# See model architecture in the repository
from model import Custom1DCNN
model = Custom1DCNN(num_classes=2)
model_path = hf_hub_download(repo_id="cxlrd/engine-knock-cnn1d", filename="model.pth")
model.load_state_dict(torch.load(model_path, map_location='cpu'))
model.eval()
# Prepare audio
waveform, sample_rate = torchaudio.load('audio.wav')
if sample_rate != 16000:
waveform = torchaudio.transforms.Resample(sample_rate, 16000)(waveform)
# Pad or truncate to 80000 samples
if waveform.shape[1] > 80000:
waveform = waveform[:, :80000]
else:
waveform = torch.nn.functional.pad(waveform, (0, 80000 - waveform.shape[1]))
# Predict
with torch.no_grad():
output = model(waveform)
prediction = torch.argmax(output, dim=1)
print('Clean' if prediction == 0 else 'Knocking')
Training Details
- Dataset: Custom engine sound recordings (1199 samples)
- Training Split: 70% train, 15% validation, 15% test
- Optimizer: Adam (lr=1e-3, weight_decay=1e-4)
- Batch Size: 32
- Early Stopping: Patience of 5 epochs
- No Preprocessing: Direct raw waveform input
Architecture Details
Conv1D(1β64, k=80, s=4) β BatchNorm β ReLU β MaxPool(4)
Conv1D(64β128, k=3) β BatchNorm β ReLU β MaxPool(4)
Conv1D(128β256, k=3) β BatchNorm β ReLU β MaxPool(4)
Conv1D(256β512, k=3) β BatchNorm β ReLU β AdaptiveAvgPool
Dropout(0.5) β Linear(512β128) β ReLU β Dropout(0.3) β Linear(128β2)
Advantages
- Fast Inference: No spectrogram conversion needed
- Lightweight: Processes raw audio directly
- Real-time Capable: Suitable for edge deployment
Citation
If you use this model, please cite:
@misc{engine-knock-cnn1d,
author = {cxlrd},
title = {Engine Knock Detection with 1D CNN},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/cxlrd/engine-knock-cnn1d}}
}
- Downloads last month
- 31