|
|
--- |
|
|
language: en |
|
|
license: mit |
|
|
tags: |
|
|
- audio-classification |
|
|
- engine-diagnostics |
|
|
- knock-detection |
|
|
- cnn |
|
|
- 1d-cnn |
|
|
datasets: |
|
|
- custom |
|
|
metrics: |
|
|
- accuracy |
|
|
- f1 |
|
|
--- |
|
|
|
|
|
# Engine Knock Detection - Custom 1D CNN |
|
|
|
|
|
This model detects engine knock from raw audio waveforms using a custom 1D Convolutional Neural Network. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
- **Architecture**: Custom 4-layer 1D CNN |
|
|
- **Input**: Raw audio waveforms (80,000 samples @ 16kHz = 5 seconds) |
|
|
- **Output**: Binary classification (clean vs knocking) |
|
|
- **Framework**: PyTorch |
|
|
- **Parameters**: ~5M trainable parameters |
|
|
|
|
|
## Performance Metrics |
|
|
|
|
|
Evaluated on test set: |
|
|
|
|
|
| Metric | Score | |
|
|
|-----------|--------| |
|
|
| Accuracy | 0.7389 | |
|
|
| Precision | 0.7660 | |
|
|
| Recall | 0.7423 | |
|
|
| F1-Score | 0.7539 | |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
import torch |
|
|
import torchaudio |
|
|
from huggingface_hub import hf_hub_download |
|
|
|
|
|
# Load model architecture (you'll need to define Custom1DCNN class) |
|
|
# See model architecture in the repository |
|
|
from model import Custom1DCNN |
|
|
|
|
|
model = Custom1DCNN(num_classes=2) |
|
|
model_path = hf_hub_download(repo_id="cxlrd/engine-knock-cnn1d", filename="model.pth") |
|
|
model.load_state_dict(torch.load(model_path, map_location='cpu')) |
|
|
model.eval() |
|
|
|
|
|
# Prepare audio |
|
|
waveform, sample_rate = torchaudio.load('audio.wav') |
|
|
if sample_rate != 16000: |
|
|
waveform = torchaudio.transforms.Resample(sample_rate, 16000)(waveform) |
|
|
|
|
|
# Pad or truncate to 80000 samples |
|
|
if waveform.shape[1] > 80000: |
|
|
waveform = waveform[:, :80000] |
|
|
else: |
|
|
waveform = torch.nn.functional.pad(waveform, (0, 80000 - waveform.shape[1])) |
|
|
|
|
|
# Predict |
|
|
with torch.no_grad(): |
|
|
output = model(waveform) |
|
|
prediction = torch.argmax(output, dim=1) |
|
|
print('Clean' if prediction == 0 else 'Knocking') |
|
|
``` |
|
|
|
|
|
## Training Details |
|
|
|
|
|
- **Dataset**: Custom engine sound recordings (1199 samples) |
|
|
- **Training Split**: 70% train, 15% validation, 15% test |
|
|
- **Optimizer**: Adam (lr=1e-3, weight_decay=1e-4) |
|
|
- **Batch Size**: 32 |
|
|
- **Early Stopping**: Patience of 5 epochs |
|
|
- **No Preprocessing**: Direct raw waveform input |
|
|
|
|
|
## Architecture Details |
|
|
|
|
|
``` |
|
|
Conv1D(1β64, k=80, s=4) β BatchNorm β ReLU β MaxPool(4) |
|
|
Conv1D(64β128, k=3) β BatchNorm β ReLU β MaxPool(4) |
|
|
Conv1D(128β256, k=3) β BatchNorm β ReLU β MaxPool(4) |
|
|
Conv1D(256β512, k=3) β BatchNorm β ReLU β AdaptiveAvgPool |
|
|
Dropout(0.5) β Linear(512β128) β ReLU β Dropout(0.3) β Linear(128β2) |
|
|
``` |
|
|
|
|
|
## Advantages |
|
|
|
|
|
- **Fast Inference**: No spectrogram conversion needed |
|
|
- **Lightweight**: Processes raw audio directly |
|
|
- **Real-time Capable**: Suitable for edge deployment |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model, please cite: |
|
|
|
|
|
```bibtex |
|
|
@misc{engine-knock-cnn1d, |
|
|
author = {cxlrd}, |
|
|
title = {Engine Knock Detection with 1D CNN}, |
|
|
year = {2025}, |
|
|
publisher = {HuggingFace}, |
|
|
howpublished = {\url{https://huggingface.co/cxlrd/engine-knock-cnn1d}} |
|
|
} |
|
|
``` |
|
|
|