--- language: en license: mit tags: - audio-classification - engine-diagnostics - knock-detection - cnn - 1d-cnn datasets: - custom metrics: - accuracy - f1 --- # Engine Knock Detection - Custom 1D CNN This model detects engine knock from raw audio waveforms using a custom 1D Convolutional Neural Network. ## Model Description - **Architecture**: Custom 4-layer 1D CNN - **Input**: Raw audio waveforms (80,000 samples @ 16kHz = 5 seconds) - **Output**: Binary classification (clean vs knocking) - **Framework**: PyTorch - **Parameters**: ~5M trainable parameters ## Performance Metrics Evaluated on test set: | Metric | Score | |-----------|--------| | Accuracy | 0.7389 | | Precision | 0.7660 | | Recall | 0.7423 | | F1-Score | 0.7539 | ## Usage ```python import torch import torchaudio from huggingface_hub import hf_hub_download # Load model architecture (you'll need to define Custom1DCNN class) # See model architecture in the repository from model import Custom1DCNN model = Custom1DCNN(num_classes=2) model_path = hf_hub_download(repo_id="cxlrd/engine-knock-cnn1d", filename="model.pth") model.load_state_dict(torch.load(model_path, map_location='cpu')) model.eval() # Prepare audio waveform, sample_rate = torchaudio.load('audio.wav') if sample_rate != 16000: waveform = torchaudio.transforms.Resample(sample_rate, 16000)(waveform) # Pad or truncate to 80000 samples if waveform.shape[1] > 80000: waveform = waveform[:, :80000] else: waveform = torch.nn.functional.pad(waveform, (0, 80000 - waveform.shape[1])) # Predict with torch.no_grad(): output = model(waveform) prediction = torch.argmax(output, dim=1) print('Clean' if prediction == 0 else 'Knocking') ``` ## Training Details - **Dataset**: Custom engine sound recordings (1199 samples) - **Training Split**: 70% train, 15% validation, 15% test - **Optimizer**: Adam (lr=1e-3, weight_decay=1e-4) - **Batch Size**: 32 - **Early Stopping**: Patience of 5 epochs - **No Preprocessing**: Direct raw waveform input ## Architecture Details ``` Conv1D(1→64, k=80, s=4) → BatchNorm → ReLU → MaxPool(4) Conv1D(64→128, k=3) → BatchNorm → ReLU → MaxPool(4) Conv1D(128→256, k=3) → BatchNorm → ReLU → MaxPool(4) Conv1D(256→512, k=3) → BatchNorm → ReLU → AdaptiveAvgPool Dropout(0.5) → Linear(512→128) → ReLU → Dropout(0.3) → Linear(128→2) ``` ## Advantages - **Fast Inference**: No spectrogram conversion needed - **Lightweight**: Processes raw audio directly - **Real-time Capable**: Suitable for edge deployment ## Citation If you use this model, please cite: ```bibtex @misc{engine-knock-cnn1d, author = {cxlrd}, title = {Engine Knock Detection with 1D CNN}, year = {2025}, publisher = {HuggingFace}, howpublished = {\url{https://huggingface.co/cxlrd/engine-knock-cnn1d}} } ```