cxlrd
/

engine-knock-cnn1d

Audio Classification

engine-diagnostics

knock-detection

Model card Files Files and versions

engine-knock-cnn1d / README.md

cxlrd's picture

Upload README.md with huggingface_hub

543d067 verified 3 months ago

|

history blame contribute delete

2.84 kB

	---
	language: en
	license: mit
	tags:
	- audio-classification
	- engine-diagnostics
	- knock-detection
	- cnn
	- 1d-cnn
	datasets:
	- custom
	metrics:
	- accuracy
	- f1
	---

	# Engine Knock Detection - Custom 1D CNN

	This model detects engine knock from raw audio waveforms using a custom 1D Convolutional Neural Network.

	## Model Description

	- Architecture: Custom 4-layer 1D CNN
	- Input: Raw audio waveforms (80,000 samples @ 16kHz = 5 seconds)
	- Output: Binary classification (clean vs knocking)
	- Framework: PyTorch
	- Parameters: ~5M trainable parameters

	## Performance Metrics

	Evaluated on test set:

	\| Metric \| Score \|
	\|-----------\|--------\|
	\| Accuracy \| 0.7389 \|
	\| Precision \| 0.7660 \|
	\| Recall \| 0.7423 \|
	\| F1-Score \| 0.7539 \|

	## Usage

	```python
	import torch
	import torchaudio
	from huggingface_hub import hf_hub_download

	# Load model architecture (you'll need to define Custom1DCNN class)
	# See model architecture in the repository
	from model import Custom1DCNN

	model = Custom1DCNN(num_classes=2)
	model_path = hf_hub_download(repo_id="cxlrd/engine-knock-cnn1d", filename="model.pth")
	model.load_state_dict(torch.load(model_path, map_location='cpu'))
	model.eval()

	# Prepare audio
	waveform, sample_rate = torchaudio.load('audio.wav')
	if sample_rate != 16000:
	waveform = torchaudio.transforms.Resample(sample_rate, 16000)(waveform)

	# Pad or truncate to 80000 samples
	if waveform.shape[1] > 80000:
	waveform = waveform[:, :80000]
	else:
	waveform = torch.nn.functional.pad(waveform, (0, 80000 - waveform.shape[1]))

	# Predict
	with torch.no_grad():
	output = model(waveform)
	prediction = torch.argmax(output, dim=1)
	print('Clean' if prediction == 0 else 'Knocking')
	```

	## Training Details

	- Dataset: Custom engine sound recordings (1199 samples)
	- Training Split: 70% train, 15% validation, 15% test
	- Optimizer: Adam (lr=1e-3, weight_decay=1e-4)
	- Batch Size: 32
	- Early Stopping: Patience of 5 epochs
	- No Preprocessing: Direct raw waveform input

	## Architecture Details

	```
	Conv1D(1→64, k=80, s=4) → BatchNorm → ReLU → MaxPool(4)
	Conv1D(64→128, k=3) → BatchNorm → ReLU → MaxPool(4)
	Conv1D(128→256, k=3) → BatchNorm → ReLU → MaxPool(4)
	Conv1D(256→512, k=3) → BatchNorm → ReLU → AdaptiveAvgPool
	Dropout(0.5) → Linear(512→128) → ReLU → Dropout(0.3) → Linear(128→2)
	```

	## Advantages

	- Fast Inference: No spectrogram conversion needed
	- Lightweight: Processes raw audio directly
	- Real-time Capable: Suitable for edge deployment

	## Citation

	If you use this model, please cite:

	```bibtex
	@misc{engine-knock-cnn1d,
	author = {cxlrd},
	title = {Engine Knock Detection with 1D CNN},
	year = {2025},
	publisher = {HuggingFace},
	howpublished = {\url{https://huggingface.co/cxlrd/engine-knock-cnn1d}}
	}
	```