cxlrd
/

engine-knock-resnet18

Audio Classification

engine-diagnostics

knock-detection

Model card Files Files and versions

engine-knock-resnet18 / README.md

cxlrd's picture

Upload README.md with huggingface_hub

dd10e31 verified 2 months ago

|

history blame contribute delete

2.29 kB

	---
	language: en
	license: mit
	tags:
	- audio-classification
	- engine-diagnostics
	- knock-detection
	- resnet
	datasets:
	- custom
	metrics:
	- accuracy
	- f1
	---

	# Engine Knock Detection - ResNet-18

	This model detects engine knock from audio recordings using a fine-tuned ResNet-18 architecture on mel-spectrograms.

	## Model Description

	- Architecture: ResNet-18 (pretrained on ImageNet, fine-tuned for audio)
	- Input: Mel-spectrograms (224x224, 3-channel)
	- Output: Binary classification (clean vs knocking)
	- Framework: PyTorch

	## Performance Metrics

	Evaluated on test set:

	\| Metric \| Score \|
	\|-----------\|--------\|
	\| Accuracy \| 0.8778 \|
	\| Precision \| 0.9518 \|
	\| Recall \| 0.8144 \|
	\| F1-Score \| 0.8778 \|

	## Usage

	```python
	import torch
	import torchaudio
	from torchvision import models
	from huggingface_hub import hf_hub_download

	# Load model
	model = models.resnet18(pretrained=False)
	model.fc = torch.nn.Linear(model.fc.in_features, 2)
	model_path = hf_hub_download(repo_id="cxlrd/engine-knock-resnet18", filename="model.pth")
	model.load_state_dict(torch.load(model_path, map_location='cpu'))
	model.eval()

	# Prepare audio
	waveform, sample_rate = torchaudio.load('audio.wav')
	mel_spec = torchaudio.transforms.MelSpectrogram(
	sample_rate=16000, n_fft=1024, hop_length=512, n_mels=128
	)(waveform)
	mel_spec_db = torchaudio.transforms.AmplitudeToDB()(mel_spec)
	mel_spec_db = torch.nn.functional.interpolate(
	mel_spec_db.unsqueeze(0), size=(224, 224), mode='bilinear'
	).repeat(1, 3, 1, 1)

	# Predict
	with torch.no_grad():
	output = model(mel_spec_db)
	prediction = torch.argmax(output, dim=1)
	print('Clean' if prediction == 0 else 'Knocking')
	```

	## Training Details

	- Dataset: Custom engine sound recordings (1199 samples)
	- Training Split: 70% train, 15% validation, 15% test
	- Optimizer: Adam (lr=1e-4, weight_decay=1e-4)
	- Batch Size: 16
	- Early Stopping: Patience of 5 epochs
	- Data Augmentation: Mel-spectrogram normalization

	## Citation

	If you use this model, please cite:

	```bibtex
	@misc{engine-knock-resnet18,
	author = {cxlrd},
	title = {Engine Knock Detection with ResNet-18},
	year = {2025},
	publisher = {HuggingFace},
	howpublished = {\url{https://huggingface.co/cxlrd/engine-knock-resnet18}}
	}
	```