Create README.md

Project title
Engine sound classification — diagnostic model

Short description (1–2 sentences)
Train a robust audio classification model to identify and classify engine sounds into diagnostic categories (e.g., Normal, Bearing_Fault, Misfire, Knocking, Belt_Slip, Unknown). The model should work on single short audio clips (1–10 seconds), tolerate background noise, and output probabilities for each class.

Detailed project description / objective
We want an automatic audio-based diagnostic model that, given a short recording of an engine or machine, returns the most likely fault class and a confidence score. The model must:

Distinguish normal engine operation from several distinct fault types (listed below).

Work with recordings captured on smartphones or simple field recorders (variable SNR).

Provide top-3 predictions with probabilities and a recommended single label (highest probability).

Be robust to moderate background noise, small variations in microphone and distance to engine, and short recording lengths (1–10 s).
Use this model for monitoring, early fault detection, and alerts.

Task type
Multi-class audio classification (single-label per clip). Also provide per-class F1 and confusion matrix.

Target classes (exact label names)

Normal

Bearing_Fault

Misfire

Knocking

Belt_Slip

Unknown_Noise

(If you have other classes, replace or add them; keep class names concise and underscore-separated.)

Dataset format & labeling instructions

Each audio file should be a separate .wav file. Prefer lossless or high-bitrate (16-bit).

Mono audio preferred. If stereo, platform may mix to mono.

Recommended sample rate: 16,000 Hz (16 kHz). If original files are at 44.1 kHz, resample to 16 kHz for consistency.

File length: ideally 1–10 seconds. If longer, use labeled segments.

Label each file with exactly one of the target classes above. If the clip contains multiple concurrent faults, label the dominant audible fault; otherwise use Unknown_Noise.

Provide a CSV metadata file with columns: filename,label,recording_device,environment,SNR_estimate,timestamp (SNR_estimate optional). Example row:
engine_001.wav, Bearing_Fault, smartphone_X, workshop_indoor, 12dB, 2024-09-14T10:12:00Z

Minimum dataset guidance

Aim for >= 200 clips per class for reasonable performance; more is better.

If class imbalance exists, apply balancing or class-weighting during training. If a class has fewer than 50 samples, expect lower accuracy for that class.

Preprocessing & input pipeline (what the platform should do)

Convert to mono and resample to 16 kHz.

Normalize amplitude (peak normalization).

Trim or pad to a fixed duration during training (e.g., 4 s windows with overlap). For very short clips (<1s) pad with silence.

Convert audio to log-mel spectrograms (e.g., 64 mel bins, window 25 ms, hop 10 ms) as model input.

Augmentation (please enable)

Additive background noise (real-world noises) at random SNRs (0–20 dB).

Random time-stretching (±10%).

Random pitch shift (±1 semitone).

Random gain and small clipping simulation.

SpecAugment (time/frequency masking) if available.

Model & training preferences

Use a lightweight convolutional or transformer-based audio classifier appropriate for deployment. Prioritize a balance between accuracy and model size.

If multiple architectures are available, try both a small CNN and a transformer/CRNN and choose the best by validation F1.

Use early stopping on validation loss, and save the best checkpoint.

Use class weights or focal loss to mitigate class imbalance.

Evaluation & metrics

Primary metric: Macro F1-score (to give balanced performance across classes).

Secondary metrics: per-class precision/recall/F1, overall accuracy, confusion matrix, and ROC-AUC for each one-vs-rest.

Provide a validation set and a held-out test set. Suggested split: 70% train / 15% validation / 15% test (or 80/10/10 if data is limited). Ensure random but stratified splitting by class.

Expected outputs from model / inference behaviour

For each input clip, return JSON with:

predictions: list of classes with probabilities (sorted descending),

top_prediction: class name,

top_confidence: float (0–1),

duration_seconds: float,

metadata_passed_back: original filename or request id.

Also provide a threshold option for alerts (e.g., alert only if top_confidence >= 0.7).

Latency & deployment constraints (choose one according to your need)

Cloud deployment (no strict latency): ok for batch processing and APIs.

Edge deployment (low-resource devices): request a lightweight model <= 10–20 MB and inference latency <200 ms on ARM CPU. If targeting edge, prioritize smaller architectures and quantization.

Privacy & licensing

Dataset contains field recordings of machines only (no human speech). If any recordings include human speech, strip or anonymize voice if required.

Preferred model license for reuse: MIT or Apache-2.0. If you require commercial use, choose a permissive license.

Extra metadata & quality control

Tag each audio file with recording_device and environment (e.g., mobile_indoor, mobile_outdoor, stationary_mic) for future domain analysis.

Provide at least 2–3 different recording devices/environments per class to improve generalization.

Provide a short list of sample filenames and labels (paste these examples into the UI if a sample list is requested):

normal_0001.wav,Normal

bearing_0123.wav,Bearing_Fault

misfire_0045.wav,Misfire

knock_0099.wav,Knocking

belt_0210.wav,Belt_Slip

unknown_0300.wav,Unknown_Noise

Notes / edge cases

If a clip has extremely low SNR (almost silence), label as Unknown_Noise and exclude from training if too noisy.

If a clip contains engine idling + intermittent knock, label with the most characteristic/diagnostic sound (prefer manual review).

If you later add new classes, retrain from scratch or use transfer learning with a new fine-tuning run.

Desired acceptance criteria (what “good” looks like)

Macro F1 ≥ 0.80 on held-out test set (ideal target; may be lower depending on dataset size).

Per-class F1 ≥ 0.70 for critical fault classes (Bearing_Fault, Misfire, Knocking).

Confusion between Normal and light faults should be minimized — false negatives for faults should be rare for safety.

If you want, I can also produce:

A short version (2–3 sentences) for a “project summary” field.

A CSV template you can paste into the dataset upload tool.

A sample inference JSON example showing expected model output.
<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/69283a22752be65115fe2dc2/lIXRLSHsOEfyy0wa3k9M0.wav"></audio>
<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/69283a22752be65115fe2dc2/RoldbMdGAj47OwpofvCvb.wav"></audio>
<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/69283a22752be65115fe2dc2/n3fQHudUeEo7DsyWx6Xhb.wav"></audio>
<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/69283a22752be65115fe2dc2/MtWShkyyvr3kpitq4dtd-.wav"></audio>
<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/69283a22752be65115fe2dc2/AggjNS8HVJDjAUd7FwV1b.wav"></audio>
<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/69283a22752be65115fe2dc2/4DKzY8utV7gpV7xJP7DXG.wav"></audio>
<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/69283a22752be65115fe2dc2/IG6QSm4KA0RJTtzrBt6wC.wav"></audio>
<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/69283a22752be65115fe2dc2/eiZqY1xWQRVueqsRlhKMd.wav"></audio>
<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/69283a22752be65115fe2dc2/MC8iKveC1Q9EicsM5m0J8.wav"></audio>
<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/69283a22752be65115fe2dc2/k36WarorGpLBflYlmqufj.wav"></audio>
<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/69283a22752be65115fe2dc2/gmKU5Tzhn4_F1nbj29Uqx.wav"></audio>
<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/69283a22752be65115fe2dc2/ulwDitcHOPGsZ8vvW2jXz.wav"></audio>
<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/69283a22752be65115fe2dc2/Xt0cgSoz7unh5BCaBeLnu.wav"></audio>
<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/69283a22752be65115fe2dc2/qznYZRFSGRU0Hy3ww8qP0.wav"></audio>
<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/69283a22752be65115fe2dc2/uNa23LHDtFCTG4zNTCba9.wav"></audio>
<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/69283a22752be65115fe2dc2/CzfkupTHvFGBcYP7nxkb6.wav"></audio>
<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/69283a22752be65115fe2dc2/2MFFIHx__mpwa7TUgl6bu.wav"></audio>
<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/69283a22752be65115fe2dc2/oITQkCJTJkxgDCPyGGG7f.wav"></audio>
<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/69283a22752be65115fe2dc2/D4aROs5ONxcRuEeYDqkxW.wav"></audio>
<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/69283a22752be65115fe2dc2/q5MRGpMRfgMci1U6AdLOY.wav"></audio>
<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/69283a22752be65115fe2dc2/h31Ygu8w_sMSlNgECFkel.wav"></audio>
<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/69283a22752be65115fe2dc2/ULdmqSH_fdVtYNlzrB13Y.wav"></audio>
<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/69283a22752be65115fe2dc2/08lTLX1DGK8SiOSd70xE_.wav"></audio>
<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/69283a22752be65115fe2dc2/LKST4mrBG_Tfq9R6V2i8F.wav"></audio>
<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/69283a22752be65115fe2dc2/ar37Y5u7mWWHXyO3mfGfF.wav"></audio>
<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/69283a22752be65115fe2dc2/ZxWqoBuRP5iyLg6Fqzs3z.wav"></audio>
<audio controls src="https://cdn-uploads.huggingface.co/production/uploads/69283a22752be65115fe2dc2/iO48p4fp37Yvy2mjRCqWr.wav"></audio>
<audio controls src="https://cdn-u

Files changed (1) hide show

README.md +17 -0

README.md ADDED Viewed

	@@ -0,0 +1,17 @@

+---
+datasets:
+- nvidia/PhysicalAI-Autonomous-Vehicles
+language:
+- ar
+metrics:
+- bleu
+base_model:
+- deepseek-ai/DeepSeek-OCR
+new_version: deepseek-ai/DeepSeek-OCR
+pipeline_tag: graph-ml
+library_name: fairseq
+tags:
+- agent
+- code
+- legal
+---