CNN_Audio_Classification_Model_with_Spectrogram

Elwolfie

AIOmarRehan commited on Mar 1

Commit

5988096

0 Parent(s):

Duplicate from AIOmarRehan/CNN_Audio_Classification_Model_with_Spectrogram

Browse files

Files changed (3) hide show

.gitattributes +35 -0
Audio_Model_Classification.h5 +3 -0
README.md +320 -0

.gitattributes ADDED Viewed

	@@ -0,0 +1,35 @@

+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text

Audio_Model_Classification.h5 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ceef1269f64afc26d31dc35e4bcacf68c2d91181aa28afeecec0e2403aabf739
+size 22083448

README.md ADDED Viewed

	@@ -0,0 +1,320 @@

+---
+language: en
+license: mit
+tags:
+  - audio-classification
+  - tensorflow
+  - mel-spectrogram-images
+  - audio-processing
+inference: true
+datasets:
+  - AIOmarRehan/Mel_Spectrogram_Images_for_Audio_Classification
+---
+[If you would like a detailed explanation of this project, please refer to the Medium article below.](https://medium.com/@ai.omar.rehan/building-a-complete-audio-classification-pipeline-using-deep-learning-from-raw-audio-to-mel-9894bd438d85)
+---
+[The project is also available for testing on Hugging Face.](https://huggingface.co/spaces/AIOmarRehan/Deep_Audio_Classifier_using_CNN)
+---
+# Audio-Classification-Raw-Audio-to-Mel-Spectrogram-CNNs
+Complete end-to-end audio classification pipeline using deep learning. From raw recordings to Mel spectrogram CNNs, includes preprocessing, augmentation, dataset validation, model training, and evaluation - a reproducible blueprint for speech, environmental, or general sound classification tasks.
+---
+# Audio Classification Pipeline - From Raw Audio to Mel-Spectrogram CNNs
+> *“In machine learning, the model is rarely the problem - the data almost always is.”*
+> - A reminder I kept repeating to myself while building this project.
+This repository contains a complete, professional, end-to-end pipeline for **audio classification using deep learning**, starting from **raw, messy audio recordings** and ending with a fully trained **CNN model** using **Mel spectrograms**.
+The workflow includes:
+* Raw audio loading
+* Cleaning & normalization
+* Silence trimming
+* Noise reduction
+* Chunking
+* Data augmentation
+* Mel spectrogram generation
+* Dataset validation
+* CNN training
+* Evaluation & metrics
+It is a fully reproducible blueprint for real-world audio classification tasks.
+---
+# Project Structure
+Here is a quick table summarizing the core stages of the pipeline:
+| Stage                   | Description                            | Output           |
+| ----------------------- | -------------------------------------- | ---------------- |
+| **1. Raw Audio**        | Unprocessed WAV/MP3 files              | Audio dataset    |
+| **2. Preprocessing**    | Trimming, cleaning, resampling         | Cleaned signals  |
+| **3. Augmentation**     | Pitch shift, time stretch, noise       | Expanded dataset |
+| **4. Mel Spectrograms** | Converts audio → images                | PNG/IMG files    |
+| **5. CNN Training**     | Deep model learns spectrogram patterns | `.h5` model      |
+| **6. Evaluation**       | Accuracy, F1, Confusion Matrix         | Metrics + plots  |
+---
+# 1. Loading & Inspecting Raw Audio
+The dataset is loaded from directory structure:
+```python
+paths = [(path.parts[-2], path.name, str(path))
+         for path in Path(extract_to).rglob('*.*')
+         if path.suffix.lower() in audio_extensions]
+df = pd.DataFrame(paths, columns=['class', 'filename', 'full_path'])
+df = df.sort_values('class').reset_index(drop=True)
+```
+During EDA, I computed:
+* Duration
+* Sample rate
+* Peak amplitude
+And visualized duration distribution:
+```python
+plt.hist(df['duration'], bins=30, edgecolor='black')
+plt.xlabel("Duration (seconds)")
+plt.ylabel("Number of recordings")
+plt.title("Audio Duration Distribution")
+plt.show()
+```
+---
+# 2. Audio Cleaning & Normalization
+Bad samples were removed, silent files filtered, and amplitudes normalized:
+```python
+peak = np.abs(y).max()
+if peak > 0:
+    y = y / peak * 0.99
+```
+This ensures consistency and prevents the model from learning from corrupted audio.
+---
+# 3. Advanced Preprocessing
+Preprocessing included:
+* Silence trimming
+* Noise reduction
+* Resampling → **16 kHz**
+* Mono conversion
+* 5-second chunking
+```python
+TARGET_DURATION = 5.0
+TARGET_SR = 16000
+TARGET_LENGTH = int(TARGET_DURATION * TARGET_SR)
+```
+Every audio file becomes a clean, consistent chunk ready for feature extraction.
+---
+# 4. Audio Augmentation
+To improve generalization, I applied augmentations:
+```python
+augment = Compose([
+    Shift(min_shift=-0.3, max_shift=0.3, p=0.5),
+    PitchShift(min_semitones=-2, max_semitones=2, p=0.5),
+    TimeStretch(min_rate=0.8, max_rate=1.25, p=0.5),
+    AddGaussianNoise(min_amplitude=0.001, max_amplitude=0.015, p=0.5)
+])
+```
+Every augmented file receives a unique name to avoid collisions.
+---
+# 5. Mel Spectrogram Generation
+Each cleaned audio chunk is transformed into a **Mel spectrogram**:
+```python
+S = librosa.feature.melspectrogram(
+    y=y, sr=SR,
+    n_fft=N_FFT,
+    hop_length=HOP_LENGTH,
+    n_mels=N_MELS
+)
+S_dB = librosa.power_to_db(S, ref=np.max)
+```
+* Output: **128×128 PNG images**
+* Separate directories per class
+* Supports both original & augmented samples
+These images become the CNN input.
+### ***Example of Mel Spectrogram Images***
+![](https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F27304693%2Ffdf7046a261734cd8f503c8f448ca6ad%2Fdownload.png?generation=1763570826533634&alt=media)
+![](https://www.googleapis.com/download/storage/v1/b/kaggle-user-content/o/inbox%2F27304693%2Fea53570ce051601192c90770091f7ceb%2Fdownload%20(1).png?generation=1763570855911665&alt=media)
+---
+# 6. Dataset Validation
+After spectrogram creation:
+* Corrupted images removed
+* Duplicate hashes filtered
+* Filename integrity checked
+* Class folders validated
+```python
+df['file_hash'] = df['full_path'].apply(get_hash)
+duplicate_hashes = df[df.duplicated(subset=['file_hash'], keep=False)]
+```
+This step ensures **clean, reliable** training data.
+---
+# 7. Building TensorFlow Datasets
+The dataset is built with batching, caching, prefetching:
+```python
+train_ds = tf.data.Dataset.from_tensor_slices((train_paths, train_labels))
+train_ds = train_ds.map(load_and_preprocess, num_parallel_calls=AUTOTUNE)
+train_ds = train_ds.shuffle(1024).batch(batch_size).prefetch(AUTOTUNE)
+```
+I used a simple image-level augmentation pipeline:
+```python
+data_augmentation = tf.keras.Sequential([
+    tf.keras.layers.InputLayer(input_shape=(231, 232, 4)),
+    tf.keras.layers.RandomFlip("horizontal"),
+    tf.keras.layers.RandomRotation(0.1),
+    tf.keras.layers.RandomZoom(0.1),
+])
+```
+---
+# 8. CNN Architecture
+The CNN captures deep frequency-time patterns across Mel images.
+Key features:
+* Multiple Conv2D + BatchNorm blocks
+* Dropout
+* L2 regularization
+* Softmax output
+```python
+model = Sequential([
+    data_augmentation,
+    Conv2D(32, (3,3), padding='same', activation='relu', kernel_regularizer=l2(weight_decay)),
+    BatchNormalization(),
+    MaxPooling2D((2,2)),
+    Dropout(0.2),
+    # ... more layers ...
+    Flatten(),
+    Dense(num_classes, activation='softmax')
+])
+```
+---
+# 9. Training Strategy
+```python
+reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=10)
+early_stopping = EarlyStopping(monitor='val_loss', patience=40, restore_best_weights=True)
+history = model.fit(
+    train_ds,
+    validation_data=val_ds,
+    epochs=50,
+    callbacks=[reduce_lr, early_stopping]
+)
+```
+The model converges smoothly while avoiding overfitting.
+---
+# 10. Evaluation
+Performance is evaluated using:
+* Accuracy
+* Precision, recall, F1-score
+* Confusion matrix
+* ROC/AUC curves
+```python
+y_pred = np.argmax(model.predict(test_ds), axis=1)
+print(classification_report(y_true, y_pred, target_names=le.classes_))
+```
+Confusion matrix:
+```python
+sns.heatmap(confusion_matrix(y_true, y_pred), annot=True, cmap='Blues')
+plt.title("Confusion Matrix")
+plt.show()
+```
+---
+# 11. Saving the Model & Dataset
+```python
+model.save("Audio_Model_Classification.h5")
+shutil.make_archive("/content/spectrograms", 'zip', "/content/spectrograms")
+```
+The entire spectrogram dataset is also zipped for sharing or deployment.
+---
+# Final Notes
+This project demonstrates:
+* How to clean & prepare raw audio at a professional level
+* Audio augmentation best practices
+* How Mel spectrograms unlock CNN performance
+* A full TensorFlow training pipeline
+* Proper evaluation, reporting, and dataset integrity
+If you're working on sound recognition, speech tasks, or environmental audio detection, this pipeline gives you a **complete production-grade foundation**.
+---
+# **Results**
+> **Note:** Click the image below to view the video showcasing the project’s results.
+<a href="https://files.catbox.moe/suzziy.mp4">
+  <img src="https://images.unsplash.com/photo-1611162616475-46b635cb6868?q=80&w=1974&auto=format&fit=crop&ixlib=rb-4.1.0&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D" width="400">
+</a>
+<hr style="border-bottom: 5px solid gray; margin-top: 10px;">
+> **Note:** If the video above is not working, you can access it directly via the link below.
+[Watch Demo Video](Results/Spectrogram_CNN_Audio_Classification.mp4)