# ðŸ”Š Audio Section: Frugal AI Challenge

## Strategy for solving the problem

To minimize energy consumption, we deliberately **chose not to use deep learning techniques** such as CNN-based spectrogram analysis, LSTM on raw audio signals or transformer models, which are generally **more computationally intensive**.

Instead, a more **lightweight approach** was adopted:
- Feature extraction from the audio signal (MFCCs and spectral contrast)
- Training a simple machine learning model (decision tree) on these extracted features

Potential Improvements (Not Yet Tested)
- Hyperparameter tuning for better performance
- Exploring alternative lightweight ML models, such as logistic regression or k-nearest neighbors
- Feature extraction without Librosa, using NumPy directly to compute basic signal properties, further reducing dependencies and overhead.

## Installation and library imports

In [1]:
!pip install librosa soundfile datasets



In [2]:
import pandas as pd
from datasets import load_dataset
from IPython.display import Audio
import librosa
import numpy as np
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import HistGradientBoostingClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from tqdm import tqdm
import joblib
import itertools

In [3]:
dataset = load_dataset("rfcx/frugalai", streaming=True)
example = next(iter(dataset['train']))
print(example)
audio_url = example['audio']['array']
Audio(audio_url, rate=12000)

{'audio': {'path': 'pooks_6ebcaf77-aa92-4f10-984e-ecc5a919bcbb_41-44.wav', 'array': array([-0.00915527,  0.01025391, -0.01452637, ..., -0.00628662,
        0.00064087,  0.00137329]), 'sampling_rate': 12000}, 'label': 1}


In [8]:
def extract_features(audio_array, sampling_rate):
    """
    Extracts spectral contrast features from an audio signal.
    """
    contrast = librosa.feature.spectral_contrast(y=audio_array)
    return np.mean(contrast, axis=1)

# Initialize feature and label lists
X, y = [], []
batch_size = 100
dataset_train = dataset['train']

# Process dataset in batches to optimize memory usage
X_total, y_total = [], []

for i, example in enumerate(tqdm(dataset_train)):
    audio_array = example['audio']['array']
    sampling_rate = example['audio']['sampling_rate']
    label = example['label']

    # Extract audio features
    features = extract_features(audio_array, sampling_rate)
    X.append(features)
    y.append(label)

    # Process batches to reduce RAM usage
    if (i + 1) % batch_size == 0:
        X_batch = np.array(X)
        y_batch = np.array(y)

        if i + 1 == batch_size:
            X_total, y_total = X_batch, y_batch
        else:
            X_total = np.vstack([X_total, X_batch])
            y_total = np.hstack([y_total, y_batch])

        # Reset batch storage
        X, y = [], []

# Add remaining data if not a multiple of batch_size
if X:
    X_total = np.vstack([X_total, np.array(X)])
    y_total = np.hstack([y_total, np.array(y)])

# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X_total, y_total, test_size=0.2, random_state=42)

# Train a decision tree classifier
clf = DecisionTreeClassifier()
clf.fit(X_train, y_train)

# Evaluate the model
y_pred = clf.predict(X_test)
print(classification_report(y_test, y_pred))

35277it [06:27, 91.13it/s] 


              precision    recall  f1-score   support

           0       0.80      0.79      0.80      2912
           1       0.86      0.86      0.86      4144

    accuracy                           0.83      7056
   macro avg       0.83      0.83      0.83      7056
weighted avg       0.83      0.83      0.83      7056



## Model export

In [9]:
model_filename = "model_audio.pkl"
joblib.dump(clf, model_filename)
print(f"Model name : {model_filename}")

Model name : model_audio.pkl
