--- license: apache-2.0 language: - en tags: - audio-classification - pronunciation - audio-quality - whisper - speech library_name: transformers base_model: openai/whisper-base pipeline_tag: audio-classification --- # ReadAI - Pronunciation & Audio Quality Assessment Models This repository contains two models for audio assessment: ## 1. Pronunciation Assessment Model (`pronunciation_v3/`) A fine-tuned **WhisperForAudioClassification** model (based on `openai/whisper-base`) for binary pronunciation quality classification. ### Labels | Label | ID | |-------|-----| | Bad | 0 | | Good | 1 | ### Usage ```python from transformers import pipeline classifier = pipeline( task="audio-classification", model="jecallora/readai", subfolder="pronunciation_v3" ) result = classifier("audio_sample.wav") print(result) # [{'label': 'Good', 'score': 0.95}, {'label': 'Bad', 'score': 0.05}] ``` ### Model Details - **Architecture:** WhisperForAudioClassification - **Base Model:** openai/whisper-base - **Sampling Rate:** 16,000 Hz - **Input Format:** Audio (WAV, MP3, etc.) - **Framework:** PyTorch (safetensors) --- ## 2. Audio Quality Classifier (`audio_quality/`) A scikit-learn classifier for audio quality assessment. ### Labels | Quality | Score | |-----------|-------| | Very Good | 100 | | Good | 75 | | Bad | 50 | | Very Bad | 25 | ### Files - `audio_classifier.joblib` — Trained classifier - `scaler.joblib` — StandardScaler for feature normalization - `label_encoder.joblib` — Label encoder ### Usage ```python import joblib import librosa import numpy as np # Load models classifier = joblib.load("audio_quality/audio_classifier.joblib") scaler = joblib.load("audio_quality/scaler.joblib") label_encoder = joblib.load("audio_quality/label_encoder.joblib") # Extract features from audio (16kHz mono) y, sr = librosa.load("audio_sample.wav", sr=16000, mono=True) # Your feature extraction pipeline here... # features = extract_features(y) # scaled = scaler.transform([features]) # prediction = classifier.predict(scaled) # label = label_encoder.inverse_transform(prediction) ``` ### Dependencies - scikit-learn==1.5.0 - librosa==0.10.2.post1 - numpy==1.26.4 - joblib --- ## Requirements ``` transformers>=4.41.2 torch>=2.3.1 torchaudio>=2.3.1 scikit-learn>=1.5.0 librosa>=0.10.2.post1 soundfile>=0.12.1 numpy>=1.26.4 ```