Add model card README (#1)

- Add model card README (e4780aa437242c404ed00812e792d55918383efb)

Co-authored-by: Rj Francisco <rjfrncsc@users.noreply.huggingface.co>

Files changed (1) hide show

README.md +109 -0

README.md CHANGED Viewed

@@ -1,3 +1,112 @@
 ---
 license: apache-2.0
 ---

 ---
 license: apache-2.0
+language:
+  - en
+tags:
+  - audio-classification
+  - pronunciation
+  - audio-quality
+  - whisper
+  - speech
+library_name: transformers
+base_model: openai/whisper-base
+pipeline_tag: audio-classification
 ---
+# ReadAI - Pronunciation & Audio Quality Assessment Models
+This repository contains two models for audio assessment:
+## 1. Pronunciation Assessment Model (`pronunciation_v3/`)
+A fine-tuned **WhisperForAudioClassification** model (based on `openai/whisper-base`) for binary pronunciation quality classification.
+### Labels
+| Label | ID |
+|-------|-----|
+| Bad   | 0   |
+| Good  | 1   |
+### Usage
+```python
+from transformers import pipeline
+classifier = pipeline(
+    task="audio-classification",
+    model="jecallora/readai",
+    subfolder="pronunciation_v3"
+)
+result = classifier("audio_sample.wav")
+print(result)
+# [{'label': 'Good', 'score': 0.95}, {'label': 'Bad', 'score': 0.05}]
+```
+### Model Details
+- **Architecture:** WhisperForAudioClassification
+- **Base Model:** openai/whisper-base
+- **Sampling Rate:** 16,000 Hz
+- **Input Format:** Audio (WAV, MP3, etc.)
+- **Framework:** PyTorch (safetensors)
+---
+## 2. Audio Quality Classifier (`audio_quality/`)
+A scikit-learn classifier for audio quality assessment.
+### Labels
+| Quality   | Score |
+|-----------|-------|
+| Very Good | 100   |
+| Good      | 75    |
+| Bad       | 50    |
+| Very Bad  | 25    |
+### Files
+- `audio_classifier.joblib` — Trained classifier
+- `scaler.joblib` — StandardScaler for feature normalization
+- `label_encoder.joblib` — Label encoder
+### Usage
+```python
+import joblib
+import librosa
+import numpy as np
+# Load models
+classifier = joblib.load("audio_quality/audio_classifier.joblib")
+scaler = joblib.load("audio_quality/scaler.joblib")
+label_encoder = joblib.load("audio_quality/label_encoder.joblib")
+# Extract features from audio (16kHz mono)
+y, sr = librosa.load("audio_sample.wav", sr=16000, mono=True)
+# Your feature extraction pipeline here...
+# features = extract_features(y)
+# scaled = scaler.transform([features])
+# prediction = classifier.predict(scaled)
+# label = label_encoder.inverse_transform(prediction)
+```
+### Dependencies
+- scikit-learn==1.5.0
+- librosa==0.10.2.post1
+- numpy==1.26.4
+- joblib
+---
+## Requirements
+```
+transformers>=4.41.2
+torch>=2.3.1
+torchaudio>=2.3.1
+scikit-learn>=1.5.0
+librosa>=0.10.2.post1
+soundfile>=0.12.1
+numpy>=1.26.4
+```