nice-bill commited on
Commit
d0ec0b6
Β·
1 Parent(s): 6d719fd

readme updated

Browse files
README.md ADDED
@@ -0,0 +1,78 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # VigilAudio: AI-Powered Audio Moderation Engine
2
+
3
+ **A production-ready audio emotion classification system built for content moderation.**
4
+
5
+ VigilAudio is the first phase of a multimodal moderation suite designed to detect distress, aggression, and safety risks in user-generated content. Unlike traditional moderators that look for keywords, VigilAudio listens to the *tone* of the voiceβ€”detecting anger, fear, or distress even when the words themselves are neutral.
6
+
7
+ ## Key Features
8
+
9
+ * **State-of-the-Art Architecture:** Fine-tuned `facebook/wav2vec2-base-960h` Transformer model.
10
+ * **High Accuracy:** Achieved **82% accuracy** on a 7-class emotion dataset (Angry, Happy, Sad, Fearful, Disgusted, Neutral, Surprised).
11
+ * **Production Pipeline:** End-to-end data harmonization, stratified splitting, and efficient feature extraction.
12
+ * **Cloud-Native Training:** Optimized training scripts for Google Colab (T4 GPU), reducing training time from 50+ hours to <20 minutes.
13
+
14
+ ## Technology Stack
15
+
16
+ * **Language:** Python 3.10+
17
+ * **Environment:** `uv` (for fast dependency management)
18
+ * **ML Framework:** PyTorch, Hugging Face Transformers, Accelerate
19
+ * **Audio Processing:** Librosa, Soundfile
20
+ * **Data Ops:** Pandas, Scikit-learn
21
+
22
+ ## Installation
23
+
24
+ 1. **Clone the repository:**
25
+ ```bash
26
+ git clone https://github.com/yourusername/vigilaudio.git
27
+ cd vigilaudio
28
+ ```
29
+
30
+ 2. **Initialize the environment:**
31
+ We use `uv` for lightning-fast setups.
32
+ ```bash
33
+ uv sync
34
+ ```
35
+
36
+ ## Execution Guide
37
+
38
+ ### 1. Data Pipeline (Harmonization)
39
+ Turn raw, messy folders into a clean, stratified dataset.
40
+ ```bash
41
+ uv run src/data/harmonize.py
42
+ ```
43
+ * **Input:** Raw audio folders (`Emotions/Angry`, `Emotions/Happy`...)
44
+ * **Output:** `data/processed/metadata.csv` (Unified labels + 80/10/10 splits)
45
+
46
+ ### 2. Feature Extraction (Local Test)
47
+ Verify that your machine can process audio using the Wav2Vec2 processor.
48
+ ```bash
49
+ uv run src/features/extractor.py
50
+ ```
51
+ * **Output:** Prints the embedding shape `(768,)` for a sample file.
52
+
53
+ ### 3. Model Training (The "Professional" Way)
54
+ Training a Transformer on a CPU is too slow. We use Google Colab.
55
+
56
+ 1. Upload `train_colab.py` and your `Emotions` folder to Google Drive.
57
+ 2. Open `VigilAudio_Fine_Tuning.ipynb` in Colab.
58
+ 3. Set Runtime to **T4 GPU**.
59
+ 4. Run the training script.
60
+ * **Result:** A fine-tuned model saved to `wav2vec2-finetuned/`.
61
+ * **Performance:** ~82% Accuracy / 0.81 F1 Score.
62
+
63
+ ## Dataset
64
+
65
+ The model was trained on a combined dataset of **12,798 audio recordings** across 7 emotions.
66
+ * **Source:** [Kaggle - Audio Emotions Dataset](https://www.kaggle.com/datasets/uldisvalainis/audio-emotions)
67
+ * **Composition:** An amalgam of CREMA-D, TESS, RAVDESS, and SAVEE datasets.
68
+
69
+ ## Results Summary
70
+
71
+ | Model | Architecture | Training Time | Accuracy |
72
+ |-------|--------------|---------------|----------|
73
+ | Baseline | Simple MLP (CPU) | ~3 hours | 54% |
74
+ | **VigilAudio** | **Fine-Tuned Wav2Vec2 (GPU)** | **17 mins** | **82%** |
75
+
76
+ ## License
77
+
78
+ MIT
src/data/harmonize.py CHANGED
@@ -6,7 +6,7 @@ from tqdm import tqdm
6
  import librosa
7
 
8
  def harmonize_data(raw_data_path, output_path):
9
- print(f"πŸ” Scanning directory: {raw_data_path}")
10
 
11
  data = []
12
  # Folder names are our labels
@@ -20,7 +20,7 @@ def harmonize_data(raw_data_path, output_path):
20
  folder_path = Path(raw_data_path) / folder
21
  files = list(folder_path.glob("*.wav"))
22
 
23
- print(f"πŸ“‚ Processing {folder}: {len(files)} files")
24
 
25
  for file_path in tqdm(files, desc=f"Processing {folder}"):
26
  try:
@@ -33,16 +33,16 @@ def harmonize_data(raw_data_path, output_path):
33
  "path": str(file_path.absolute())
34
  })
35
  except Exception as e:
36
- print(f"❌ Error processing {file_path}: {e}")
37
 
38
  df = pd.DataFrame(data)
39
 
40
  if df.empty:
41
- print("❌ No data found! Please check the raw_data_path.")
42
  return
43
 
44
  # --- Stratified Splitting (80/10/10) ---
45
- print("\nβš–οΈ Creating stratified splits...")
46
 
47
  # First split: Train vs Temp (20%)
48
  train_df, temp_df = train_test_split(
@@ -66,9 +66,9 @@ def harmonize_data(raw_data_path, output_path):
66
  os.makedirs(os.path.dirname(output_path), exist_ok=True)
67
  final_df.to_csv(output_path, index=False)
68
 
69
- print(f"\nβœ… Harmonization Complete!")
70
- print(f"πŸ“Š Total files: {len(final_df)}")
71
- print(f"πŸ“ Metadata saved to: {output_path}")
72
  print("\nSplit Statistics:")
73
  print(final_df.groupby(['split', 'emotion']).size().unstack(fill_value=0))
74
 
 
6
  import librosa
7
 
8
  def harmonize_data(raw_data_path, output_path):
9
+ print(f"Scanning directory: {raw_data_path}")
10
 
11
  data = []
12
  # Folder names are our labels
 
20
  folder_path = Path(raw_data_path) / folder
21
  files = list(folder_path.glob("*.wav"))
22
 
23
+ print(f"Processing {folder}: {len(files)} files")
24
 
25
  for file_path in tqdm(files, desc=f"Processing {folder}"):
26
  try:
 
33
  "path": str(file_path.absolute())
34
  })
35
  except Exception as e:
36
+ print(f"Error processing {file_path}: {e}")
37
 
38
  df = pd.DataFrame(data)
39
 
40
  if df.empty:
41
+ print("No data found! Please check the raw_data_path.")
42
  return
43
 
44
  # --- Stratified Splitting (80/10/10) ---
45
+ print("\nCreating stratified splits...")
46
 
47
  # First split: Train vs Temp (20%)
48
  train_df, temp_df = train_test_split(
 
66
  os.makedirs(os.path.dirname(output_path), exist_ok=True)
67
  final_df.to_csv(output_path, index=False)
68
 
69
+ print(f"\nHarmonization Complete!")
70
+ print(f"Total files: {len(final_df)}")
71
+ print(f"Metadata saved to: {output_path}")
72
  print("\nSplit Statistics:")
73
  print(final_df.groupby(['split', 'emotion']).size().unstack(fill_value=0))
74
 
src/features/build_features.py CHANGED
@@ -13,7 +13,7 @@ def build_all_features(metadata_path, output_dir):
13
  df = pd.read_csv(metadata_path)
14
  extractor = AudioFeatureExtractor()
15
 
16
- print(f"πŸ“Š Starting bulk extraction for {len(df)} files...")
17
 
18
  # 2. Loop with progress bar
19
  # We use a custom naming scheme: {split}_{original_filename}.npy
@@ -32,8 +32,8 @@ def build_all_features(metadata_path, output_dir):
32
  if embedding is not None:
33
  np.save(embedding_path, embedding)
34
 
35
- print(f"\nβœ… Bulk Extraction Complete!")
36
- print(f"πŸ“ Embeddings saved to: {output_dir.absolute()}")
37
 
38
  if __name__ == "__main__":
39
  METADATA = "data/processed/metadata.csv"
@@ -42,4 +42,4 @@ if __name__ == "__main__":
42
  if os.path.exists(METADATA):
43
  build_all_features(METADATA, OUTPUT)
44
  else:
45
- print("❌ Metadata not found. Run harmonize.py first.")
 
13
  df = pd.read_csv(metadata_path)
14
  extractor = AudioFeatureExtractor()
15
 
16
+ print(f"Starting bulk extraction for {len(df)} files...")
17
 
18
  # 2. Loop with progress bar
19
  # We use a custom naming scheme: {split}_{original_filename}.npy
 
32
  if embedding is not None:
33
  np.save(embedding_path, embedding)
34
 
35
+ print(f"\nBulk Extraction Complete!")
36
+ print(f"Embeddings saved to: {output_dir.absolute()}")
37
 
38
  if __name__ == "__main__":
39
  METADATA = "data/processed/metadata.csv"
 
42
  if os.path.exists(METADATA):
43
  build_all_features(METADATA, OUTPUT)
44
  else:
45
+ print("Metadata not found. Run harmonize.py first.")
src/features/extractor.py CHANGED
@@ -12,8 +12,8 @@ class AudioFeatureExtractor:
12
  self.cache_dir = Path(cache_dir)
13
  self.cache_dir.mkdir(parents=True, exist_ok=True)
14
 
15
- print(f"πŸš€ Loading model: {model_name}...")
16
- print(f"πŸ“¦ Cache directory: {self.cache_dir.absolute()}")
17
 
18
  # Load processor and model with explicit cache_dir
19
  self.processor = Wav2Vec2Processor.from_pretrained(model_name, cache_dir=self.cache_dir)
@@ -24,7 +24,7 @@ class AudioFeatureExtractor:
24
  self.model.to(self.device)
25
  self.model.eval()
26
 
27
- print(f"βœ… Model loaded on {self.device}")
28
 
29
  def extract(self, audio_path):
30
  """
@@ -48,7 +48,7 @@ class AudioFeatureExtractor:
48
  return embeddings.cpu().numpy().flatten()
49
 
50
  except Exception as e:
51
- print(f"❌ Error extracting features from {audio_path}: {e}")
52
  return None
53
 
54
  if __name__ == "__main__":
@@ -66,7 +66,7 @@ if __name__ == "__main__":
66
  if embedding is not None:
67
  print(f"\n✨ Success!")
68
  print(f"File: {sample_path}")
69
- print(f"Embedding shape: {embedding.shape}") # Should be (768,)
70
  print(f"First 5 values: {embedding[:5]}")
71
  else:
72
- print("❌ Metadata not found. Please run harmonization first.")
 
12
  self.cache_dir = Path(cache_dir)
13
  self.cache_dir.mkdir(parents=True, exist_ok=True)
14
 
15
+ print(f"Loading model: {model_name}...")
16
+ print(f"Cache directory: {self.cache_dir.absolute()}")
17
 
18
  # Load processor and model with explicit cache_dir
19
  self.processor = Wav2Vec2Processor.from_pretrained(model_name, cache_dir=self.cache_dir)
 
24
  self.model.to(self.device)
25
  self.model.eval()
26
 
27
+ print(f"Model loaded on {self.device}")
28
 
29
  def extract(self, audio_path):
30
  """
 
48
  return embeddings.cpu().numpy().flatten()
49
 
50
  except Exception as e:
51
+ print(f"Error extracting features from {audio_path}: {e}")
52
  return None
53
 
54
  if __name__ == "__main__":
 
66
  if embedding is not None:
67
  print(f"\n✨ Success!")
68
  print(f"File: {sample_path}")
69
+ print(f"Embedding shape: {embedding.shape}")
70
  print(f"First 5 values: {embedding[:5]}")
71
  else:
72
+ print("Metadata not found. Please run harmonization first.")