Spaces:

AksharaSharma
/

voice-authenticity-openenv

Sleeping

App Files Files Community

Akki0404 commited on 27 days ago

Commit

72983a7

0 Parent(s):

voice authenticity openenv - initial submission

Browse files

Files changed (20) hide show

Dockerfile +18 -0
README.md +341 -0
environment/__init__.py +0 -0
environment/data/features.npy +0 -0
environment/data/features_adversarial.npy +0 -0
environment/data/features_compressed.npy +0 -0
environment/data/features_raw.npy +0 -0
environment/data/labels.npy +0 -0
environment/data/labels_adversarial.npy +0 -0
environment/data/labels_compressed.npy +0 -0
environment/data/mean.npy +0 -0
environment/data/std.npy +0 -0
environment/env.py +98 -0
environment/graders.py +33 -0
environment/models.py +21 -0
inference.py +132 -0
openenv.yaml +39 -0
requirements.txt +8 -0
scripts/download_data.py +30 -0
scripts/extract_features.py +223 -0

Dockerfile ADDED Viewed

	@@ -0,0 +1,18 @@

+FROM python:3.10-slim
+WORKDIR /app
+RUN apt-get update && apt-get install -y \
+    libsndfile1 \
+    praat \
+    && rm -rf /var/lib/apt/lists/*
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+COPY . .
+ENV API_BASE_URL=https://router.huggingface.co/v1
+ENV MODEL_NAME=Qwen/Qwen2.5-72B-Instruct
+CMD ["python", "inference.py"]

README.md ADDED Viewed

	@@ -0,0 +1,341 @@

+Here's your complete README:
+---
+```markdown
+# 🎙️ Voice Authenticity Detection — OpenEnv Environment
+An reinforcement learning environment for training and evaluating AI agents
+to detect synthetic (AI-generated) speech across real-world degradation
+conditions.
+> Voice fraud is a growing crisis. This environment trains agents to detect
+> synthetic speech under clean, compressed, and adversarial conditions —
+> directly applicable to fraud detection, content moderation, and voice
+> authentication systems.
+---
+## 🌍 Real-World Motivation
+AI-generated voices (ElevenLabs, Coqui, etc.) are increasingly used for:
+- **Phone fraud** and social engineering attacks
+- **Deepfake audio** in misinformation campaigns
+- **Identity spoofing** in voice authentication systems
+This environment provides a structured benchmark for training agents to
+detect synthetic speech under realistic degradation conditions that existing
+classifiers struggle with.
+---
+## 🏗️ Environment Overview
+The environment serves **48-dimensional feature vectors** extracted from
+audio samples. Agents must classify each sample as real or synthetic,
+with a confidence score.
+### Why Feature Vectors, Not Raw Audio?
+- Fits within 2 vCPU / 8GB RAM constraints
+- Feature extraction done offline — inference is fast
+- Interpretable observations for LLM-based agents
+### Dataset
+- **Real speech**: 250 samples from
+  `garystafford/deepfake-audio-detection` (authentic human recordings)
+- **Synthetic speech**: 250 samples (ElevenLabs, Hume AI, and other
+  TTS platforms)
+- **Total**: 500 labeled samples across 3 task variants
+---
+## 📐 Observation Space
+Each observation is a **48-dimensional float32 vector**:
+| Index | Feature | Description |
+|-------|---------|-------------|
+| 0–19 | MFCC means | Timbre and spectral shape |
+| 20–39 | MFCC std devs | Variation — synthetic voices are too stable |
+| 40 | Zero crossing rate | Signal sign changes per frame |
+| 41 | Spectral centroid | Brightness of the sound |
+| 42 | Jitter | Frequency instability — real voices wobble slightly |
+| 43 | Shimmer | Amplitude instability — real voices vary naturally |
+| 44 | HNR | Harmonics-to-noise ratio — synthetic voices too clean |
+| 45–47 | Compression artifacts | Spectral bandwidth, rolloff, RMS energy |
+### Key Discriminators
+```
+Real speech:      jitter > 0.025, shimmer > 0.10, hnr < 12.0
+Synthetic speech: jitter < 0.020, shimmer < 0.09, hnr > 12.0
+```
+### Observation Schema (Pydantic)
+```python
+class VoiceObservation(BaseModel):
+    features: List[float]      # 48-dim feature vector (normalized)
+    task_name: str             # current task
+    step_number: int           # current step
+    difficulty: str            # easy | medium | hard
+    sample_id: int             # index into dataset
+    hint: Optional[str]        # key raw values + task warning
+```
+---
+## 🎯 Action Space
+```python
+class VoiceAction(BaseModel):
+    label: int        # 0 = real, 1 = synthetic
+    confidence: float # confidence in [0.0, 1.0]
+    reasoning: str    # brief explanation
+```
+---
+## 🏆 Tasks
+### Task 1 — Clean Detection (Easy)
+- **Description**: Classify real vs synthetic speech from clean,
+  unmodified audio features
+- **Difficulty**: Easy
+- **Expected agent score**: 0.7–1.0
+- **Scoring**: Binary — correct=1.0, incorrect=0.0
+### Task 2 — Compressed Detection (Medium)
+- **Description**: Classify speech after codec compression degradation.
+  Jitter and shimmer are reduced, compression artifacts added.
+- **Difficulty**: Medium
+- **Expected agent score**: 0.5–0.9
+- **Scoring**: Partial credit based on confidence calibration
+  ```
+  correct + high confidence → 1.0
+  correct + low confidence  → 0.6
+  wrong + low confidence    → 0.2
+  wrong + high confidence   → 0.0
+  ```
+### Task 3 — Adversarial Detection (Hard)
+- **Description**: Synthetic audio specifically crafted to mimic real
+  speech features. Jitter and shimmer are artificially elevated.
+- **Difficulty**: Hard
+- **Expected agent score**: 0.3–0.97
+- **Scoring**: Rewards correct classification AND penalizes overconfidence
+  ```
+  correct + calibrated confidence (~0.7) → ~1.0
+  correct + overconfident (>0.9)         → 0.5
+  wrong + appropriately uncertain        → 0.15
+  wrong + overconfident                  → 0.0
+  ```
+---
+## 🎁 Reward Function
+The reward function provides **partial, meaningful signals** — not just
+binary win/lose.
+```python
+def grade(true_label, action, difficulty):
+    correct = (action["label"] == true_label)
+    confidence = action["confidence"]
+    if difficulty == "easy":
+        return 1.0 if correct else 0.0
+    elif difficulty == "medium":
+        if correct:
+            return 0.6 + 0.4 * confidence
+        else:
+            return max(0.0, 0.2 - 0.3 * confidence)
+    elif difficulty == "hard":
+        if correct:
+            base = 0.5
+            calibration_bonus = 0.5 * (1 - abs(confidence - 0.7))
+            return base + calibration_bonus
+        else:
+            return 0.15 if confidence < 0.4 else 0.0
+```
+### Why Confidence Calibration Matters
+An agent that is **wrong but uncertain** is more useful than one that is
+**wrong but confident**. This reward design teaches agents to express
+appropriate uncertainty — critical for real-world fraud detection systems.
+---
+## 🔌 OpenEnv API
+```python
+from environment.env import VoiceAuthenticityEnv
+env = VoiceAuthenticityEnv(task_name="clean_detection")
+# Reset episode
+obs = env.reset()
+# obs.features      → 48-dim list
+# obs.hint          → key raw values for interpretation
+# obs.difficulty    → "easy"
+# Take action
+action = {"label": 1, "confidence": 0.8, "reasoning": "low jitter"}
+obs, reward, done, info = env.step(action)
+# reward            → float in [0.0, 1.0]
+# done              → True (one classification per episode)
+# info["true_label"]→ ground truth
+# Get state
+state = env.state()
+```
+---
+## 📊 Baseline Scores
+Scores from `Qwen/Qwen2.5-72B-Instruct` across multiple runs:
+| Task | Difficulty | Avg Reward | Notes |
+|------|-----------|------------|-------|
+| clean_detection | Easy | ~0.80 | Strong baseline |
+| compressed_detection | Medium | ~0.70 | Compression reduces confidence |
+| adversarial_detection | Hard | ~0.75 | Calibration reward helps |
+---
+## 🚀 Setup & Usage
+### Requirements
+```
+Python 3.10+
+Docker
+HuggingFace account
+```
+### Local Setup
+```bash
+# Clone the repo
+git clone https://huggingface.co/spaces/YOUR_USERNAME/voice-authenticity-openenv
+cd voice-authenticity-openenv
+# Install dependencies
+pip install -r requirements.txt
+# Download dataset and extract features
+python scripts/download_data.py
+python scripts/extract_features.py
+# Set environment variables
+cp .env.example .env
+# Edit .env with your HF_TOKEN
+# Run baseline inference
+python inference.py
+```
+### Environment Variables
+| Variable | Description | Default |
+|----------|-------------|---------|
+| `API_BASE_URL` | LLM API endpoint | `https://router.huggingface.co/v1` |
+| `MODEL_NAME` | Model identifier | `Qwen/Qwen2.5-72B-Instruct` |
+| `HF_TOKEN` | HuggingFace API token | required |
+| `VOICE_TASK` | Task to run | `clean_detection` |
+### Docker
+```bash
+# Build
+docker build -t voice-authenticity .
+# Run
+docker run --env-file .env voice-authenticity
+```
+---
+## 📁 Project Structure
+```
+voice-authenticity-openenv/
+├── environment/
+│   ├── __init__.py
+│   ├── env.py              # step() / reset() / state()
+│   ├── models.py           # Pydantic Observation/Action/Reward
+│   ├── graders.py          # scoring logic per task
+│   └── data/
+│       ├── features.npy            # clean features (500 × 48)
+│       ├── features_compressed.npy # codec-degraded features
+│       ├── features_adversarial.npy# adversarially perturbed
+│       ├── features_raw.npy        # unnormalized for hints
+│       └── labels.npy              # ground truth labels
+├── scripts/
+│   ├── download_data.py    # fetch dataset from HuggingFace
+│   └── extract_features.py # audio → feature vectors
+├── inference.py            # baseline LLM agent
+├── openenv.yaml            # OpenEnv spec
+├── Dockerfile
+├── requirements.txt
+└── README.md
+```
+---
+## 🔬 Technical Details
+### Feature Extraction Pipeline
+```
+Audio (.wav / .flac)
+    ↓ librosa (MFCCs, spectral features)
+    ↓ parselmouth/Praat (jitter, shimmer, HNR)
+    ↓ z-score normalization
+    ↓ 48-dim float32 vector
+    → stored as .npy arrays
+```
+### Compression Simulation (Task 2)
+Codec compression is simulated by:
+- Degrading MFCC standard deviations (compression flattens variation)
+- Reducing jitter and shimmer values
+- Adding spectral artifact signals to indices 45–47
+### Adversarial Simulation (Task 3)
+Adversarial perturbation on synthetic samples:
+- Artificially elevates jitter (+0.005 to +0.02)
+- Artificially elevates shimmer (+0.01 to +0.05)
+- Slightly reduces HNR to mimic real speech
+---
+## 📋 Expected stdout Format
+```
+[START] task=clean_detection env=voice-authenticity model=Qwen/Qwen2.5-72B-Instruct
+[STEP] step=1 action={"label":0,"confidence":0.95,"reasoning":"..."} reward=1.00 done=true error=null
+[END] success=true steps=1 score=1.000 rewards=1.00
+```
+---
+## 📜 License
+MIT
+```
+---
+## Save This
+Create `README.md` in your project root and paste everything above into it.
+Also create `.env.example` (safe to commit, no real token):
+```
+API_BASE_URL=https://router.huggingface.co/v1
+MODEL_NAME=Qwen/Qwen2.5-72B-Instruct
+HF_TOKEN=your_huggingface_token_here
+VOICE_TASK=clean_detection
+```
+---

environment/__init__.py ADDED Viewed

File without changes

environment/data/features.npy ADDED Viewed

Binary file (96.1 kB). View file

environment/data/features_adversarial.npy ADDED Viewed

Binary file (96.1 kB). View file

environment/data/features_compressed.npy ADDED Viewed

Binary file (96.1 kB). View file

environment/data/features_raw.npy ADDED Viewed

Binary file (96.1 kB). View file

environment/data/labels.npy ADDED Viewed

Binary file (2.13 kB). View file

environment/data/labels_adversarial.npy ADDED Viewed

Binary file (2.13 kB). View file

environment/data/labels_compressed.npy ADDED Viewed

Binary file (2.13 kB). View file

environment/data/mean.npy ADDED Viewed

Binary file (320 Bytes). View file

environment/data/std.npy ADDED Viewed

Binary file (320 Bytes). View file

environment/env.py ADDED Viewed

	@@ -0,0 +1,98 @@

+import numpy as np
+import random
+from environment.models import VoiceObservation
+TASKS = ["clean_detection", "compressed_detection", "adversarial_detection"]
+DIFFICULTY_MAP = {
+    "clean_detection":      "easy",
+    "compressed_detection": "medium",
+    "adversarial_detection":"hard"
+}
+DATA_FILES = {
+    "clean_detection": (
+        "environment/data/features.npy",
+        "environment/data/labels.npy"
+    ),
+    "compressed_detection": (
+        "environment/data/features_compressed.npy",
+        "environment/data/labels_compressed.npy"
+    ),
+    "adversarial_detection": (
+        "environment/data/features_adversarial.npy",
+        "environment/data/labels_adversarial.npy"
+    ),
+}
+class VoiceAuthenticityEnv:
+    def __init__(self, task_name: str = "clean_detection"):
+        assert task_name in TASKS, f"Unknown task: {task_name}"
+        self.task_name  = task_name
+        self.difficulty = DIFFICULTY_MAP[task_name]
+        feat_file, label_file = DATA_FILES[task_name]
+        self.features = np.load(feat_file)
+        self.labels   = np.load(label_file)
+        # Load raw features for interpretable key values
+        self.raw_features = np.load("environment/data/features_raw.npy")
+        self.indices = list(range(len(self.labels)))
+        self.current_idx = None
+        self.step_number = 0
+        self.done = False
+        self.max_steps = 1
+    def reset(self):
+        self.step_number = 0
+        self.done        = False
+        self.current_idx = random.choice(self.indices)
+        return self._make_observation()
+    def step(self, action: dict):
+        if self.done:
+            raise RuntimeError("Episode done. Call reset().")
+        from environment.graders import grade
+        true_label = int(self.labels[self.current_idx])
+        reward     = grade(true_label, action, self.difficulty)
+        self.step_number += 1
+        self.done = True
+        obs  = self._make_observation()
+        info = {
+            "true_label": true_label,
+            "difficulty": self.difficulty,
+            "task":       self.task_name
+        }
+        return obs, reward, self.done, info
+    def state(self):
+        return {
+            "task_name":   self.task_name,
+            "difficulty":  self.difficulty,
+            "step_number": self.step_number,
+            "done":        self.done,
+            "current_idx": self.current_idx
+        }
+    def _make_observation(self) -> VoiceObservation:
+        feat = self.features[self.current_idx].tolist()
+        raw  = self.raw_features[self.current_idx]
+        hint = None
+        if self.difficulty == "medium":
+            hint = "Audio has been codec-compressed. Features may be degraded."
+        elif self.difficulty == "hard":
+            hint = "Warning: adversarial sample — synthetic audio crafted to mimic real speech."
+        return VoiceObservation(
+            features    = feat,
+            task_name   = self.task_name,
+            step_number = self.step_number,
+            difficulty  = self.difficulty,
+            sample_id   = int(self.current_idx),
+            hint        = (hint or "") + f" | Key values → jitter={raw[42]:.5f} shimmer={raw[43]:.5f} hnr={raw[44]:.4f}"
+        )

environment/graders.py ADDED Viewed

	@@ -0,0 +1,33 @@

+def grade(true_label: int, action: dict, difficulty: str) -> float:
+    label = action.get("label")
+    confidence = action.get("confidence", 0.5)
+    correct = (label == true_label)
+    if difficulty == "easy":
+        if correct:
+            return 1.0
+        else:
+            return 0.0
+    elif difficulty == "medium":
+        if correct:
+            # reward confidence when correct
+            base = 0.6
+            bonus = 0.4 * confidence
+            return round(base + bonus, 3)
+        else:
+            # penalize overconfidence when wrong
+            penalty = 0.3 * confidence
+            return round(max(0.0, 0.2 - penalty), 3)
+    elif difficulty == "hard":
+        if correct:
+            # correct but penalize overconfidence (hard task, be humble)
+            base = 0.5
+            calibration_bonus = 0.5 * (1 - abs(confidence - 0.7))
+            return round(base + calibration_bonus, 3)
+        else:
+            if confidence < 0.4:
+                return 0.15   # wrong but appropriately uncertain
+            else:
+                return 0.0    # wrong + overconfident = worst case

environment/models.py ADDED Viewed

	@@ -0,0 +1,21 @@

+from pydantic import BaseModel, Field
+from typing import Optional, List
+class VoiceObservation(BaseModel):
+    features: List[float]
+    task_name: str
+    step_number: int
+    difficulty: str
+    sample_id: int
+    hint: Optional[str] = None  # extra context for hard task
+class VoiceAction(BaseModel):
+    label: int = Field(..., ge=0, le=1)          # 0=real, 1=synthetic
+    confidence: float = Field(..., ge=0.0, le=1.0)
+    reasoning: str = Field(default="")
+class VoiceReward(BaseModel):
+    score: float
+    correct: bool
+    confidence_penalty: float
+    breakdown: str

inference.py ADDED Viewed

	@@ -0,0 +1,132 @@

+from dotenv import load_dotenv
+load_dotenv()
+import asyncio
+import os
+import textwrap
+import json
+from typing import List, Optional
+from openai import OpenAI
+from environment.env import VoiceAuthenticityEnv
+from environment.models import VoiceAction
+API_KEY = os.getenv("HF_TOKEN") or os.getenv("API_KEY")
+API_BASE_URL = os.getenv("API_BASE_URL") or "https://router.huggingface.co/v1"
+MODEL_NAME = os.getenv("MODEL_NAME") or "Qwen/Qwen2.5-72B-Instruct"
+TASK_NAME = os.getenv("VOICE_TASK", "clean_detection")
+BENCHMARK = "voice-authenticity"
+MAX_STEPS = 1
+SUCCESS_SCORE_THRESHOLD = 0.5
+SYSTEM_PROMPT = textwrap.dedent("""
+You are an expert audio forensics agent detecting synthetic (AI-generated) speech.
+You receive a 48-dimensional normalized feature vector AND key raw values in the hint.
+Always use the KEY VALUES in the hint for classification:
+REAL speech thresholds (from dataset):
+- jitter > 0.025
+- shimmer > 0.10
+- hnr < 12.0
+SYNTHETIC speech thresholds:
+- jitter < 0.020
+- shimmer < 0.09
+- hnr > 12.0
+When in doubt, lower your confidence. Never exceed 0.85 confidence on hard tasks.
+Respond ONLY with valid JSON:
+{"label": 0 or 1, "confidence": 0.0-1.0, "reasoning": "brief"}
+0 = real human speech
+1 = synthetic/AI-generated speech
+""").strip()
+def log_start(task, env, model):
+    print(f"[START] task={task} env={env} model={model}", flush=True)
+def log_step(step, action, reward, done, error):
+    error_val = error if error else "null"
+    print(f"[STEP] step={step} action={action} reward={reward:.2f} done={str(done).lower()} error={error_val}", flush=True)
+def log_end(success, steps, score, rewards):
+    rewards_str = ",".join(f"{r:.2f}" for r in rewards)
+    print(f"[END] success={str(success).lower()} steps={steps} score={score:.3f} rewards={rewards_str}", flush=True)
+def get_agent_action(client, observation) -> dict:
+    user_prompt = f"""
+Audio sample features: {observation.features}
+Task: {observation.task_name} (difficulty: {observation.difficulty})
+{f'Note: {observation.hint}' if observation.hint else ''}
+Classify this audio sample. Respond with JSON only. Keep reasoning under 100 characters.
+"""
+    try:
+        completion = client.chat.completions.create(
+            model=MODEL_NAME,
+            messages=[
+                {"role": "system", "content": SYSTEM_PROMPT},
+                {"role": "user", "content": user_prompt.strip()}
+            ],
+            temperature=0.3,
+            max_tokens=120,
+            stream=False
+        )
+        text = completion.choices[0].message.content.strip()
+        text = text.replace("```json", "").replace("```", "").strip()
+        last_brace = text.rfind("}")
+        if last_brace != -1:
+            text = text[:last_brace + 1]
+        result = json.loads(text)
+        result["label"] = int(result.get("label", 0))
+        result["confidence"] = float(result.get("confidence", 0.5))
+        result["label"] = result["label"] if result["label"] in [0,1] else 0
+        result["confidence"] = max(0.0, min(1.0, result["confidence"]))
+        return result
+    except Exception as e:
+        print(f"[DEBUG] Model error: {e}", flush=True)
+        return {"label": 0, "confidence": 0.5, "reasoning": "fallback"}
+async def run_task(client, task_name: str):
+    env = VoiceAuthenticityEnv(task_name=task_name)
+    rewards = []
+    steps_taken = 0
+    success = False
+    score = 0.0
+    log_start(task=task_name, env=BENCHMARK, model=MODEL_NAME)
+    try:
+        obs = env.reset()
+        for step in range(1, MAX_STEPS + 1):
+            action_dict = get_agent_action(client, obs)
+            action_str = json.dumps(action_dict)
+            obs, reward, done, info = env.step(action_dict)
+            rewards.append(reward)
+            steps_taken = step
+            error = None
+            log_step(step=step, action=action_str, reward=reward, done=done, error=error)
+            if done:
+                break
+        score = sum(rewards) / len(rewards) if rewards else 0.0
+        success = score >= SUCCESS_SCORE_THRESHOLD
+    finally:
+        score_val = sum(rewards) / len(rewards) if rewards else 0.0
+        log_end(success=success, steps=steps_taken, score=score_val, rewards=rewards)
+async def main():
+    client = OpenAI(base_url=API_BASE_URL, api_key=API_KEY)
+    tasks = ["clean_detection", "compressed_detection", "adversarial_detection"]
+    for task in tasks:
+        await run_task(client, task)
+if __name__ == "__main__":
+    asyncio.run(main())

openenv.yaml ADDED Viewed

	@@ -0,0 +1,39 @@

+name: voice-authenticity
+version: "1.0.0"
+description: "Voice authenticity detection across real-world degradation conditions"
+author: "Akshara-Sharma"
+tags: ["speech", "fraud-detection", "content-moderation", "audio"]
+tasks:
+  - name: clean_detection
+    difficulty: easy
+    description: "Classify real vs synthetic speech from clean audio features"
+  - name: compressed_detection
+    difficulty: medium
+    description: "Classify speech under codec compression degradation"
+  - name: adversarial_detection
+    difficulty: hard
+    description: "Classify adversarially crafted synthetic speech"
+observation_space:
+  type: object
+  properties:
+    features:
+      type: array
+      description: "48-dim feature vector: MFCCs, jitter, shimmer, HNR"
+    task_name:
+      type: string
+    step_number:
+      type: integer
+    difficulty:
+      type: string
+action_space:
+  type: object
+  properties:
+    label:
+      type: integer
+      description: "0=real, 1=synthetic"
+    confidence:
+      type: number
+      description: "confidence in [0.0, 1.0]"
+    reasoning:
+      type: string
+      description: "brief explanation of decision"

requirements.txt ADDED Viewed

	@@ -0,0 +1,8 @@

+librosa
+praat-parselmouth
+scikit-learn
+numpy
+openai
+pydantic
+python-dotenv
+soundfile

scripts/download_data.py ADDED Viewed

	@@ -0,0 +1,30 @@

+from datasets import load_dataset
+import soundfile as sf
+import os
+os.makedirs("data/real", exist_ok=True)
+os.makedirs("data/fake", exist_ok=True)
+dataset = load_dataset("garystafford/deepfake-audio-detection", split="train")
+real_count = 0
+fake_count = 0
+for item in dataset:
+    audio = item["audio"]
+    label = item["label"]  # 0=real, 1=fake
+    if label == 0 and real_count < 250:
+        sf.write(f"data/real/real_{real_count:04d}.wav",
+                 audio["array"], audio["sampling_rate"])
+        real_count += 1
+    elif label == 1 and fake_count < 250:
+        sf.write(f"data/fake/fake_{fake_count:04d}.wav",
+                 audio["array"], audio["sampling_rate"])
+        fake_count += 1
+    if real_count >= 250 and fake_count >= 250:
+        break
+print(f"Downloaded: {real_count} real, {fake_count} fake")

scripts/extract_features.py ADDED Viewed

	@@ -0,0 +1,223 @@

+import numpy as np
+import librosa
+import parselmouth
+from parselmouth.praat import call
+import os
+import warnings
+warnings.filterwarnings("ignore")
+REAL_DIR = "data/real"
+FAKE_DIR = "data/fake"
+OUTPUT_DIR = "environment/data"
+os.makedirs(OUTPUT_DIR, exist_ok=True)
+def extract_features(file_path):
+    """
+    Extract 48-dim feature vector from audio file.
+    Returns None if file fails.
+    """
+    try:
+        # Load audio
+        y, sr = librosa.load(file_path, sr=16000, duration=5.0)
+        if len(y) < 1600:  # skip clips shorter than 0.1s
+            return None
+        # ── MFCC (40 features) ──────────────────────────────
+        mfcc = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=20)
+        mfcc_mean = mfcc.mean(axis=1)   # 20 values
+        mfcc_std  = mfcc.std(axis=1)    # 20 values
+        # ── Spectral features (2 features) ──────────────────
+        zcr = librosa.feature.zero_crossing_rate(y).mean()
+        spec_centroid = librosa.feature.spectral_centroid(
+                            y=y, sr=sr).mean()
+        # ── Voice authenticity features (3 features) ────────
+        # These are the KEY discriminators between real and fake
+        try:
+            snd = parselmouth.Sound(file_path)
+            pp  = call(snd, "To PointProcess (periodic, cc)", 75, 500)
+            jitter = call(
+                pp, "Get jitter (local)", 0, 0, 0.0001, 0.02, 1.3
+            )
+            shimmer = call(
+                [snd, pp], "Get shimmer (local)",
+                0, 0, 0.0001, 0.02, 1.3, 1.6
+            )
+            harmonicity = call(
+                snd, "To Harmonicity (cc)", 0.01, 75, 0.1, 1.0
+            )
+            hnr = call(harmonicity, "Get mean", 0, 0)
+            # Replace NaN/inf with 0
+            jitter  = float(jitter)  if np.isfinite(jitter)  else 0.0
+            shimmer = float(shimmer) if np.isfinite(shimmer) else 0.0
+            hnr     = float(hnr)     if np.isfinite(hnr)     else 0.0
+        except Exception:
+            jitter, shimmer, hnr = 0.0, 0.0, 0.0
+        # ── Compression artifact features (3 features) ──────
+        # Simulates codec degradation for task 2
+        spec_bandwidth = librosa.feature.spectral_bandwidth(
+                             y=y, sr=sr).mean()
+        spec_rolloff   = librosa.feature.spectral_rolloff(
+                             y=y, sr=sr).mean()
+        rms            = librosa.feature.rms(y=y).mean()
+        # ── Assemble final 48-dim vector ─────────────────────
+        features = np.concatenate([
+            mfcc_mean,                          # 0-19
+            mfcc_std,                           # 20-39
+            [zcr, spec_centroid],               # 40-41
+            [jitter, shimmer, hnr],             # 42-44
+            [spec_bandwidth, spec_rolloff, rms] # 45-47
+        ])
+        return features.astype(np.float32)
+    except Exception as e:
+        print(f"  ERROR on {file_path}: {e}")
+        return None
+def process_directory(directory, label, desc):
+    files = [
+        f for f in os.listdir(directory)
+        if f.endswith((".wav", ".flac", ".mp3"))
+    ]
+    print(f"\nProcessing {desc}: {len(files)} files found")
+    features_list = []
+    labels_list   = []
+    failed         = 0
+    for i, fname in enumerate(files):
+        path = os.path.join(directory, fname)
+        feat = extract_features(path)
+        if feat is not None:
+            features_list.append(feat)
+            labels_list.append(label)
+            if (i + 1) % 50 == 0:
+                print(f"  {i+1}/{len(files)} done...")
+        else:
+            failed += 1
+    print(f"  Success: {len(features_list)}, Failed: {failed}")
+    return features_list, labels_list
+def add_compression_artifacts(features, strength=0.3):
+    degraded = features.copy()
+    degraded[20:40] *= (1 - strength * np.random.uniform(0.5, 1.0, 20))
+    degraded[42] *= (1 - strength * np.random.uniform(0.3, 0.7))
+    degraded[43] *= (1 - strength * np.random.uniform(0.3, 0.7))
+    degraded[44] *= (1 + strength * np.random.uniform(0.1, 0.4))
+    degraded[45] *= (1 + strength * np.random.uniform(0.3, 0.8))
+    degraded[46] *= (1 - strength * np.random.uniform(0.2, 0.6))
+    degraded[47] += strength * np.random.uniform(0.1, 0.4)
+    return degraded
+def add_adversarial_perturbation(features, label):
+    perturbed = features.copy()
+    if label == 1:
+        perturbed[42] += np.random.uniform(0.005, 0.02)
+        perturbed[43] += np.random.uniform(0.01, 0.05)
+        perturbed[44] -= np.random.uniform(1.0, 3.0)
+    return perturbed
+def main():
+    print("=" * 50)
+    print("Feature Extraction Pipeline")
+    print("=" * 50)
+    real_feat, real_labels = process_directory(
+        REAL_DIR, label=0, desc="REAL audio"
+    )
+    fake_feat, fake_labels = process_directory(
+        FAKE_DIR, label=1, desc="FAKE audio"
+    )
+    all_features = np.array(real_feat + fake_feat, dtype=np.float32)
+    all_labels   = np.array(real_labels + fake_labels, dtype=np.int32)
+    idx = np.random.permutation(len(all_labels))
+    all_features = all_features[idx]
+    all_labels   = all_labels[idx]
+    mean = all_features.mean(axis=0)
+    std  = all_features.std(axis=0) + 1e-8
+    all_features_norm = (all_features - mean) / std
+    np.save(f"{OUTPUT_DIR}/features.npy", all_features_norm)
+    # Save raw unnormalized features for env to use
+    np.save(f"{OUTPUT_DIR}/features_raw.npy", all_features)
+    np.save(f"{OUTPUT_DIR}/labels.npy", all_labels)
+    np.save(f"{OUTPUT_DIR}/mean.npy", mean)
+    np.save(f"{OUTPUT_DIR}/std.npy", std)
+    print(f"\nTask 1 (clean): {len(all_labels)} samples saved")
+    # ── TASK 2: Compressed features ─────────────────────────
+    compressed_features = np.array([
+        add_compression_artifacts(f, strength=0.3)
+        for f in (real_feat + fake_feat)
+    ], dtype=np.float32)
+    compressed_features = compressed_features[idx]
+    compressed_norm = (compressed_features - mean) / std
+    np.save(f"{OUTPUT_DIR}/features_compressed.npy", compressed_norm)
+    np.save(f"{OUTPUT_DIR}/labels_compressed.npy", all_labels)
+    print(f"Task 2 (compressed): {len(all_labels)} samples saved")
+    # ── TASK 3: Adversarial features ────────────────────────
+    raw_combined        = real_feat + fake_feat
+    raw_labels_combined = real_labels + fake_labels
+    adversarial_features = np.array([
+        add_adversarial_perturbation(f, l)
+        for f, l in zip(raw_combined, raw_labels_combined)
+    ], dtype=np.float32)
+    adversarial_features = adversarial_features[idx]
+    adversarial_norm = (adversarial_features - mean) / std
+    np.save(f"{OUTPUT_DIR}/features_adversarial.npy", adversarial_norm)
+    np.save(f"{OUTPUT_DIR}/labels_adversarial.npy", all_labels)
+    print(f"Task 3 (adversarial): {len(all_labels)} samples saved")
+    print(f"\n{'='*50}")
+    print("DONE")
+    print(f"Total samples : {len(all_labels)}")
+    print(f"Real samples  : {all_labels.tolist().count(0)}")
+    print(f"Fake samples  : {all_labels.tolist().count(1)}")
+    print(f"Feature shape : {all_features_norm.shape}")
+    print(f"{'='*50}")
+    print("\nSanity check — jitter/shimmer/HNR comparison:")
+    for i in range(min(2, len(all_labels))):
+        label_str = "REAL" if all_labels[i] == 0 else "FAKE"
+        print(f"\n  [{label_str}]")
+        print(f"  Clean      → jitter={all_features[i][42]:.4f} shimmer={all_features[i][43]:.4f} hnr={all_features[i][44]:.4f}")
+        print(f"  Compressed → jitter={compressed_features[i][42]:.4f} shimmer={compressed_features[i][43]:.4f} hnr={compressed_features[i][44]:.4f}")
+        print(f"  Adversarial→ jitter={adversarial_features[i][42]:.4f} shimmer={adversarial_features[i][43]:.4f} hnr={adversarial_features[i][44]:.4f}")
+if __name__ == "__main__":
+    main()