Spaces:

AI-Talent-Force
/

ShortSmith_v3

Paused

App Files Files Community

defalt-here commited on Jan 21

Commit

f07a888

verified ·

1 Parent(s): 0e470ef

Fixed frontend issue where the number of highlights selected did not appear

Browse files

Files changed (8) hide show

.gitattributes +35 -35
PLAN.md +226 -226
README.md +48 -48
REQUIREMENTS_CHECKLIST.md +162 -162
app.py +1096 -1014
config.py +201 -201
requirements.txt +103 -103
space.yaml +31 -31

.gitattributes CHANGED Viewed

@@ -1,35 +1,35 @@
-*.7z filter=lfs diff=lfs merge=lfs -text
-*.arrow filter=lfs diff=lfs merge=lfs -text
-*.bin filter=lfs diff=lfs merge=lfs -text
-*.bz2 filter=lfs diff=lfs merge=lfs -text
-*.ckpt filter=lfs diff=lfs merge=lfs -text
-*.ftz filter=lfs diff=lfs merge=lfs -text
-*.gz filter=lfs diff=lfs merge=lfs -text
-*.h5 filter=lfs diff=lfs merge=lfs -text
-*.joblib filter=lfs diff=lfs merge=lfs -text
-*.lfs.* filter=lfs diff=lfs merge=lfs -text
-*.mlmodel filter=lfs diff=lfs merge=lfs -text
-*.model filter=lfs diff=lfs merge=lfs -text
-*.msgpack filter=lfs diff=lfs merge=lfs -text
-*.npy filter=lfs diff=lfs merge=lfs -text
-*.npz filter=lfs diff=lfs merge=lfs -text
-*.onnx filter=lfs diff=lfs merge=lfs -text
-*.ot filter=lfs diff=lfs merge=lfs -text
-*.parquet filter=lfs diff=lfs merge=lfs -text
-*.pb filter=lfs diff=lfs merge=lfs -text
-*.pickle filter=lfs diff=lfs merge=lfs -text
-*.pkl filter=lfs diff=lfs merge=lfs -text
-*.pt filter=lfs diff=lfs merge=lfs -text
-*.pth filter=lfs diff=lfs merge=lfs -text
-*.rar filter=lfs diff=lfs merge=lfs -text
-*.safetensors filter=lfs diff=lfs merge=lfs -text
-saved_model/**/* filter=lfs diff=lfs merge=lfs -text
-*.tar.* filter=lfs diff=lfs merge=lfs -text
-*.tar filter=lfs diff=lfs merge=lfs -text
-*.tflite filter=lfs diff=lfs merge=lfs -text
-*.tgz filter=lfs diff=lfs merge=lfs -text
-*.wasm filter=lfs diff=lfs merge=lfs -text
-*.xz filter=lfs diff=lfs merge=lfs -text
-*.zip filter=lfs diff=lfs merge=lfs -text
-*.zst filter=lfs diff=lfs merge=lfs -text
-*tfevents* filter=lfs diff=lfs merge=lfs -text

+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text

PLAN.md CHANGED Viewed

@@ -1,226 +1,226 @@
-# ShortSmith v2 - Implementation Plan
-## Overview
-Build a Hugging Face Space that extracts "hype" moments from videos with optional person-specific filtering.
-## Project Structure
-```
-shortsmith-v2/
-├── app.py                    # Gradio UI (Hugging Face interface)
-├── requirements.txt          # Dependencies
-├── config.py                 # Configuration and constants
-├── utils/
-│   ├── __init__.py
-│   ├── logger.py             # Centralized logging
-│   └── helpers.py            # Utility functions
-├── core/
-│   ├── __init__.py
-│   ├── video_processor.py    # FFmpeg video/audio extraction
-│   ├── scene_detector.py     # PySceneDetect integration
-│   ├── frame_sampler.py      # Hierarchical sampling logic
-│   └── clip_extractor.py     # Final clip cutting
-├── models/
-│   ├── __init__.py
-│   ├── visual_analyzer.py    # Qwen2-VL integration
-│   ├── audio_analyzer.py     # Wav2Vec 2.0 + Librosa
-│   ├── face_recognizer.py    # InsightFace (SCRFD + ArcFace)
-│   ├── body_recognizer.py    # OSNet for body recognition
-│   ├── motion_detector.py    # RAFT optical flow
-│   └── tracker.py            # ByteTrack integration
-├── scoring/
-│   ├── __init__.py
-│   ├── hype_scorer.py        # Hype scoring logic
-│   └── domain_presets.py     # Domain-specific weights
-└── pipeline/
-    ├── __init__.py
-    └── orchestrator.py       # Main pipeline coordinator
-```
-## Implementation Phases
-### Phase 1: Core Infrastructure
-1. **config.py** - Configuration management
-   - Model paths, thresholds, domain presets
-   - HuggingFace API key handling
-2. **utils/logger.py** - Centralized logging
-   - File and console handlers
-   - Different log levels per module
-   - Timing decorators for performance tracking
-3. **utils/helpers.py** - Common utilities
-   - File validation
-   - Temporary file management
-   - Error formatting
-### Phase 2: Video Processing Layer
-4. **core/video_processor.py** - FFmpeg operations
-   - Extract frames at specified FPS
-   - Extract audio track
-   - Get video metadata (duration, resolution, fps)
-   - Cut clips at timestamps
-5. **core/scene_detector.py** - Scene boundary detection
-   - PySceneDetect integration
-   - Content-aware detection
-   - Return scene timestamps
-6. **core/frame_sampler.py** - Hierarchical sampling
-   - First pass: 1 frame per 5-10 seconds
-   - Second pass: Dense sampling on candidates
-   - Dynamic FPS based on motion
-### Phase 3: AI Models
-7. **models/visual_analyzer.py** - Qwen2-VL-2B
-   - Load quantized model
-   - Process frame batches
-   - Extract visual embeddings/scores
-8. **models/audio_analyzer.py** - Audio analysis
-   - Librosa for basic features (RMS, spectral flux, centroid)
-   - Optional Wav2Vec 2.0 for advanced understanding
-   - Return audio hype signals per segment
-9. **models/face_recognizer.py** - Face detection/recognition
-   - InsightFace SCRFD for detection
-   - ArcFace for embeddings
-   - Reference image matching
-10. **models/body_recognizer.py** - Body recognition
-    - OSNet for full-body embeddings
-    - Handle non-frontal views
-11. **models/motion_detector.py** - Motion analysis
-    - RAFT optical flow
-    - Motion magnitude scoring
-12. **models/tracker.py** - Multi-object tracking
-    - ByteTrack integration
-    - Maintain identity across frames
-### Phase 4: Scoring & Selection
-13. **scoring/domain_presets.py** - Domain configurations
-    - Sports, Vlogs, Music, Podcasts presets
-    - Custom weight definitions
-14. **scoring/hype_scorer.py** - Hype calculation
-    - Combine visual + audio scores
-    - Apply domain weights
-    - Normalize and rank segments
-### Phase 5: Pipeline & UI
-15. **pipeline/orchestrator.py** - Main coordinator
-    - Coordinate all components
-    - Handle errors gracefully
-    - Progress reporting
-16. **app.py** - Gradio interface
-    - Video upload
-    - API key input (secure)
-    - Prompt/instructions input
-    - Domain selection
-    - Reference image upload (for person filtering)
-    - Progress bar
-    - Output video gallery
-## Key Design Decisions
-### Error Handling Strategy
-- Each module has try/except with specific exception types
-- Errors bubble up with context
-- Pipeline continues with degraded functionality when possible
-- User-friendly error messages in UI
-### Logging Strategy
-- DEBUG: Model loading, frame processing details
-- INFO: Pipeline stages, timing, results
-- WARNING: Fallback triggers, degraded mode
-- ERROR: Failures with stack traces
-### Memory Management
-- Process frames in batches
-- Clear GPU memory between stages
-- Use generators where possible
-- Temporary file cleanup
-### HuggingFace Space Considerations
-- Use `gr.State` for session data
-- Respect ZeroGPU limits (if using)
-- Cache models in `/tmp` or HF cache
-- Handle timeouts gracefully
-## API Key Usage
-The API key input is for future extensibility (e.g., external services).
-For MVP, all processing is local using open-weight models.
-## Gradio UI Layout
-```
-┌─────────────────────────────────────────────────────────────┐
-│  ShortSmith v2 - AI Video Highlight Extractor               │
-├─────────────────────────────────────────────────────────────┤
-│  ┌─────────────────────┐  ┌─────────────────────────────┐   │
-│  │ Upload Video        │  │ Settings                    │   │
-│  │ [Drop zone]         │  │ Domain: [Dropdown]          │   │
-│  │                     │  │ Clip Duration: [Slider]     │   │
-│  └─────────────────────┘  │ Num Clips: [Slider]         │   │
-│                           │ API Key: [Password field]   │   │
-│  ┌─────────────────────┐  └─────────────────────────────┘   │
-│  │ Reference Image     │                                    │
-│  │ (Optional)          │  ┌─────────────────────────────┐   │
-│  │ [Drop zone]         │  │ Additional Instructions     │   │
-│  └─────────────────────┘  │ [Textbox]                   │   │
-│                           └─────────────────────────────┘   │
-├─────────────────────────────────────────────────────────────┤
-│  [🚀 Extract Highlights]                                    │
-├─────────────────────────────────────────────────────────────┤
-│  Progress: [████████████░░░░░░░░] 60%                       │
-│  Status: Analyzing audio...                                 │
-├─────────────────────────────────────────────────────────────┤
-│  Results                                                    │
-│  ┌──────────┐ ┌──────────┐ ┌──────────┐                    │
-│  │ Clip 1   │ │ Clip 2   │ │ Clip 3   │                    │
-│  │ [Video]  │ │ [Video]  │ │ [Video]  │                    │
-│  │ Score:85 │ │ Score:78 │ │ Score:72 │                    │
-│  └──────────┘ └──────────┘ └──────────┘                    │
-│  [Download All]                                             │
-└─────────────────────────────────────────────────────────────┘
-```
-## Dependencies (requirements.txt)
-```
-gradio>=4.0.0
-torch>=2.0.0
-transformers>=4.35.0
-accelerate
-bitsandbytes
-qwen-vl-utils
-librosa>=0.10.0
-soundfile
-insightface
-onnxruntime-gpu
-opencv-python-headless
-scenedetect[opencv]
-numpy
-pillow
-tqdm
-ffmpeg-python
-```
-## Implementation Order
-1. config.py, utils/ (foundation)
-2. core/video_processor.py (essential)
-3. models/audio_analyzer.py (simpler, Librosa first)
-4. core/scene_detector.py
-5. core/frame_sampler.py
-6. scoring/ modules
-7. models/visual_analyzer.py (Qwen2-VL)
-8. models/face_recognizer.py, body_recognizer.py
-9. models/tracker.py, motion_detector.py
-10. pipeline/orchestrator.py
-11. app.py (Gradio UI)
-## Notes
-- Start with Librosa-only audio (MVP), add Wav2Vec later
-- Face/body recognition is optional (triggered by reference image)
-- Motion detection can be skipped in MVP for speed
-- ByteTrack only needed when person filtering is enabled

+# ShortSmith v2 - Implementation Plan
+## Overview
+Build a Hugging Face Space that extracts "hype" moments from videos with optional person-specific filtering.
+## Project Structure
+```
+shortsmith-v2/
+├── app.py                    # Gradio UI (Hugging Face interface)
+├── requirements.txt          # Dependencies
+├── config.py                 # Configuration and constants
+├── utils/
+│   ├── __init__.py
+│   ├── logger.py             # Centralized logging
+│   └── helpers.py            # Utility functions
+├── core/
+│   ├── __init__.py
+│   ├── video_processor.py    # FFmpeg video/audio extraction
+│   ├── scene_detector.py     # PySceneDetect integration
+│   ├── frame_sampler.py      # Hierarchical sampling logic
+│   └── clip_extractor.py     # Final clip cutting
+├── models/
+│   ├── __init__.py
+│   ├── visual_analyzer.py    # Qwen2-VL integration
+│   ├── audio_analyzer.py     # Wav2Vec 2.0 + Librosa
+│   ├── face_recognizer.py    # InsightFace (SCRFD + ArcFace)
+│   ├── body_recognizer.py    # OSNet for body recognition
+│   ├── motion_detector.py    # RAFT optical flow
+│   └── tracker.py            # ByteTrack integration
+├── scoring/
+│   ├── __init__.py
+│   ├── hype_scorer.py        # Hype scoring logic
+│   └── domain_presets.py     # Domain-specific weights
+└── pipeline/
+    ├── __init__.py
+    └── orchestrator.py       # Main pipeline coordinator
+```
+## Implementation Phases
+### Phase 1: Core Infrastructure
+1. **config.py** - Configuration management
+   - Model paths, thresholds, domain presets
+   - HuggingFace API key handling
+2. **utils/logger.py** - Centralized logging
+   - File and console handlers
+   - Different log levels per module
+   - Timing decorators for performance tracking
+3. **utils/helpers.py** - Common utilities
+   - File validation
+   - Temporary file management
+   - Error formatting
+### Phase 2: Video Processing Layer
+4. **core/video_processor.py** - FFmpeg operations
+   - Extract frames at specified FPS
+   - Extract audio track
+   - Get video metadata (duration, resolution, fps)
+   - Cut clips at timestamps
+5. **core/scene_detector.py** - Scene boundary detection
+   - PySceneDetect integration
+   - Content-aware detection
+   - Return scene timestamps
+6. **core/frame_sampler.py** - Hierarchical sampling
+   - First pass: 1 frame per 5-10 seconds
+   - Second pass: Dense sampling on candidates
+   - Dynamic FPS based on motion
+### Phase 3: AI Models
+7. **models/visual_analyzer.py** - Qwen2-VL-2B
+   - Load quantized model
+   - Process frame batches
+   - Extract visual embeddings/scores
+8. **models/audio_analyzer.py** - Audio analysis
+   - Librosa for basic features (RMS, spectral flux, centroid)
+   - Optional Wav2Vec 2.0 for advanced understanding
+   - Return audio hype signals per segment
+9. **models/face_recognizer.py** - Face detection/recognition
+   - InsightFace SCRFD for detection
+   - ArcFace for embeddings
+   - Reference image matching
+10. **models/body_recognizer.py** - Body recognition
+    - OSNet for full-body embeddings
+    - Handle non-frontal views
+11. **models/motion_detector.py** - Motion analysis
+    - RAFT optical flow
+    - Motion magnitude scoring
+12. **models/tracker.py** - Multi-object tracking
+    - ByteTrack integration
+    - Maintain identity across frames
+### Phase 4: Scoring & Selection
+13. **scoring/domain_presets.py** - Domain configurations
+    - Sports, Vlogs, Music, Podcasts presets
+    - Custom weight definitions
+14. **scoring/hype_scorer.py** - Hype calculation
+    - Combine visual + audio scores
+    - Apply domain weights
+    - Normalize and rank segments
+### Phase 5: Pipeline & UI
+15. **pipeline/orchestrator.py** - Main coordinator
+    - Coordinate all components
+    - Handle errors gracefully
+    - Progress reporting
+16. **app.py** - Gradio interface
+    - Video upload
+    - API key input (secure)
+    - Prompt/instructions input
+    - Domain selection
+    - Reference image upload (for person filtering)
+    - Progress bar
+    - Output video gallery
+## Key Design Decisions
+### Error Handling Strategy
+- Each module has try/except with specific exception types
+- Errors bubble up with context
+- Pipeline continues with degraded functionality when possible
+- User-friendly error messages in UI
+### Logging Strategy
+- DEBUG: Model loading, frame processing details
+- INFO: Pipeline stages, timing, results
+- WARNING: Fallback triggers, degraded mode
+- ERROR: Failures with stack traces
+### Memory Management
+- Process frames in batches
+- Clear GPU memory between stages
+- Use generators where possible
+- Temporary file cleanup
+### HuggingFace Space Considerations
+- Use `gr.State` for session data
+- Respect ZeroGPU limits (if using)
+- Cache models in `/tmp` or HF cache
+- Handle timeouts gracefully
+## API Key Usage
+The API key input is for future extensibility (e.g., external services).
+For MVP, all processing is local using open-weight models.
+## Gradio UI Layout
+```
+┌─────────────────────────────────────────────────────────────┐
+│  ShortSmith v2 - AI Video Highlight Extractor               │
+├─────────────────────────────────────────────────────────────┤
+│  ┌─────────────────────┐  ┌─────────────────────────────┐   │
+│  │ Upload Video        │  │ Settings                    │   │
+│  │ [Drop zone]         │  │ Domain: [Dropdown]          │   │
+│  │                     │  │ Clip Duration: [Slider]     │   │
+│  └─────────────────────┘  │ Num Clips: [Slider]         │   │
+│                           │ API Key: [Password field]   │   │
+│  ┌─────────────────────┐  └─────────────────────────────┘   │
+│  │ Reference Image     │                                    │
+│  │ (Optional)          │  ┌─────────────────────────────┐   │
+│  │ [Drop zone]         │  │ Additional Instructions     │   │
+│  └─────────────────────┘  │ [Textbox]                   │   │
+│                           └─────────────────────────────┘   │
+├─────────────────────────────────────────────────────────────┤
+│  [🚀 Extract Highlights]                                    │
+├─────────────────────────────────────────────────────────────┤
+│  Progress: [████████████░░░░░░░░] 60%                       │
+│  Status: Analyzing audio...                                 │
+├─────────────────────────────────────────────────────────────┤
+│  Results                                                    │
+│  ┌──────────┐ ┌──────────┐ ┌──────────┐                    │
+│  │ Clip 1   │ │ Clip 2   │ │ Clip 3   │                    │
+│  │ [Video]  │ │ [Video]  │ │ [Video]  │                    │
+│  │ Score:85 │ │ Score:78 │ │ Score:72 │                    │
+│  └──────────┘ └──────────┘ └──────────┘                    │
+│  [Download All]                                             │
+└─────────────────────────────────────────────────────────────┘
+```
+## Dependencies (requirements.txt)
+```
+gradio>=4.0.0
+torch>=2.0.0
+transformers>=4.35.0
+accelerate
+bitsandbytes
+qwen-vl-utils
+librosa>=0.10.0
+soundfile
+insightface
+onnxruntime-gpu
+opencv-python-headless
+scenedetect[opencv]
+numpy
+pillow
+tqdm
+ffmpeg-python
+```
+## Implementation Order
+1. config.py, utils/ (foundation)
+2. core/video_processor.py (essential)
+3. models/audio_analyzer.py (simpler, Librosa first)
+4. core/scene_detector.py
+5. core/frame_sampler.py
+6. scoring/ modules
+7. models/visual_analyzer.py (Qwen2-VL)
+8. models/face_recognizer.py, body_recognizer.py
+9. models/tracker.py, motion_detector.py
+10. pipeline/orchestrator.py
+11. app.py (Gradio UI)
+## Notes
+- Start with Librosa-only audio (MVP), add Wav2Vec later
+- Face/body recognition is optional (triggered by reference image)
+- Motion detection can be skipped in MVP for speed
+- ByteTrack only needed when person filtering is enabled

README.md CHANGED Viewed

@@ -1,48 +1,48 @@
----
-title: ShortSmith v2
-emoji: 🎬
-colorFrom: purple
-colorTo: blue
-sdk: gradio
-sdk_version: "4.44.1"
-app_file: app.py
-pinned: false
-license: mit
-hardware: a10g-large
-tags:
-  - video
-  - highlight-detection
-  - ai
-  - qwen
-  - computer-vision
-  - audio-analysis
-short_description: AI-Powered Video Highlight Extractor
----
-# ShortSmith v2
-Extract the most engaging highlight clips from your videos automatically using AI.
-## Features
-- Multi-modal analysis (visual + audio + motion)
-- Domain-optimized presets (Sports, Music, Vlogs, etc.)
-- Person-specific filtering
-- Scene-aware clip cutting
-- Trained on Mr. HiSum "Most Replayed" data
-## Usage
-1. Upload a video (up to 500MB, max 1 hour)
-2. Select content domain (Sports, Music, Vlogs, etc.)
-3. Choose number of clips and duration
-4. (Optional) Upload reference image for person filtering
-5. Click "Extract Highlights"
-6. Download your clips!
-## Tech Stack
-- **Visual**: Qwen2-VL-2B (INT4 quantized)
-- **Audio**: Librosa + Wav2Vec 2.0
-- **Face Recognition**: InsightFace (SCRFD + ArcFace)
-- **Hype Scoring**: MLP trained on Mr. HiSum dataset
-- **Scene Detection**: PySceneDetect
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+---
+title: ShortSmith v2
+emoji: 🎬
+colorFrom: purple
+colorTo: blue
+sdk: gradio
+sdk_version: "4.44.1"
+app_file: app.py
+pinned: false
+license: mit
+hardware: a10g-large
+tags:
+  - video
+  - highlight-detection
+  - ai
+  - qwen
+  - computer-vision
+  - audio-analysis
+short_description: AI-Powered Video Highlight Extractor
+---
+# ShortSmith v2
+Extract the most engaging highlight clips from your videos automatically using AI.
+## Features
+- Multi-modal analysis (visual + audio + motion)
+- Domain-optimized presets (Sports, Music, Vlogs, etc.)
+- Person-specific filtering
+- Scene-aware clip cutting
+- Trained on Mr. HiSum "Most Replayed" data
+## Usage
+1. Upload a video (up to 500MB, max 1 hour)
+2. Select content domain (Sports, Music, Vlogs, etc.)
+3. Choose number of clips and duration
+4. (Optional) Upload reference image for person filtering
+5. Click "Extract Highlights"
+6. Download your clips!
+## Tech Stack
+- **Visual**: Qwen2-VL-2B (INT4 quantized)
+- **Audio**: Librosa + Wav2Vec 2.0
+- **Face Recognition**: InsightFace (SCRFD + ArcFace)
+- **Hype Scoring**: MLP trained on Mr. HiSum dataset
+- **Scene Detection**: PySceneDetect
+Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

REQUIREMENTS_CHECKLIST.md CHANGED Viewed

@@ -1,162 +1,162 @@
-# ShortSmith v2 - Requirements Checklist
-Comparing implementation against the original proposal document.
-## ✅ Executive Summary Requirements
-| Requirement | Status | Implementation |
-|-------------|--------|----------------|
-| Reduce costs vs Klap.app | ✅ | Uses open-weight models, no per-video API cost |
-| Person-specific filtering | ✅ | `face_recognizer.py` + `body_recognizer.py` |
-| Customizable "hype" definitions | ✅ | `domain_presets.py` with Sports, Vlogs, Music, etc. |
-| Eliminate vendor dependency | ✅ | All processing is local |
-## ✅ Technical Challenges Addressed
-| Challenge | Status | Solution |
-|-----------|--------|----------|
-| Long video processing | ✅ | Hierarchical sampling in `frame_sampler.py` |
-| Subjective "hype" | ✅ | Domain presets + trainable scorer |
-| Person tracking | ✅ | Face + Body recognition + ByteTrack |
-| Audio-visual correlation | ✅ | Multi-modal fusion in `hype_scorer.py` |
-| Temporal precision | ✅ | Scene-aware cutting in `clip_extractor.py` |
-## ✅ Technology Decisions (Section 5)
-### 5.1 Visual Understanding Model
-| Item | Proposal | Implementation | Status |
-|------|----------|----------------|--------|
-| Model | Qwen2-VL-2B | `visual_analyzer.py` | ✅ |
-| Quantization | INT4 via AWQ/GPTQ | bitsandbytes INT4 | ✅ |
-### 5.2 Audio Analysis
-| Item | Proposal | Implementation | Status |
-|------|----------|----------------|--------|
-| Primary | Wav2Vec 2.0 + Librosa | `audio_analyzer.py` | ✅ |
-| Features | RMS, spectral flux, centroid | Implemented | ✅ |
-| MVP Strategy | Start with Librosa | Librosa default, Wav2Vec optional | ✅ |
-### 5.3 Hype Scoring
-| Item | Proposal | Implementation | Status |
-|------|----------|----------------|--------|
-| Dataset | Mr. HiSum | Training notebook created | ✅ |
-| Method | Contrastive/pairwise ranking | `training/hype_scorer_training.ipynb` | ✅ |
-| Model | 2-layer MLP | Implemented in training notebook | ✅ |
-### 5.4 Face Recognition
-| Item | Proposal | Implementation | Status |
-|------|----------|----------------|--------|
-| Detection | SCRFD | InsightFace in `face_recognizer.py` | ✅ |
-| Embeddings | ArcFace (512-dim) | Implemented | ✅ |
-| Threshold | >0.4 cosine similarity | Configurable in `config.py` | ✅ |
-### 5.5 Body Recognition
-| Item | Proposal | Implementation | Status |
-|------|----------|----------------|--------|
-| Model | OSNet | `body_recognizer.py` | ✅ |
-| Purpose | Non-frontal views | Handles back views, profiles | ✅ |
-### 5.6 Multi-Object Tracking
-| Item | Proposal | Implementation | Status |
-|------|----------|----------------|--------|
-| Tracker | ByteTrack | `tracker.py` | ✅ |
-| Features | Two-stage association | Implemented | ✅ |
-### 5.7 Scene Boundary Detection
-| Item | Proposal | Implementation | Status |
-|------|----------|----------------|--------|
-| Tool | PySceneDetect | `scene_detector.py` | ✅ |
-| Modes | Content-aware, Adaptive | Both supported | ✅ |
-### 5.8 Video Processing
-| Item | Proposal | Implementation | Status |
-|------|----------|----------------|--------|
-| Tool | FFmpeg | `video_processor.py` | ✅ |
-| Operations | Extract frames, audio, cut clips | All implemented | ✅ |
-### 5.9 Motion Detection
-| Item | Proposal | Implementation | Status |
-|------|----------|----------------|--------|
-| Model | RAFT Optical Flow | `motion_detector.py` | ✅ |
-| Fallback | Farneback | Implemented | ✅ |
-## ✅ Key Design Decisions (Section 7)
-### 7.1 Hierarchical Sampling
-| Feature | Status | Implementation |
-|---------|--------|----------------|
-| Coarse pass (1 frame/5-10s) | ✅ | `frame_sampler.py` |
-| Dense pass on candidates | ✅ | `sample_dense()` method |
-| Dynamic FPS | ✅ | Based on motion scores |
-### 7.2 Contrastive Hype Scoring
-| Feature | Status | Implementation |
-|---------|--------|----------------|
-| Pairwise ranking | ✅ | Training notebook |
-| Relative scoring | ✅ | Normalized within video |
-### 7.3 Multi-Modal Person Detection
-| Feature | Status | Implementation |
-|---------|--------|----------------|
-| Face + Body | ✅ | Both recognizers |
-| Confidence fusion | ✅ | `max(face_score, body_score)` |
-| ByteTrack tracking | ✅ | `tracker.py` |
-### 7.4 Domain-Aware Presets
-| Domain | Visual | Audio | Status |
-|--------|--------|-------|--------|
-| Sports | 30% | 45% | ✅ |
-| Vlogs | 55% | 20% | ✅ |
-| Music | 35% | 45% | ✅ |
-| Podcasts | 10% | 75% | ✅ |
-| Gaming | 40% | 35% | ✅ |
-| General | 40% | 35% | ✅ |
-### 7.5 Diversity Enforcement
-| Feature | Status | Implementation |
-|---------|--------|----------------|
-| Minimum 30s gap | ✅ | `clip_extractor.py` `select_clips()` |
-### 7.6 Fallback Handling
-| Feature | Status | Implementation |
-|---------|--------|----------------|
-| Uniform windowing for flat content | ✅ | `create_fallback_clips()` |
-| Never zero clips | ✅ | Fallback always creates clips |
-## ✅ Gradio UI Requirements
-| Feature | Status | Implementation |
-|---------|--------|----------------|
-| Video upload | ✅ | `gr.Video` component |
-| API key input | ✅ | `gr.Textbox(type="password")` |
-| Domain selection | ✅ | `gr.Dropdown` |
-| Clip duration slider | ✅ | `gr.Slider` |
-| Num clips slider | ✅ | `gr.Slider` |
-| Reference image | ✅ | `gr.Image` |
-| Custom prompt | ✅ | `gr.Textbox` |
-| Progress bar | ✅ | `gr.Progress` |
-| Output gallery | ✅ | `gr.Gallery` |
-| Download all | ⚠️ | Partial (individual clips downloadable) |
-## ⚠️ Items for Future Enhancement
-| Item | Status | Notes |
-|------|--------|-------|
-| Trained hype scorer weights | 🔄 | Notebook ready, needs training on real data |
-| RAFT GPU acceleration | ⚠️ | Falls back to Farneback if unavailable |
-| Download all as ZIP | ⚠️ | Could add `gr.DownloadButton` |
-| Batch processing | ❌ | Single video only currently |
-| API endpoint | ❌ | UI only, no REST API |
-## Summary
-**Completed**: 95% of proposal requirements
-**Training Pipeline**: Separate Colab notebook for Mr. HiSum training
-**Missing**: Only minor UI features (bulk download) and production training
-The implementation fully covers:
-- ✅ All 9 core components from the proposal
-- ✅ All 6 key design decisions
-- ✅ All domain presets
-- ✅ Error handling and logging throughout
-- ✅ Gradio UI with all inputs from proposal

+# ShortSmith v2 - Requirements Checklist
+Comparing implementation against the original proposal document.
+## ✅ Executive Summary Requirements
+| Requirement | Status | Implementation |
+|-------------|--------|----------------|
+| Reduce costs vs Klap.app | ✅ | Uses open-weight models, no per-video API cost |
+| Person-specific filtering | ✅ | `face_recognizer.py` + `body_recognizer.py` |
+| Customizable "hype" definitions | ✅ | `domain_presets.py` with Sports, Vlogs, Music, etc. |
+| Eliminate vendor dependency | ✅ | All processing is local |
+## ✅ Technical Challenges Addressed
+| Challenge | Status | Solution |
+|-----------|--------|----------|
+| Long video processing | ✅ | Hierarchical sampling in `frame_sampler.py` |
+| Subjective "hype" | ✅ | Domain presets + trainable scorer |
+| Person tracking | ✅ | Face + Body recognition + ByteTrack |
+| Audio-visual correlation | ✅ | Multi-modal fusion in `hype_scorer.py` |
+| Temporal precision | ✅ | Scene-aware cutting in `clip_extractor.py` |
+## ✅ Technology Decisions (Section 5)
+### 5.1 Visual Understanding Model
+| Item | Proposal | Implementation | Status |
+|------|----------|----------------|--------|
+| Model | Qwen2-VL-2B | `visual_analyzer.py` | ✅ |
+| Quantization | INT4 via AWQ/GPTQ | bitsandbytes INT4 | ✅ |
+### 5.2 Audio Analysis
+| Item | Proposal | Implementation | Status |
+|------|----------|----------------|--------|
+| Primary | Wav2Vec 2.0 + Librosa | `audio_analyzer.py` | ✅ |
+| Features | RMS, spectral flux, centroid | Implemented | ✅ |
+| MVP Strategy | Start with Librosa | Librosa default, Wav2Vec optional | ✅ |
+### 5.3 Hype Scoring
+| Item | Proposal | Implementation | Status |
+|------|----------|----------------|--------|
+| Dataset | Mr. HiSum | Training notebook created | ✅ |
+| Method | Contrastive/pairwise ranking | `training/hype_scorer_training.ipynb` | ✅ |
+| Model | 2-layer MLP | Implemented in training notebook | ✅ |
+### 5.4 Face Recognition
+| Item | Proposal | Implementation | Status |
+|------|----------|----------------|--------|
+| Detection | SCRFD | InsightFace in `face_recognizer.py` | ✅ |
+| Embeddings | ArcFace (512-dim) | Implemented | ✅ |
+| Threshold | >0.4 cosine similarity | Configurable in `config.py` | ✅ |
+### 5.5 Body Recognition
+| Item | Proposal | Implementation | Status |
+|------|----------|----------------|--------|
+| Model | OSNet | `body_recognizer.py` | ✅ |
+| Purpose | Non-frontal views | Handles back views, profiles | ✅ |
+### 5.6 Multi-Object Tracking
+| Item | Proposal | Implementation | Status |
+|------|----------|----------------|--------|
+| Tracker | ByteTrack | `tracker.py` | ✅ |
+| Features | Two-stage association | Implemented | ✅ |
+### 5.7 Scene Boundary Detection
+| Item | Proposal | Implementation | Status |
+|------|----------|----------------|--------|
+| Tool | PySceneDetect | `scene_detector.py` | ✅ |
+| Modes | Content-aware, Adaptive | Both supported | ✅ |
+### 5.8 Video Processing
+| Item | Proposal | Implementation | Status |
+|------|----------|----------------|--------|
+| Tool | FFmpeg | `video_processor.py` | ✅ |
+| Operations | Extract frames, audio, cut clips | All implemented | ✅ |
+### 5.9 Motion Detection
+| Item | Proposal | Implementation | Status |
+|------|----------|----------------|--------|
+| Model | RAFT Optical Flow | `motion_detector.py` | ✅ |
+| Fallback | Farneback | Implemented | ✅ |
+## ✅ Key Design Decisions (Section 7)
+### 7.1 Hierarchical Sampling
+| Feature | Status | Implementation |
+|---------|--------|----------------|
+| Coarse pass (1 frame/5-10s) | ✅ | `frame_sampler.py` |
+| Dense pass on candidates | ✅ | `sample_dense()` method |
+| Dynamic FPS | ✅ | Based on motion scores |
+### 7.2 Contrastive Hype Scoring
+| Feature | Status | Implementation |
+|---------|--------|----------------|
+| Pairwise ranking | ✅ | Training notebook |
+| Relative scoring | ✅ | Normalized within video |
+### 7.3 Multi-Modal Person Detection
+| Feature | Status | Implementation |
+|---------|--------|----------------|
+| Face + Body | ✅ | Both recognizers |
+| Confidence fusion | ✅ | `max(face_score, body_score)` |
+| ByteTrack tracking | ✅ | `tracker.py` |
+### 7.4 Domain-Aware Presets
+| Domain | Visual | Audio | Status |
+|--------|--------|-------|--------|
+| Sports | 30% | 45% | ✅ |
+| Vlogs | 55% | 20% | ✅ |
+| Music | 35% | 45% | ✅ |
+| Podcasts | 10% | 75% | ✅ |
+| Gaming | 40% | 35% | ✅ |
+| General | 40% | 35% | ✅ |
+### 7.5 Diversity Enforcement
+| Feature | Status | Implementation |
+|---------|--------|----------------|
+| Minimum 30s gap | ✅ | `clip_extractor.py` `select_clips()` |
+### 7.6 Fallback Handling
+| Feature | Status | Implementation |
+|---------|--------|----------------|
+| Uniform windowing for flat content | ✅ | `create_fallback_clips()` |
+| Never zero clips | ✅ | Fallback always creates clips |
+## ✅ Gradio UI Requirements
+| Feature | Status | Implementation |
+|---------|--------|----------------|
+| Video upload | ✅ | `gr.Video` component |
+| API key input | ✅ | `gr.Textbox(type="password")` |
+| Domain selection | ✅ | `gr.Dropdown` |
+| Clip duration slider | ✅ | `gr.Slider` |
+| Num clips slider | ✅ | `gr.Slider` |
+| Reference image | ✅ | `gr.Image` |
+| Custom prompt | ✅ | `gr.Textbox` |
+| Progress bar | ✅ | `gr.Progress` |
+| Output gallery | ✅ | `gr.Gallery` |
+| Download all | ⚠️ | Partial (individual clips downloadable) |
+## ⚠️ Items for Future Enhancement
+| Item | Status | Notes |
+|------|--------|-------|
+| Trained hype scorer weights | 🔄 | Notebook ready, needs training on real data |
+| RAFT GPU acceleration | ⚠️ | Falls back to Farneback if unavailable |
+| Download all as ZIP | ⚠️ | Could add `gr.DownloadButton` |
+| Batch processing | ❌ | Single video only currently |
+| API endpoint | ❌ | UI only, no REST API |
+## Summary
+**Completed**: 95% of proposal requirements
+**Training Pipeline**: Separate Colab notebook for Mr. HiSum training
+**Missing**: Only minor UI features (bulk download) and production training
+The implementation fully covers:
+- ✅ All 9 core components from the proposal
+- ✅ All 6 key design decisions
+- ✅ All domain presets
+- ✅ Error handling and logging throughout
+- ✅ Gradio UI with all inputs from proposal

app.py CHANGED Viewed

@@ -1,1014 +1,1096 @@
-"""
-ShortSmith v2 - Gradio Application
-Hugging Face Space interface for video highlight extraction.
-Features:
-- Multi-modal analysis (visual + audio + motion)
-- Domain-optimized presets
-- Person-specific filtering (optional)
-- Scene-aware clip cutting
-- Batch testing with parameter variations
-"""
-import os
-import sys
-import tempfile
-import shutil
-import json
-import zipfile
-from pathlib import Path
-import time
-import traceback
-from typing import List, Dict, Any, Optional
-import gradio as gr
-import pandas as pd
-# Add project root to path
-sys.path.insert(0, str(Path(__file__).parent))
-# Initialize logging
-try:
-    from utils.logger import setup_logging, get_logger
-    setup_logging(log_level="INFO", log_to_console=True)
-    logger = get_logger("app")
-except Exception:
-    import logging
-    logging.basicConfig(level=logging.INFO)
-    logger = logging.getLogger("app")
-# =============================================================================
-# Shared Utilities
-# =============================================================================
-def build_metrics_output(result, domain: str, custom_prompt: Optional[str] = None) -> str:
-    """
-    Build formatted metrics output for testing and evaluation.
-    Args:
-        result: PipelineResult object
-        domain: Content domain used for processing
-        custom_prompt: Custom prompt used (if any)
-    Returns:
-        Formatted string with all metrics
-    """
-    lines = []
-    lines.append("=" * 50)
-    lines.append("AUTOMATED METRICS (System-Generated)")
-    lines.append("=" * 50)
-    lines.append("")
-    # Processing Metrics
-    lines.append("PROCESSING METRICS")
-    lines.append("-" * 30)
-    lines.append(f"processing_time_seconds: {result.processing_time:.2f}")
-    lines.append(f"frames_analyzed: {len(result.visual_features)}")
-    lines.append(f"scenes_detected: {len(result.scenes)}")
-    lines.append(f"audio_segments_analyzed: {len(result.audio_features)}")
-    lines.append(f"domain: {domain}")
-    lines.append(f"custom_prompt: {custom_prompt if custom_prompt else 'none'}")
-    # Count hooks from scores (estimate based on high-scoring segments)
-    hooks_detected = sum(1 for s in result.scores if s.combined_score > 0.7) if result.scores else 0
-    lines.append(f"hooks_detected: {hooks_detected}")
-    if result.metadata:
-        lines.append(f"video_duration_seconds: {result.metadata.duration:.2f}")
-        lines.append(f"video_resolution: {result.metadata.resolution}")
-        lines.append(f"video_fps: {result.metadata.fps:.2f}")
-    lines.append("")
-    # Per Clip Metrics
-    lines.append("PER CLIP METRICS")
-    lines.append("-" * 30)
-    for i, clip in enumerate(result.clips):
-        lines.append("")
-        lines.append(f"[Clip {i + 1}]")
-        lines.append(f"  clip_id: {i + 1}")
-        lines.append(f"  start_time: {clip.start_time:.2f}")
-        lines.append(f"  end_time: {clip.end_time:.2f}")
-        lines.append(f"  duration: {clip.duration:.2f}")
-        lines.append(f"  hype_score: {clip.hype_score:.4f}")
-        lines.append(f"  visual_score: {clip.visual_score:.4f}")
-        lines.append(f"  audio_score: {clip.audio_score:.4f}")
-        lines.append(f"  motion_score: {clip.motion_score:.4f}")
-        # Hook info - derive from segment scores if available
-        hook_type = "none"
-        hook_confidence = 0.0
-        # Find matching segment score for this clip
-        for score in result.scores:
-            if abs(score.start_time - clip.start_time) < 1.0:
-                if score.combined_score > 0.7:
-                    hook_confidence = score.combined_score
-                    # Infer hook type based on dominant score
-                    if score.audio_score > score.visual_score and score.audio_score > score.motion_score:
-                        hook_type = "audio_peak"
-                    elif score.motion_score > score.visual_score:
-                        hook_type = "motion_spike"
-                    else:
-                        hook_type = "visual_highlight"
-                break
-        lines.append(f"  hook_type: {hook_type}")
-        lines.append(f"  hook_confidence: {hook_confidence:.4f}")
-        if clip.person_detected:
-            lines.append(f"  person_detected: True")
-            lines.append(f"  person_screen_time: {clip.person_screen_time:.4f}")
-    lines.append("")
-    lines.append("=" * 50)
-    lines.append("END METRICS")
-    lines.append("=" * 50)
-    return "\n".join(lines)
-# =============================================================================
-# Single Video Processing
-# =============================================================================
-def process_video(
-    video_file,
-    domain,
-    num_clips,
-    clip_length,
-    reference_image,
-    custom_prompt,
-    progress=gr.Progress()
-):
-    """
-    Main video processing function for single video mode.
-    Args:
-        video_file: Uploaded video file path
-        domain: Content domain for scoring weights
-        num_clips: Number of clips to extract
-        clip_length: Clip length preset ("Short" or "Long")
-        reference_image: Optional reference image for person filtering
-        custom_prompt: Optional custom instructions
-        progress: Gradio progress tracker
-    Returns:
-        Tuple of (status_message, clip1, clip2, clip3, log_text, metrics_text)
-    """
-    if video_file is None:
-        return "Please upload a video first.", None, None, None, "", ""
-    log_messages = []
-    def log(msg):
-        log_messages.append(f"[{time.strftime('%H:%M:%S')}] {msg}")
-        logger.info(msg)
-    try:
-        video_path = Path(video_file)
-        log(f"Processing video: {video_path.name}")
-        progress(0.05, desc="Validating video...")
-        # Import pipeline components
-        from utils.helpers import validate_video_file, validate_image_file, format_duration
-        from pipeline.orchestrator import PipelineOrchestrator
-        # Validate video
-        validation = validate_video_file(video_file)
-        if not validation.is_valid:
-            return f"Error: {validation.error_message}", None, None, None, "\n".join(log_messages), ""
-        log(f"Video size: {validation.file_size / (1024*1024):.1f} MB")
-        # Validate reference image if provided
-        ref_path = None
-        if reference_image is not None:
-            ref_validation = validate_image_file(reference_image)
-            if ref_validation.is_valid:
-                ref_path = reference_image
-                log(f"Reference image: {Path(reference_image).name}")
-            else:
-                log(f"Warning: Invalid reference image - {ref_validation.error_message}")
-        # Map domain string to internal value
-        domain_map = {
-            "Sports": "sports",
-            "Vlogs": "vlogs",
-            "Music Videos": "music",
-            "Podcasts": "podcasts",
-            "Gaming": "gaming",
-            "Comedy": "comedy",
-            "General": "general",
-        }
-        domain_value = domain_map.get(domain, "general")
-        log(f"Domain: {domain_value}")
-        # Map clip length to internal value
-        clip_length_value = "short" if clip_length == "Short (30-60s)" else "long"
-        log(f"Clip length: {clip_length_value}")
-        # Create output directory
-        output_dir = Path(tempfile.mkdtemp(prefix="shortsmith_output_"))
-        log(f"Output directory: {output_dir}")
-        # Progress callback to update UI during processing
-        def on_progress(pipeline_progress):
-            stage = pipeline_progress.stage.value
-            pct = pipeline_progress.progress
-            msg = pipeline_progress.message
-            log(f"[{stage}] {msg}")
-            # Map pipeline progress (0-1) to our range (0.1-0.9)
-            mapped_progress = 0.1 + (pct * 0.8)
-            progress(mapped_progress, desc=f"{stage}: {msg}")
-        # Initialize pipeline
-        progress(0.1, desc="Initializing AI models...")
-        log("Initializing pipeline...")
-        pipeline = PipelineOrchestrator(progress_callback=on_progress)
-        # Process video
-        progress(0.15, desc="Starting analysis...")
-        log(f"Processing: {int(num_clips)} clips, length={clip_length_value}")
-        result = pipeline.process(
-            video_path=video_path,
-            num_clips=int(num_clips),
-            clip_length=clip_length_value,
-            domain=domain_value,
-            reference_image=ref_path,
-            custom_prompt=custom_prompt.strip() if custom_prompt else None,
-        )
-        progress(0.9, desc="Extracting clips...")
-        # Handle result
-        if result.success:
-            log(f"Processing complete in {result.processing_time:.1f}s")
-            clip_paths = []
-            for i, clip in enumerate(result.clips):
-                if clip.clip_path.exists():
-                    output_path = output_dir / f"highlight_{i+1}.mp4"
-                    shutil.copy2(clip.clip_path, output_path)
-                    clip_paths.append(str(output_path))
-                    log(f"Clip {i+1}: {format_duration(clip.start_time)} - {format_duration(clip.end_time)} (score: {clip.hype_score:.2f})")
-            status = f"Successfully extracted {len(clip_paths)} highlight clips!\nProcessing time: {result.processing_time:.1f}s"
-            # Build metrics output
-            metrics_output = build_metrics_output(result, domain_value, custom_prompt.strip() if custom_prompt else None)
-            pipeline.cleanup()
-            progress(1.0, desc="Done!")
-            # Return up to 3 clips
-            clip1 = clip_paths[0] if len(clip_paths) > 0 else None
-            clip2 = clip_paths[1] if len(clip_paths) > 1 else None
-            clip3 = clip_paths[2] if len(clip_paths) > 2 else None
-            return status, clip1, clip2, clip3, "\n".join(log_messages), metrics_output
-        else:
-            log(f"Processing failed: {result.error_message}")
-            pipeline.cleanup()
-            return f"Error: {result.error_message}", None, None, None, "\n".join(log_messages), ""
-    except Exception as e:
-        error_msg = f"Unexpected error: {str(e)}"
-        log(error_msg)
-        log(traceback.format_exc())
-        logger.exception("Pipeline error")
-        return error_msg, None, None, None, "\n".join(log_messages), ""
-# =============================================================================
-# Batch Testing Functions
-# =============================================================================
-def generate_test_queue(
-    videos: List[str],
-    domains: List[str],
-    clip_lengths: List[str],
-    num_clips: int,
-    ref_image: Optional[str],
-    prompts: List[str],
-    include_no_prompt: bool
-) -> List[Dict[str, Any]]:
-    """Generate all parameter combinations to test (cartesian product)."""
-    # Build prompt list
-    prompt_list = []
-    if include_no_prompt:
-        prompt_list.append(None)  # No prompt baseline
-    prompt_list.extend([p.strip() for p in prompts if p and p.strip()])
-    # If no prompts at all, use just None
-    if not prompt_list:
-        prompt_list = [None]
-    # Map domain display names to internal values
-    domain_map = {
-        "Sports": "sports",
-        "Vlogs": "vlogs",
-        "Music Videos": "music",
-        "Podcasts": "podcasts",
-        "Gaming": "gaming",
-        "Comedy": "comedy",
-        "General": "general",
-    }
-    # Map clip length display names to internal values
-    clip_length_map = {
-        "Short (30-60s)": "short",
-        "Long (1-3 min)": "long",
-    }
-    queue = []
-    test_id = 1
-    for video in videos:
-        video_name = Path(video).name if video else "unknown"
-        for domain in domains:
-            domain_value = domain_map.get(domain, "general")
-            for clip_length in clip_lengths:
-                clip_length_value = clip_length_map.get(clip_length, "short")
-                for prompt in prompt_list:
-                    queue.append({
-                        "test_id": test_id,
-                        "video_path": video,
-                        "video_name": video_name,
-                        "domain": domain,
-                        "domain_value": domain_value,
-                        "clip_length": clip_length,
-                        "clip_length_value": clip_length_value,
-                        "num_clips": num_clips,
-                        "reference_image": ref_image,
-                        "custom_prompt": prompt,
-                    })
-                    test_id += 1
-    return queue
-def run_single_batch_test(config: Dict[str, Any], output_base_dir: Path) -> Dict[str, Any]:
-    """Run a single test from the batch queue."""
-    from utils.helpers import validate_video_file
-    from pipeline.orchestrator import PipelineOrchestrator
-    test_id = config["test_id"]
-    video_path = config["video_path"]
-    video_name = config["video_name"]
-    domain_value = config["domain_value"]
-    clip_length = config["clip_length"]
-    clip_length_value = config["clip_length_value"]
-    num_clips = config["num_clips"]
-    ref_image = config["reference_image"]
-    custom_prompt = config["custom_prompt"]
-    # Create unique output folder for this test
-    prompt_suffix = "no_prompt" if not custom_prompt else f"prompt_{hash(custom_prompt) % 1000}"
-    test_folder = f"{Path(video_name).stem}_{domain_value}_{clip_length_value}_{prompt_suffix}"
-    output_dir = output_base_dir / test_folder
-    output_dir.mkdir(parents=True, exist_ok=True)
-    result_data = {
-        "test_id": test_id,
-        "video_name": video_name,
-        "domain": domain_value,
-        "clip_length": clip_length,
-        "custom_prompt": custom_prompt if custom_prompt else "none",
-        "num_clips": num_clips,
-        "status": "failed",
-        "error": None,
-        "processing_time": 0,
-        "frames_analyzed": 0,
-        "scenes_detected": 0,
-        "hooks_detected": 0,
-        "clips": [],
-        "clip_paths": [],
-    }
-    try:
-        # Validate video
-        validation = validate_video_file(video_path)
-        if not validation.is_valid:
-            result_data["error"] = validation.error_message
-            return result_data
-        # Initialize and run pipeline
-        pipeline = PipelineOrchestrator()
-        result = pipeline.process(
-            video_path=video_path,
-            num_clips=num_clips,
-            clip_length=clip_length_value,
-            domain=domain_value,
-            reference_image=ref_image,
-            custom_prompt=custom_prompt,
-        )
-        if result.success:
-            result_data["status"] = "success"
-            result_data["processing_time"] = round(result.processing_time, 2)
-            result_data["frames_analyzed"] = len(result.visual_features)
-            result_data["scenes_detected"] = len(result.scenes)
-            result_data["hooks_detected"] = sum(1 for s in result.scores if s.combined_score > 0.7) if result.scores else 0
-            # Copy clips and collect data
-            for i, clip in enumerate(result.clips):
-                if clip.clip_path.exists():
-                    clip_output = output_dir / f"clip_{i+1}.mp4"
-                    shutil.copy2(clip.clip_path, clip_output)
-                    result_data["clip_paths"].append(str(clip_output))
-                    # Find hook type for this clip
-                    hook_type = "none"
-                    hook_confidence = 0.0
-                    for score in result.scores:
-                        if abs(score.start_time - clip.start_time) < 1.0:
-                            if score.combined_score > 0.7:
-                                hook_confidence = score.combined_score
-                                if score.audio_score > score.visual_score and score.audio_score > score.motion_score:
-                                    hook_type = "audio_peak"
-                                elif score.motion_score > score.visual_score:
-                                    hook_type = "motion_spike"
-                                else:
-                                    hook_type = "visual_highlight"
-                            break
-                    result_data["clips"].append({
-                        "clip_id": i + 1,
-                        "start_time": round(clip.start_time, 2),
-                        "end_time": round(clip.end_time, 2),
-                        "duration": round(clip.duration, 2),
-                        "hype_score": round(clip.hype_score, 4),
-                        "visual_score": round(clip.visual_score, 4),
-                        "audio_score": round(clip.audio_score, 4),
-                        "motion_score": round(clip.motion_score, 4),
-                        "hook_type": hook_type,
-                        "hook_confidence": round(hook_confidence, 4),
-                    })
-        else:
-            result_data["error"] = result.error_message
-        pipeline.cleanup()
-    except Exception as e:
-        result_data["error"] = str(e)
-        logger.exception(f"Batch test {test_id} failed")
-    return result_data
-def results_to_dataframe(results: List[Dict[str, Any]]) -> pd.DataFrame:
-    """Convert batch results to a pandas DataFrame for display."""
-    rows = []
-    for r in results:
-        row = {
-            "Test ID": r["test_id"],
-            "Video": r["video_name"],
-            "Domain": r["domain"],
-            "Length": r["clip_length"],
-            "Prompt": r["custom_prompt"][:20] + "..." if len(r["custom_prompt"]) > 20 else r["custom_prompt"],
-            "Status": r["status"],
-            "Time (s)": r["processing_time"],
-            "Frames": r["frames_analyzed"],
-            "Hooks": r["hooks_detected"],
-        }
-        # Add clip scores
-        for i, clip in enumerate(r.get("clips", [])[:3]):
-            row[f"Clip {i+1} Hype"] = clip.get("hype_score", 0)
-        rows.append(row)
-    return pd.DataFrame(rows)
-def results_to_csv(results: List[Dict[str, Any]]) -> str:
-    """Convert results to CSV format."""
-    rows = []
-    for r in results:
-        row = {
-            "test_id": r["test_id"],
-            "video_name": r["video_name"],
-            "domain": r["domain"],
-            "clip_length": r["clip_length"],
-            "custom_prompt": r["custom_prompt"],
-            "num_clips": r["num_clips"],
-            "status": r["status"],
-            "error": r.get("error", ""),
-            "processing_time": r["processing_time"],
-            "frames_analyzed": r["frames_analyzed"],
-            "scenes_detected": r["scenes_detected"],
-            "hooks_detected": r["hooks_detected"],
-        }
-        # Add per-clip data
-        for i in range(3):
-            if i < len(r.get("clips", [])):
-                clip = r["clips"][i]
-                row[f"clip_{i+1}_start"] = clip["start_time"]
-                row[f"clip_{i+1}_end"] = clip["end_time"]
-                row[f"clip_{i+1}_hype"] = clip["hype_score"]
-                row[f"clip_{i+1}_visual"] = clip["visual_score"]
-                row[f"clip_{i+1}_audio"] = clip["audio_score"]
-                row[f"clip_{i+1}_motion"] = clip["motion_score"]
-                row[f"clip_{i+1}_hook_type"] = clip["hook_type"]
-            else:
-                row[f"clip_{i+1}_start"] = ""
-                row[f"clip_{i+1}_end"] = ""
-                row[f"clip_{i+1}_hype"] = ""
-                row[f"clip_{i+1}_visual"] = ""
-                row[f"clip_{i+1}_audio"] = ""
-                row[f"clip_{i+1}_motion"] = ""
-                row[f"clip_{i+1}_hook_type"] = ""
-        rows.append(row)
-    df = pd.DataFrame(rows)
-    return df.to_csv(index=False)
-def results_to_json(results: List[Dict[str, Any]]) -> str:
-    """Convert results to JSON format."""
-    # Remove clip_paths from export (they're temp files)
-    export_results = []
-    for r in results:
-        r_copy = r.copy()
-        r_copy.pop("clip_paths", None)
-        export_results.append(r_copy)
-    return json.dumps(export_results, indent=2)
-def create_clips_zip(results: List[Dict[str, Any]]) -> Optional[str]:
-    """Create a ZIP file of all extracted clips."""
-    zip_path = Path(tempfile.mkdtemp()) / "batch_clips.zip"
-    with zipfile.ZipFile(zip_path, 'w', zipfile.ZIP_DEFLATED) as zf:
-        for r in results:
-            if r["status"] == "success":
-                folder_name = f"{Path(r['video_name']).stem}_{r['domain']}_{r['clip_length']}"
-                if r["custom_prompt"] != "none":
-                    folder_name += f"_prompt"
-                for clip_path in r.get("clip_paths", []):
-                    if Path(clip_path).exists():
-                        arcname = f"{folder_name}/{Path(clip_path).name}"
-                        zf.write(clip_path, arcname)
-    return str(zip_path) if zip_path.exists() else None
-# Batch state (module level for simplicity)
-batch_state = {
-    "is_running": False,
-    "should_cancel": False,
-    "results": [],
-    "output_dir": None,
-}
-def run_batch_tests(
-    videos,
-    domains,
-    clip_lengths,
-    num_clips,
-    reference_image,
-    include_no_prompt,
-    prompt1,
-    prompt2,
-    prompt3,
-    progress=gr.Progress()
-):
-    """Main batch testing function."""
-    global batch_state
-    # Validate inputs
-    if not videos:
-        return "Please upload at least one video.", None, "", "", None, None, None
-    if not domains:
-        return "Please select at least one domain.", None, "", "", None, None, None
-    if not clip_lengths:
-        return "Please select at least one clip length.", None, "", "", None, None, None
-    # Collect prompts
-    prompts = [p for p in [prompt1, prompt2, prompt3] if p and p.strip()]
-    # Generate test queue
-    queue = generate_test_queue(
-        videos=videos,
-        domains=domains,
-        clip_lengths=clip_lengths,
-        num_clips=int(num_clips),
-        ref_image=reference_image,
-        prompts=prompts,
-        include_no_prompt=include_no_prompt,
-    )
-    if not queue:
-        return "No tests to run. Please check your configuration.", None, "", "", None, None, None
-    # Initialize batch state
-    batch_state["is_running"] = True
-    batch_state["should_cancel"] = False
-    batch_state["results"] = []
-    batch_state["output_dir"] = Path(tempfile.mkdtemp(prefix="shortsmith_batch_"))
-    total_tests = len(queue)
-    log_messages = []
-    def log(msg):
-        log_messages.append(f"[{time.strftime('%H:%M:%S')}] {msg}")
-        logger.info(msg)
-    log(f"Starting batch testing: {total_tests} tests")
-    log(f"Videos: {len(videos)}, Domains: {len(domains)}, Lengths: {len(clip_lengths)}, Prompts: {len(prompts) + (1 if include_no_prompt else 0)}")
-    # Run tests sequentially
-    for i, test_config in enumerate(queue):
-        if batch_state["should_cancel"]:
-            log("Batch cancelled by user")
-            break
-        test_id = test_config["test_id"]
-        video_name = test_config["video_name"]
-        domain = test_config["domain_value"]
-        clip_length = test_config["clip_length"]
-        prompt = test_config["custom_prompt"] or "no-prompt"
-        log(f"[{i+1}/{total_tests}] Testing: {video_name} | {domain} | {clip_length} | {prompt[:30]}...")
-        progress((i + 1) / total_tests, desc=f"Test {i+1}/{total_tests}: {video_name}")
-        # Run the test
-        result = run_single_batch_test(test_config, batch_state["output_dir"])
-        batch_state["results"].append(result)
-        if result["status"] == "success":
-            log(f"  ✓ Completed in {result['processing_time']}s")
-        else:
-            log(f"  ✗ Failed: {result.get('error', 'Unknown error')}")
-    # Finalize
-    batch_state["is_running"] = False
-    completed = len([r for r in batch_state["results"] if r["status"] == "success"])
-    failed = len([r for r in batch_state["results"] if r["status"] == "failed"])
-    log(f"Batch complete: {completed} succeeded, {failed} failed")
-    # Generate outputs
-    results_df = results_to_dataframe(batch_state["results"])
-    csv_content = results_to_csv(batch_state["results"])
-    json_content = results_to_json(batch_state["results"])
-    # Save CSV and JSON to files for download
-    csv_path = batch_state["output_dir"] / "results.csv"
-    json_path = batch_state["output_dir"] / "results.json"
-    csv_path.write_text(csv_content)
-    json_path.write_text(json_content)
-    # Create ZIP of clips
-    zip_path = create_clips_zip(batch_state["results"])
-    status = f"Batch complete: {completed}/{total_tests} tests succeeded"
-    return (
-        status,
-        results_df,
-        "\n".join(log_messages),
-        json_content,
-        str(csv_path),
-        str(json_path),
-        zip_path,
-    )
-def cancel_batch():
-    """Cancel the running batch."""
-    global batch_state
-    batch_state["should_cancel"] = True
-    return "Cancelling batch... (will stop after current test completes)"
-def calculate_queue_size(videos, domains, clip_lengths, include_no_prompt, prompt1, prompt2, prompt3):
-    """Calculate and display the queue size."""
-    num_videos = len(videos) if videos else 0
-    num_domains = len(domains) if domains else 0
-    num_lengths = len(clip_lengths) if clip_lengths else 0
-    prompts = [p for p in [prompt1, prompt2, prompt3] if p and p.strip()]
-    num_prompts = len(prompts) + (1 if include_no_prompt else 0)
-    if num_prompts == 0:
-        num_prompts = 1  # Default to no-prompt if nothing selected
-    total = num_videos * num_domains * num_lengths * num_prompts
-    return f"Queue: {num_videos} video(s) × {num_domains} domain(s) × {num_lengths} length(s) × {num_prompts} prompt(s) = **{total} tests**"
-# =============================================================================
-# Build Gradio Interface
-# =============================================================================
-with gr.Blocks(
-    title="ShortSmith v2",
-    theme=gr.themes.Soft(),
-    css="""
-    .container { max-width: 1200px; margin: auto; }
-    .output-video { min-height: 200px; }
-    """
-) as demo:
-    gr.Markdown("""
-    # ShortSmith v2
-    ### AI-Powered Video Highlight Extractor
-    Upload a video and automatically extract the most engaging highlight clips using AI analysis.
-    """)
-    with gr.Tabs():
-        # =================================================================
-        # Tab 1: Single Video
-        # =================================================================
-        with gr.TabItem("Single Video"):
-            with gr.Row():
-                # Left column - Inputs
-                with gr.Column(scale=1):
-                    gr.Markdown("### Input")
-                    video_input = gr.Video(
-                        label="Upload Video",
-                        sources=["upload"],
-                    )
-                    with gr.Accordion("Settings", open=True):
-                        domain_dropdown = gr.Dropdown(
-                            choices=["Sports", "Vlogs", "Music Videos", "Podcasts", "Gaming", "Comedy", "General"],
-                            value="General",
-                            label="Content Domain",
-                            info="Select the type of content for optimized scoring"
-                        )
-                        with gr.Row():
-                            num_clips_slider = gr.Slider(
-                                minimum=1,
-                                maximum=5,
-                                value=3,
-                                step=1,
-                                label="Number of Clips",
-                                info="How many highlight clips to extract"
-                            )
-                            clip_length_radio = gr.Radio(
-                                choices=["Short (30-60s)", "Long (1-3 min)"],
-                                value="Short (30-60s)",
-                                label="Clip Length",
-                                info="Short clips for social media, long clips for YouTube"
-                            )
-                    with gr.Accordion("Person Filtering (Optional)", open=False):
-                        reference_image = gr.Image(
-                            label="Reference Image",
-                            type="filepath",
-                            sources=["upload"],
-                        )
-                        gr.Markdown("*Upload a photo of a person to prioritize clips featuring them.*")
-                    with gr.Accordion("Custom Instructions (Optional)", open=False):
-                        custom_prompt = gr.Textbox(
-                            label="Additional Instructions",
-                            placeholder="E.g., 'Focus on crowd reactions' or 'Prioritize action scenes'",
-                            lines=2,
-                        )
-                    process_btn = gr.Button(
-                        "Extract Highlights",
-                        variant="primary",
-                        size="lg"
-                    )
-                # Right column - Outputs
-                with gr.Column(scale=1):
-                    gr.Markdown("### Output")
-                    status_output = gr.Textbox(
-                        label="Status",
-                        lines=2,
-                        interactive=False
-                    )
-                    gr.Markdown("#### Extracted Clips")
-                    clip1_output = gr.Video(label="Clip 1", elem_classes=["output-video"])
-                    clip2_output = gr.Video(label="Clip 2", elem_classes=["output-video"])
-                    clip3_output = gr.Video(label="Clip 3", elem_classes=["output-video"])
-                    with gr.Accordion("Processing Log", open=True):
-                        log_output = gr.Textbox(
-                            label="Log",
-                            lines=10,
-                            interactive=False,
-                            show_copy_button=True
-                        )
-                    with gr.Accordion("Automated Metrics (System-Generated)", open=True):
-                        metrics_output = gr.Textbox(
-                            label="Metrics for Testing",
-                            lines=20,
-                            interactive=False,
-                            show_copy_button=True,
-                            info="Copy these metrics for evaluation spreadsheets"
-                        )
-            # Connect single video processing
-            process_btn.click(
-                fn=process_video,
-                inputs=[
-                    video_input,
-                    domain_dropdown,
-                    num_clips_slider,
-                    clip_length_radio,
-                    reference_image,
-                    custom_prompt
-                ],
-                outputs=[
-                    status_output,
-                    clip1_output,
-                    clip2_output,
-                    clip3_output,
-                    log_output,
-                    metrics_output
-                ],
-                show_progress="full"
-            )
-        # =================================================================
-        # Tab 2: Batch Testing
-        # =================================================================
-        with gr.TabItem("Batch Testing"):
-            with gr.Row():
-                # Left column - Configuration
-                with gr.Column(scale=1):
-                    gr.Markdown("### Batch Configuration")
-                    batch_videos = gr.File(
-                        label="Upload Video(s)",
-                        file_count="multiple",
-                        file_types=["video"],
-                    )
-                    gr.Markdown("#### Domains to Test")
-                    batch_domains = gr.CheckboxGroup(
-                        choices=["Sports", "Vlogs", "Music Videos", "Podcasts", "Gaming", "Comedy", "General"],
-                        value=["General"],
-                        label="Select domains",
-                    )
-                    gr.Markdown("#### Clip Lengths to Test")
-                    batch_clip_lengths = gr.CheckboxGroup(
-                        choices=["Short (30-60s)", "Long (1-3 min)"],
-                        value=["Short (30-60s)"],
-                        label="Select clip lengths",
-                    )
-                    batch_num_clips = gr.Slider(
-                        minimum=1,
-                        maximum=5,
-                        value=3,
-                        step=1,
-                        label="Number of Clips per Test",
-                    )
-                    with gr.Accordion("Custom Prompts", open=True):
-                        batch_no_prompt = gr.Checkbox(
-                            label="Include no-prompt baseline",
-                            value=True,
-                            info="Test without any custom prompt for comparison"
-                        )
-                        batch_prompt1 = gr.Textbox(
-                            label="Prompt 1",
-                            placeholder="E.g., 'Focus on action moments'",
-                            lines=1,
-                        )
-                        batch_prompt2 = gr.Textbox(
-                            label="Prompt 2",
-                            placeholder="E.g., 'Find crowd reactions'",
-                            lines=1,
-                        )
-                        batch_prompt3 = gr.Textbox(
-                            label="Prompt 3",
-                            placeholder="E.g., 'Prioritize emotional moments'",
-                            lines=1,
-                        )
-                    with gr.Accordion("Reference Image (Optional)", open=False):
-                        batch_ref_image = gr.Image(
-                            label="Reference Image (applies to all tests)",
-                            type="filepath",
-                            sources=["upload"],
-                        )
-                    # Queue size indicator
-                    queue_info = gr.Markdown("Queue: 0 tests")
-                    with gr.Row():
-                        batch_start_btn = gr.Button(
-                            "Start Batch",
-                            variant="primary",
-                            size="lg"
-                        )
-                        batch_cancel_btn = gr.Button(
-                            "Cancel",
-                            variant="secondary",
-                            size="lg"
-                        )
-                # Right column - Results
-                with gr.Column(scale=1):
-                    gr.Markdown("### Results")
-                    batch_status = gr.Textbox(
-                        label="Status",
-                        lines=2,
-                        interactive=False
-                    )
-                    batch_results_table = gr.Dataframe(
-                        label="Test Results",
-                        headers=["Test ID", "Video", "Domain", "Length", "Prompt", "Status", "Time (s)", "Frames", "Hooks"],
-                        interactive=False,
-                    )
-                    with gr.Accordion("Processing Log", open=True):
-                        batch_log = gr.Textbox(
-                            label="Log",
-                            lines=15,
-                            interactive=False,
-                            show_copy_button=True
-                        )
-                    with gr.Accordion("Full Results (JSON)", open=False):
-                        batch_json = gr.Textbox(
-                            label="JSON Output",
-                            lines=10,
-                            interactive=False,
-                            show_copy_button=True
-                        )
-                    gr.Markdown("#### Download Results")
-                    with gr.Row():
-                        csv_download = gr.File(label="CSV Results")
-                        json_download = gr.File(label="JSON Results")
-                        zip_download = gr.File(label="All Clips (ZIP)")
-            # Update queue size when inputs change
-            queue_inputs = [batch_videos, batch_domains, batch_clip_lengths, batch_no_prompt, batch_prompt1, batch_prompt2, batch_prompt3]
-            for inp in queue_inputs:
-                inp.change(
-                    fn=calculate_queue_size,
-                    inputs=queue_inputs,
-                    outputs=queue_info
-                )
-            # Connect batch processing
-            batch_start_btn.click(
-                fn=run_batch_tests,
-                inputs=[
-                    batch_videos,
-                    batch_domains,
-                    batch_clip_lengths,
-                    batch_num_clips,
-                    batch_ref_image,
-                    batch_no_prompt,
-                    batch_prompt1,
-                    batch_prompt2,
-                    batch_prompt3,
-                ],
-                outputs=[
-                    batch_status,
-                    batch_results_table,
-                    batch_log,
-                    batch_json,
-                    csv_download,
-                    json_download,
-                    zip_download,
-                ],
-                show_progress="full"
-            )
-            batch_cancel_btn.click(
-                fn=cancel_batch,
-                inputs=[],
-                outputs=[batch_status]
-            )
-    gr.Markdown("""
-    ---
-    **ShortSmith v2** | Powered by Qwen2-VL, InsightFace, and Librosa |
-    [GitHub](https://github.com) | Built with Gradio
-    """)
-# Launch the app
-if __name__ == "__main__":
-    demo.queue()
-    demo.launch(
-        server_name="0.0.0.0",
-        server_port=7860,
-        show_error=True
-    )
-else:
-    # For HuggingFace Spaces
-    demo.queue()
-    demo.launch()

+"""
+ShortSmith v3 - Gradio Application
+Hugging Face Space interface for video highlight extraction.
+Features:
+- Multi-modal analysis (visual + audio + motion)
+- Domain-optimized presets
+- Person-specific filtering (optional)
+- Scene-aware clip cutting
+- Batch testing with parameter variations
+"""
+import os
+import sys
+import tempfile
+import shutil
+import json
+import zipfile
+from pathlib import Path
+import time
+import traceback
+from typing import List, Dict, Any, Optional
+import gradio as gr
+import pandas as pd
+# Add project root to path
+sys.path.insert(0, str(Path(__file__).parent))
+# Initialize logging
+try:
+    from utils.logger import setup_logging, get_logger
+    setup_logging(log_level="INFO", log_to_console=True)
+    logger = get_logger("app")
+except Exception:
+    import logging
+    logging.basicConfig(level=logging.INFO)
+    logger = logging.getLogger("app")
+# =============================================================================
+# Shared Utilities
+# =============================================================================
+def build_metrics_output(result, domain: str, custom_prompt: Optional[str] = None) -> str:
+    """
+    Build formatted metrics output for testing and evaluation.
+    Args:
+        result: PipelineResult object
+        domain: Content domain used for processing
+        custom_prompt: Custom prompt used (if any)
+    Returns:
+        Formatted string with all metrics
+    """
+    lines = []
+    lines.append("=" * 50)
+    lines.append("AUTOMATED METRICS (System-Generated)")
+    lines.append("=" * 50)
+    lines.append("")
+    # Processing Metrics
+    lines.append("PROCESSING METRICS")
+    lines.append("-" * 30)
+    lines.append(f"processing_time_seconds: {result.processing_time:.2f}")
+    lines.append(f"frames_analyzed: {len(result.visual_features)}")
+    lines.append(f"scenes_detected: {len(result.scenes)}")
+    lines.append(f"audio_segments_analyzed: {len(result.audio_features)}")
+    lines.append(f"domain: {domain}")
+    lines.append(f"custom_prompt: {custom_prompt if custom_prompt else 'none'}")
+    # Count hooks from scores (estimate based on high-scoring segments)
+    hooks_detected = sum(1 for s in result.scores if s.combined_score > 0.7) if result.scores else 0
+    lines.append(f"hooks_detected: {hooks_detected}")
+    if result.metadata:
+        lines.append(f"video_duration_seconds: {result.metadata.duration:.2f}")
+        lines.append(f"video_resolution: {result.metadata.resolution}")
+        lines.append(f"video_fps: {result.metadata.fps:.2f}")
+    lines.append("")
+    # Per Clip Metrics
+    lines.append("PER CLIP METRICS")
+    lines.append("-" * 30)
+    for i, clip in enumerate(result.clips):
+        lines.append("")
+        lines.append(f"[Clip {i + 1}]")
+        lines.append(f"  clip_id: {i + 1}")
+        lines.append(f"  start_time: {clip.start_time:.2f}")
+        lines.append(f"  end_time: {clip.end_time:.2f}")
+        lines.append(f"  duration: {clip.duration:.2f}")
+        lines.append(f"  hype_score: {clip.hype_score:.4f}")
+        lines.append(f"  visual_score: {clip.visual_score:.4f}")
+        lines.append(f"  audio_score: {clip.audio_score:.4f}")
+        lines.append(f"  motion_score: {clip.motion_score:.4f}")
+        # Hook info - derive from segment scores if available
+        hook_type = "none"
+        hook_confidence = 0.0
+        # Find matching segment score for this clip
+        for score in result.scores:
+            if abs(score.start_time - clip.start_time) < 1.0:
+                if score.combined_score > 0.7:
+                    hook_confidence = score.combined_score
+                    # Infer hook type based on dominant score
+                    if score.audio_score > score.visual_score and score.audio_score > score.motion_score:
+                        hook_type = "audio_peak"
+                    elif score.motion_score > score.visual_score:
+                        hook_type = "motion_spike"
+                    else:
+                        hook_type = "visual_highlight"
+                break
+        lines.append(f"  hook_type: {hook_type}")
+        lines.append(f"  hook_confidence: {hook_confidence:.4f}")
+        if clip.person_detected:
+            lines.append(f"  person_detected: True")
+            lines.append(f"  person_screen_time: {clip.person_screen_time:.4f}")
+    lines.append("")
+    lines.append("=" * 50)
+    lines.append("END METRICS")
+    lines.append("=" * 50)
+    return "\n".join(lines)
+# =============================================================================
+# Single Video Processing
+# =============================================================================
+def process_video(
+    video_file,
+    domain,
+    num_clips,
+    clip_length,
+    reference_image,
+    custom_prompt,
+    progress=gr.Progress()
+):
+    """
+    Main video processing function for single video mode.
+    Args:
+        video_file: Uploaded video file path
+        domain: Content domain for scoring weights
+        num_clips: Number of clips to extract
+        clip_length: Clip length preset ("Short" or "Long")
+        reference_image: Optional reference image for person filtering
+        custom_prompt: Optional custom instructions
+        progress: Gradio progress tracker
+    Returns:
+        Tuple of (status_message, clip1, clip2, clip3, log_text, metrics_text)
+    """
+    if video_file is None:
+        return "Please upload a video first.", None, None, None, "", ""
+    log_messages = []
+    def log(msg):
+        log_messages.append(f"[{time.strftime('%H:%M:%S')}] {msg}")
+        logger.info(msg)
+    try:
+        video_path = Path(video_file)
+        log(f"Processing video: {video_path.name}")
+        progress(0.05, desc="Validating video...")
+        # Import pipeline components
+        from utils.helpers import validate_video_file, validate_image_file, format_duration
+        from pipeline.orchestrator import PipelineOrchestrator
+        # Validate video
+        validation = validate_video_file(video_file)
+        if not validation.is_valid:
+            return f"Error: {validation.error_message}", None, None, None, "\n".join(log_messages), ""
+        log(f"Video size: {validation.file_size / (1024*1024):.1f} MB")
+###-------------------------------TESTING MODE----------------------------------------------
+        # For testing: Skip processing and show mock highlights
+        if custom_prompt and custom_prompt.strip().lower() == "test@akatsuki":
+            log("TEST MODE: Skipping processing, showing mock highlights")
+            progress(0.5, desc="Generating test highlights...")
+            # Build dynamic HTML for mock highlights
+            clips_html = ""
+            for i in range(int(num_clips)):
+                clips_html += f'''
+                <div style="margin-bottom: 20px; border: 1px solid #ccc; border-radius: 8px; padding: 15px; background-color: #0b1619;">
+                    <h4 style="margin-top: 0; color: #f2fafc;">Highlight Clip {i+1}</h4>
+                    <div style="margin-bottom: 10px; font-size: 14px; color: #f2fafc;">
+                        <strong>Time:</strong> {i*30}-{i*30+45}s |
+                        <strong>Score:</strong> {95-i*5:.1f} |
+                        <strong>Domain:</strong> {domain}
+                    </div>
+                    <video width="100%" controls style="border-radius: 4px;">
+                        <source src="{video_file}" type="video/mp4">
+                    </video>
+                </div>
+                '''
+            status = f"TEST MODE: Generated {int(num_clips)} mock highlight clips from {Path(video_file).name}"
+            metrics_output = f"""
+Test Mode Metrics:
+- Domain: {domain}
+- Number of Clips: {int(num_clips)}
+- Clip Length: {clip_length}
+- Processing Time: 2.3s (simulated)
+- Total Video Duration: 5:42
+- Highlights Extracted: {int(num_clips)}
+            """.strip()
+            progress(1.0, desc="Test complete!")
+            return status, clips_html, "\n".join(log_messages), metrics_output, ""
+###----------------------------------------------------------------------------------------------------------------------
+        # Validate reference image if provided
+        ref_path = None
+        if reference_image is not None:
+            ref_validation = validate_image_file(reference_image)
+            if ref_validation.is_valid:
+                ref_path = reference_image
+                log(f"Reference image: {Path(reference_image).name}")
+            else:
+                log(f"Warning: Invalid reference image - {ref_validation.error_message}")
+        # Map domain string to internal value
+        domain_map = {
+            "Sports": "sports",
+            "Vlogs": "vlogs",
+            "Music Videos": "music",
+            "Podcasts": "podcasts",
+            "Gaming": "gaming",
+            "Comedy": "comedy",
+            "General": "general",
+        }
+        domain_value = domain_map.get(domain, "general")
+        log(f"Domain: {domain_value}")
+        # Map clip length to internal value
+        clip_length_value = "short" if clip_length == "Short (30-60s)" else "long"
+        log(f"Clip length: {clip_length_value}")
+        # Create output directory
+        output_dir = Path(tempfile.mkdtemp(prefix="shortsmith_output_"))
+        log(f"Output directory: {output_dir}")
+        # Progress callback to update UI during processing
+        def on_progress(pipeline_progress):
+            stage = pipeline_progress.stage.value
+            pct = pipeline_progress.progress
+            msg = pipeline_progress.message
+            log(f"[{stage}] {msg}")
+            # Map pipeline progress (0-1) to our range (0.1-0.9)
+            mapped_progress = 0.1 + (pct * 0.8)
+            progress(mapped_progress, desc=f"{stage}: {msg}")
+        # Initialize pipeline
+        progress(0.1, desc="Initializing AI models...")
+        log("Initializing pipeline...")
+        pipeline = PipelineOrchestrator(progress_callback=on_progress)
+        # Process video
+        progress(0.15, desc="Starting analysis...")
+        log(f"Processing: {int(num_clips)} clips, length={clip_length_value}")
+        result = pipeline.process(
+            video_path=video_path,
+            num_clips=int(num_clips),
+            clip_length=clip_length_value,
+            domain=domain_value,
+            reference_image=ref_path,
+            custom_prompt=custom_prompt.strip() if custom_prompt else None,
+        )
+        progress(0.9, desc="Extracting clips...")
+        # Handle result
+        if result.success:
+            log(f"Processing complete in {result.processing_time:.1f}s")
+            clip_paths = []
+            for i, clip in enumerate(result.clips):
+                if clip.clip_path.exists():
+                    output_path = output_dir / f"highlight_{i+1}.mp4"
+                    shutil.copy2(clip.clip_path, output_path)
+                    clip_paths.append(str(output_path))
+                    log(f"Clip {i+1}: {format_duration(clip.start_time)} - {format_duration(clip.end_time)} (score: {clip.hype_score:.2f})")
+            status = f"Successfully extracted {len(clip_paths)} highlight clips!\nProcessing time: {result.processing_time:.1f}s"
+            # Build metrics output
+            metrics_output = build_metrics_output(result, domain_value, custom_prompt.strip() if custom_prompt else None)
+            pipeline.cleanup()
+            progress(1.0, desc="Done!")
+            # Build dynamic HTML for clips with extracted highlights
+            clips_html = ""
+            for i, clip_path in enumerate(clip_paths):
+                clips_html += f'''
+                <div style="margin-bottom: 20px; border: 1px solid #ddd; border-radius: 8px; padding: 15px; background-color: #f9f9f9;">
+                    <h4 style="margin-top: 0; color: #2e7d32;">Highlight Clip {i+1}</h4>
+                    <div style="margin-bottom: 10px; font-size: 14px; color: #666;">
+                        <strong>Time:</strong> {format_duration(result.clips[i].start_time)} - {format_duration(result.clips[i].end_time)} |
+                        <strong>Score:</strong> {result.clips[i].hype_score:.2f} |
+                        <strong>Domain:</strong> {domain_value}
+                    </div>
+                    <video width="100%" controls style="border-radius: 4px;">
+                        <source src="{clip_path}" type="video/mp4">
+                    </video>
+                </div>
+                '''
+            return status, clips_html, "\n".join(log_messages), metrics_output, ""
+        else:
+            log(f"Processing failed: {result.error_message}")
+            pipeline.cleanup()
+            return f"Error: {result.error_message}", "", "\n".join(log_messages), "", ""
+    except Exception as e:
+        error_msg = f"Unexpected error: {str(e)}"
+        log(error_msg)
+        log(traceback.format_exc())
+        logger.exception("Pipeline error")
+        return error_msg, "", "\n".join(log_messages), "", ""
+# =============================================================================
+# Batch Testing Functions
+# =============================================================================
+def generate_test_queue(
+    videos: List[str],
+    domains: List[str],
+    clip_lengths: List[str],
+    num_clips: int,
+    ref_image: Optional[str],
+    prompts: List[str],
+    include_no_prompt: bool
+) -> List[Dict[str, Any]]:
+    """Generate all parameter combinations to test (cartesian product)."""
+    # Build prompt list
+    prompt_list = []
+    if include_no_prompt:
+        prompt_list.append(None)  # No prompt baseline
+    prompt_list.extend([p.strip() for p in prompts if p and p.strip()])
+    # If no prompts at all, use just None
+    if not prompt_list:
+        prompt_list = [None]
+    # Map domain display names to internal values
+    domain_map = {
+        "Sports": "sports",
+        "Vlogs": "vlogs",
+        "Music Videos": "music",
+        "Podcasts": "podcasts",
+        "Gaming": "gaming",
+        "Comedy": "comedy",
+        "General": "general",
+    }
+    # Map clip length display names to internal values
+    clip_length_map = {
+        "Short (30-60s)": "short",
+        "Long (1-3 min)": "long",
+    }
+    queue = []
+    test_id = 1
+    for video in videos:
+        video_name = Path(video).name if video else "unknown"
+        for domain in domains:
+            domain_value = domain_map.get(domain, "general")
+            for clip_length in clip_lengths:
+                clip_length_value = clip_length_map.get(clip_length, "short")
+                for prompt in prompt_list:
+                    queue.append({
+                        "test_id": test_id,
+                        "video_path": video,
+                        "video_name": video_name,
+                        "domain": domain,
+                        "domain_value": domain_value,
+                        "clip_length": clip_length,
+                        "clip_length_value": clip_length_value,
+                        "num_clips": num_clips,
+                        "reference_image": ref_image,
+                        "custom_prompt": prompt,
+                    })
+                    test_id += 1
+    return queue
+def run_single_batch_test(config: Dict[str, Any], output_base_dir: Path) -> Dict[str, Any]:
+    """Run a single test from the batch queue."""
+    from utils.helpers import validate_video_file
+    from pipeline.orchestrator import PipelineOrchestrator
+    test_id = config["test_id"]
+    video_path = config["video_path"]
+    video_name = config["video_name"]
+    domain_value = config["domain_value"]
+    clip_length = config["clip_length"]
+    clip_length_value = config["clip_length_value"]
+    num_clips = config["num_clips"]
+    ref_image = config["reference_image"]
+    custom_prompt = config["custom_prompt"]
+    # Create unique output folder for this test
+    prompt_suffix = "no_prompt" if not custom_prompt else f"prompt_{hash(custom_prompt) % 1000}"
+    test_folder = f"{Path(video_name).stem}_{domain_value}_{clip_length_value}_{prompt_suffix}"
+    output_dir = output_base_dir / test_folder
+    output_dir.mkdir(parents=True, exist_ok=True)
+    result_data = {
+        "test_id": test_id,
+        "video_name": video_name,
+        "domain": domain_value,
+        "clip_length": clip_length,
+        "custom_prompt": custom_prompt if custom_prompt else "none",
+        "num_clips": num_clips,
+        "status": "failed",
+        "error": None,
+        "processing_time": 0,
+        "frames_analyzed": 0,
+        "scenes_detected": 0,
+        "hooks_detected": 0,
+        "clips": [],
+        "clip_paths": [],
+    }
+    try:
+        # Validate video
+        validation = validate_video_file(video_path)
+        if not validation.is_valid:
+            result_data["error"] = validation.error_message
+            return result_data
+        # Initialize and run pipeline
+        pipeline = PipelineOrchestrator()
+        result = pipeline.process(
+            video_path=video_path,
+            num_clips=num_clips,
+            clip_length=clip_length_value,
+            domain=domain_value,
+            reference_image=ref_image,
+            custom_prompt=custom_prompt,
+        )
+        if result.success:
+            result_data["status"] = "success"
+            result_data["processing_time"] = round(result.processing_time, 2)
+            result_data["frames_analyzed"] = len(result.visual_features)
+            result_data["scenes_detected"] = len(result.scenes)
+            result_data["hooks_detected"] = sum(1 for s in result.scores if s.combined_score > 0.7) if result.scores else 0
+            # Copy clips and collect data
+            for i, clip in enumerate(result.clips):
+                if clip.clip_path.exists():
+                    clip_output = output_dir / f"clip_{i+1}.mp4"
+                    shutil.copy2(clip.clip_path, clip_output)
+                    result_data["clip_paths"].append(str(clip_output))
+                    # Find hook type for this clip
+                    hook_type = "none"
+                    hook_confidence = 0.0
+                    for score in result.scores:
+                        if abs(score.start_time - clip.start_time) < 1.0:
+                            if score.combined_score > 0.7:
+                                hook_confidence = score.combined_score
+                                if score.audio_score > score.visual_score and score.audio_score > score.motion_score:
+                                    hook_type = "audio_peak"
+                                elif score.motion_score > score.visual_score:
+                                    hook_type = "motion_spike"
+                                else:
+                                    hook_type = "visual_highlight"
+                            break
+                    result_data["clips"].append({
+                        "clip_id": i + 1,
+                        "start_time": round(clip.start_time, 2),
+                        "end_time": round(clip.end_time, 2),
+                        "duration": round(clip.duration, 2),
+                        "hype_score": round(clip.hype_score, 4),
+                        "visual_score": round(clip.visual_score, 4),
+                        "audio_score": round(clip.audio_score, 4),
+                        "motion_score": round(clip.motion_score, 4),
+                        "hook_type": hook_type,
+                        "hook_confidence": round(hook_confidence, 4),
+                    })
+        else:
+            result_data["error"] = result.error_message
+        pipeline.cleanup()
+    except Exception as e:
+        result_data["error"] = str(e)
+        logger.exception(f"Batch test {test_id} failed")
+    return result_data
+def results_to_dataframe(results: List[Dict[str, Any]]) -> pd.DataFrame:
+    """Convert batch results to a pandas DataFrame for display."""
+    rows = []
+    for r in results:
+        row = {
+            "Test ID": r["test_id"],
+            "Video": r["video_name"],
+            "Domain": r["domain"],
+            "Length": r["clip_length"],
+            "Prompt": r["custom_prompt"][:20] + "..." if len(r["custom_prompt"]) > 20 else r["custom_prompt"],
+            "Status": r["status"],
+            "Time (s)": r["processing_time"],
+            "Frames": r["frames_analyzed"],
+            "Hooks": r["hooks_detected"],
+        }
+        # Add clip scores
+        for i, clip in enumerate(r.get("clips", [])[:3]):
+            row[f"Clip {i+1} Hype"] = clip.get("hype_score", 0)
+        rows.append(row)
+    return pd.DataFrame(rows)
+def results_to_csv(results: List[Dict[str, Any]]) -> str:
+    """Convert results to CSV format."""
+    rows = []
+    for r in results:
+        row = {
+            "test_id": r["test_id"],
+            "video_name": r["video_name"],
+            "domain": r["domain"],
+            "clip_length": r["clip_length"],
+            "custom_prompt": r["custom_prompt"],
+            "num_clips": r["num_clips"],
+            "status": r["status"],
+            "error": r.get("error", ""),
+            "processing_time": r["processing_time"],
+            "frames_analyzed": r["frames_analyzed"],
+            "scenes_detected": r["scenes_detected"],
+            "hooks_detected": r["hooks_detected"],
+        }
+        # Add per-clip data
+        for i in range(3):
+            if i < len(r.get("clips", [])):
+                clip = r["clips"][i]
+                row[f"clip_{i+1}_start"] = clip["start_time"]
+                row[f"clip_{i+1}_end"] = clip["end_time"]
+                row[f"clip_{i+1}_hype"] = clip["hype_score"]
+                row[f"clip_{i+1}_visual"] = clip["visual_score"]
+                row[f"clip_{i+1}_audio"] = clip["audio_score"]
+                row[f"clip_{i+1}_motion"] = clip["motion_score"]
+                row[f"clip_{i+1}_hook_type"] = clip["hook_type"]
+            else:
+                row[f"clip_{i+1}_start"] = ""
+                row[f"clip_{i+1}_end"] = ""
+                row[f"clip_{i+1}_hype"] = ""
+                row[f"clip_{i+1}_visual"] = ""
+                row[f"clip_{i+1}_audio"] = ""
+                row[f"clip_{i+1}_motion"] = ""
+                row[f"clip_{i+1}_hook_type"] = ""
+        rows.append(row)
+    df = pd.DataFrame(rows)
+    return df.to_csv(index=False)
+def results_to_json(results: List[Dict[str, Any]]) -> str:
+    """Convert results to JSON format."""
+    # Remove clip_paths from export (they're temp files)
+    export_results = []
+    for r in results:
+        r_copy = r.copy()
+        r_copy.pop("clip_paths", None)
+        export_results.append(r_copy)
+    return json.dumps(export_results, indent=2)
+def create_clips_zip(results: List[Dict[str, Any]]) -> Optional[str]:
+    """Create a ZIP file of all extracted clips."""
+    zip_path = Path(tempfile.mkdtemp()) / "batch_clips.zip"
+    with zipfile.ZipFile(zip_path, 'w', zipfile.ZIP_DEFLATED) as zf:
+        for r in results:
+            if r["status"] == "success":
+                folder_name = f"{Path(r['video_name']).stem}_{r['domain']}_{r['clip_length']}"
+                if r["custom_prompt"] != "none":
+                    folder_name += f"_prompt"
+                for clip_path in r.get("clip_paths", []):
+                    if Path(clip_path).exists():
+                        arcname = f"{folder_name}/{Path(clip_path).name}"
+                        zf.write(clip_path, arcname)
+    return str(zip_path) if zip_path.exists() else None
+# Batch state (module level for simplicity)
+batch_state = {
+    "is_running": False,
+    "should_cancel": False,
+    "results": [],
+    "output_dir": None,
+}
+def run_batch_tests(
+    videos,
+    domains,
+    clip_lengths,
+    num_clips,
+    reference_image,
+    include_no_prompt,
+    prompt1,
+    prompt2,
+    prompt3,
+    progress=gr.Progress()
+):
+    """Main batch testing function."""
+    global batch_state
+    # Validate inputs
+    if not videos:
+        return "Please upload at least one video.", None, "", "", None, None, None
+    if not domains:
+        return "Please select at least one domain.", None, "", "", None, None, None
+    if not clip_lengths:
+        return "Please select at least one clip length.", None, "", "", None, None, None
+    # Collect prompts
+    prompts = [p for p in [prompt1, prompt2, prompt3] if p and p.strip()]
+    # Generate test queue
+    queue = generate_test_queue(
+        videos=videos,
+        domains=domains,
+        clip_lengths=clip_lengths,
+        num_clips=int(num_clips),
+        ref_image=reference_image,
+        prompts=prompts,
+        include_no_prompt=include_no_prompt,
+    )
+    if not queue:
+        return "No tests to run. Please check your configuration.", None, "", "", None, None, None
+    # Initialize batch state
+    batch_state["is_running"] = True
+    batch_state["should_cancel"] = False
+    batch_state["results"] = []
+    batch_state["output_dir"] = Path(tempfile.mkdtemp(prefix="shortsmith_batch_"))
+    total_tests = len(queue)
+    log_messages = []
+    def log(msg):
+        log_messages.append(f"[{time.strftime('%H:%M:%S')}] {msg}")
+        logger.info(msg)
+    log(f"Starting batch testing: {total_tests} tests")
+    log(f"Videos: {len(videos)}, Domains: {len(domains)}, Lengths: {len(clip_lengths)}, Prompts: {len(prompts) + (1 if include_no_prompt else 0)}")
+    # Run tests sequentially
+    for i, test_config in enumerate(queue):
+        if batch_state["should_cancel"]:
+            log("Batch cancelled by user")
+            break
+        test_id = test_config["test_id"]
+        video_name = test_config["video_name"]
+        domain = test_config["domain_value"]
+        clip_length = test_config["clip_length"]
+        prompt = test_config["custom_prompt"] or "no-prompt"
+        log(f"[{i+1}/{total_tests}] Testing: {video_name} | {domain} | {clip_length} | {prompt[:30]}...")
+        progress((i + 1) / total_tests, desc=f"Test {i+1}/{total_tests}: {video_name}")
+        # Run the test
+        result = run_single_batch_test(test_config, batch_state["output_dir"])
+        batch_state["results"].append(result)
+        if result["status"] == "success":
+            log(f"  ✓ Completed in {result['processing_time']}s")
+        else:
+            log(f"  ✗ Failed: {result.get('error', 'Unknown error')}")
+    # Finalize
+    batch_state["is_running"] = False
+    completed = len([r for r in batch_state["results"] if r["status"] == "success"])
+    failed = len([r for r in batch_state["results"] if r["status"] == "failed"])
+    log(f"Batch complete: {completed} succeeded, {failed} failed")
+    # Generate outputs
+    results_df = results_to_dataframe(batch_state["results"])
+    csv_content = results_to_csv(batch_state["results"])
+    json_content = results_to_json(batch_state["results"])
+    # Save CSV and JSON to files for download
+    csv_path = batch_state["output_dir"] / "results.csv"
+    json_path = batch_state["output_dir"] / "results.json"
+    csv_path.write_text(csv_content)
+    json_path.write_text(json_content)
+    # Create ZIP of clips
+    zip_path = create_clips_zip(batch_state["results"])
+    status = f"Batch complete: {completed}/{total_tests} tests succeeded"
+    return (
+        status,
+        results_df,
+        "\n".join(log_messages),
+        json_content,
+        str(csv_path),
+        str(json_path),
+        zip_path,
+    )
+def cancel_batch():
+    """Cancel the running batch."""
+    global batch_state
+    batch_state["should_cancel"] = True
+    return "Cancelling batch... (will stop after current test completes)"
+def calculate_queue_size(videos, domains, clip_lengths, include_no_prompt, prompt1, prompt2, prompt3):
+    """Calculate and display the queue size."""
+    num_videos = len(videos) if videos else 0
+    num_domains = len(domains) if domains else 0
+    num_lengths = len(clip_lengths) if clip_lengths else 0
+    prompts = [p for p in [prompt1, prompt2, prompt3] if p and p.strip()]
+    num_prompts = len(prompts) + (1 if include_no_prompt else 0)
+    if num_prompts == 0:
+        num_prompts = 1  # Default to no-prompt if nothing selected
+    total = num_videos * num_domains * num_lengths * num_prompts
+    return f"Queue: {num_videos} video(s) × {num_domains} domain(s) × {num_lengths} length(s) × {num_prompts} prompt(s) = **{total} tests**"
+def generate_clip_preview(num_clips):
+    """Generate HTML preview of expected clips."""
+    if num_clips <= 0:
+        return "<p>No clips to preview</p>"
+    preview_html = '<div style="display: flex; flex-wrap: wrap; gap: 10px; margin: 10px 0;">'
+    for i in range(num_clips):
+        preview_html += f'''
+        <div style="border: 2px dashed #ccc; border-radius: 8px; padding: 20px; text-align: center; width: 150px; background: #0b1619;">
+            <div style="font-size: 48px; color: ##f2fafc; margin-bottom: 10px;">🎥</div>
+            <div style="font-weight: bold; color: ##f2fafc;">Clip {i+1}</div>
+            <div style="font-size: 12px; color: ##a1a5a6;">Processing...</div>
+        </div>
+        '''
+    preview_html += '</div>'
+    return preview_html
+# =============================================================================
+# Build Gradio Interface
+# =============================================================================
+with gr.Blocks(
+    title="ShortSmith v3",
+    theme=gr.themes.Soft(),
+    css="""
+    .container { max-width: 1200px; margin: auto; }
+    .output-video { min-height: 200px; }
+    """
+) as demo:
+    gr.Markdown("""
+    # ShortSmith v3
+    ### AI-Powered Video Highlight Extractor
+    Upload a video and automatically extract the most engaging highlight clips using AI analysis.
+    """)
+    with gr.Tabs():
+        # =================================================================
+        # Tab 1: Single Video
+        # =================================================================
+        with gr.TabItem("Single Video"):
+            with gr.Row():
+                # Left column - Inputs
+                with gr.Column(scale=1):
+                    gr.Markdown("### Input")
+                    video_input = gr.Video(
+                        label="Upload Video",
+                        sources=["upload"],
+                    )
+                    with gr.Accordion("Settings", open=True):
+                        domain_dropdown = gr.Dropdown(
+                            choices=["Sports", "Vlogs", "Music Videos", "Podcasts", "Gaming", "Comedy", "General"],
+                            value="General",
+                            label="Content Domain",
+                            info="Select the type of content for optimized scoring"
+                        )
+                        with gr.Row():
+                            num_clips_slider = gr.Slider(
+                                minimum=1,
+                                maximum=5,
+                                value=3,
+                                step=1,
+                                label="Number of Clips",
+                                info="How many highlight clips to extract"
+                            )
+                            clip_length_radio = gr.Radio(
+                                choices=["Short (30-60s)", "Long (1-3 min)"],
+                                value="Short (30-60s)",
+                                label="Clip Length",
+                                info="Short clips for social media, long clips for YouTube"
+                            )
+                    with gr.Accordion("Person Filtering (Optional)", open=False):
+                        reference_image = gr.Image(
+                            label="Reference Image",
+                            type="filepath",
+                            sources=["upload"],
+                        )
+                        gr.Markdown("*Upload a photo of a person to prioritize clips featuring them.*")
+                    with gr.Accordion("Custom Instructions (Optional)", open=False):
+                        custom_prompt = gr.Textbox(
+                            label="Additional Instructions",
+                            placeholder="E.g., 'Focus on crowd reactions' or 'Prioritize action scenes'",
+                            lines=2,
+                        )
+                    process_btn = gr.Button(
+                        "Extract Highlights",
+                        variant="primary",
+                        size="lg"
+                    )
+                # Right column - Outputs
+                with gr.Column(scale=1):
+                    gr.Markdown("### Output")
+                    status_output = gr.Textbox(
+                        label="Status",
+                        lines=2,
+                        interactive=False
+                    )
+                    gr.Markdown("#### Extracted Clips")
+                    clips_output = gr.HTML(label="Extracted Clips")
+                    with gr.Accordion("Processing Log", open=True):
+                        log_output = gr.Textbox(
+                            label="Log",
+                            lines=10,
+                            interactive=False,
+                            show_copy_button=True
+                        )
+                    with gr.Accordion("Automated Metrics (System-Generated)", open=True):
+                        metrics_output = gr.Textbox(
+                            label="Metrics for Testing",
+                            lines=20,
+                            interactive=False,
+                            show_copy_button=True,
+                            info="Copy these metrics for evaluation spreadsheets"
+                        )
+            # Connect single video processing
+            process_btn.click(
+                fn=process_video,
+                inputs=[
+                    video_input,
+                    domain_dropdown,
+                    num_clips_slider,
+                    clip_length_radio,
+                    reference_image,
+                    custom_prompt
+                ],
+                outputs=[
+                    status_output,
+                    clips_output,
+                    log_output,
+                    metrics_output
+                ],
+                show_progress="full"
+            )
+            # Update preview when num_clips changes
+            num_clips_slider.change(
+                fn=generate_clip_preview,
+                inputs=[num_clips_slider],
+                outputs=[clips_output]
+            )
+            # Initialize preview with default value
+            demo.load(
+                fn=lambda: generate_clip_preview(3),
+                inputs=[],
+                outputs=[clips_output]
+            )
+        # =================================================================
+        # Tab 2: Batch Testing
+        # =================================================================
+        with gr.TabItem("Batch Testing"):
+            with gr.Row():
+                # Left column - Configuration
+                with gr.Column(scale=1):
+                    gr.Markdown("### Batch Configuration")
+                    batch_videos = gr.File(
+                        label="Upload Video(s)",
+                        file_count="multiple",
+                        file_types=["video"],
+                    )
+                    gr.Markdown("#### Domains to Test")
+                    batch_domains = gr.CheckboxGroup(
+                        choices=["Sports", "Vlogs", "Music Videos", "Podcasts", "Gaming", "Comedy", "General"],
+                        value=["General"],
+                        label="Select domains",
+                    )
+                    gr.Markdown("#### Clip Lengths to Test")
+                    batch_clip_lengths = gr.CheckboxGroup(
+                        choices=["Short (30-60s)", "Long (1-3 min)"],
+                        value=["Short (30-60s)"],
+                        label="Select clip lengths",
+                    )
+                    batch_num_clips = gr.Slider(
+                        minimum=1,
+                        maximum=5,
+                        value=3,
+                        step=1,
+                        label="Number of Clips per Test",
+                    )
+                    with gr.Accordion("Custom Prompts", open=True):
+                        batch_no_prompt = gr.Checkbox(
+                            label="Include no-prompt baseline",
+                            value=True,
+                            info="Test without any custom prompt for comparison"
+                        )
+                        batch_prompt1 = gr.Textbox(
+                            label="Prompt 1",
+                            placeholder="E.g., 'Focus on action moments'",
+                            lines=1,
+                        )
+                        batch_prompt2 = gr.Textbox(
+                            label="Prompt 2",
+                            placeholder="E.g., 'Find crowd reactions'",
+                            lines=1,
+                        )
+                        batch_prompt3 = gr.Textbox(
+                            label="Prompt 3",
+                            placeholder="E.g., 'Prioritize emotional moments'",
+                            lines=1,
+                        )
+                    with gr.Accordion("Reference Image (Optional)", open=False):
+                        batch_ref_image = gr.Image(
+                            label="Reference Image (applies to all tests)",
+                            type="filepath",
+                            sources=["upload"],
+                        )
+                    # Queue size indicator
+                    queue_info = gr.Markdown("Queue: 0 tests")
+                    with gr.Row():
+                        batch_start_btn = gr.Button(
+                            "Start Batch",
+                            variant="primary",
+                            size="lg"
+                        )
+                        batch_cancel_btn = gr.Button(
+                            "Cancel",
+                            variant="secondary",
+                            size="lg"
+                        )
+                # Right column - Results
+                with gr.Column(scale=1):
+                    gr.Markdown("### Results")
+                    batch_status = gr.Textbox(
+                        label="Status",
+                        lines=2,
+                        interactive=False
+                    )
+                    batch_results_table = gr.Dataframe(
+                        label="Test Results",
+                        headers=["Test ID", "Video", "Domain", "Length", "Prompt", "Status", "Time (s)", "Frames", "Hooks"],
+                        interactive=False,
+                    )
+                    with gr.Accordion("Processing Log", open=True):
+                        batch_log = gr.Textbox(
+                            label="Log",
+                            lines=15,
+                            interactive=False,
+                            show_copy_button=True
+                        )
+                    with gr.Accordion("Full Results (JSON)", open=False):
+                        batch_json = gr.Textbox(
+                            label="JSON Output",
+                            lines=10,
+                            interactive=False,
+                            show_copy_button=True
+                        )
+                    gr.Markdown("#### Download Results")
+                    with gr.Row():
+                        csv_download = gr.File(label="CSV Results")
+                        json_download = gr.File(label="JSON Results")
+                        zip_download = gr.File(label="All Clips (ZIP)")
+            # Update queue size when inputs change
+            queue_inputs = [batch_videos, batch_domains, batch_clip_lengths, batch_no_prompt, batch_prompt1, batch_prompt2, batch_prompt3]
+            for inp in queue_inputs:
+                inp.change(
+                    fn=calculate_queue_size,
+                    inputs=queue_inputs,
+                    outputs=queue_info
+                )
+            # Connect batch processing
+            batch_start_btn.click(
+                fn=run_batch_tests,
+                inputs=[
+                    batch_videos,
+                    batch_domains,
+                    batch_clip_lengths,
+                    batch_num_clips,
+                    batch_ref_image,
+                    batch_no_prompt,
+                    batch_prompt1,
+                    batch_prompt2,
+                    batch_prompt3,
+                ],
+                outputs=[
+                    batch_status,
+                    batch_results_table,
+                    batch_log,
+                    batch_json,
+                    csv_download,
+                    json_download,
+                    zip_download,
+                ],
+                show_progress="full"
+            )
+            batch_cancel_btn.click(
+                fn=cancel_batch,
+                inputs=[],
+                outputs=[batch_status]
+            )
+    gr.Markdown("""
+    ---
+    **ShortSmith v3** | Powered by Qwen2-VL, InsightFace, and Librosa |
+    [GitHub](https://github.com) | Built with Gradio
+    """)
+# Launch the app
+if __name__ == "__main__":
+    demo.queue()
+    demo.launch(
+        server_name="0.0.0.0",
+        server_port=7860,
+        show_error=True
+    )
+else:
+    # For HuggingFace Spaces
+    demo.queue()
+    demo.launch()

config.py CHANGED Viewed

@@ -1,201 +1,201 @@
-"""
-ShortSmith v2 - Configuration Module
-Centralized configuration for all components including model paths,
-thresholds, domain presets, and runtime settings.
-"""
-import os
-from dataclasses import dataclass, field
-from typing import Dict, Optional
-from enum import Enum
-class ContentDomain(Enum):
-    """Supported content domains with different hype characteristics."""
-    SPORTS = "sports"
-    VLOGS = "vlogs"
-    MUSIC = "music"
-    PODCASTS = "podcasts"
-    GAMING = "gaming"
-    COMEDY = "comedy"
-    GENERAL = "general"
-class ClipLength(Enum):
-    """Clip length presets - flexible ranges instead of fixed durations."""
-    SHORT = "short"  # 30-60 seconds (soft bounds, can vary by ~5-10s)
-    LONG = "long"    # Up to 3 minutes
-# Clip length configuration - just min/max constraints, algorithm finds natural boundaries
-CLIP_LENGTH_CONFIG = {
-    ClipLength.SHORT: {
-        "min": 30.0,         # Minimum duration in seconds
-        "max": 60.0,         # Maximum duration in seconds
-        "description": "30-60 seconds",
-    },
-    ClipLength.LONG: {
-        "min": 60.0,         # Minimum duration in seconds
-        "max": 180.0,        # Maximum duration in seconds (3 min)
-        "description": "1-3 minutes",
-    },
-}
-@dataclass
-class DomainWeights:
-    """Weight configuration for visual vs audio scoring per domain."""
-    visual_weight: float
-    audio_weight: float
-    motion_weight: float = 0.0
-    def __post_init__(self):
-        """Normalize weights to sum to 1.0."""
-        total = self.visual_weight + self.audio_weight + self.motion_weight
-        if total > 0:
-            self.visual_weight /= total
-            self.audio_weight /= total
-            self.motion_weight /= total
-@dataclass
-class ModelConfig:
-    """Configuration for AI models."""
-    # Visual model (Qwen2-VL)
-    visual_model_id: str = "Qwen/Qwen2-VL-2B-Instruct"
-    visual_model_quantization: str = "int4"  # Options: "int4", "int8", "none"
-    visual_max_frames: int = 32
-    # Audio model
-    audio_model_id: str = "facebook/wav2vec2-base-960h"
-    use_advanced_audio: bool = False  # Use Wav2Vec2 instead of just Librosa
-    # Face recognition (InsightFace)
-    face_detection_model: str = "buffalo_l"  # SCRFD model
-    face_similarity_threshold: float = 0.4
-    # Body recognition (OSNet)
-    body_model_name: str = "osnet_x1_0"
-    body_similarity_threshold: float = 0.5
-    # Motion detection (RAFT)
-    motion_model: str = "raft-things"
-    motion_threshold: float = 5.0
-    # Device settings
-    device: str = "cuda"  # Options: "cuda", "cpu", "mps"
-    def __post_init__(self):
-        """Validate and adjust device based on availability."""
-        import torch
-        if self.device == "cuda" and not torch.cuda.is_available():
-            self.device = "cpu"
-        elif self.device == "mps" and not (hasattr(torch.backends, "mps") and torch.backends.mps.is_available()):
-            self.device = "cpu"
-@dataclass
-class ProcessingConfig:
-    """Configuration for video processing pipeline."""
-    # Sampling settings
-    coarse_sample_interval: float = 5.0  # Seconds between frames in first pass
-    dense_sample_fps: float = 3.0  # FPS for dense sampling on candidates
-    min_motion_for_dense: float = 2.0  # Threshold to trigger dense sampling
-    # Clip settings - these are safety bounds, actual duration comes from CLIP_LENGTH_CONFIG
-    min_clip_duration: float = 15.0  # Absolute minimum clip length (safety floor)
-    max_clip_duration: float = 180.0  # Absolute maximum clip length (safety ceiling for LONG preset)
-    default_clip_duration: float = 45.0  # Default clip length (midpoint of SHORT range)
-    min_gap_between_clips: float = 30.0  # Minimum gap between clip starts
-    # Output settings
-    default_num_clips: int = 3
-    max_num_clips: int = 10
-    output_format: str = "mp4"
-    output_codec: str = "libx264"
-    output_audio_codec: str = "aac"
-    # Scene detection
-    scene_threshold: float = 27.0  # PySceneDetect threshold
-    # Hype scoring
-    hype_threshold: float = 0.3  # Minimum normalized score to consider
-    diversity_weight: float = 0.2  # Weight for temporal diversity in ranking
-    # Performance
-    batch_size: int = 8  # Frames per batch for model inference
-    max_video_duration: float = 7200.0  # Maximum video length (2 hours)
-    # Temporary files
-    temp_dir: Optional[str] = None
-    cleanup_temp: bool = True
-@dataclass
-class AppConfig:
-    """Main application configuration."""
-    model: ModelConfig = field(default_factory=ModelConfig)
-    processing: ProcessingConfig = field(default_factory=ProcessingConfig)
-    # Logging
-    log_level: str = "INFO"
-    log_file: Optional[str] = "shortsmith.log"
-    log_to_console: bool = True
-    # API settings (for future extensibility)
-    api_key: Optional[str] = None
-    # UI settings
-    share_gradio: bool = False
-    server_port: int = 7860
-    @classmethod
-    def from_env(cls) -> "AppConfig":
-        """Create configuration from environment variables."""
-        config = cls()
-        # Override from environment
-        if os.environ.get("SHORTSMITH_LOG_LEVEL"):
-            config.log_level = os.environ["SHORTSMITH_LOG_LEVEL"]
-        if os.environ.get("SHORTSMITH_DEVICE"):
-            config.model.device = os.environ["SHORTSMITH_DEVICE"]
-        if os.environ.get("SHORTSMITH_API_KEY"):
-            config.api_key = os.environ["SHORTSMITH_API_KEY"]
-        return config
-# Global configuration instance
-_config: Optional[AppConfig] = None
-def get_config() -> AppConfig:
-    """Get the global configuration instance."""
-    global _config
-    if _config is None:
-        _config = AppConfig.from_env()
-    return _config
-def set_config(config: AppConfig) -> None:
-    """Set the global configuration instance."""
-    global _config
-    _config = config
-# Export commonly used items
-__all__ = [
-    "ContentDomain",
-    "ClipLength",
-    "CLIP_LENGTH_CONFIG",
-    "DomainWeights",
-    "ModelConfig",
-    "ProcessingConfig",
-    "AppConfig",
-    "get_config",
-    "set_config",
-]

+"""
+ShortSmith v2 - Configuration Module
+Centralized configuration for all components including model paths,
+thresholds, domain presets, and runtime settings.
+"""
+import os
+from dataclasses import dataclass, field
+from typing import Dict, Optional
+from enum import Enum
+class ContentDomain(Enum):
+    """Supported content domains with different hype characteristics."""
+    SPORTS = "sports"
+    VLOGS = "vlogs"
+    MUSIC = "music"
+    PODCASTS = "podcasts"
+    GAMING = "gaming"
+    COMEDY = "comedy"
+    GENERAL = "general"
+class ClipLength(Enum):
+    """Clip length presets - flexible ranges instead of fixed durations."""
+    SHORT = "short"  # 30-60 seconds (soft bounds, can vary by ~5-10s)
+    LONG = "long"    # Up to 3 minutes
+# Clip length configuration - just min/max constraints, algorithm finds natural boundaries
+CLIP_LENGTH_CONFIG = {
+    ClipLength.SHORT: {
+        "min": 30.0,         # Minimum duration in seconds
+        "max": 60.0,         # Maximum duration in seconds
+        "description": "30-60 seconds",
+    },
+    ClipLength.LONG: {
+        "min": 60.0,         # Minimum duration in seconds
+        "max": 180.0,        # Maximum duration in seconds (3 min)
+        "description": "1-3 minutes",
+    },
+}
+@dataclass
+class DomainWeights:
+    """Weight configuration for visual vs audio scoring per domain."""
+    visual_weight: float
+    audio_weight: float
+    motion_weight: float = 0.0
+    def __post_init__(self):
+        """Normalize weights to sum to 1.0."""
+        total = self.visual_weight + self.audio_weight + self.motion_weight
+        if total > 0:
+            self.visual_weight /= total
+            self.audio_weight /= total
+            self.motion_weight /= total
+@dataclass
+class ModelConfig:
+    """Configuration for AI models."""
+    # Visual model (Qwen2-VL)
+    visual_model_id: str = "Qwen/Qwen2-VL-2B-Instruct"
+    visual_model_quantization: str = "int4"  # Options: "int4", "int8", "none"
+    visual_max_frames: int = 32
+    # Audio model
+    audio_model_id: str = "facebook/wav2vec2-base-960h"
+    use_advanced_audio: bool = False  # Use Wav2Vec2 instead of just Librosa
+    # Face recognition (InsightFace)
+    face_detection_model: str = "buffalo_l"  # SCRFD model
+    face_similarity_threshold: float = 0.4
+    # Body recognition (OSNet)
+    body_model_name: str = "osnet_x1_0"
+    body_similarity_threshold: float = 0.5
+    # Motion detection (RAFT)
+    motion_model: str = "raft-things"
+    motion_threshold: float = 5.0
+    # Device settings
+    device: str = "cuda"  # Options: "cuda", "cpu", "mps"
+    def __post_init__(self):
+        """Validate and adjust device based on availability."""
+        import torch
+        if self.device == "cuda" and not torch.cuda.is_available():
+            self.device = "cpu"
+        elif self.device == "mps" and not (hasattr(torch.backends, "mps") and torch.backends.mps.is_available()):
+            self.device = "cpu"
+@dataclass
+class ProcessingConfig:
+    """Configuration for video processing pipeline."""
+    # Sampling settings
+    coarse_sample_interval: float = 5.0  # Seconds between frames in first pass
+    dense_sample_fps: float = 3.0  # FPS for dense sampling on candidates
+    min_motion_for_dense: float = 2.0  # Threshold to trigger dense sampling
+    # Clip settings - these are safety bounds, actual duration comes from CLIP_LENGTH_CONFIG
+    min_clip_duration: float = 15.0  # Absolute minimum clip length (safety floor)
+    max_clip_duration: float = 180.0  # Absolute maximum clip length (safety ceiling for LONG preset)
+    default_clip_duration: float = 45.0  # Default clip length (midpoint of SHORT range)
+    min_gap_between_clips: float = 30.0  # Minimum gap between clip starts
+    # Output settings
+    default_num_clips: int = 3
+    max_num_clips: int = 10
+    output_format: str = "mp4"
+    output_codec: str = "libx264"
+    output_audio_codec: str = "aac"
+    # Scene detection
+    scene_threshold: float = 27.0  # PySceneDetect threshold
+    # Hype scoring
+    hype_threshold: float = 0.3  # Minimum normalized score to consider
+    diversity_weight: float = 0.2  # Weight for temporal diversity in ranking
+    # Performance
+    batch_size: int = 8  # Frames per batch for model inference
+    max_video_duration: float = 7200.0  # Maximum video length (2 hours)
+    # Temporary files
+    temp_dir: Optional[str] = None
+    cleanup_temp: bool = True
+@dataclass
+class AppConfig:
+    """Main application configuration."""
+    model: ModelConfig = field(default_factory=ModelConfig)
+    processing: ProcessingConfig = field(default_factory=ProcessingConfig)
+    # Logging
+    log_level: str = "INFO"
+    log_file: Optional[str] = "shortsmith.log"
+    log_to_console: bool = True
+    # API settings (for future extensibility)
+    api_key: Optional[str] = None
+    # UI settings
+    share_gradio: bool = False
+    server_port: int = 7860
+    @classmethod
+    def from_env(cls) -> "AppConfig":
+        """Create configuration from environment variables."""
+        config = cls()
+        # Override from environment
+        if os.environ.get("SHORTSMITH_LOG_LEVEL"):
+            config.log_level = os.environ["SHORTSMITH_LOG_LEVEL"]
+        if os.environ.get("SHORTSMITH_DEVICE"):
+            config.model.device = os.environ["SHORTSMITH_DEVICE"]
+        if os.environ.get("SHORTSMITH_API_KEY"):
+            config.api_key = os.environ["SHORTSMITH_API_KEY"]
+        return config
+# Global configuration instance
+_config: Optional[AppConfig] = None
+def get_config() -> AppConfig:
+    """Get the global configuration instance."""
+    global _config
+    if _config is None:
+        _config = AppConfig.from_env()
+    return _config
+def set_config(config: AppConfig) -> None:
+    """Set the global configuration instance."""
+    global _config
+    _config = config
+# Export commonly used items
+__all__ = [
+    "ContentDomain",
+    "ClipLength",
+    "CLIP_LENGTH_CONFIG",
+    "DomainWeights",
+    "ModelConfig",
+    "ProcessingConfig",
+    "AppConfig",
+    "get_config",
+    "set_config",
+]

requirements.txt CHANGED Viewed

@@ -1,103 +1,103 @@
-# ShortSmith v2 - Requirements
-# For Hugging Face Spaces deployment
-# ============================================
-# Core Dependencies
-# ============================================
-# Gradio UI framework
-gradio==4.44.1
-# Pin pydantic to fix "argument of type 'bool' is not iterable" error
-pydantic==2.10.6
-# Deep learning frameworks
-torch>=2.0.0
-torchvision>=0.15.0
-torchaudio>=2.0.0
-# Transformers and model loading
-transformers>=4.35.0
-accelerate>=0.24.0
-bitsandbytes>=0.41.0  # For INT4/INT8 quantization
-# ============================================
-# Video Processing
-# ============================================
-# Video I/O
-ffmpeg-python>=0.2.0
-opencv-python-headless>=4.8.0
-# Scene detection
-scenedetect[opencv]>=0.6.0
-# ============================================
-# Audio Processing
-# ============================================
-# Audio analysis
-librosa>=0.10.0
-soundfile>=0.12.0
-# Optional: Advanced audio understanding
-# wav2vec2 is loaded via transformers
-# ============================================
-# Computer Vision Models
-# ============================================
-# Face recognition
-insightface>=0.7.0
-onnxruntime-gpu>=1.16.0  # Use onnxruntime for CPU-only
-# Person detection (YOLO)
-ultralytics>=8.0.0
-# Image processing
-Pillow>=10.0.0
-# ============================================
-# Utilities
-# ============================================
-# Numerical computing
-numpy>=1.24.0
-# Progress bars
-tqdm>=4.65.0
-# ============================================
-# Hugging Face Specific
-# ============================================
-# For model downloading
-huggingface_hub>=0.17.0
-# Qwen2-VL specific utilities
-qwen-vl-utils>=0.0.2
-# ============================================
-# Optional: GPU Acceleration
-# ============================================
-# Uncomment for specific CUDA versions if needed
-# --extra-index-url https://download.pytorch.org/whl/cu118
-# torch==2.1.0+cu118
-# torchvision==0.16.0+cu118
-# ============================================
-# Training Dependencies (optional)
-# ============================================
-# For loading Mr. HiSum dataset
-h5py>=3.9.0
-# ============================================
-# Development Dependencies (optional)
-# ============================================
-# pytest>=7.0.0
-# black>=23.0.0
-# isort>=5.0.0
-# mypy>=1.0.0

+# ShortSmith v2 - Requirements
+# For Hugging Face Spaces deployment
+# ============================================
+# Core Dependencies
+# ============================================
+# Gradio UI framework
+gradio==4.44.1
+# Pin pydantic to fix "argument of type 'bool' is not iterable" error
+pydantic==2.10.6
+# Deep learning frameworks
+torch>=2.0.0
+torchvision>=0.15.0
+torchaudio>=2.0.0
+# Transformers and model loading
+transformers>=4.35.0
+accelerate>=0.24.0
+bitsandbytes>=0.41.0  # For INT4/INT8 quantization
+# ============================================
+# Video Processing
+# ============================================
+# Video I/O
+ffmpeg-python>=0.2.0
+opencv-python-headless>=4.8.0
+# Scene detection
+scenedetect[opencv]>=0.6.0
+# ============================================
+# Audio Processing
+# ============================================
+# Audio analysis
+librosa>=0.10.0
+soundfile>=0.12.0
+# Optional: Advanced audio understanding
+# wav2vec2 is loaded via transformers
+# ============================================
+# Computer Vision Models
+# ============================================
+# Face recognition
+insightface>=0.7.0
+onnxruntime-gpu>=1.16.0  # Use onnxruntime for CPU-only
+# Person detection (YOLO)
+ultralytics>=8.0.0
+# Image processing
+Pillow>=10.0.0
+# ============================================
+# Utilities
+# ============================================
+# Numerical computing
+numpy>=1.24.0
+# Progress bars
+tqdm>=4.65.0
+# ============================================
+# Hugging Face Specific
+# ============================================
+# For model downloading
+huggingface_hub>=0.17.0
+# Qwen2-VL specific utilities
+qwen-vl-utils>=0.0.2
+# ============================================
+# Optional: GPU Acceleration
+# ============================================
+# Uncomment for specific CUDA versions if needed
+# --extra-index-url https://download.pytorch.org/whl/cu118
+# torch==2.1.0+cu118
+# torchvision==0.16.0+cu118
+# ============================================
+# Training Dependencies (optional)
+# ============================================
+# For loading Mr. HiSum dataset
+h5py>=3.9.0
+# ============================================
+# Development Dependencies (optional)
+# ============================================
+# pytest>=7.0.0
+# black>=23.0.0
+# isort>=5.0.0
+# mypy>=1.0.0

space.yaml CHANGED Viewed

@@ -1,31 +1,31 @@
----
-title: ShortSmith v2
-emoji: 🎬
-colorFrom: purple
-colorTo: blue
-sdk: gradio
-sdk_version: "4.44.1"
-app_file: app.py
-pinned: false
-license: mit
-tags:
-  - video
-  - highlight-detection
-  - ai
-  - qwen
-  - computer-vision
-  - audio-analysis
-short_description: AI-Powered Video Highlight Extractor
----
-# ShortSmith v2
-Extract the most engaging highlight clips from your videos automatically using AI.
-## Features
-- Multi-modal analysis (visual + audio + motion)
-- Domain-optimized presets (Sports, Music, Vlogs, etc.)
-- Person-specific filtering
-- Scene-aware clip cutting
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+---
+title: ShortSmith v2
+emoji: 🎬
+colorFrom: purple
+colorTo: blue
+sdk: gradio
+sdk_version: "4.44.1"
+app_file: app.py
+pinned: false
+license: mit
+tags:
+  - video
+  - highlight-detection
+  - ai
+  - qwen
+  - computer-vision
+  - audio-analysis
+short_description: AI-Powered Video Highlight Extractor
+---
+# ShortSmith v2
+Extract the most engaging highlight clips from your videos automatically using AI.
+## Features
+- Multi-modal analysis (visual + audio + motion)
+- Domain-optimized presets (Sports, Music, Vlogs, etc.)
+- Person-specific filtering
+- Scene-aware clip cutting
+Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference