jw-search / docs /TESTING_GUIDE.md
jw-tools's picture
deploy: latest main (lazy-ML cold start, durable launcher, web-image search, scene search) + full-app data refresh
7ea1851 verified
# Testing Guide: Phase 1-3 Implementation
This guide covers all features implemented in Phases 1-3 of the video processing enhancement project. Use this document to validate each feature works correctly before processing your 3,000+ video library.
## Overview of Changes
### Phase 1: Database Schema & Model Versioning
- Model version tracking (SigLIP, ArcFace, labels hash)
- New database columns for processing metadata
- FTS5 to regular table migration for proper score sorting
### Phase 2: Processing Improvements
- Category-specific scene detection thresholds
- Enhanced hybrid approach (scene + interval fallback)
- Face crop size increased to 256x256 pixels
### Phase 3: Settings UI
- Scene threshold controls in Settings page
- Preset configurations (static, moderate, dynamic)
- Minimum thumbnails configuration
---
## Test 1: Backend Settings Module
### 1.1 Verify Presets Load Correctly
```bash
cd /home/user/Search-UI/backend
python3 -c "
import settings
print('Presets:')
for name, config in settings.PRESET_SETTINGS.items():
print(f' {name}: threshold={config[\"scene_threshold\"]}, interval={config[\"thumbnail_interval\"]}s, min={config[\"min_thumbnails\"]}')
print(f'\nDefaults: {settings.get_defaults()}')
"
```
**Expected Output:**
```
Presets:
static: threshold=0.12, interval=10.0s, min=30
moderate: threshold=0.2, interval=5.0s, min=35
dynamic: threshold=0.35, interval=2.0s, min=50
Defaults: {'scene_threshold': 0.29, 'thumbnail_interval': 3.0, 'min_thumbnails': 30}
```
### 1.2 Test get_processing_settings()
```bash
cd /home/user/Search-UI/backend
python3 -c "
import settings
print(f'Default: {settings.get_processing_settings()}')
settings.set_scene_threshold(0.35)
print(f'Updated: {settings.get_processing_settings()}')
"
```
**Expected Output:**
- Default returns `scene_threshold=0.2`
- Updated returns `scene_threshold=0.35`
---
## Test 2: Search Images Model Versioning
### 2.1 Verify Version Constants
```bash
cd /home/user/Search-UI/backend
python3 -c "
# Check the source file for constants
with open('search_images.py', 'r') as f:
content = f.read()
import re
version_match = re.search(r'SIGLIP_MODEL_VERSION = \"([^\"]+)\"', content)
threshold_match = re.search(r'CLASSIFICATION_THRESHOLD = ([0-9.]+)', content)
print(f'SIGLIP_MODEL_VERSION: {version_match.group(1) if version_match else \"NOT FOUND\"}')
print(f'CLASSIFICATION_THRESHOLD: {threshold_match.group(1) if threshold_match else \"NOT FOUND\"}')
print('get_labels_version(): defined' if 'def get_labels_version()' in content else 'get_labels_version(): NOT FOUND')
"
```
**Expected Output:**
```
SIGLIP_MODEL_VERSION: siglip2-so400m-patch16-naflex-v1
CLASSIFICATION_THRESHOLD: 0.1
get_labels_version(): defined
```
### 2.2 Verify Database Schema (processed_videos table)
```bash
cd /home/user/Search-UI/backend
python3 -c "
from database import get_db
conn = get_db()
cursor = conn.execute('PRAGMA table_info(processed_videos)')
columns = [row[1] for row in cursor.fetchall()]
required = ['siglip_model_version', 'labels_version', 'classification_threshold',
'resolution_label', 'scene_threshold', 'duration_seconds', 'fps', 'total_frames']
for col in required:
status = '✓' if col in columns else '✗'
print(f'{status} {col}')
"
```
**Expected:** All columns marked with ✓
### 2.3 Verify image_categories is Regular Table (not FTS5)
```bash
cd /home/user/Search-UI/backend
python3 -c "
from database import get_db
conn = get_db()
# Check if it's a regular table (not FTS5 virtual table)
cursor = conn.execute(\"SELECT sql FROM sqlite_master WHERE name='image_categories'\")
row = cursor.fetchone()
if row:
sql = row[0]
if 'VIRTUAL TABLE' in sql.upper():
print('✗ Still FTS5 virtual table')
else:
print('✓ Regular table')
print(f' Schema: {sql[:100]}...')
else:
print('✗ Table not found')
# Check indexes
cursor = conn.execute(\"SELECT name FROM sqlite_master WHERE type='index' AND tbl_name='image_categories'\")
indexes = [row[0] for row in cursor.fetchall()]
print(f' Indexes: {indexes}')
"
```
**Expected:**
- Regular table (not virtual)
- Indexes include: idx_categories_natural_key, idx_categories_category_name, idx_categories_score
---
## Test 3: Face Search Constants
### 3.1 Verify Constants
```bash
cd /home/user/Search-UI/backend
python3 -c "
with open('face_search.py', 'r') as f:
content = f.read()
import re
crop_match = re.search(r'FACE_CROP_SIZE = (\d+)', content)
conf_match = re.search(r'MIN_FACE_CONFIDENCE = ([0-9.]+)', content)
version_match = re.search(r'ARCFACE_MODEL_VERSION = \"([^\"]+)\"', content)
print(f'FACE_CROP_SIZE: {crop_match.group(1) if crop_match else \"NOT FOUND\"} (expected: 256)')
print(f'MIN_FACE_CONFIDENCE: {conf_match.group(1) if conf_match else \"NOT FOUND\"}')
print(f'ARCFACE_MODEL_VERSION: {version_match.group(1) if version_match else \"NOT FOUND\"}')
"
```
**Expected:**
```
FACE_CROP_SIZE: 256 (expected: 256)
MIN_FACE_CONFIDENCE: 0.5
ARCFACE_MODEL_VERSION: arcface-mtcnn-v1
```
---
## Test 4: Process Video Integration
### 4.1 Verify Scene Threshold Usage
```bash
cd /home/user/Search-UI/backend
grep -n "scene_threshold = processing_settings" process_video.py
grep -n "gt(scene" process_video.py
```
**Expected:** Both patterns found, showing scene_threshold is used dynamically from settings.
### 4.2 Verify FACE_CROP_SIZE Usage
```bash
cd /home/user/Search-UI/backend
grep -n "face_search.FACE_CROP_SIZE" process_video.py
```
**Expected:** Found in process_video.py (not hardcoded 128)
### 4.3 Verify Enhanced Hybrid Approach
```bash
cd /home/user/Search-UI/backend
grep -n "fallback_threshold" process_video.py
grep -n "int(min_thumbnails \* 0.5)" process_video.py
```
**Expected:** 50% threshold used for fallback decision
---
## Test 5: API Endpoints
### 5.1 Start Backend Server
```bash
cd /home/user/Search-UI
tmux new -d -s backend "cd backend && source venv/bin/activate && uvicorn main:app --reload --host 0.0.0.0 2>&1 | tee ../backend.log"
sleep 3
tail -5 backend.log
```
**Expected:** Server running on http://0.0.0.0:8000
### 5.2 Test GET /api/settings/series
```bash
curl -s "http://localhost:8000/api/settings/series?language=E" | python3 -m json.tool | head -50
```
**Expected Response Structure:**
```json
{
"settings": [...],
"available_categories": [...],
"presets": {
"static": {"scene_threshold": 0.12, ...},
"moderate": {...},
"dynamic": {...}
},
"defaults": {
"scene_threshold": 0.29,
"thumbnail_interval": 3.0,
"min_thumbnails": 30
}
}
```
### 5.3 Test POST /api/settings/series
```bash
# Add a test setting
curl -X POST "http://localhost:8000/api/settings/series?category=TestDrama&thumbnail_interval=2.0&scene_threshold=0.35&min_thumbnails=50&preset=dynamic"
# Verify it was added
curl -s "http://localhost:8000/api/settings/series?language=E" | python3 -c "
import sys, json
data = json.load(sys.stdin)
for s in data.get('settings', []):
if s.get('category') == 'TestDrama':
print(f'Found: {s}')
break
else:
print('Not found')
"
# Clean up
curl -s "http://localhost:8000/api/settings/series?language=E" | python3 -c "
import sys, json
data = json.load(sys.stdin)
for s in data.get('settings', []):
if s.get('category') == 'TestDrama':
print(f'Setting ID to delete: {s[\"id\"]}')
"
# Delete using the ID from above
# curl -X DELETE "http://localhost:8000/api/settings/series/{id}"
```
---
## Test 6: Frontend Settings Page
### 6.1 Start Frontend Server
```bash
cd /home/user/Search-UI
tmux new -d -s frontend "cd frontend && npm run dev -- --host 0.0.0.0 2>&1 | tee ../frontend.log"
sleep 5
tail -5 frontend.log
```
### 6.2 Visual Testing Checklist
Open browser to http://localhost:5173 (or your network IP:5173)
Navigate to Settings page and verify:
- [ ] **Presets Section** displays all three presets with values:
- Static: Threshold 0.12, Interval 10s, Min 30
- Moderate: Threshold 0.20, Interval 5s, Min 35
- Dynamic: Threshold 0.35, Interval 2s, Min 50
- Default: Shows current defaults
- [ ] **Add Series Setting Form** has:
- Category dropdown (populated from available categories)
- Subcategory dropdown (optional)
- Preset dropdown (static, moderate, dynamic, or custom)
- Scene Threshold input (0.05-0.5, step 0.01)
- Fallback Interval input (0.5-60, step 0.5)
- Min Thumbnails input (10-200, step 5)
- [ ] **Preset Selection** auto-fills values:
- Select "Static" preset → threshold=0.12, interval=10, min=30
- Select "Dynamic" preset → threshold=0.35, interval=2, min=50
- Custom settings inputs become disabled when preset selected
- [ ] **Current Settings Table** shows columns:
- Category | Subcategory | Threshold | Interval | Min | Preset | Actions
- [ ] **Add/Delete** works:
- Add a test category with dynamic preset
- Verify it appears in table with correct values
- Delete it and verify it's removed
---
## Test 7: End-to-End Video Processing
### 7.1 Test with Single Video
**Prerequisites:** Have a test video in the system with known category.
```bash
# Process a single video with explicit settings
curl -X POST "http://localhost:8000/api/process-video?natural_key=YOUR_TEST_KEY&label=480p"
```
### 7.2 Verify Processing Metadata Saved
```bash
cd /home/user/Search-UI/backend
python3 -c "
from database import get_db
conn = get_db()
cursor = conn.execute('''
SELECT natural_key, resolution_label, scene_threshold,
siglip_model_version, labels_version, duration_seconds
FROM processed_videos
ORDER BY processed_at DESC
LIMIT 5
''')
print('Recent processed videos:')
for row in cursor.fetchall():
print(f' {row}')
"
```
**Expected:** Shows processing metadata including scene_threshold, model versions
### 7.3 Verify Thumbnail Count
After processing, check the thumbnails directory:
```bash
# Replace with actual natural_key and label
ls -la /home/user/Search-UI/backend/videos/{natural_key}/{label}/thumbnails/ | wc -l
```
**Expected:** Number of thumbnails appropriate for the content type and settings
---
## Automated Test Script
Run the comprehensive test script:
```bash
cd /home/user/Search-UI
python3 scratchpad/2025-12-25-1708-test-phase1-3.py
```
**Expected:** All 6 modules pass (6/6)
---
## Known Limitations
### Phase 4 (Deferred)
The Unified Person system (combining face + speaker recognition) requires:
- ECAPA-TDNN model integration for speaker embeddings
- New Person management UI
- Cross-reference system for face ↔ speaker associations
- This is planned for future implementation
### Phase 5 (Deferred)
Validation and performance improvements:
- Bulk reprocessing detection
- Performance profiling
- These will be addressed after Phase 4
---
## Troubleshooting
### Backend Won't Start
```bash
# Check for Python environment issues
cd /home/user/Search-UI/backend
source venv/bin/activate
pip list | grep -E "fastapi|uvicorn|sqlite-vec"
```
### Database Migration Issues
```bash
# Force recreate tables (WARNING: loses data)
cd /home/user/Search-UI/backend
python3 -c "
from database import get_db
conn = get_db()
# Drop and recreate problematic tables
conn.execute('DROP TABLE IF EXISTS image_categories')
# Re-run the app to recreate
"
```
### Frontend Build Issues
```bash
cd /home/user/Search-UI/frontend
npm install
npm run build
```
---
## Summary Checklist
Before processing your video library, verify:
- [ ] All automated tests pass (6/6)
- [ ] Backend API returns presets and defaults correctly
- [ ] Frontend Settings page displays and functions correctly
- [ ] Test video processes with correct scene threshold
- [ ] Processing metadata saved to database
- [ ] Thumbnail count appropriate for content type