Spaces:
Running
Running
refactor & connect ontology
Browse files- REFACTORED_STRUCTURE.md +110 -0
- app.py +166 -409
- app_original_backup.py +640 -0
- app_refactored.py +236 -0
- model_processing.py +97 -0
- ontology_integration.py +144 -0
- ui_components.py +320 -0
- video_processing.py +112 -0
REFACTORED_STRUCTURE.md
ADDED
|
@@ -0,0 +1,110 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# 📁 Refactored Code Structure
|
| 2 |
+
|
| 3 |
+
The application has been refactored into modular components for better maintainability and understanding.
|
| 4 |
+
|
| 5 |
+
## 🗂️ File Structure
|
| 6 |
+
|
| 7 |
+
```
|
| 8 |
+
📦 Bahngleiserfassung/
|
| 9 |
+
├── 🎯 app.py # Main Streamlit application (refactored)
|
| 10 |
+
├── 📹 video_processing.py # Video frame extraction and repair utilities
|
| 11 |
+
├── 🧠 ontology_integration.py # Ontology-based scene analysis and risk assessment
|
| 12 |
+
├── 🤖 model_processing.py # Local and remote AI model processing
|
| 13 |
+
├── 🖥️ ui_components.py # Streamlit UI components and rendering
|
| 14 |
+
├── 🧮 ontology_eval.py # Core ontology evaluation logic (unchanged)
|
| 15 |
+
├── 🔬 local_models.py # Local AI models (ViT, BLIP) (unchanged)
|
| 16 |
+
└── 💾 app_original_backup.py # Backup of original monolithic app.py
|
| 17 |
+
```
|
| 18 |
+
|
| 19 |
+
## 📋 Module Responsibilities
|
| 20 |
+
|
| 21 |
+
### 🎯 `app.py` - Main Application
|
| 22 |
+
- **Purpose**: Main entry point and orchestration
|
| 23 |
+
- **Functions**:
|
| 24 |
+
- Application initialization and layout
|
| 25 |
+
- Model setup and configuration
|
| 26 |
+
- Main processing workflow coordination
|
| 27 |
+
- Input validation and error handling
|
| 28 |
+
|
| 29 |
+
### 📹 `video_processing.py` - Video Processing
|
| 30 |
+
- **Purpose**: Video frame extraction and repair
|
| 31 |
+
- **Functions**:
|
| 32 |
+
- `extract_frames_from_video()` - Extract frames at specified FPS
|
| 33 |
+
- `repair_video_with_ffmpeg()` - Repair corrupted video files
|
| 34 |
+
- Handle various video formats (MP4, AVI, MOV, MKV)
|
| 35 |
+
|
| 36 |
+
### 🧠 `ontology_integration.py` - Ontology Analysis
|
| 37 |
+
- **Purpose**: Scene analysis using ontology-based risk assessment
|
| 38 |
+
- **Functions**:
|
| 39 |
+
- `analyze_scene_with_ontology()` - Main ontology analysis function
|
| 40 |
+
- `_extract_ontology_features()` - Extract features from scene descriptions
|
| 41 |
+
- `_calculate_person_on_track_confidence()` - Calculate specific risk confidence
|
| 42 |
+
- `extract_scene_description()` - Extract text from model results
|
| 43 |
+
|
| 44 |
+
### 🤖 `model_processing.py` - Model Processing
|
| 45 |
+
- **Purpose**: Handle local and remote AI model processing
|
| 46 |
+
- **Functions**:
|
| 47 |
+
- `process_image_locally()` - Process images using local models
|
| 48 |
+
- `query_huggingface_api()` - Process images using remote HF API
|
| 49 |
+
- `process_frame()` - Unified frame processing interface
|
| 50 |
+
- `image_to_base64()` - Image conversion utilities
|
| 51 |
+
|
| 52 |
+
### 🖥️ `ui_components.py` - UI Components
|
| 53 |
+
- **Purpose**: Streamlit UI components and rendering
|
| 54 |
+
- **Functions**:
|
| 55 |
+
- `render_sidebar_config()` - Configuration sidebar
|
| 56 |
+
- `render_input_section()` - Video upload interface
|
| 57 |
+
- `render_frame_result()` - Display frame analysis results
|
| 58 |
+
- `render_validation_errors()` - Show validation messages
|
| 59 |
+
- Various helper rendering functions
|
| 60 |
+
|
| 61 |
+
## 🔄 Data Flow
|
| 62 |
+
|
| 63 |
+
```mermaid
|
| 64 |
+
graph TD
|
| 65 |
+
A[app.py] --> B[ui_components.py]
|
| 66 |
+
A --> C[video_processing.py]
|
| 67 |
+
A --> D[model_processing.py]
|
| 68 |
+
A --> E[ontology_integration.py]
|
| 69 |
+
|
| 70 |
+
C --> F[Extract Frames]
|
| 71 |
+
D --> G[Process with AI Models]
|
| 72 |
+
E --> H[Ontology Risk Assessment]
|
| 73 |
+
|
| 74 |
+
F --> G
|
| 75 |
+
G --> H
|
| 76 |
+
H --> B
|
| 77 |
+
|
| 78 |
+
I[local_models.py] --> D
|
| 79 |
+
J[ontology_eval.py] --> E
|
| 80 |
+
```
|
| 81 |
+
|
| 82 |
+
## ✨ Benefits of Refactoring
|
| 83 |
+
|
| 84 |
+
1. **🧩 Modularity**: Each module has a single responsibility
|
| 85 |
+
2. **🔧 Maintainability**: Easier to update and debug individual components
|
| 86 |
+
3. **📚 Readability**: Clear separation of concerns and smaller, focused files
|
| 87 |
+
4. **🧪 Testability**: Each module can be tested independently
|
| 88 |
+
5. **🔄 Reusability**: Components can be reused in other projects
|
| 89 |
+
6. **👥 Collaboration**: Multiple developers can work on different modules
|
| 90 |
+
|
| 91 |
+
## 🚀 Usage
|
| 92 |
+
|
| 93 |
+
The refactored application works exactly the same as before:
|
| 94 |
+
|
| 95 |
+
```bash
|
| 96 |
+
streamlit run app.py
|
| 97 |
+
```
|
| 98 |
+
|
| 99 |
+
All functionality remains identical:
|
| 100 |
+
- ✅ NONE / 🟢 LOW / 🟠 MEDIUM / ⚠️ HIGH / 🚨 CRITICAL classification
|
| 101 |
+
- Toggle ontology analysis on/off
|
| 102 |
+
- Support for local and remote AI models
|
| 103 |
+
- Video processing with automatic repair
|
| 104 |
+
|
| 105 |
+
## 🔒 Backwards Compatibility
|
| 106 |
+
|
| 107 |
+
- Original functionality is preserved
|
| 108 |
+
- API and interface remain unchanged
|
| 109 |
+
- Configuration and settings work the same way
|
| 110 |
+
- The original monolithic code is backed up as `app_original_backup.py`
|
app.py
CHANGED
|
@@ -1,15 +1,27 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
import streamlit as st
|
| 2 |
-
import cv2
|
| 3 |
-
import os
|
| 4 |
-
import tempfile
|
| 5 |
-
import requests
|
| 6 |
-
import base64
|
| 7 |
-
import subprocess
|
| 8 |
import json
|
| 9 |
-
from io import BytesIO
|
| 10 |
-
from PIL import Image
|
| 11 |
-
import numpy as np
|
| 12 |
from dotenv import load_dotenv
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
# Try to import local models, fall back gracefully if not available
|
| 14 |
try:
|
| 15 |
from local_models import get_local_model_manager
|
|
@@ -23,6 +35,7 @@ except ImportError as e:
|
|
| 23 |
# Load environment variables
|
| 24 |
load_dotenv()
|
| 25 |
|
|
|
|
| 26 |
def load_settings():
|
| 27 |
"""Load settings from JSON file"""
|
| 28 |
try:
|
|
@@ -31,199 +44,31 @@ def load_settings():
|
|
| 31 |
except FileNotFoundError:
|
| 32 |
return {}
|
| 33 |
|
| 34 |
-
# Local models configuration
|
| 35 |
-
LOCAL_MODELS_ENABLED = LOCAL_MODELS_AVAILABLE
|
| 36 |
-
REMOTE_MODELS_ENABLED = True # Always allow remote API as fallback
|
| 37 |
|
| 38 |
-
# Initialize local model manager
|
| 39 |
@st.cache_resource
|
| 40 |
def initialize_local_models():
|
| 41 |
"""Initialize local model manager"""
|
| 42 |
return get_local_model_manager()
|
| 43 |
|
| 44 |
-
# Hugging Face models for vision-language tasks (kept for compatibility)
|
| 45 |
-
AVAILABLE_MODELS = {
|
| 46 |
-
"microsoft/kosmos-2-patch14-224": "Kosmos-2",
|
| 47 |
-
"Salesforce/blip-image-captioning-large": "BLIP Image Captioning",
|
| 48 |
-
"microsoft/DialoGPT-medium": "DialoGPT",
|
| 49 |
-
"microsoft/git-large-coco": "GIT Large COCO",
|
| 50 |
-
"nlpconnect/vit-gpt2-image-captioning": "ViT-GPT2"
|
| 51 |
-
}
|
| 52 |
-
|
| 53 |
-
def repair_video_with_ffmpeg(input_path, output_path):
|
| 54 |
-
"""
|
| 55 |
-
Repair corrupted video by moving moov atom to the beginning
|
| 56 |
-
"""
|
| 57 |
-
try:
|
| 58 |
-
# Try to fix the video using FFmpeg
|
| 59 |
-
cmd = [
|
| 60 |
-
'ffmpeg',
|
| 61 |
-
'-i', input_path,
|
| 62 |
-
'-c', 'copy',
|
| 63 |
-
'-movflags', 'faststart',
|
| 64 |
-
'-avoid_negative_ts', 'make_zero',
|
| 65 |
-
'-y', # Overwrite output file
|
| 66 |
-
output_path
|
| 67 |
-
]
|
| 68 |
-
|
| 69 |
-
result = subprocess.run(
|
| 70 |
-
cmd,
|
| 71 |
-
capture_output=True,
|
| 72 |
-
text=True,
|
| 73 |
-
timeout=300 # 5 minute timeout
|
| 74 |
-
)
|
| 75 |
-
|
| 76 |
-
return result.returncode == 0
|
| 77 |
-
except (subprocess.TimeoutExpired, FileNotFoundError):
|
| 78 |
-
return False
|
| 79 |
-
|
| 80 |
-
def extract_frames_from_video(video_file, fps=1):
|
| 81 |
-
"""
|
| 82 |
-
Extract frames from video at specified FPS (default 1 frame per second)
|
| 83 |
-
Automatically handles corrupted videos by attempting repair with FFmpeg
|
| 84 |
-
"""
|
| 85 |
-
frames = []
|
| 86 |
-
|
| 87 |
-
with tempfile.NamedTemporaryFile(delete=False, suffix='.mp4') as tmp_file:
|
| 88 |
-
tmp_file.write(video_file.read())
|
| 89 |
-
tmp_file_path = tmp_file.name
|
| 90 |
-
|
| 91 |
-
repaired_path = None
|
| 92 |
-
|
| 93 |
-
try:
|
| 94 |
-
# First attempt: try to open video directly
|
| 95 |
-
cap = cv2.VideoCapture(tmp_file_path)
|
| 96 |
-
|
| 97 |
-
# Check if video opened successfully and has frames
|
| 98 |
-
if not cap.isOpened() or cap.get(cv2.CAP_PROP_FRAME_COUNT) == 0:
|
| 99 |
-
cap.release()
|
| 100 |
-
|
| 101 |
-
# Second attempt: try to repair the video with FFmpeg
|
| 102 |
-
st.warning("Video appears corrupted (moov atom issue). Attempting repair...")
|
| 103 |
-
|
| 104 |
-
with tempfile.NamedTemporaryFile(delete=False, suffix='_repaired.mp4') as repaired_file:
|
| 105 |
-
repaired_path = repaired_file.name
|
| 106 |
-
|
| 107 |
-
if repair_video_with_ffmpeg(tmp_file_path, repaired_path):
|
| 108 |
-
st.success("Video repair successful! Processing frames...")
|
| 109 |
-
cap = cv2.VideoCapture(repaired_path)
|
| 110 |
-
else:
|
| 111 |
-
st.error("Failed to repair video. FFmpeg may not be installed or video is severely corrupted.")
|
| 112 |
-
return frames
|
| 113 |
-
|
| 114 |
-
# Extract video properties
|
| 115 |
-
video_fps = cap.get(cv2.CAP_PROP_FPS)
|
| 116 |
-
if video_fps <= 0:
|
| 117 |
-
video_fps = 30 # Default fallback FPS
|
| 118 |
-
|
| 119 |
-
frame_interval = int(video_fps / fps) if video_fps > fps else 1
|
| 120 |
-
|
| 121 |
-
frame_count = 0
|
| 122 |
-
extracted_count = 0
|
| 123 |
-
|
| 124 |
-
while True:
|
| 125 |
-
ret, frame = cap.read()
|
| 126 |
-
if not ret:
|
| 127 |
-
break
|
| 128 |
-
|
| 129 |
-
if frame_count % frame_interval == 0:
|
| 130 |
-
frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
|
| 131 |
-
pil_image = Image.fromarray(frame_rgb)
|
| 132 |
-
frames.append({
|
| 133 |
-
'frame': pil_image,
|
| 134 |
-
'timestamp': frame_count / video_fps,
|
| 135 |
-
'frame_number': extracted_count
|
| 136 |
-
})
|
| 137 |
-
extracted_count += 1
|
| 138 |
-
|
| 139 |
-
frame_count += 1
|
| 140 |
-
|
| 141 |
-
cap.release()
|
| 142 |
-
|
| 143 |
-
finally:
|
| 144 |
-
# Clean up temporary files
|
| 145 |
-
if os.path.exists(tmp_file_path):
|
| 146 |
-
os.unlink(tmp_file_path)
|
| 147 |
-
if repaired_path and os.path.exists(repaired_path):
|
| 148 |
-
os.unlink(repaired_path)
|
| 149 |
-
|
| 150 |
-
return frames
|
| 151 |
-
|
| 152 |
-
def image_to_base64(image):
|
| 153 |
-
"""Convert PIL image to base64 string"""
|
| 154 |
-
buffer = BytesIO()
|
| 155 |
-
image.save(buffer, format="PNG")
|
| 156 |
-
img_str = base64.b64encode(buffer.getvalue()).decode()
|
| 157 |
-
return img_str
|
| 158 |
-
|
| 159 |
-
def process_image_locally(image, prompt, model_name, local_manager):
|
| 160 |
-
"""
|
| 161 |
-
Process image using local models
|
| 162 |
-
"""
|
| 163 |
-
try:
|
| 164 |
-
if model_name == "Person on Track Detector":
|
| 165 |
-
# Special handling for person-on-track detection
|
| 166 |
-
result = local_manager.person_on_track_detector.detect_person_on_track(image)
|
| 167 |
-
return {"person_on_track_detection": result}
|
| 168 |
-
else:
|
| 169 |
-
caption = local_manager.generate_caption(model_name, image, prompt)
|
| 170 |
-
return {"generated_text": caption}
|
| 171 |
-
except Exception as e:
|
| 172 |
-
return {"error": f"Local processing failed: {str(e)}"}
|
| 173 |
-
|
| 174 |
-
def query_huggingface_api(image, prompt, model_name, api_token):
|
| 175 |
-
"""
|
| 176 |
-
Query Hugging Face API with image and prompt
|
| 177 |
-
"""
|
| 178 |
-
API_URL = f"https://api-inference.huggingface.co/models/{model_name}"
|
| 179 |
-
headers = {"Authorization": f"Bearer {api_token}"}
|
| 180 |
-
|
| 181 |
-
# Convert image to base64
|
| 182 |
-
img_base64 = image_to_base64(image)
|
| 183 |
-
|
| 184 |
-
# Prepare payload based on model type
|
| 185 |
-
if "blip" in model_name.lower():
|
| 186 |
-
# For BLIP models, send image directly
|
| 187 |
-
buffer = BytesIO()
|
| 188 |
-
image.save(buffer, format="PNG")
|
| 189 |
-
response = requests.post(
|
| 190 |
-
API_URL,
|
| 191 |
-
headers=headers,
|
| 192 |
-
files={"file": buffer.getvalue()}
|
| 193 |
-
)
|
| 194 |
-
else:
|
| 195 |
-
# For other vision-language models
|
| 196 |
-
payload = {
|
| 197 |
-
"inputs": {
|
| 198 |
-
"image": img_base64,
|
| 199 |
-
"text": prompt
|
| 200 |
-
}
|
| 201 |
-
}
|
| 202 |
-
response = requests.post(API_URL, headers=headers, json=payload)
|
| 203 |
-
|
| 204 |
-
if response.status_code == 200:
|
| 205 |
-
return response.json()
|
| 206 |
-
else:
|
| 207 |
-
return {"error": f"API request failed: {response.status_code} - {response.text}"}
|
| 208 |
|
| 209 |
-
def
|
|
|
|
| 210 |
st.set_page_config(
|
| 211 |
-
page_title="Video Frame Analyzer",
|
| 212 |
page_icon="🎥",
|
| 213 |
layout="wide"
|
| 214 |
)
|
| 215 |
|
| 216 |
-
st.title("🎥 Video Frame Analyzer with
|
| 217 |
-
st.markdown("Upload a video
|
| 218 |
-
|
| 219 |
-
|
| 220 |
-
|
| 221 |
-
|
| 222 |
-
# Initialize local models if enabled
|
| 223 |
local_manager = None
|
| 224 |
local_models_available = False
|
| 225 |
|
| 226 |
-
if
|
| 227 |
try:
|
| 228 |
local_manager = initialize_local_models()
|
| 229 |
local_models_available = True
|
|
@@ -235,245 +80,157 @@ def main():
|
|
| 235 |
else:
|
| 236 |
st.info("💡 Local AI models not installed. Install with: `pip install torch torchvision transformers accelerate sentencepiece`")
|
| 237 |
|
| 238 |
-
|
| 239 |
-
|
| 240 |
-
|
| 241 |
-
|
| 242 |
-
|
| 243 |
-
|
| 244 |
-
|
| 245 |
-
|
| 246 |
-
|
| 247 |
-
|
| 248 |
-
|
| 249 |
-
|
| 250 |
-
|
| 251 |
-
|
| 252 |
-
|
| 253 |
-
|
| 254 |
-
|
| 255 |
-
|
| 256 |
-
|
| 257 |
-
|
| 258 |
-
|
| 259 |
-
|
| 260 |
-
|
| 261 |
-
|
| 262 |
-
|
| 263 |
-
|
| 264 |
-
|
| 265 |
-
)
|
| 266 |
|
| 267 |
-
#
|
| 268 |
-
|
| 269 |
-
if selected_model in model_info:
|
| 270 |
-
with st.expander("Model Information"):
|
| 271 |
-
st.write(f"**Description:** {model_info[selected_model]['description']}")
|
| 272 |
-
st.write(f"**Strengths:** {model_info[selected_model]['strengths']}")
|
| 273 |
-
st.write(f"**Size:** {model_info[selected_model]['size']}")
|
| 274 |
|
| 275 |
-
|
|
|
|
| 276 |
|
| 277 |
-
|
| 278 |
-
|
| 279 |
-
|
| 280 |
-
|
| 281 |
-
|
| 282 |
-
|
| 283 |
-
|
| 284 |
-
help="Get your token from https://huggingface.co/settings/tokens or save in settings.json"
|
| 285 |
-
)
|
| 286 |
|
| 287 |
-
|
| 288 |
-
|
| 289 |
-
|
| 290 |
-
|
| 291 |
-
|
| 292 |
-
|
| 293 |
-
|
| 294 |
-
|
| 295 |
-
|
| 296 |
-
|
| 297 |
-
|
| 298 |
-
|
| 299 |
-
|
| 300 |
-
|
| 301 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 302 |
|
| 303 |
-
#
|
|
|
|
|
|
|
|
|
|
|
|
|
| 304 |
col1, col2 = st.columns([1, 1])
|
| 305 |
|
| 306 |
with col1:
|
| 307 |
-
|
|
|
|
| 308 |
|
| 309 |
-
#
|
| 310 |
-
|
| 311 |
-
|
| 312 |
-
type=['mp4', 'avi', 'mov', 'mkv'],
|
| 313 |
-
help="Upload a video file to analyze"
|
| 314 |
-
)
|
| 315 |
|
| 316 |
-
#
|
| 317 |
-
|
| 318 |
-
# Person on Track Detector works automatically
|
| 319 |
-
st.info("🤖 Person on Track Detector works automatically - no prompt needed!")
|
| 320 |
-
prompt = "automatic" # Set automatic prompt
|
| 321 |
-
else:
|
| 322 |
-
# Regular models need user prompt
|
| 323 |
-
prompt = st.text_area(
|
| 324 |
-
"Analysis Prompt",
|
| 325 |
-
placeholder="Describe what you see in the image...",
|
| 326 |
-
help="Enter the prompt to analyze each frame"
|
| 327 |
-
)
|
| 328 |
|
| 329 |
-
#
|
| 330 |
-
process_button =
|
| 331 |
|
| 332 |
with col2:
|
| 333 |
-
|
| 334 |
-
results_container =
|
| 335 |
-
|
| 336 |
-
#
|
| 337 |
-
if process_button
|
| 338 |
-
|
| 339 |
-
#
|
| 340 |
-
|
| 341 |
-
|
| 342 |
-
|
| 343 |
-
|
| 344 |
-
|
| 345 |
-
|
| 346 |
-
st.success(f"Extracted {len(frames)} frames from video")
|
| 347 |
-
|
| 348 |
-
# Process each frame
|
| 349 |
-
results = []
|
| 350 |
-
progress_bar = st.progress(0)
|
| 351 |
-
|
| 352 |
-
for i, frame_data in enumerate(frames):
|
| 353 |
-
with st.spinner(f"Analyzing frame {i+1}/{len(frames)}..."):
|
| 354 |
-
# Process frame based on model type
|
| 355 |
-
if model_type == "Local Models" and local_models_available:
|
| 356 |
-
result = process_image_locally(
|
| 357 |
-
frame_data['frame'],
|
| 358 |
-
prompt,
|
| 359 |
-
selected_model,
|
| 360 |
-
local_manager
|
| 361 |
-
)
|
| 362 |
-
else:
|
| 363 |
-
result = query_huggingface_api(
|
| 364 |
-
frame_data['frame'],
|
| 365 |
-
prompt,
|
| 366 |
-
selected_model,
|
| 367 |
-
api_token
|
| 368 |
-
)
|
| 369 |
-
|
| 370 |
-
results.append({
|
| 371 |
-
'frame_number': frame_data['frame_number'],
|
| 372 |
-
'timestamp': frame_data['timestamp'],
|
| 373 |
-
'image': frame_data['frame'],
|
| 374 |
-
'result': result
|
| 375 |
-
})
|
| 376 |
-
|
| 377 |
-
progress_bar.progress((i + 1) / len(frames))
|
| 378 |
-
|
| 379 |
-
# Display results
|
| 380 |
-
with results_container:
|
| 381 |
-
st.subheader("Analysis Results")
|
| 382 |
|
| 383 |
-
|
| 384 |
-
|
| 385 |
-
|
|
|
|
| 386 |
|
| 387 |
-
|
| 388 |
-
|
| 389 |
-
|
| 390 |
-
|
| 391 |
-
|
| 392 |
-
)
|
| 393 |
|
| 394 |
-
|
| 395 |
-
|
| 396 |
-
|
| 397 |
-
|
| 398 |
-
|
| 399 |
-
|
| 400 |
-
|
| 401 |
-
|
| 402 |
-
|
| 403 |
-
|
| 404 |
-
|
| 405 |
-
|
| 406 |
-
|
| 407 |
-
|
| 408 |
-
|
| 409 |
-
|
| 410 |
-
|
| 411 |
-
|
| 412 |
-
|
| 413 |
-
|
| 414 |
-
|
| 415 |
-
st.metric("👥 People on Track", people_count)
|
| 416 |
-
with col2:
|
| 417 |
-
st.metric("📊 Confidence", f"{confidence:.0%}")
|
| 418 |
-
else:
|
| 419 |
-
st.write("**Analysis Result:**")
|
| 420 |
-
if 'generated_text' in result_data['result']:
|
| 421 |
-
# Handle direct generated_text response (local models)
|
| 422 |
-
st.write(result_data['result']['generated_text'])
|
| 423 |
-
elif isinstance(result_data['result'], list) and len(result_data['result']) > 0:
|
| 424 |
-
# Handle list responses (common for captioning models)
|
| 425 |
-
if 'generated_text' in result_data['result'][0]:
|
| 426 |
-
st.write(result_data['result'][0]['generated_text'])
|
| 427 |
-
else:
|
| 428 |
-
st.json(result_data['result'][0])
|
| 429 |
-
else:
|
| 430 |
-
st.json(result_data['result'])
|
| 431 |
-
|
| 432 |
-
elif process_button:
|
| 433 |
-
if not video_file:
|
| 434 |
-
st.error("Please upload a video file")
|
| 435 |
-
if not prompt and not (model_type == "Local Models" and selected_model == "Person on Track Detector"):
|
| 436 |
-
st.error("Please enter an analysis prompt")
|
| 437 |
-
if not api_token and model_type == "Remote API":
|
| 438 |
-
st.error("Please provide your Hugging Face API token for remote models")
|
| 439 |
-
if model_type == "Local Models" and not local_models_available:
|
| 440 |
-
st.error("Local models failed to initialize. Check your installation.")
|
| 441 |
|
| 442 |
-
#
|
| 443 |
-
|
| 444 |
-
|
| 445 |
-
## Local AI Models (Recommended)
|
| 446 |
-
1. **Upload a video**: Choose a video file (MP4, AVI, MOV, or MKV)
|
| 447 |
-
2. **Select model type**: Choose "Local Models" for offline processing
|
| 448 |
-
3. **Choose AI model**:
|
| 449 |
-
- **CNN (BLIP)**: Fast, good for object detection (~1.2GB)
|
| 450 |
-
- **Transformer (ViT-GPT2)**: Detailed descriptions (~1.8GB)
|
| 451 |
-
4. **Enter a prompt**: Describe what you want the AI to analyze
|
| 452 |
-
5. **Adjust frame rate**: Set frames per second to extract (default: 1 fps)
|
| 453 |
-
6. **Click Process**: Frames are processed locally on your machine
|
| 454 |
-
|
| 455 |
-
## Remote API Models (Optional)
|
| 456 |
-
1. **Get API token**: Visit [Hugging Face Settings](https://huggingface.co/settings/tokens)
|
| 457 |
-
2. **Select "Remote API"** in model type
|
| 458 |
-
3. **Enter token** and select remote model
|
| 459 |
-
|
| 460 |
-
## Video Support Features
|
| 461 |
-
- **Automatic corruption repair**: Handles videos with corrupted moov atoms
|
| 462 |
-
- **FFmpeg integration**: Auto-repairs problematic video files
|
| 463 |
-
- **Multiple formats**: MP4, AVI, MOV, MKV support
|
| 464 |
-
|
| 465 |
-
## Requirements
|
| 466 |
-
- **Python packages**: torch, transformers, accelerate (see requirements.txt)
|
| 467 |
-
- **Optional**: FFmpeg for video repair (download from https://ffmpeg.org)
|
| 468 |
-
- **Storage**: ~3GB for both local models
|
| 469 |
-
|
| 470 |
-
## Example Prompts
|
| 471 |
-
- "Describe what you see in this image"
|
| 472 |
-
- "Count the number of people in this scene"
|
| 473 |
-
- "What objects are visible in this frame?"
|
| 474 |
-
- "Describe the emotions and actions in this scene"
|
| 475 |
-
- "What is the main activity happening here?"
|
| 476 |
-
""")
|
| 477 |
|
| 478 |
if __name__ == "__main__":
|
| 479 |
main()
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""
|
| 3 |
+
Main Streamlit application for video frame analysis with ontology-based risk assessment
|
| 4 |
+
Refactored for better code organization and maintainability
|
| 5 |
+
"""
|
| 6 |
import streamlit as st
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
import json
|
|
|
|
|
|
|
|
|
|
| 8 |
from dotenv import load_dotenv
|
| 9 |
+
|
| 10 |
+
# Import our modular components
|
| 11 |
+
from video_processing import extract_frames_from_video
|
| 12 |
+
from ontology_integration import analyze_scene_with_ontology, extract_scene_description
|
| 13 |
+
from model_processing import process_frame
|
| 14 |
+
from ui_components import (
|
| 15 |
+
render_sidebar_config,
|
| 16 |
+
render_input_section,
|
| 17 |
+
render_prompt_section,
|
| 18 |
+
render_process_button,
|
| 19 |
+
render_results_header,
|
| 20 |
+
render_frame_result,
|
| 21 |
+
render_validation_errors,
|
| 22 |
+
render_instructions
|
| 23 |
+
)
|
| 24 |
+
|
| 25 |
# Try to import local models, fall back gracefully if not available
|
| 26 |
try:
|
| 27 |
from local_models import get_local_model_manager
|
|
|
|
| 35 |
# Load environment variables
|
| 36 |
load_dotenv()
|
| 37 |
|
| 38 |
+
|
| 39 |
def load_settings():
|
| 40 |
"""Load settings from JSON file"""
|
| 41 |
try:
|
|
|
|
| 44 |
except FileNotFoundError:
|
| 45 |
return {}
|
| 46 |
|
|
|
|
|
|
|
|
|
|
| 47 |
|
|
|
|
| 48 |
@st.cache_resource
|
| 49 |
def initialize_local_models():
|
| 50 |
"""Initialize local model manager"""
|
| 51 |
return get_local_model_manager()
|
| 52 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 53 |
|
| 54 |
+
def initialize_app():
|
| 55 |
+
"""Initialize the Streamlit application"""
|
| 56 |
st.set_page_config(
|
| 57 |
+
page_title="Video Frame Analyzer with Ontology",
|
| 58 |
page_icon="🎥",
|
| 59 |
layout="wide"
|
| 60 |
)
|
| 61 |
|
| 62 |
+
st.title("🎥 Video Frame Analyzer with Ontology-Based Risk Assessment")
|
| 63 |
+
st.markdown("Upload a video and analyze frames using AI models with ontology-based safety classification")
|
| 64 |
+
|
| 65 |
+
|
| 66 |
+
def setup_local_models():
|
| 67 |
+
"""Setup local models and return availability status"""
|
|
|
|
| 68 |
local_manager = None
|
| 69 |
local_models_available = False
|
| 70 |
|
| 71 |
+
if LOCAL_MODELS_AVAILABLE:
|
| 72 |
try:
|
| 73 |
local_manager = initialize_local_models()
|
| 74 |
local_models_available = True
|
|
|
|
| 80 |
else:
|
| 81 |
st.info("💡 Local AI models not installed. Install with: `pip install torch torchvision transformers accelerate sentencepiece`")
|
| 82 |
|
| 83 |
+
return local_manager, local_models_available
|
| 84 |
+
|
| 85 |
+
|
| 86 |
+
def process_video_frames(video_file, config, local_manager=None):
|
| 87 |
+
"""
|
| 88 |
+
Process all frames in the video and return results
|
| 89 |
+
"""
|
| 90 |
+
# Extract frames
|
| 91 |
+
frames = extract_frames_from_video(video_file, config["fps"])
|
| 92 |
+
|
| 93 |
+
if not frames:
|
| 94 |
+
st.error("No frames could be extracted from the video")
|
| 95 |
+
return []
|
| 96 |
+
|
| 97 |
+
st.success(f"Extracted {len(frames)} frames from video")
|
| 98 |
+
|
| 99 |
+
# Process each frame
|
| 100 |
+
results = []
|
| 101 |
+
progress_bar = st.progress(0)
|
| 102 |
+
|
| 103 |
+
# Add prompt to config for processing
|
| 104 |
+
processing_config = config.copy()
|
| 105 |
+
processing_config["prompt"] = config.get("prompt", "")
|
| 106 |
+
|
| 107 |
+
for i, frame_data in enumerate(frames):
|
| 108 |
+
with st.spinner(f"Analyzing frame {i+1}/{len(frames)}..."):
|
| 109 |
+
# Process frame with selected model
|
| 110 |
+
result = process_frame(frame_data, processing_config, local_manager)
|
| 111 |
|
| 112 |
+
# Extract scene description for ontology analysis
|
| 113 |
+
scene_description = extract_scene_description(result)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 114 |
|
| 115 |
+
# Apply ontology analysis
|
| 116 |
+
ontology_analysis = analyze_scene_with_ontology(scene_description, config["use_ontology"])
|
| 117 |
|
| 118 |
+
results.append({
|
| 119 |
+
'frame_number': frame_data['frame_number'],
|
| 120 |
+
'timestamp': frame_data['timestamp'],
|
| 121 |
+
'image': frame_data['frame'],
|
| 122 |
+
'result': result,
|
| 123 |
+
'ontology_analysis': ontology_analysis
|
| 124 |
+
})
|
|
|
|
|
|
|
| 125 |
|
| 126 |
+
progress_bar.progress((i + 1) / len(frames))
|
| 127 |
+
|
| 128 |
+
return results
|
| 129 |
+
|
| 130 |
+
|
| 131 |
+
def validate_inputs(video_file, prompt, config, local_models_available):
|
| 132 |
+
"""
|
| 133 |
+
Validate all required inputs
|
| 134 |
+
"""
|
| 135 |
+
model_type = config["model_type"]
|
| 136 |
+
selected_model = config["selected_model"]
|
| 137 |
+
api_token = config["api_token"]
|
| 138 |
+
|
| 139 |
+
# Check basic requirements
|
| 140 |
+
if not video_file:
|
| 141 |
+
return False
|
| 142 |
+
|
| 143 |
+
# Check prompt requirements
|
| 144 |
+
if not prompt and not (model_type == "Local Models" and selected_model == "Person on Track Detector"):
|
| 145 |
+
return False
|
| 146 |
+
|
| 147 |
+
# Check API token for remote models
|
| 148 |
+
if not api_token and model_type == "Remote API":
|
| 149 |
+
return False
|
| 150 |
+
|
| 151 |
+
# Check local models availability
|
| 152 |
+
if model_type == "Local Models" and not local_models_available:
|
| 153 |
+
return False
|
| 154 |
+
|
| 155 |
+
return True
|
| 156 |
+
|
| 157 |
+
|
| 158 |
+
def main():
|
| 159 |
+
"""Main application entry point"""
|
| 160 |
+
# Initialize application
|
| 161 |
+
initialize_app()
|
| 162 |
|
| 163 |
+
# Load settings and setup models
|
| 164 |
+
settings = load_settings()
|
| 165 |
+
local_manager, local_models_available = setup_local_models()
|
| 166 |
+
|
| 167 |
+
# Create main layout
|
| 168 |
col1, col2 = st.columns([1, 1])
|
| 169 |
|
| 170 |
with col1:
|
| 171 |
+
# Render sidebar configuration
|
| 172 |
+
config = render_sidebar_config(settings, local_models_available, local_manager)
|
| 173 |
|
| 174 |
+
# Render input section
|
| 175 |
+
input_data = render_input_section()
|
| 176 |
+
video_file = input_data["video_file"]
|
|
|
|
|
|
|
|
|
|
| 177 |
|
| 178 |
+
# Render prompt section
|
| 179 |
+
prompt = render_prompt_section(config)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 180 |
|
| 181 |
+
# Render process button
|
| 182 |
+
process_button = render_process_button()
|
| 183 |
|
| 184 |
with col2:
|
| 185 |
+
# Render results section
|
| 186 |
+
results_container = render_results_header()
|
| 187 |
+
|
| 188 |
+
# Main processing logic
|
| 189 |
+
if process_button:
|
| 190 |
+
if validate_inputs(video_file, prompt, config, local_models_available):
|
| 191 |
+
# Add prompt to config for processing
|
| 192 |
+
config["prompt"] = prompt
|
| 193 |
+
|
| 194 |
+
with st.spinner("Processing video..."):
|
| 195 |
+
# Process video frames
|
| 196 |
+
results = process_video_frames(video_file, config, local_manager)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 197 |
|
| 198 |
+
# Display results
|
| 199 |
+
if results:
|
| 200 |
+
with results_container:
|
| 201 |
+
st.subheader("Analysis Results")
|
| 202 |
|
| 203 |
+
# Display summary statistics
|
| 204 |
+
severity_counts = {}
|
| 205 |
+
for result in results:
|
| 206 |
+
severity = result['ontology_analysis'].get('severity', 'NONE')
|
| 207 |
+
severity_counts[severity] = severity_counts.get(severity, 0) + 1
|
|
|
|
| 208 |
|
| 209 |
+
if config["use_ontology"] and severity_counts:
|
| 210 |
+
st.write("**Summary:**")
|
| 211 |
+
summary_cols = st.columns(len(severity_counts))
|
| 212 |
+
for i, (severity, count) in enumerate(severity_counts.items()):
|
| 213 |
+
icon_map = {
|
| 214 |
+
'NONE': '✅', 'LOW': '🟢', 'MEDIUM': '🟠',
|
| 215 |
+
'HIGH': '⚠️', 'CRITICAL': '🚨'
|
| 216 |
+
}
|
| 217 |
+
with summary_cols[i]:
|
| 218 |
+
st.metric(f"{icon_map.get(severity, '❓')} {severity}", count)
|
| 219 |
+
st.divider()
|
| 220 |
+
|
| 221 |
+
# Display individual frame results
|
| 222 |
+
for result_data in results:
|
| 223 |
+
render_frame_result(result_data)
|
| 224 |
+
else:
|
| 225 |
+
# Show validation errors
|
| 226 |
+
render_validation_errors(
|
| 227 |
+
video_file, prompt, config["api_token"],
|
| 228 |
+
config["model_type"], local_models_available, config["selected_model"]
|
| 229 |
+
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 230 |
|
| 231 |
+
# Render instructions
|
| 232 |
+
render_instructions()
|
| 233 |
+
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 234 |
|
| 235 |
if __name__ == "__main__":
|
| 236 |
main()
|
app_original_backup.py
ADDED
|
@@ -0,0 +1,640 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import streamlit as st
|
| 2 |
+
import cv2
|
| 3 |
+
import os
|
| 4 |
+
import tempfile
|
| 5 |
+
import requests
|
| 6 |
+
import base64
|
| 7 |
+
import subprocess
|
| 8 |
+
import json
|
| 9 |
+
from io import BytesIO
|
| 10 |
+
from PIL import Image
|
| 11 |
+
import numpy as np
|
| 12 |
+
from dotenv import load_dotenv
|
| 13 |
+
from ontology_eval import Observation, evaluate, OntologyContext, decision_to_triples, triples_to_turtle, Severity
|
| 14 |
+
# Try to import local models, fall back gracefully if not available
|
| 15 |
+
try:
|
| 16 |
+
from local_models import get_local_model_manager
|
| 17 |
+
LOCAL_MODELS_AVAILABLE = True
|
| 18 |
+
except ImportError as e:
|
| 19 |
+
LOCAL_MODELS_AVAILABLE = False
|
| 20 |
+
print(f"Local models not available: {e}")
|
| 21 |
+
def get_local_model_manager():
|
| 22 |
+
return None
|
| 23 |
+
|
| 24 |
+
# Load environment variables
|
| 25 |
+
load_dotenv()
|
| 26 |
+
|
| 27 |
+
def load_settings():
|
| 28 |
+
"""Load settings from JSON file"""
|
| 29 |
+
try:
|
| 30 |
+
with open('settings.json', 'r') as f:
|
| 31 |
+
return json.load(f)
|
| 32 |
+
except FileNotFoundError:
|
| 33 |
+
return {}
|
| 34 |
+
|
| 35 |
+
# Local models configuration
|
| 36 |
+
LOCAL_MODELS_ENABLED = LOCAL_MODELS_AVAILABLE
|
| 37 |
+
REMOTE_MODELS_ENABLED = True # Always allow remote API as fallback
|
| 38 |
+
|
| 39 |
+
# Initialize local model manager
|
| 40 |
+
@st.cache_resource
|
| 41 |
+
def initialize_local_models():
|
| 42 |
+
"""Initialize local model manager"""
|
| 43 |
+
return get_local_model_manager()
|
| 44 |
+
|
| 45 |
+
# Hugging Face models for vision-language tasks (kept for compatibility)
|
| 46 |
+
AVAILABLE_MODELS = {
|
| 47 |
+
"microsoft/kosmos-2-patch14-224": "Kosmos-2",
|
| 48 |
+
"Salesforce/blip-image-captioning-large": "BLIP Image Captioning",
|
| 49 |
+
"microsoft/DialoGPT-medium": "DialoGPT",
|
| 50 |
+
"microsoft/git-large-coco": "GIT Large COCO",
|
| 51 |
+
"nlpconnect/vit-gpt2-image-captioning": "ViT-GPT2"
|
| 52 |
+
}
|
| 53 |
+
|
| 54 |
+
def repair_video_with_ffmpeg(input_path, output_path):
|
| 55 |
+
"""
|
| 56 |
+
Repair corrupted video by moving moov atom to the beginning
|
| 57 |
+
"""
|
| 58 |
+
try:
|
| 59 |
+
# Try to fix the video using FFmpeg
|
| 60 |
+
cmd = [
|
| 61 |
+
'ffmpeg',
|
| 62 |
+
'-i', input_path,
|
| 63 |
+
'-c', 'copy',
|
| 64 |
+
'-movflags', 'faststart',
|
| 65 |
+
'-avoid_negative_ts', 'make_zero',
|
| 66 |
+
'-y', # Overwrite output file
|
| 67 |
+
output_path
|
| 68 |
+
]
|
| 69 |
+
|
| 70 |
+
result = subprocess.run(
|
| 71 |
+
cmd,
|
| 72 |
+
capture_output=True,
|
| 73 |
+
text=True,
|
| 74 |
+
timeout=300 # 5 minute timeout
|
| 75 |
+
)
|
| 76 |
+
|
| 77 |
+
return result.returncode == 0
|
| 78 |
+
except (subprocess.TimeoutExpired, FileNotFoundError):
|
| 79 |
+
return False
|
| 80 |
+
|
| 81 |
+
def extract_frames_from_video(video_file, fps=1):
|
| 82 |
+
"""
|
| 83 |
+
Extract frames from video at specified FPS (default 1 frame per second)
|
| 84 |
+
Automatically handles corrupted videos by attempting repair with FFmpeg
|
| 85 |
+
"""
|
| 86 |
+
frames = []
|
| 87 |
+
|
| 88 |
+
with tempfile.NamedTemporaryFile(delete=False, suffix='.mp4') as tmp_file:
|
| 89 |
+
tmp_file.write(video_file.read())
|
| 90 |
+
tmp_file_path = tmp_file.name
|
| 91 |
+
|
| 92 |
+
repaired_path = None
|
| 93 |
+
|
| 94 |
+
try:
|
| 95 |
+
# First attempt: try to open video directly
|
| 96 |
+
cap = cv2.VideoCapture(tmp_file_path)
|
| 97 |
+
|
| 98 |
+
# Check if video opened successfully and has frames
|
| 99 |
+
if not cap.isOpened() or cap.get(cv2.CAP_PROP_FRAME_COUNT) == 0:
|
| 100 |
+
cap.release()
|
| 101 |
+
|
| 102 |
+
# Second attempt: try to repair the video with FFmpeg
|
| 103 |
+
st.warning("Video appears corrupted (moov atom issue). Attempting repair...")
|
| 104 |
+
|
| 105 |
+
with tempfile.NamedTemporaryFile(delete=False, suffix='_repaired.mp4') as repaired_file:
|
| 106 |
+
repaired_path = repaired_file.name
|
| 107 |
+
|
| 108 |
+
if repair_video_with_ffmpeg(tmp_file_path, repaired_path):
|
| 109 |
+
st.success("Video repair successful! Processing frames...")
|
| 110 |
+
cap = cv2.VideoCapture(repaired_path)
|
| 111 |
+
else:
|
| 112 |
+
st.error("Failed to repair video. FFmpeg may not be installed or video is severely corrupted.")
|
| 113 |
+
return frames
|
| 114 |
+
|
| 115 |
+
# Extract video properties
|
| 116 |
+
video_fps = cap.get(cv2.CAP_PROP_FPS)
|
| 117 |
+
if video_fps <= 0:
|
| 118 |
+
video_fps = 30 # Default fallback FPS
|
| 119 |
+
|
| 120 |
+
frame_interval = int(video_fps / fps) if video_fps > fps else 1
|
| 121 |
+
|
| 122 |
+
frame_count = 0
|
| 123 |
+
extracted_count = 0
|
| 124 |
+
|
| 125 |
+
while True:
|
| 126 |
+
ret, frame = cap.read()
|
| 127 |
+
if not ret:
|
| 128 |
+
break
|
| 129 |
+
|
| 130 |
+
if frame_count % frame_interval == 0:
|
| 131 |
+
frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
|
| 132 |
+
pil_image = Image.fromarray(frame_rgb)
|
| 133 |
+
frames.append({
|
| 134 |
+
'frame': pil_image,
|
| 135 |
+
'timestamp': frame_count / video_fps,
|
| 136 |
+
'frame_number': extracted_count
|
| 137 |
+
})
|
| 138 |
+
extracted_count += 1
|
| 139 |
+
|
| 140 |
+
frame_count += 1
|
| 141 |
+
|
| 142 |
+
cap.release()
|
| 143 |
+
|
| 144 |
+
finally:
|
| 145 |
+
# Clean up temporary files
|
| 146 |
+
if os.path.exists(tmp_file_path):
|
| 147 |
+
os.unlink(tmp_file_path)
|
| 148 |
+
if repaired_path and os.path.exists(repaired_path):
|
| 149 |
+
os.unlink(repaired_path)
|
| 150 |
+
|
| 151 |
+
return frames
|
| 152 |
+
|
| 153 |
+
def image_to_base64(image):
|
| 154 |
+
"""Convert PIL image to base64 string"""
|
| 155 |
+
buffer = BytesIO()
|
| 156 |
+
image.save(buffer, format="PNG")
|
| 157 |
+
img_str = base64.b64encode(buffer.getvalue()).decode()
|
| 158 |
+
return img_str
|
| 159 |
+
|
| 160 |
+
def process_image_locally(image, prompt, model_name, local_manager):
|
| 161 |
+
"""
|
| 162 |
+
Process image using local models
|
| 163 |
+
"""
|
| 164 |
+
try:
|
| 165 |
+
if model_name == "Person on Track Detector":
|
| 166 |
+
# Special handling for person-on-track detection
|
| 167 |
+
result = local_manager.person_on_track_detector.detect_person_on_track(image)
|
| 168 |
+
return {"person_on_track_detection": result}
|
| 169 |
+
else:
|
| 170 |
+
caption = local_manager.generate_caption(model_name, image, prompt)
|
| 171 |
+
return {"generated_text": caption}
|
| 172 |
+
except Exception as e:
|
| 173 |
+
return {"error": f"Local processing failed: {str(e)}"}
|
| 174 |
+
|
| 175 |
+
def query_huggingface_api(image, prompt, model_name, api_token):
|
| 176 |
+
"""
|
| 177 |
+
Query Hugging Face API with image and prompt
|
| 178 |
+
"""
|
| 179 |
+
API_URL = f"https://api-inference.huggingface.co/models/{model_name}"
|
| 180 |
+
headers = {"Authorization": f"Bearer {api_token}"}
|
| 181 |
+
|
| 182 |
+
# Convert image to base64
|
| 183 |
+
img_base64 = image_to_base64(image)
|
| 184 |
+
|
| 185 |
+
# Prepare payload based on model type
|
| 186 |
+
if "blip" in model_name.lower():
|
| 187 |
+
# For BLIP models, send image directly
|
| 188 |
+
buffer = BytesIO()
|
| 189 |
+
image.save(buffer, format="PNG")
|
| 190 |
+
response = requests.post(
|
| 191 |
+
API_URL,
|
| 192 |
+
headers=headers,
|
| 193 |
+
files={"file": buffer.getvalue()}
|
| 194 |
+
)
|
| 195 |
+
else:
|
| 196 |
+
# For other vision-language models
|
| 197 |
+
payload = {
|
| 198 |
+
"inputs": {
|
| 199 |
+
"image": img_base64,
|
| 200 |
+
"text": prompt
|
| 201 |
+
}
|
| 202 |
+
}
|
| 203 |
+
response = requests.post(API_URL, headers=headers, json=payload)
|
| 204 |
+
|
| 205 |
+
if response.status_code == 200:
|
| 206 |
+
return response.json()
|
| 207 |
+
else:
|
| 208 |
+
return {"error": f"API request failed: {response.status_code} - {response.text}"}
|
| 209 |
+
|
| 210 |
+
def analyze_scene_with_ontology(scene_description, use_ontology=True):
|
| 211 |
+
"""
|
| 212 |
+
Analyze scene description using ontology-based evaluation
|
| 213 |
+
Returns classification and explanation
|
| 214 |
+
"""
|
| 215 |
+
if not use_ontology:
|
| 216 |
+
return {
|
| 217 |
+
"severity": "NONE",
|
| 218 |
+
"severity_icon": "✅",
|
| 219 |
+
"score": 0,
|
| 220 |
+
"explanation": "Ontology-based analysis skipped",
|
| 221 |
+
"ontology_used": False,
|
| 222 |
+
"raw_description": scene_description
|
| 223 |
+
}
|
| 224 |
+
|
| 225 |
+
# Extract relevant information from scene description for ontology
|
| 226 |
+
scene_lower = scene_description.lower().strip() if scene_description else ""
|
| 227 |
+
|
| 228 |
+
# Initialize observation based on scene analysis
|
| 229 |
+
obs = Observation()
|
| 230 |
+
|
| 231 |
+
# Analyze scene for ontology features
|
| 232 |
+
person_words = ['person', 'people', 'man', 'woman', 'boy', 'girl', 'human', 'individual', 'someone']
|
| 233 |
+
track_words = ['track', 'tracks', 'rail', 'rails', 'railway', 'railroad']
|
| 234 |
+
platform_words = ['platform', 'station', 'bahnsteig']
|
| 235 |
+
danger_words = ['fallen', 'lying', 'down', 'accident', 'emergency']
|
| 236 |
+
fire_words = ['fire', 'smoke', 'flames', 'burning']
|
| 237 |
+
crowd_words = ['crowd', 'many people', 'group', 'mehrere personen']
|
| 238 |
+
safe_words = ['no people', 'empty', 'clear', 'safe', 'nobody', 'without people']
|
| 239 |
+
|
| 240 |
+
# Set observation values based on keyword analysis
|
| 241 |
+
person_mentions = sum(1 for word in person_words if word in scene_lower)
|
| 242 |
+
track_mentions = sum(1 for word in track_words if word in scene_lower)
|
| 243 |
+
platform_mentions = sum(1 for word in platform_words if word in scene_lower)
|
| 244 |
+
danger_mentions = sum(1 for word in danger_words if word in scene_lower)
|
| 245 |
+
fire_mentions = sum(1 for word in fire_words if word in scene_lower)
|
| 246 |
+
crowd_mentions = sum(1 for word in crowd_words if word in scene_lower)
|
| 247 |
+
safe_mentions = sum(1 for word in safe_words if word in scene_lower)
|
| 248 |
+
|
| 249 |
+
# Person on track detection (but not if explicitly safe)
|
| 250 |
+
if person_mentions > 0 and track_mentions > 0 and safe_mentions == 0:
|
| 251 |
+
# Check if person is actually on the tracks vs just mentioned
|
| 252 |
+
on_track_indicators = ['on track', 'on the track', 'on rails', 'on the rails', 'standing on', 'walking on']
|
| 253 |
+
on_track_specific = sum(1 for phrase in on_track_indicators if phrase in scene_lower)
|
| 254 |
+
if on_track_specific > 0:
|
| 255 |
+
obs.on_track_person = min(0.8, 0.6 + on_track_specific * 0.1)
|
| 256 |
+
elif person_mentions > 0 and track_mentions > 0:
|
| 257 |
+
# General co-occurrence but less confident - need stronger evidence
|
| 258 |
+
near_indicators = ['near', 'close to', 'next to', 'beside', 'by the']
|
| 259 |
+
near_mentions = sum(1 for phrase in near_indicators if phrase in scene_lower)
|
| 260 |
+
if near_mentions > 0:
|
| 261 |
+
# Person near tracks but not necessarily on them - lower confidence
|
| 262 |
+
obs.on_track_person = min(0.4, 0.25 + near_mentions * 0.05)
|
| 263 |
+
else:
|
| 264 |
+
# Just mention of person and tracks together - very low confidence
|
| 265 |
+
obs.on_track_person = min(0.3, 0.2 + (person_mentions + track_mentions) * 0.02)
|
| 266 |
+
|
| 267 |
+
# Fallen person detection
|
| 268 |
+
if person_mentions > 0 and danger_mentions > 0:
|
| 269 |
+
obs.fallen_person = min(0.7, 0.4 + danger_mentions * 0.1)
|
| 270 |
+
|
| 271 |
+
# Fire/smoke detection
|
| 272 |
+
if fire_mentions > 0:
|
| 273 |
+
obs.smoke_or_fire = min(0.8, 0.5 + fire_mentions * 0.15)
|
| 274 |
+
|
| 275 |
+
# Crowd detection
|
| 276 |
+
if crowd_mentions > 0 and (track_mentions > 0 or platform_mentions > 0):
|
| 277 |
+
obs.crowd_on_track = min(0.7, 0.4 + crowd_mentions * 0.1)
|
| 278 |
+
|
| 279 |
+
# Generic object detection (if no person but something mentioned on tracks)
|
| 280 |
+
if track_mentions > 0 and person_mentions == 0 and any(word in scene_lower for word in ['object', 'item', 'thing', 'debris']):
|
| 281 |
+
obs.object_on_track = 0.6
|
| 282 |
+
|
| 283 |
+
# Evaluate using ontology
|
| 284 |
+
decision = evaluate(obs)
|
| 285 |
+
|
| 286 |
+
# Map severity to icons and colors
|
| 287 |
+
severity_mapping = {
|
| 288 |
+
Severity.NONE: {"icon": "✅", "color": "green"},
|
| 289 |
+
Severity.LOW: {"icon": "🟢", "color": "lightgreen"},
|
| 290 |
+
Severity.MEDIUM: {"icon": "🟠", "color": "orange"},
|
| 291 |
+
Severity.HIGH: {"icon": "⚠️", "color": "red"},
|
| 292 |
+
Severity.CRITICAL: {"icon": "🚨", "color": "darkred"}
|
| 293 |
+
}
|
| 294 |
+
|
| 295 |
+
severity_info = severity_mapping[decision.severity]
|
| 296 |
+
|
| 297 |
+
return {
|
| 298 |
+
"severity": decision.severity.name,
|
| 299 |
+
"severity_icon": severity_info["icon"],
|
| 300 |
+
"severity_color": severity_info["color"],
|
| 301 |
+
"score": decision.score_0_100,
|
| 302 |
+
"labels": [label.value for label in decision.labels],
|
| 303 |
+
"explanations": decision.explanations,
|
| 304 |
+
"fired_rules": decision.fired_rules,
|
| 305 |
+
"ontology_used": True,
|
| 306 |
+
"raw_description": scene_description,
|
| 307 |
+
"observation": obs,
|
| 308 |
+
"decision": decision
|
| 309 |
+
}
|
| 310 |
+
|
| 311 |
+
def main():
|
| 312 |
+
st.set_page_config(
|
| 313 |
+
page_title="Video Frame Analyzer",
|
| 314 |
+
page_icon="🎥",
|
| 315 |
+
layout="wide"
|
| 316 |
+
)
|
| 317 |
+
|
| 318 |
+
st.title("🎥 Video Frame Analyzer with Local AI Models")
|
| 319 |
+
st.markdown("Upload a video, provide a prompt, and analyze each frame using local AI models (CNN or Transformer)")
|
| 320 |
+
|
| 321 |
+
# Load settings and initialize local models
|
| 322 |
+
settings = load_settings()
|
| 323 |
+
|
| 324 |
+
# Initialize local models if enabled
|
| 325 |
+
local_manager = None
|
| 326 |
+
local_models_available = False
|
| 327 |
+
|
| 328 |
+
if LOCAL_MODELS_ENABLED:
|
| 329 |
+
try:
|
| 330 |
+
local_manager = initialize_local_models()
|
| 331 |
+
local_models_available = True
|
| 332 |
+
st.success("🤖 Local AI models initialized successfully!")
|
| 333 |
+
except Exception as e:
|
| 334 |
+
st.warning(f"Local AI models not available: {str(e)}")
|
| 335 |
+
st.info("💡 Install AI packages: `pip install torch torchvision transformers accelerate sentencepiece`")
|
| 336 |
+
local_models_available = False
|
| 337 |
+
else:
|
| 338 |
+
st.info("💡 Local AI models not installed. Install with: `pip install torch torchvision transformers accelerate sentencepiece`")
|
| 339 |
+
|
| 340 |
+
# Sidebar for configuration
|
| 341 |
+
with st.sidebar:
|
| 342 |
+
st.header("Configuration")
|
| 343 |
+
|
| 344 |
+
# Model type selection
|
| 345 |
+
available_options = []
|
| 346 |
+
if local_models_available:
|
| 347 |
+
available_options.append("Local Models")
|
| 348 |
+
if REMOTE_MODELS_ENABLED:
|
| 349 |
+
available_options.append("Remote API")
|
| 350 |
+
|
| 351 |
+
if not available_options:
|
| 352 |
+
available_options = ["Remote API"] # Fallback
|
| 353 |
+
|
| 354 |
+
model_type = st.radio(
|
| 355 |
+
"Model Type",
|
| 356 |
+
available_options,
|
| 357 |
+
help="Choose between local AI models or remote Hugging Face API"
|
| 358 |
+
)
|
| 359 |
+
|
| 360 |
+
if model_type == "Local Models" and local_models_available:
|
| 361 |
+
# Local model selection
|
| 362 |
+
available_local_models = local_manager.get_available_models()
|
| 363 |
+
selected_model = st.selectbox(
|
| 364 |
+
"Select Local Model",
|
| 365 |
+
options=available_local_models,
|
| 366 |
+
help="Choose between CNN (fast) or Transformer (detailed) models"
|
| 367 |
+
)
|
| 368 |
+
|
| 369 |
+
# Show model info
|
| 370 |
+
model_info = local_manager.get_model_info()
|
| 371 |
+
if selected_model in model_info:
|
| 372 |
+
with st.expander("Model Information"):
|
| 373 |
+
st.write(f"**Description:** {model_info[selected_model]['description']}")
|
| 374 |
+
st.write(f"**Strengths:** {model_info[selected_model]['strengths']}")
|
| 375 |
+
st.write(f"**Size:** {model_info[selected_model]['size']}")
|
| 376 |
+
|
| 377 |
+
api_token = None # Not needed for local models
|
| 378 |
+
|
| 379 |
+
else:
|
| 380 |
+
# Remote API configuration
|
| 381 |
+
default_token = settings.get('hugging_face_api_token', '')
|
| 382 |
+
api_token = st.text_input(
|
| 383 |
+
"Hugging Face API Token",
|
| 384 |
+
value=default_token,
|
| 385 |
+
type="password",
|
| 386 |
+
help="Get your token from https://huggingface.co/settings/tokens or save in settings.json"
|
| 387 |
+
)
|
| 388 |
+
|
| 389 |
+
# Remote model selection
|
| 390 |
+
selected_model = st.selectbox(
|
| 391 |
+
"Select Model",
|
| 392 |
+
options=list(AVAILABLE_MODELS.keys()),
|
| 393 |
+
format_func=lambda x: AVAILABLE_MODELS[x]
|
| 394 |
+
)
|
| 395 |
+
|
| 396 |
+
# Frame extraction rate
|
| 397 |
+
fps = st.slider(
|
| 398 |
+
"Frames per second to extract",
|
| 399 |
+
min_value=0.1,
|
| 400 |
+
max_value=5.0,
|
| 401 |
+
value=1.0,
|
| 402 |
+
step=0.1
|
| 403 |
+
)
|
| 404 |
+
|
| 405 |
+
# Ontology settings
|
| 406 |
+
st.subheader("Ontology Analysis")
|
| 407 |
+
use_ontology = st.checkbox(
|
| 408 |
+
"Enable Ontology Analysis",
|
| 409 |
+
value=True,
|
| 410 |
+
help="Use ontology-based classification (NONE/LOW/MEDIUM/HIGH/CRITICAL)"
|
| 411 |
+
)
|
| 412 |
+
|
| 413 |
+
if not use_ontology:
|
| 414 |
+
st.info("🔄 Ontology analysis disabled - showing raw model output only")
|
| 415 |
+
|
| 416 |
+
# Main content area
|
| 417 |
+
col1, col2 = st.columns([1, 1])
|
| 418 |
+
|
| 419 |
+
with col1:
|
| 420 |
+
st.header("Input")
|
| 421 |
+
|
| 422 |
+
# Video upload
|
| 423 |
+
video_file = st.file_uploader(
|
| 424 |
+
"Upload Video",
|
| 425 |
+
type=['mp4', 'avi', 'mov', 'mkv'],
|
| 426 |
+
help="Upload a video file to analyze"
|
| 427 |
+
)
|
| 428 |
+
|
| 429 |
+
# Prompt input (conditional based on model)
|
| 430 |
+
if model_type == "Local Models" and local_models_available and selected_model == "Person on Track Detector":
|
| 431 |
+
# Person on Track Detector works automatically
|
| 432 |
+
st.info("🤖 Person on Track Detector works automatically - no prompt needed!")
|
| 433 |
+
prompt = "automatic" # Set automatic prompt
|
| 434 |
+
else:
|
| 435 |
+
# Regular models need user prompt
|
| 436 |
+
prompt = st.text_area(
|
| 437 |
+
"Analysis Prompt",
|
| 438 |
+
placeholder="Describe what you see in the image...",
|
| 439 |
+
help="Enter the prompt to analyze each frame"
|
| 440 |
+
)
|
| 441 |
+
|
| 442 |
+
# Process button
|
| 443 |
+
process_button = st.button("Process Video", type="primary")
|
| 444 |
+
|
| 445 |
+
with col2:
|
| 446 |
+
st.header("Results")
|
| 447 |
+
results_container = st.container()
|
| 448 |
+
|
| 449 |
+
# Processing logic
|
| 450 |
+
if process_button and video_file and (prompt or (model_type == "Local Models" and selected_model == "Person on Track Detector")) and (api_token or model_type == "Local Models"):
|
| 451 |
+
with st.spinner("Processing video..."):
|
| 452 |
+
# Extract frames
|
| 453 |
+
frames = extract_frames_from_video(video_file, fps)
|
| 454 |
+
|
| 455 |
+
if not frames:
|
| 456 |
+
st.error("No frames could be extracted from the video")
|
| 457 |
+
return
|
| 458 |
+
|
| 459 |
+
st.success(f"Extracted {len(frames)} frames from video")
|
| 460 |
+
|
| 461 |
+
# Process each frame
|
| 462 |
+
results = []
|
| 463 |
+
progress_bar = st.progress(0)
|
| 464 |
+
|
| 465 |
+
for i, frame_data in enumerate(frames):
|
| 466 |
+
with st.spinner(f"Analyzing frame {i+1}/{len(frames)}..."):
|
| 467 |
+
# Process frame based on model type
|
| 468 |
+
if model_type == "Local Models" and local_models_available:
|
| 469 |
+
result = process_image_locally(
|
| 470 |
+
frame_data['frame'],
|
| 471 |
+
prompt,
|
| 472 |
+
selected_model,
|
| 473 |
+
local_manager
|
| 474 |
+
)
|
| 475 |
+
else:
|
| 476 |
+
result = query_huggingface_api(
|
| 477 |
+
frame_data['frame'],
|
| 478 |
+
prompt,
|
| 479 |
+
selected_model,
|
| 480 |
+
api_token
|
| 481 |
+
)
|
| 482 |
+
|
| 483 |
+
# Extract scene description for ontology analysis
|
| 484 |
+
scene_description = ""
|
| 485 |
+
if 'person_on_track_detection' in result:
|
| 486 |
+
# For person detection results, use the analysis text
|
| 487 |
+
scene_description = result['person_on_track_detection'].get('detailed_analysis', {}).get('scene_description', '')
|
| 488 |
+
elif 'generated_text' in result:
|
| 489 |
+
scene_description = result['generated_text']
|
| 490 |
+
elif isinstance(result, list) and len(result) > 0 and 'generated_text' in result[0]:
|
| 491 |
+
scene_description = result[0]['generated_text']
|
| 492 |
+
|
| 493 |
+
# Apply ontology analysis
|
| 494 |
+
ontology_analysis = analyze_scene_with_ontology(scene_description, use_ontology)
|
| 495 |
+
|
| 496 |
+
results.append({
|
| 497 |
+
'frame_number': frame_data['frame_number'],
|
| 498 |
+
'timestamp': frame_data['timestamp'],
|
| 499 |
+
'image': frame_data['frame'],
|
| 500 |
+
'result': result,
|
| 501 |
+
'ontology_analysis': ontology_analysis
|
| 502 |
+
})
|
| 503 |
+
|
| 504 |
+
progress_bar.progress((i + 1) / len(frames))
|
| 505 |
+
|
| 506 |
+
# Display results
|
| 507 |
+
with results_container:
|
| 508 |
+
st.subheader("Analysis Results")
|
| 509 |
+
|
| 510 |
+
for result_data in results:
|
| 511 |
+
ontology = result_data['ontology_analysis']
|
| 512 |
+
severity_icon = ontology.get('severity_icon', '✅')
|
| 513 |
+
severity = ontology.get('severity', 'NONE')
|
| 514 |
+
|
| 515 |
+
# Create expander title with severity indicator
|
| 516 |
+
expander_title = f"{severity_icon} {severity} - Frame {result_data['frame_number']} (t={result_data['timestamp']:.1f}s)"
|
| 517 |
+
|
| 518 |
+
with st.expander(expander_title):
|
| 519 |
+
col_img, col_text = st.columns([1, 2])
|
| 520 |
+
|
| 521 |
+
with col_img:
|
| 522 |
+
st.image(
|
| 523 |
+
result_data['image'],
|
| 524 |
+
caption=f"Frame {result_data['frame_number']}",
|
| 525 |
+
use_container_width=True
|
| 526 |
+
)
|
| 527 |
+
|
| 528 |
+
with col_text:
|
| 529 |
+
# Display ontology analysis first if enabled
|
| 530 |
+
if ontology.get('ontology_used', False):
|
| 531 |
+
# Severity display with color
|
| 532 |
+
severity_color = ontology.get('severity_color', 'green')
|
| 533 |
+
st.markdown(f"**Safety Assessment:** :{severity_color}[{severity_icon} {severity}]")
|
| 534 |
+
|
| 535 |
+
# Score display
|
| 536 |
+
if ontology.get('score', 0) > 0:
|
| 537 |
+
st.metric("Risk Score", f"{ontology['score']}/100")
|
| 538 |
+
|
| 539 |
+
# Show explanations if available
|
| 540 |
+
if ontology.get('explanations'):
|
| 541 |
+
st.write("**Ontology Analysis:**")
|
| 542 |
+
for explanation in ontology['explanations']:
|
| 543 |
+
st.write(f"• {explanation}")
|
| 544 |
+
|
| 545 |
+
# Show fired rules if available
|
| 546 |
+
if ontology.get('fired_rules'):
|
| 547 |
+
with st.expander("Technical Details"):
|
| 548 |
+
st.write("**Triggered Rules:**")
|
| 549 |
+
for rule in ontology['fired_rules']:
|
| 550 |
+
st.code(rule)
|
| 551 |
+
|
| 552 |
+
if ontology.get('labels'):
|
| 553 |
+
st.write("**Detected Hazard Labels:**")
|
| 554 |
+
for label in ontology['labels']:
|
| 555 |
+
st.code(label)
|
| 556 |
+
|
| 557 |
+
st.divider()
|
| 558 |
+
|
| 559 |
+
# Display original model results
|
| 560 |
+
st.write("**Model Output:**")
|
| 561 |
+
if 'error' in result_data['result']:
|
| 562 |
+
st.error(f"Error: {result_data['result']['error']}")
|
| 563 |
+
elif 'person_on_track_detection' in result_data['result']:
|
| 564 |
+
# Handle person-on-track detection results
|
| 565 |
+
detection = result_data['result']['person_on_track_detection']
|
| 566 |
+
|
| 567 |
+
people_count = detection.get('people_count', 0)
|
| 568 |
+
confidence = detection.get('confidence', 0)
|
| 569 |
+
analysis = detection.get('analysis', 'No analysis')
|
| 570 |
+
person_on_track = detection.get('person_on_track', False)
|
| 571 |
+
|
| 572 |
+
st.write(f"**Detection Analysis:** {analysis}")
|
| 573 |
+
|
| 574 |
+
# Show metrics
|
| 575 |
+
col1, col2 = st.columns(2)
|
| 576 |
+
with col1:
|
| 577 |
+
st.metric("👥 People Detected", people_count)
|
| 578 |
+
with col2:
|
| 579 |
+
st.metric("📊 Model Confidence", f"{confidence:.0%}")
|
| 580 |
+
else:
|
| 581 |
+
if 'generated_text' in result_data['result']:
|
| 582 |
+
# Handle direct generated_text response (local models)
|
| 583 |
+
st.write(f"*{result_data['result']['generated_text']}*")
|
| 584 |
+
elif isinstance(result_data['result'], list) and len(result_data['result']) > 0:
|
| 585 |
+
# Handle list responses (common for captioning models)
|
| 586 |
+
if 'generated_text' in result_data['result'][0]:
|
| 587 |
+
st.write(f"*{result_data['result'][0]['generated_text']}*")
|
| 588 |
+
else:
|
| 589 |
+
st.json(result_data['result'][0])
|
| 590 |
+
else:
|
| 591 |
+
st.json(result_data['result'])
|
| 592 |
+
|
| 593 |
+
elif process_button:
|
| 594 |
+
if not video_file:
|
| 595 |
+
st.error("Please upload a video file")
|
| 596 |
+
if not prompt and not (model_type == "Local Models" and selected_model == "Person on Track Detector"):
|
| 597 |
+
st.error("Please enter an analysis prompt")
|
| 598 |
+
if not api_token and model_type == "Remote API":
|
| 599 |
+
st.error("Please provide your Hugging Face API token for remote models")
|
| 600 |
+
if model_type == "Local Models" and not local_models_available:
|
| 601 |
+
st.error("Local models failed to initialize. Check your installation.")
|
| 602 |
+
|
| 603 |
+
# Instructions
|
| 604 |
+
with st.expander("How to use"):
|
| 605 |
+
st.markdown("""
|
| 606 |
+
## Local AI Models (Recommended)
|
| 607 |
+
1. **Upload a video**: Choose a video file (MP4, AVI, MOV, or MKV)
|
| 608 |
+
2. **Select model type**: Choose "Local Models" for offline processing
|
| 609 |
+
3. **Choose AI model**:
|
| 610 |
+
- **CNN (BLIP)**: Fast, good for object detection (~1.2GB)
|
| 611 |
+
- **Transformer (ViT-GPT2)**: Detailed descriptions (~1.8GB)
|
| 612 |
+
4. **Enter a prompt**: Describe what you want the AI to analyze
|
| 613 |
+
5. **Adjust frame rate**: Set frames per second to extract (default: 1 fps)
|
| 614 |
+
6. **Click Process**: Frames are processed locally on your machine
|
| 615 |
+
|
| 616 |
+
## Remote API Models (Optional)
|
| 617 |
+
1. **Get API token**: Visit [Hugging Face Settings](https://huggingface.co/settings/tokens)
|
| 618 |
+
2. **Select "Remote API"** in model type
|
| 619 |
+
3. **Enter token** and select remote model
|
| 620 |
+
|
| 621 |
+
## Video Support Features
|
| 622 |
+
- **Automatic corruption repair**: Handles videos with corrupted moov atoms
|
| 623 |
+
- **FFmpeg integration**: Auto-repairs problematic video files
|
| 624 |
+
- **Multiple formats**: MP4, AVI, MOV, MKV support
|
| 625 |
+
|
| 626 |
+
## Requirements
|
| 627 |
+
- **Python packages**: torch, transformers, accelerate (see requirements.txt)
|
| 628 |
+
- **Optional**: FFmpeg for video repair (download from https://ffmpeg.org)
|
| 629 |
+
- **Storage**: ~3GB for both local models
|
| 630 |
+
|
| 631 |
+
## Example Prompts
|
| 632 |
+
- "Describe what you see in this image"
|
| 633 |
+
- "Count the number of people in this scene"
|
| 634 |
+
- "What objects are visible in this frame?"
|
| 635 |
+
- "Describe the emotions and actions in this scene"
|
| 636 |
+
- "What is the main activity happening here?"
|
| 637 |
+
""")
|
| 638 |
+
|
| 639 |
+
if __name__ == "__main__":
|
| 640 |
+
main()
|
app_refactored.py
ADDED
|
@@ -0,0 +1,236 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""
|
| 3 |
+
Main Streamlit application for video frame analysis with ontology-based risk assessment
|
| 4 |
+
Refactored for better code organization and maintainability
|
| 5 |
+
"""
|
| 6 |
+
import streamlit as st
|
| 7 |
+
import json
|
| 8 |
+
from dotenv import load_dotenv
|
| 9 |
+
|
| 10 |
+
# Import our modular components
|
| 11 |
+
from video_processing import extract_frames_from_video
|
| 12 |
+
from ontology_integration import analyze_scene_with_ontology, extract_scene_description
|
| 13 |
+
from model_processing import process_frame
|
| 14 |
+
from ui_components import (
|
| 15 |
+
render_sidebar_config,
|
| 16 |
+
render_input_section,
|
| 17 |
+
render_prompt_section,
|
| 18 |
+
render_process_button,
|
| 19 |
+
render_results_header,
|
| 20 |
+
render_frame_result,
|
| 21 |
+
render_validation_errors,
|
| 22 |
+
render_instructions
|
| 23 |
+
)
|
| 24 |
+
|
| 25 |
+
# Try to import local models, fall back gracefully if not available
|
| 26 |
+
try:
|
| 27 |
+
from local_models import get_local_model_manager
|
| 28 |
+
LOCAL_MODELS_AVAILABLE = True
|
| 29 |
+
except ImportError as e:
|
| 30 |
+
LOCAL_MODELS_AVAILABLE = False
|
| 31 |
+
print(f"Local models not available: {e}")
|
| 32 |
+
def get_local_model_manager():
|
| 33 |
+
return None
|
| 34 |
+
|
| 35 |
+
# Load environment variables
|
| 36 |
+
load_dotenv()
|
| 37 |
+
|
| 38 |
+
|
| 39 |
+
def load_settings():
|
| 40 |
+
"""Load settings from JSON file"""
|
| 41 |
+
try:
|
| 42 |
+
with open('settings.json', 'r') as f:
|
| 43 |
+
return json.load(f)
|
| 44 |
+
except FileNotFoundError:
|
| 45 |
+
return {}
|
| 46 |
+
|
| 47 |
+
|
| 48 |
+
@st.cache_resource
|
| 49 |
+
def initialize_local_models():
|
| 50 |
+
"""Initialize local model manager"""
|
| 51 |
+
return get_local_model_manager()
|
| 52 |
+
|
| 53 |
+
|
| 54 |
+
def initialize_app():
|
| 55 |
+
"""Initialize the Streamlit application"""
|
| 56 |
+
st.set_page_config(
|
| 57 |
+
page_title="Video Frame Analyzer with Ontology",
|
| 58 |
+
page_icon="🎥",
|
| 59 |
+
layout="wide"
|
| 60 |
+
)
|
| 61 |
+
|
| 62 |
+
st.title("🎥 Video Frame Analyzer with Ontology-Based Risk Assessment")
|
| 63 |
+
st.markdown("Upload a video and analyze frames using AI models with ontology-based safety classification")
|
| 64 |
+
|
| 65 |
+
|
| 66 |
+
def setup_local_models():
|
| 67 |
+
"""Setup local models and return availability status"""
|
| 68 |
+
local_manager = None
|
| 69 |
+
local_models_available = False
|
| 70 |
+
|
| 71 |
+
if LOCAL_MODELS_AVAILABLE:
|
| 72 |
+
try:
|
| 73 |
+
local_manager = initialize_local_models()
|
| 74 |
+
local_models_available = True
|
| 75 |
+
st.success("🤖 Local AI models initialized successfully!")
|
| 76 |
+
except Exception as e:
|
| 77 |
+
st.warning(f"Local AI models not available: {str(e)}")
|
| 78 |
+
st.info("💡 Install AI packages: `pip install torch torchvision transformers accelerate sentencepiece`")
|
| 79 |
+
local_models_available = False
|
| 80 |
+
else:
|
| 81 |
+
st.info("💡 Local AI models not installed. Install with: `pip install torch torchvision transformers accelerate sentencepiece`")
|
| 82 |
+
|
| 83 |
+
return local_manager, local_models_available
|
| 84 |
+
|
| 85 |
+
|
| 86 |
+
def process_video_frames(video_file, config, local_manager=None):
|
| 87 |
+
"""
|
| 88 |
+
Process all frames in the video and return results
|
| 89 |
+
"""
|
| 90 |
+
# Extract frames
|
| 91 |
+
frames = extract_frames_from_video(video_file, config["fps"])
|
| 92 |
+
|
| 93 |
+
if not frames:
|
| 94 |
+
st.error("No frames could be extracted from the video")
|
| 95 |
+
return []
|
| 96 |
+
|
| 97 |
+
st.success(f"Extracted {len(frames)} frames from video")
|
| 98 |
+
|
| 99 |
+
# Process each frame
|
| 100 |
+
results = []
|
| 101 |
+
progress_bar = st.progress(0)
|
| 102 |
+
|
| 103 |
+
# Add prompt to config for processing
|
| 104 |
+
processing_config = config.copy()
|
| 105 |
+
processing_config["prompt"] = config.get("prompt", "")
|
| 106 |
+
|
| 107 |
+
for i, frame_data in enumerate(frames):
|
| 108 |
+
with st.spinner(f"Analyzing frame {i+1}/{len(frames)}..."):
|
| 109 |
+
# Process frame with selected model
|
| 110 |
+
result = process_frame(frame_data, processing_config, local_manager)
|
| 111 |
+
|
| 112 |
+
# Extract scene description for ontology analysis
|
| 113 |
+
scene_description = extract_scene_description(result)
|
| 114 |
+
|
| 115 |
+
# Apply ontology analysis
|
| 116 |
+
ontology_analysis = analyze_scene_with_ontology(scene_description, config["use_ontology"])
|
| 117 |
+
|
| 118 |
+
results.append({
|
| 119 |
+
'frame_number': frame_data['frame_number'],
|
| 120 |
+
'timestamp': frame_data['timestamp'],
|
| 121 |
+
'image': frame_data['frame'],
|
| 122 |
+
'result': result,
|
| 123 |
+
'ontology_analysis': ontology_analysis
|
| 124 |
+
})
|
| 125 |
+
|
| 126 |
+
progress_bar.progress((i + 1) / len(frames))
|
| 127 |
+
|
| 128 |
+
return results
|
| 129 |
+
|
| 130 |
+
|
| 131 |
+
def validate_inputs(video_file, prompt, config, local_models_available):
|
| 132 |
+
"""
|
| 133 |
+
Validate all required inputs
|
| 134 |
+
"""
|
| 135 |
+
model_type = config["model_type"]
|
| 136 |
+
selected_model = config["selected_model"]
|
| 137 |
+
api_token = config["api_token"]
|
| 138 |
+
|
| 139 |
+
# Check basic requirements
|
| 140 |
+
if not video_file:
|
| 141 |
+
return False
|
| 142 |
+
|
| 143 |
+
# Check prompt requirements
|
| 144 |
+
if not prompt and not (model_type == "Local Models" and selected_model == "Person on Track Detector"):
|
| 145 |
+
return False
|
| 146 |
+
|
| 147 |
+
# Check API token for remote models
|
| 148 |
+
if not api_token and model_type == "Remote API":
|
| 149 |
+
return False
|
| 150 |
+
|
| 151 |
+
# Check local models availability
|
| 152 |
+
if model_type == "Local Models" and not local_models_available:
|
| 153 |
+
return False
|
| 154 |
+
|
| 155 |
+
return True
|
| 156 |
+
|
| 157 |
+
|
| 158 |
+
def main():
|
| 159 |
+
"""Main application entry point"""
|
| 160 |
+
# Initialize application
|
| 161 |
+
initialize_app()
|
| 162 |
+
|
| 163 |
+
# Load settings and setup models
|
| 164 |
+
settings = load_settings()
|
| 165 |
+
local_manager, local_models_available = setup_local_models()
|
| 166 |
+
|
| 167 |
+
# Create main layout
|
| 168 |
+
col1, col2 = st.columns([1, 1])
|
| 169 |
+
|
| 170 |
+
with col1:
|
| 171 |
+
# Render sidebar configuration
|
| 172 |
+
config = render_sidebar_config(settings, local_models_available, local_manager)
|
| 173 |
+
|
| 174 |
+
# Render input section
|
| 175 |
+
input_data = render_input_section()
|
| 176 |
+
video_file = input_data["video_file"]
|
| 177 |
+
|
| 178 |
+
# Render prompt section
|
| 179 |
+
prompt = render_prompt_section(config)
|
| 180 |
+
|
| 181 |
+
# Render process button
|
| 182 |
+
process_button = render_process_button()
|
| 183 |
+
|
| 184 |
+
with col2:
|
| 185 |
+
# Render results section
|
| 186 |
+
results_container = render_results_header()
|
| 187 |
+
|
| 188 |
+
# Main processing logic
|
| 189 |
+
if process_button:
|
| 190 |
+
if validate_inputs(video_file, prompt, config, local_models_available):
|
| 191 |
+
# Add prompt to config for processing
|
| 192 |
+
config["prompt"] = prompt
|
| 193 |
+
|
| 194 |
+
with st.spinner("Processing video..."):
|
| 195 |
+
# Process video frames
|
| 196 |
+
results = process_video_frames(video_file, config, local_manager)
|
| 197 |
+
|
| 198 |
+
# Display results
|
| 199 |
+
if results:
|
| 200 |
+
with results_container:
|
| 201 |
+
st.subheader("Analysis Results")
|
| 202 |
+
|
| 203 |
+
# Display summary statistics
|
| 204 |
+
severity_counts = {}
|
| 205 |
+
for result in results:
|
| 206 |
+
severity = result['ontology_analysis'].get('severity', 'NONE')
|
| 207 |
+
severity_counts[severity] = severity_counts.get(severity, 0) + 1
|
| 208 |
+
|
| 209 |
+
if config["use_ontology"] and severity_counts:
|
| 210 |
+
st.write("**Summary:**")
|
| 211 |
+
summary_cols = st.columns(len(severity_counts))
|
| 212 |
+
for i, (severity, count) in enumerate(severity_counts.items()):
|
| 213 |
+
icon_map = {
|
| 214 |
+
'NONE': '✅', 'LOW': '🟢', 'MEDIUM': '🟠',
|
| 215 |
+
'HIGH': '⚠️', 'CRITICAL': '🚨'
|
| 216 |
+
}
|
| 217 |
+
with summary_cols[i]:
|
| 218 |
+
st.metric(f"{icon_map.get(severity, '❓')} {severity}", count)
|
| 219 |
+
st.divider()
|
| 220 |
+
|
| 221 |
+
# Display individual frame results
|
| 222 |
+
for result_data in results:
|
| 223 |
+
render_frame_result(result_data)
|
| 224 |
+
else:
|
| 225 |
+
# Show validation errors
|
| 226 |
+
render_validation_errors(
|
| 227 |
+
video_file, prompt, config["api_token"],
|
| 228 |
+
config["model_type"], local_models_available, config["selected_model"]
|
| 229 |
+
)
|
| 230 |
+
|
| 231 |
+
# Render instructions
|
| 232 |
+
render_instructions()
|
| 233 |
+
|
| 234 |
+
|
| 235 |
+
if __name__ == "__main__":
|
| 236 |
+
main()
|
model_processing.py
ADDED
|
@@ -0,0 +1,97 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""
|
| 3 |
+
Model processing utilities for local and remote AI models
|
| 4 |
+
"""
|
| 5 |
+
import requests
|
| 6 |
+
import base64
|
| 7 |
+
from io import BytesIO
|
| 8 |
+
from PIL import Image
|
| 9 |
+
from typing import Dict, Any, Optional
|
| 10 |
+
|
| 11 |
+
|
| 12 |
+
def image_to_base64(image: Image.Image) -> str:
|
| 13 |
+
"""Convert PIL image to base64 string"""
|
| 14 |
+
buffer = BytesIO()
|
| 15 |
+
image.save(buffer, format="PNG")
|
| 16 |
+
img_str = base64.b64encode(buffer.getvalue()).decode()
|
| 17 |
+
return img_str
|
| 18 |
+
|
| 19 |
+
|
| 20 |
+
def process_image_locally(image: Image.Image, prompt: str, model_name: str, local_manager) -> Dict[str, Any]:
|
| 21 |
+
"""
|
| 22 |
+
Process image using local models
|
| 23 |
+
"""
|
| 24 |
+
try:
|
| 25 |
+
if model_name == "Person on Track Detector":
|
| 26 |
+
# Special handling for person-on-track detection
|
| 27 |
+
result = local_manager.person_on_track_detector.detect_person_on_track(image)
|
| 28 |
+
return {"person_on_track_detection": result}
|
| 29 |
+
else:
|
| 30 |
+
caption = local_manager.generate_caption(model_name, image, prompt)
|
| 31 |
+
return {"generated_text": caption}
|
| 32 |
+
except Exception as e:
|
| 33 |
+
return {"error": f"Local processing failed: {str(e)}"}
|
| 34 |
+
|
| 35 |
+
|
| 36 |
+
def query_huggingface_api(image: Image.Image, prompt: str, model_name: str, api_token: str) -> Dict[str, Any]:
|
| 37 |
+
"""
|
| 38 |
+
Query Hugging Face API with image and prompt
|
| 39 |
+
"""
|
| 40 |
+
API_URL = f"https://api-inference.huggingface.co/models/{model_name}"
|
| 41 |
+
headers = {"Authorization": f"Bearer {api_token}"}
|
| 42 |
+
|
| 43 |
+
# Convert image to base64
|
| 44 |
+
img_base64 = image_to_base64(image)
|
| 45 |
+
|
| 46 |
+
# Prepare payload based on model type
|
| 47 |
+
if "blip" in model_name.lower():
|
| 48 |
+
# For BLIP models, send image directly
|
| 49 |
+
buffer = BytesIO()
|
| 50 |
+
image.save(buffer, format="PNG")
|
| 51 |
+
response = requests.post(
|
| 52 |
+
API_URL,
|
| 53 |
+
headers=headers,
|
| 54 |
+
files={"file": buffer.getvalue()}
|
| 55 |
+
)
|
| 56 |
+
else:
|
| 57 |
+
# For other vision-language models
|
| 58 |
+
payload = {
|
| 59 |
+
"inputs": {
|
| 60 |
+
"image": img_base64,
|
| 61 |
+
"text": prompt
|
| 62 |
+
}
|
| 63 |
+
}
|
| 64 |
+
response = requests.post(API_URL, headers=headers, json=payload)
|
| 65 |
+
|
| 66 |
+
if response.status_code == 200:
|
| 67 |
+
return response.json()
|
| 68 |
+
else:
|
| 69 |
+
return {"error": f"API request failed: {response.status_code} - {response.text}"}
|
| 70 |
+
|
| 71 |
+
|
| 72 |
+
def process_frame(frame_data: Dict, config: Dict[str, Any], local_manager=None) -> Dict[str, Any]:
|
| 73 |
+
"""
|
| 74 |
+
Process a single frame using the configured model
|
| 75 |
+
"""
|
| 76 |
+
model_type = config["model_type"]
|
| 77 |
+
selected_model = config["selected_model"]
|
| 78 |
+
prompt = config.get("prompt", "")
|
| 79 |
+
api_token = config.get("api_token")
|
| 80 |
+
|
| 81 |
+
# Process frame based on model type
|
| 82 |
+
if model_type == "Local Models" and local_manager:
|
| 83 |
+
result = process_image_locally(
|
| 84 |
+
frame_data['frame'],
|
| 85 |
+
prompt,
|
| 86 |
+
selected_model,
|
| 87 |
+
local_manager
|
| 88 |
+
)
|
| 89 |
+
else:
|
| 90 |
+
result = query_huggingface_api(
|
| 91 |
+
frame_data['frame'],
|
| 92 |
+
prompt,
|
| 93 |
+
selected_model,
|
| 94 |
+
api_token
|
| 95 |
+
)
|
| 96 |
+
|
| 97 |
+
return result
|
ontology_integration.py
ADDED
|
@@ -0,0 +1,144 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""
|
| 3 |
+
Ontology integration module for scene analysis and risk assessment
|
| 4 |
+
"""
|
| 5 |
+
from ontology_eval import Observation, evaluate, Severity
|
| 6 |
+
from typing import Dict, Any, Optional
|
| 7 |
+
|
| 8 |
+
|
| 9 |
+
def analyze_scene_with_ontology(scene_description: str, use_ontology: bool = True) -> Dict[str, Any]:
|
| 10 |
+
"""
|
| 11 |
+
Analyze scene description using ontology-based evaluation
|
| 12 |
+
Returns classification and explanation
|
| 13 |
+
"""
|
| 14 |
+
if not use_ontology:
|
| 15 |
+
return {
|
| 16 |
+
"severity": "NONE",
|
| 17 |
+
"severity_icon": "✅",
|
| 18 |
+
"score": 0,
|
| 19 |
+
"explanation": "Ontology-based analysis skipped",
|
| 20 |
+
"ontology_used": False,
|
| 21 |
+
"raw_description": scene_description
|
| 22 |
+
}
|
| 23 |
+
|
| 24 |
+
# Extract relevant information from scene description for ontology
|
| 25 |
+
scene_lower = scene_description.lower().strip() if scene_description else ""
|
| 26 |
+
|
| 27 |
+
# Initialize observation based on scene analysis
|
| 28 |
+
obs = _extract_ontology_features(scene_lower)
|
| 29 |
+
|
| 30 |
+
# Evaluate using ontology
|
| 31 |
+
decision = evaluate(obs)
|
| 32 |
+
|
| 33 |
+
# Map severity to icons and colors
|
| 34 |
+
severity_mapping = {
|
| 35 |
+
Severity.NONE: {"icon": "✅", "color": "green"},
|
| 36 |
+
Severity.LOW: {"icon": "🟢", "color": "lightgreen"},
|
| 37 |
+
Severity.MEDIUM: {"icon": "🟠", "color": "orange"},
|
| 38 |
+
Severity.HIGH: {"icon": "⚠️", "color": "red"},
|
| 39 |
+
Severity.CRITICAL: {"icon": "🚨", "color": "darkred"}
|
| 40 |
+
}
|
| 41 |
+
|
| 42 |
+
severity_info = severity_mapping[decision.severity]
|
| 43 |
+
|
| 44 |
+
return {
|
| 45 |
+
"severity": decision.severity.name,
|
| 46 |
+
"severity_icon": severity_info["icon"],
|
| 47 |
+
"severity_color": severity_info["color"],
|
| 48 |
+
"score": decision.score_0_100,
|
| 49 |
+
"labels": [label.value for label in decision.labels],
|
| 50 |
+
"explanations": decision.explanations,
|
| 51 |
+
"fired_rules": decision.fired_rules,
|
| 52 |
+
"ontology_used": True,
|
| 53 |
+
"raw_description": scene_description,
|
| 54 |
+
"observation": obs,
|
| 55 |
+
"decision": decision
|
| 56 |
+
}
|
| 57 |
+
|
| 58 |
+
|
| 59 |
+
def _extract_ontology_features(scene_lower: str) -> Observation:
|
| 60 |
+
"""
|
| 61 |
+
Extract ontology-relevant features from scene description
|
| 62 |
+
"""
|
| 63 |
+
# Initialize observation
|
| 64 |
+
obs = Observation()
|
| 65 |
+
|
| 66 |
+
# Define keyword categories
|
| 67 |
+
person_words = ['person', 'people', 'man', 'woman', 'boy', 'girl', 'human', 'individual', 'someone']
|
| 68 |
+
track_words = ['track', 'tracks', 'rail', 'rails', 'railway', 'railroad']
|
| 69 |
+
platform_words = ['platform', 'station', 'bahnsteig']
|
| 70 |
+
danger_words = ['fallen', 'lying', 'down', 'accident', 'emergency']
|
| 71 |
+
fire_words = ['fire', 'smoke', 'flames', 'burning']
|
| 72 |
+
crowd_words = ['crowd', 'many people', 'group', 'mehrere personen']
|
| 73 |
+
safe_words = ['no people', 'empty', 'clear', 'safe', 'nobody', 'without people']
|
| 74 |
+
|
| 75 |
+
# Count keyword mentions
|
| 76 |
+
person_mentions = sum(1 for word in person_words if word in scene_lower)
|
| 77 |
+
track_mentions = sum(1 for word in track_words if word in scene_lower)
|
| 78 |
+
platform_mentions = sum(1 for word in platform_words if word in scene_lower)
|
| 79 |
+
danger_mentions = sum(1 for word in danger_words if word in scene_lower)
|
| 80 |
+
fire_mentions = sum(1 for word in fire_words if word in scene_lower)
|
| 81 |
+
crowd_mentions = sum(1 for word in crowd_words if word in scene_lower)
|
| 82 |
+
safe_mentions = sum(1 for word in safe_words if word in scene_lower)
|
| 83 |
+
|
| 84 |
+
# Person on track detection (but not if explicitly safe)
|
| 85 |
+
if person_mentions > 0 and track_mentions > 0 and safe_mentions == 0:
|
| 86 |
+
obs.on_track_person = _calculate_person_on_track_confidence(scene_lower, person_mentions, track_mentions)
|
| 87 |
+
|
| 88 |
+
# Fallen person detection
|
| 89 |
+
if person_mentions > 0 and danger_mentions > 0:
|
| 90 |
+
obs.fallen_person = min(0.7, 0.4 + danger_mentions * 0.1)
|
| 91 |
+
|
| 92 |
+
# Fire/smoke detection
|
| 93 |
+
if fire_mentions > 0:
|
| 94 |
+
obs.smoke_or_fire = min(0.8, 0.5 + fire_mentions * 0.15)
|
| 95 |
+
|
| 96 |
+
# Crowd detection
|
| 97 |
+
if crowd_mentions > 0 and (track_mentions > 0 or platform_mentions > 0):
|
| 98 |
+
obs.crowd_on_track = min(0.7, 0.4 + crowd_mentions * 0.1)
|
| 99 |
+
|
| 100 |
+
# Generic object detection (if no person but something mentioned on tracks)
|
| 101 |
+
if track_mentions > 0 and person_mentions == 0 and any(word in scene_lower for word in ['object', 'item', 'thing', 'debris']):
|
| 102 |
+
obs.object_on_track = 0.6
|
| 103 |
+
|
| 104 |
+
return obs
|
| 105 |
+
|
| 106 |
+
|
| 107 |
+
def _calculate_person_on_track_confidence(scene_lower: str, person_mentions: int, track_mentions: int) -> float:
|
| 108 |
+
"""
|
| 109 |
+
Calculate confidence for person on track detection based on specific indicators
|
| 110 |
+
"""
|
| 111 |
+
# Check for specific on-track indicators
|
| 112 |
+
on_track_indicators = ['on track', 'on the track', 'on rails', 'on the rails', 'standing on', 'walking on']
|
| 113 |
+
on_track_specific = sum(1 for phrase in on_track_indicators if phrase in scene_lower)
|
| 114 |
+
|
| 115 |
+
if on_track_specific > 0:
|
| 116 |
+
return min(0.8, 0.6 + on_track_specific * 0.1)
|
| 117 |
+
else:
|
| 118 |
+
# Check for proximity indicators
|
| 119 |
+
near_indicators = ['near', 'close to', 'next to', 'beside', 'by the']
|
| 120 |
+
near_mentions = sum(1 for phrase in near_indicators if phrase in scene_lower)
|
| 121 |
+
|
| 122 |
+
if near_mentions > 0:
|
| 123 |
+
# Person near tracks but not necessarily on them - lower confidence
|
| 124 |
+
return min(0.4, 0.25 + near_mentions * 0.05)
|
| 125 |
+
else:
|
| 126 |
+
# Just mention of person and tracks together - very low confidence
|
| 127 |
+
return min(0.3, 0.2 + (person_mentions + track_mentions) * 0.02)
|
| 128 |
+
|
| 129 |
+
|
| 130 |
+
def extract_scene_description(result: Dict[str, Any]) -> str:
|
| 131 |
+
"""
|
| 132 |
+
Extract scene description from various model result formats
|
| 133 |
+
"""
|
| 134 |
+
scene_description = ""
|
| 135 |
+
|
| 136 |
+
if 'person_on_track_detection' in result:
|
| 137 |
+
# For person detection results, use the analysis text
|
| 138 |
+
scene_description = result['person_on_track_detection'].get('detailed_analysis', {}).get('scene_description', '')
|
| 139 |
+
elif 'generated_text' in result:
|
| 140 |
+
scene_description = result['generated_text']
|
| 141 |
+
elif isinstance(result, list) and len(result) > 0 and 'generated_text' in result[0]:
|
| 142 |
+
scene_description = result[0]['generated_text']
|
| 143 |
+
|
| 144 |
+
return scene_description
|
ui_components.py
ADDED
|
@@ -0,0 +1,320 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""
|
| 3 |
+
UI components for the Streamlit application
|
| 4 |
+
"""
|
| 5 |
+
import streamlit as st
|
| 6 |
+
from typing import Dict, List, Any, Optional
|
| 7 |
+
from local_models import get_local_model_manager
|
| 8 |
+
|
| 9 |
+
|
| 10 |
+
# Available Hugging Face models for remote API
|
| 11 |
+
AVAILABLE_MODELS = {
|
| 12 |
+
"microsoft/kosmos-2-patch14-224": "Kosmos-2",
|
| 13 |
+
"Salesforce/blip-image-captioning-large": "BLIP Image Captioning",
|
| 14 |
+
"microsoft/DialoGPT-medium": "DialoGPT",
|
| 15 |
+
"microsoft/git-large-coco": "GIT Large COCO",
|
| 16 |
+
"nlpconnect/vit-gpt2-image-captioning": "ViT-GPT2"
|
| 17 |
+
}
|
| 18 |
+
|
| 19 |
+
|
| 20 |
+
def render_sidebar_config(settings: Dict, local_models_available: bool, local_manager: Optional[Any]) -> Dict[str, Any]:
|
| 21 |
+
"""
|
| 22 |
+
Render the sidebar configuration panel
|
| 23 |
+
Returns configuration settings
|
| 24 |
+
"""
|
| 25 |
+
with st.sidebar:
|
| 26 |
+
st.header("Configuration")
|
| 27 |
+
|
| 28 |
+
# Model type selection
|
| 29 |
+
available_options = []
|
| 30 |
+
if local_models_available:
|
| 31 |
+
available_options.append("Local Models")
|
| 32 |
+
available_options.append("Remote API")
|
| 33 |
+
|
| 34 |
+
model_type = st.radio(
|
| 35 |
+
"Model Type",
|
| 36 |
+
available_options,
|
| 37 |
+
help="Choose between local AI models or remote Hugging Face API"
|
| 38 |
+
)
|
| 39 |
+
|
| 40 |
+
# Model selection based on type
|
| 41 |
+
if model_type == "Local Models" and local_models_available:
|
| 42 |
+
selected_model, api_token = _render_local_model_config(local_manager)
|
| 43 |
+
else:
|
| 44 |
+
selected_model, api_token = _render_remote_model_config(settings)
|
| 45 |
+
|
| 46 |
+
# Frame extraction rate
|
| 47 |
+
fps = st.slider(
|
| 48 |
+
"Frames per second to extract",
|
| 49 |
+
min_value=0.1,
|
| 50 |
+
max_value=5.0,
|
| 51 |
+
value=1.0,
|
| 52 |
+
step=0.1
|
| 53 |
+
)
|
| 54 |
+
|
| 55 |
+
# Ontology settings
|
| 56 |
+
st.subheader("Ontology Analysis")
|
| 57 |
+
use_ontology = st.checkbox(
|
| 58 |
+
"Enable Ontology Analysis",
|
| 59 |
+
value=True,
|
| 60 |
+
help="Use ontology-based classification (NONE/LOW/MEDIUM/HIGH/CRITICAL)"
|
| 61 |
+
)
|
| 62 |
+
|
| 63 |
+
if not use_ontology:
|
| 64 |
+
st.info("🔄 Ontology analysis disabled - showing raw model output only")
|
| 65 |
+
|
| 66 |
+
return {
|
| 67 |
+
"model_type": model_type,
|
| 68 |
+
"selected_model": selected_model,
|
| 69 |
+
"api_token": api_token,
|
| 70 |
+
"fps": fps,
|
| 71 |
+
"use_ontology": use_ontology
|
| 72 |
+
}
|
| 73 |
+
|
| 74 |
+
|
| 75 |
+
def _render_local_model_config(local_manager) -> tuple:
|
| 76 |
+
"""Render local model configuration"""
|
| 77 |
+
available_local_models = local_manager.get_available_models()
|
| 78 |
+
selected_model = st.selectbox(
|
| 79 |
+
"Select Local Model",
|
| 80 |
+
options=available_local_models,
|
| 81 |
+
help="Choose between CNN (fast) or Transformer (detailed) models"
|
| 82 |
+
)
|
| 83 |
+
|
| 84 |
+
# Show model info
|
| 85 |
+
model_info = local_manager.get_model_info()
|
| 86 |
+
if selected_model in model_info:
|
| 87 |
+
with st.expander("Model Information"):
|
| 88 |
+
st.write(f"**Description:** {model_info[selected_model]['description']}")
|
| 89 |
+
st.write(f"**Strengths:** {model_info[selected_model]['strengths']}")
|
| 90 |
+
st.write(f"**Size:** {model_info[selected_model]['size']}")
|
| 91 |
+
|
| 92 |
+
return selected_model, None # No API token needed for local models
|
| 93 |
+
|
| 94 |
+
|
| 95 |
+
def _render_remote_model_config(settings: Dict) -> tuple:
|
| 96 |
+
"""Render remote API model configuration"""
|
| 97 |
+
default_token = settings.get('hugging_face_api_token', '')
|
| 98 |
+
api_token = st.text_input(
|
| 99 |
+
"Hugging Face API Token",
|
| 100 |
+
value=default_token,
|
| 101 |
+
type="password",
|
| 102 |
+
help="Get your token from https://huggingface.co/settings/tokens or save in settings.json"
|
| 103 |
+
)
|
| 104 |
+
|
| 105 |
+
selected_model = st.selectbox(
|
| 106 |
+
"Select Model",
|
| 107 |
+
options=list(AVAILABLE_MODELS.keys()),
|
| 108 |
+
format_func=lambda x: AVAILABLE_MODELS[x]
|
| 109 |
+
)
|
| 110 |
+
|
| 111 |
+
return selected_model, api_token
|
| 112 |
+
|
| 113 |
+
|
| 114 |
+
def render_input_section() -> Dict[str, Any]:
|
| 115 |
+
"""
|
| 116 |
+
Render the input section for video upload and prompts
|
| 117 |
+
Returns input data
|
| 118 |
+
"""
|
| 119 |
+
st.header("Input")
|
| 120 |
+
|
| 121 |
+
# Video upload
|
| 122 |
+
video_file = st.file_uploader(
|
| 123 |
+
"Upload Video",
|
| 124 |
+
type=['mp4', 'avi', 'mov', 'mkv'],
|
| 125 |
+
help="Upload a video file to analyze"
|
| 126 |
+
)
|
| 127 |
+
|
| 128 |
+
return {
|
| 129 |
+
"video_file": video_file
|
| 130 |
+
}
|
| 131 |
+
|
| 132 |
+
|
| 133 |
+
def render_prompt_section(config: Dict[str, Any]) -> str:
|
| 134 |
+
"""
|
| 135 |
+
Render prompt input section based on model configuration
|
| 136 |
+
"""
|
| 137 |
+
model_type = config["model_type"]
|
| 138 |
+
selected_model = config["selected_model"]
|
| 139 |
+
|
| 140 |
+
# Prompt input (conditional based on model)
|
| 141 |
+
if (model_type == "Local Models" and
|
| 142 |
+
selected_model == "Person on Track Detector"):
|
| 143 |
+
# Person on Track Detector works automatically
|
| 144 |
+
st.info("🤖 Person on Track Detector works automatically - no prompt needed!")
|
| 145 |
+
return "automatic"
|
| 146 |
+
else:
|
| 147 |
+
# Regular models need user prompt
|
| 148 |
+
return st.text_area(
|
| 149 |
+
"Analysis Prompt",
|
| 150 |
+
placeholder="Describe what you see in the image...",
|
| 151 |
+
help="Enter the prompt to analyze each frame"
|
| 152 |
+
)
|
| 153 |
+
|
| 154 |
+
|
| 155 |
+
def render_process_button() -> bool:
|
| 156 |
+
"""Render the process button"""
|
| 157 |
+
return st.button("Process Video", type="primary")
|
| 158 |
+
|
| 159 |
+
|
| 160 |
+
def render_results_header():
|
| 161 |
+
"""Render the results section header"""
|
| 162 |
+
st.header("Results")
|
| 163 |
+
return st.container()
|
| 164 |
+
|
| 165 |
+
|
| 166 |
+
def render_frame_result(result_data: Dict[str, Any]):
|
| 167 |
+
"""
|
| 168 |
+
Render a single frame result with ontology analysis
|
| 169 |
+
"""
|
| 170 |
+
ontology = result_data['ontology_analysis']
|
| 171 |
+
severity_icon = ontology.get('severity_icon', '✅')
|
| 172 |
+
severity = ontology.get('severity', 'NONE')
|
| 173 |
+
|
| 174 |
+
# Create expander title with severity indicator
|
| 175 |
+
expander_title = f"{severity_icon} {severity} - Frame {result_data['frame_number']} (t={result_data['timestamp']:.1f}s)"
|
| 176 |
+
|
| 177 |
+
with st.expander(expander_title):
|
| 178 |
+
col_img, col_text = st.columns([1, 2])
|
| 179 |
+
|
| 180 |
+
with col_img:
|
| 181 |
+
st.image(
|
| 182 |
+
result_data['image'],
|
| 183 |
+
caption=f"Frame {result_data['frame_number']}",
|
| 184 |
+
use_container_width=True
|
| 185 |
+
)
|
| 186 |
+
|
| 187 |
+
with col_text:
|
| 188 |
+
# Display ontology analysis first if enabled
|
| 189 |
+
if ontology.get('ontology_used', False):
|
| 190 |
+
_render_ontology_analysis(ontology)
|
| 191 |
+
st.divider()
|
| 192 |
+
|
| 193 |
+
# Display original model results
|
| 194 |
+
_render_model_output(result_data['result'])
|
| 195 |
+
|
| 196 |
+
|
| 197 |
+
def _render_ontology_analysis(ontology: Dict[str, Any]):
|
| 198 |
+
"""Render ontology analysis section"""
|
| 199 |
+
severity = ontology.get('severity', 'NONE')
|
| 200 |
+
severity_icon = ontology.get('severity_icon', '✅')
|
| 201 |
+
severity_color = ontology.get('severity_color', 'green')
|
| 202 |
+
|
| 203 |
+
# Severity display with color
|
| 204 |
+
st.markdown(f"**Safety Assessment:** :{severity_color}[{severity_icon} {severity}]")
|
| 205 |
+
|
| 206 |
+
# Score display
|
| 207 |
+
if ontology.get('score', 0) > 0:
|
| 208 |
+
st.metric("Risk Score", f"{ontology['score']}/100")
|
| 209 |
+
|
| 210 |
+
# Show explanations if available
|
| 211 |
+
if ontology.get('explanations'):
|
| 212 |
+
st.write("**Ontology Analysis:**")
|
| 213 |
+
for explanation in ontology['explanations']:
|
| 214 |
+
st.write(f"• {explanation}")
|
| 215 |
+
|
| 216 |
+
# Show fired rules if available
|
| 217 |
+
if ontology.get('fired_rules'):
|
| 218 |
+
with st.expander("Technical Details"):
|
| 219 |
+
st.write("**Triggered Rules:**")
|
| 220 |
+
for rule in ontology['fired_rules']:
|
| 221 |
+
st.code(rule)
|
| 222 |
+
|
| 223 |
+
if ontology.get('labels'):
|
| 224 |
+
st.write("**Detected Hazard Labels:**")
|
| 225 |
+
for label in ontology['labels']:
|
| 226 |
+
st.code(label)
|
| 227 |
+
|
| 228 |
+
|
| 229 |
+
def _render_model_output(result: Dict[str, Any]):
|
| 230 |
+
"""Render original model output section"""
|
| 231 |
+
st.write("**Model Output:**")
|
| 232 |
+
|
| 233 |
+
if 'error' in result:
|
| 234 |
+
st.error(f"Error: {result['error']}")
|
| 235 |
+
elif 'person_on_track_detection' in result:
|
| 236 |
+
_render_person_detection_result(result['person_on_track_detection'])
|
| 237 |
+
else:
|
| 238 |
+
_render_general_model_result(result)
|
| 239 |
+
|
| 240 |
+
|
| 241 |
+
def _render_person_detection_result(detection: Dict[str, Any]):
|
| 242 |
+
"""Render person on track detection specific results"""
|
| 243 |
+
people_count = detection.get('people_count', 0)
|
| 244 |
+
confidence = detection.get('confidence', 0)
|
| 245 |
+
analysis = detection.get('analysis', 'No analysis')
|
| 246 |
+
|
| 247 |
+
st.write(f"**Detection Analysis:** {analysis}")
|
| 248 |
+
|
| 249 |
+
# Show metrics
|
| 250 |
+
col1, col2 = st.columns(2)
|
| 251 |
+
with col1:
|
| 252 |
+
st.metric("👥 People Detected", people_count)
|
| 253 |
+
with col2:
|
| 254 |
+
st.metric("📊 Model Confidence", f"{confidence:.0%}")
|
| 255 |
+
|
| 256 |
+
|
| 257 |
+
def _render_general_model_result(result: Dict[str, Any]):
|
| 258 |
+
"""Render general model results (captioning, etc.)"""
|
| 259 |
+
if 'generated_text' in result:
|
| 260 |
+
st.write(f"*{result['generated_text']}*")
|
| 261 |
+
elif isinstance(result, list) and len(result) > 0:
|
| 262 |
+
if 'generated_text' in result[0]:
|
| 263 |
+
st.write(f"*{result[0]['generated_text']}*")
|
| 264 |
+
else:
|
| 265 |
+
st.json(result[0])
|
| 266 |
+
else:
|
| 267 |
+
st.json(result)
|
| 268 |
+
|
| 269 |
+
|
| 270 |
+
def render_validation_errors(video_file, prompt, api_token, model_type, local_models_available, selected_model):
|
| 271 |
+
"""
|
| 272 |
+
Render validation error messages
|
| 273 |
+
"""
|
| 274 |
+
if not video_file:
|
| 275 |
+
st.error("Please upload a video file")
|
| 276 |
+
if not prompt and not (model_type == "Local Models" and selected_model == "Person on Track Detector"):
|
| 277 |
+
st.error("Please enter an analysis prompt")
|
| 278 |
+
if not api_token and model_type == "Remote API":
|
| 279 |
+
st.error("Please provide your Hugging Face API token for remote models")
|
| 280 |
+
if model_type == "Local Models" and not local_models_available:
|
| 281 |
+
st.error("Local models failed to initialize. Check your installation.")
|
| 282 |
+
|
| 283 |
+
|
| 284 |
+
def render_instructions():
|
| 285 |
+
"""Render the instructions section"""
|
| 286 |
+
with st.expander("How to use"):
|
| 287 |
+
st.markdown("""
|
| 288 |
+
## Local AI Models (Recommended)
|
| 289 |
+
1. **Upload a video**: Choose a video file (MP4, AVI, MOV, or MKV)
|
| 290 |
+
2. **Select model type**: Choose "Local Models" for offline processing
|
| 291 |
+
3. **Choose AI model**:
|
| 292 |
+
- **CNN (BLIP)**: Fast, good for object detection (~1.2GB)
|
| 293 |
+
- **Transformer (ViT-GPT2)**: Detailed descriptions (~1.8GB)
|
| 294 |
+
4. **Enter a prompt**: Describe what you want the AI to analyze
|
| 295 |
+
5. **Enable/Disable Ontology**: Toggle ontology-based risk assessment
|
| 296 |
+
6. **Adjust frame rate**: Set frames per second to extract (default: 1 fps)
|
| 297 |
+
7. **Click Process**: Frames are processed locally on your machine
|
| 298 |
+
|
| 299 |
+
## Ontology Analysis
|
| 300 |
+
- **✅ NONE**: No safety concerns detected
|
| 301 |
+
- **🟢 LOW**: Minor safety considerations
|
| 302 |
+
- **🟠 MEDIUM**: Moderate safety risk
|
| 303 |
+
- **⚠️ HIGH**: Significant safety risk
|
| 304 |
+
- **🚨 CRITICAL**: Immediate safety hazard
|
| 305 |
+
|
| 306 |
+
## Remote API Models (Optional)
|
| 307 |
+
1. **Get API token**: Visit [Hugging Face Settings](https://huggingface.co/settings/tokens)
|
| 308 |
+
2. **Select "Remote API"** in model type
|
| 309 |
+
3. **Enter token** and select remote model
|
| 310 |
+
|
| 311 |
+
## Video Support Features
|
| 312 |
+
- **Automatic corruption repair**: Handles videos with corrupted moov atoms
|
| 313 |
+
- **FFmpeg integration**: Auto-repairs problematic video files
|
| 314 |
+
- **Multiple formats**: MP4, AVI, MOV, MKV support
|
| 315 |
+
|
| 316 |
+
## Requirements
|
| 317 |
+
- **Python packages**: torch, transformers, accelerate (see requirements.txt)
|
| 318 |
+
- **Optional**: FFmpeg for video repair (download from https://ffmpeg.org)
|
| 319 |
+
- **Storage**: ~3GB for both local models
|
| 320 |
+
""")
|
video_processing.py
ADDED
|
@@ -0,0 +1,112 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""
|
| 3 |
+
Video processing utilities for frame extraction and repair
|
| 4 |
+
"""
|
| 5 |
+
import cv2
|
| 6 |
+
import os
|
| 7 |
+
import tempfile
|
| 8 |
+
import subprocess
|
| 9 |
+
import streamlit as st
|
| 10 |
+
from PIL import Image
|
| 11 |
+
from typing import List, Dict
|
| 12 |
+
|
| 13 |
+
|
| 14 |
+
def repair_video_with_ffmpeg(input_path: str, output_path: str) -> bool:
|
| 15 |
+
"""
|
| 16 |
+
Repair corrupted video by moving moov atom to the beginning
|
| 17 |
+
"""
|
| 18 |
+
try:
|
| 19 |
+
# Try to fix the video using FFmpeg
|
| 20 |
+
cmd = [
|
| 21 |
+
'ffmpeg',
|
| 22 |
+
'-i', input_path,
|
| 23 |
+
'-c', 'copy',
|
| 24 |
+
'-movflags', 'faststart',
|
| 25 |
+
'-avoid_negative_ts', 'make_zero',
|
| 26 |
+
'-y', # Overwrite output file
|
| 27 |
+
output_path
|
| 28 |
+
]
|
| 29 |
+
|
| 30 |
+
result = subprocess.run(
|
| 31 |
+
cmd,
|
| 32 |
+
capture_output=True,
|
| 33 |
+
text=True,
|
| 34 |
+
timeout=300 # 5 minute timeout
|
| 35 |
+
)
|
| 36 |
+
|
| 37 |
+
return result.returncode == 0
|
| 38 |
+
except (subprocess.TimeoutExpired, FileNotFoundError):
|
| 39 |
+
return False
|
| 40 |
+
|
| 41 |
+
|
| 42 |
+
def extract_frames_from_video(video_file, fps: float = 1) -> List[Dict]:
|
| 43 |
+
"""
|
| 44 |
+
Extract frames from video at specified FPS (default 1 frame per second)
|
| 45 |
+
Automatically handles corrupted videos by attempting repair with FFmpeg
|
| 46 |
+
"""
|
| 47 |
+
frames = []
|
| 48 |
+
|
| 49 |
+
with tempfile.NamedTemporaryFile(delete=False, suffix='.mp4') as tmp_file:
|
| 50 |
+
tmp_file.write(video_file.read())
|
| 51 |
+
tmp_file_path = tmp_file.name
|
| 52 |
+
|
| 53 |
+
repaired_path = None
|
| 54 |
+
|
| 55 |
+
try:
|
| 56 |
+
# First attempt: try to open video directly
|
| 57 |
+
cap = cv2.VideoCapture(tmp_file_path)
|
| 58 |
+
|
| 59 |
+
# Check if video opened successfully and has frames
|
| 60 |
+
if not cap.isOpened() or cap.get(cv2.CAP_PROP_FRAME_COUNT) == 0:
|
| 61 |
+
cap.release()
|
| 62 |
+
|
| 63 |
+
# Second attempt: try to repair the video with FFmpeg
|
| 64 |
+
st.warning("Video appears corrupted (moov atom issue). Attempting repair...")
|
| 65 |
+
|
| 66 |
+
with tempfile.NamedTemporaryFile(delete=False, suffix='_repaired.mp4') as repaired_file:
|
| 67 |
+
repaired_path = repaired_file.name
|
| 68 |
+
|
| 69 |
+
if repair_video_with_ffmpeg(tmp_file_path, repaired_path):
|
| 70 |
+
st.success("Video repair successful! Processing frames...")
|
| 71 |
+
cap = cv2.VideoCapture(repaired_path)
|
| 72 |
+
else:
|
| 73 |
+
st.error("Failed to repair video. FFmpeg may not be installed or video is severely corrupted.")
|
| 74 |
+
return frames
|
| 75 |
+
|
| 76 |
+
# Extract video properties
|
| 77 |
+
video_fps = cap.get(cv2.CAP_PROP_FPS)
|
| 78 |
+
if video_fps <= 0:
|
| 79 |
+
video_fps = 30 # Default fallback FPS
|
| 80 |
+
|
| 81 |
+
frame_interval = int(video_fps / fps) if video_fps > fps else 1
|
| 82 |
+
|
| 83 |
+
frame_count = 0
|
| 84 |
+
extracted_count = 0
|
| 85 |
+
|
| 86 |
+
while True:
|
| 87 |
+
ret, frame = cap.read()
|
| 88 |
+
if not ret:
|
| 89 |
+
break
|
| 90 |
+
|
| 91 |
+
if frame_count % frame_interval == 0:
|
| 92 |
+
frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
|
| 93 |
+
pil_image = Image.fromarray(frame_rgb)
|
| 94 |
+
frames.append({
|
| 95 |
+
'frame': pil_image,
|
| 96 |
+
'timestamp': frame_count / video_fps,
|
| 97 |
+
'frame_number': extracted_count
|
| 98 |
+
})
|
| 99 |
+
extracted_count += 1
|
| 100 |
+
|
| 101 |
+
frame_count += 1
|
| 102 |
+
|
| 103 |
+
cap.release()
|
| 104 |
+
|
| 105 |
+
finally:
|
| 106 |
+
# Clean up temporary files
|
| 107 |
+
if os.path.exists(tmp_file_path):
|
| 108 |
+
os.unlink(tmp_file_path)
|
| 109 |
+
if repaired_path and os.path.exists(repaired_path):
|
| 110 |
+
os.unlink(repaired_path)
|
| 111 |
+
|
| 112 |
+
return frames
|