dxfoso commited on
Commit
d33203e
·
1 Parent(s): 2aab908

refactor & connect ontology

Browse files
REFACTORED_STRUCTURE.md ADDED
@@ -0,0 +1,110 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 📁 Refactored Code Structure
2
+
3
+ The application has been refactored into modular components for better maintainability and understanding.
4
+
5
+ ## 🗂️ File Structure
6
+
7
+ ```
8
+ 📦 Bahngleiserfassung/
9
+ ├── 🎯 app.py # Main Streamlit application (refactored)
10
+ ├── 📹 video_processing.py # Video frame extraction and repair utilities
11
+ ├── 🧠 ontology_integration.py # Ontology-based scene analysis and risk assessment
12
+ ├── 🤖 model_processing.py # Local and remote AI model processing
13
+ ├── 🖥️ ui_components.py # Streamlit UI components and rendering
14
+ ├── 🧮 ontology_eval.py # Core ontology evaluation logic (unchanged)
15
+ ├── 🔬 local_models.py # Local AI models (ViT, BLIP) (unchanged)
16
+ └── 💾 app_original_backup.py # Backup of original monolithic app.py
17
+ ```
18
+
19
+ ## 📋 Module Responsibilities
20
+
21
+ ### 🎯 `app.py` - Main Application
22
+ - **Purpose**: Main entry point and orchestration
23
+ - **Functions**:
24
+ - Application initialization and layout
25
+ - Model setup and configuration
26
+ - Main processing workflow coordination
27
+ - Input validation and error handling
28
+
29
+ ### 📹 `video_processing.py` - Video Processing
30
+ - **Purpose**: Video frame extraction and repair
31
+ - **Functions**:
32
+ - `extract_frames_from_video()` - Extract frames at specified FPS
33
+ - `repair_video_with_ffmpeg()` - Repair corrupted video files
34
+ - Handle various video formats (MP4, AVI, MOV, MKV)
35
+
36
+ ### 🧠 `ontology_integration.py` - Ontology Analysis
37
+ - **Purpose**: Scene analysis using ontology-based risk assessment
38
+ - **Functions**:
39
+ - `analyze_scene_with_ontology()` - Main ontology analysis function
40
+ - `_extract_ontology_features()` - Extract features from scene descriptions
41
+ - `_calculate_person_on_track_confidence()` - Calculate specific risk confidence
42
+ - `extract_scene_description()` - Extract text from model results
43
+
44
+ ### 🤖 `model_processing.py` - Model Processing
45
+ - **Purpose**: Handle local and remote AI model processing
46
+ - **Functions**:
47
+ - `process_image_locally()` - Process images using local models
48
+ - `query_huggingface_api()` - Process images using remote HF API
49
+ - `process_frame()` - Unified frame processing interface
50
+ - `image_to_base64()` - Image conversion utilities
51
+
52
+ ### 🖥️ `ui_components.py` - UI Components
53
+ - **Purpose**: Streamlit UI components and rendering
54
+ - **Functions**:
55
+ - `render_sidebar_config()` - Configuration sidebar
56
+ - `render_input_section()` - Video upload interface
57
+ - `render_frame_result()` - Display frame analysis results
58
+ - `render_validation_errors()` - Show validation messages
59
+ - Various helper rendering functions
60
+
61
+ ## 🔄 Data Flow
62
+
63
+ ```mermaid
64
+ graph TD
65
+ A[app.py] --> B[ui_components.py]
66
+ A --> C[video_processing.py]
67
+ A --> D[model_processing.py]
68
+ A --> E[ontology_integration.py]
69
+
70
+ C --> F[Extract Frames]
71
+ D --> G[Process with AI Models]
72
+ E --> H[Ontology Risk Assessment]
73
+
74
+ F --> G
75
+ G --> H
76
+ H --> B
77
+
78
+ I[local_models.py] --> D
79
+ J[ontology_eval.py] --> E
80
+ ```
81
+
82
+ ## ✨ Benefits of Refactoring
83
+
84
+ 1. **🧩 Modularity**: Each module has a single responsibility
85
+ 2. **🔧 Maintainability**: Easier to update and debug individual components
86
+ 3. **📚 Readability**: Clear separation of concerns and smaller, focused files
87
+ 4. **🧪 Testability**: Each module can be tested independently
88
+ 5. **🔄 Reusability**: Components can be reused in other projects
89
+ 6. **👥 Collaboration**: Multiple developers can work on different modules
90
+
91
+ ## 🚀 Usage
92
+
93
+ The refactored application works exactly the same as before:
94
+
95
+ ```bash
96
+ streamlit run app.py
97
+ ```
98
+
99
+ All functionality remains identical:
100
+ - ✅ NONE / 🟢 LOW / 🟠 MEDIUM / ⚠️ HIGH / 🚨 CRITICAL classification
101
+ - Toggle ontology analysis on/off
102
+ - Support for local and remote AI models
103
+ - Video processing with automatic repair
104
+
105
+ ## 🔒 Backwards Compatibility
106
+
107
+ - Original functionality is preserved
108
+ - API and interface remain unchanged
109
+ - Configuration and settings work the same way
110
+ - The original monolithic code is backed up as `app_original_backup.py`
app.py CHANGED
@@ -1,15 +1,27 @@
 
 
 
 
 
1
  import streamlit as st
2
- import cv2
3
- import os
4
- import tempfile
5
- import requests
6
- import base64
7
- import subprocess
8
  import json
9
- from io import BytesIO
10
- from PIL import Image
11
- import numpy as np
12
  from dotenv import load_dotenv
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  # Try to import local models, fall back gracefully if not available
14
  try:
15
  from local_models import get_local_model_manager
@@ -23,6 +35,7 @@ except ImportError as e:
23
  # Load environment variables
24
  load_dotenv()
25
 
 
26
  def load_settings():
27
  """Load settings from JSON file"""
28
  try:
@@ -31,199 +44,31 @@ def load_settings():
31
  except FileNotFoundError:
32
  return {}
33
 
34
- # Local models configuration
35
- LOCAL_MODELS_ENABLED = LOCAL_MODELS_AVAILABLE
36
- REMOTE_MODELS_ENABLED = True # Always allow remote API as fallback
37
 
38
- # Initialize local model manager
39
  @st.cache_resource
40
  def initialize_local_models():
41
  """Initialize local model manager"""
42
  return get_local_model_manager()
43
 
44
- # Hugging Face models for vision-language tasks (kept for compatibility)
45
- AVAILABLE_MODELS = {
46
- "microsoft/kosmos-2-patch14-224": "Kosmos-2",
47
- "Salesforce/blip-image-captioning-large": "BLIP Image Captioning",
48
- "microsoft/DialoGPT-medium": "DialoGPT",
49
- "microsoft/git-large-coco": "GIT Large COCO",
50
- "nlpconnect/vit-gpt2-image-captioning": "ViT-GPT2"
51
- }
52
-
53
- def repair_video_with_ffmpeg(input_path, output_path):
54
- """
55
- Repair corrupted video by moving moov atom to the beginning
56
- """
57
- try:
58
- # Try to fix the video using FFmpeg
59
- cmd = [
60
- 'ffmpeg',
61
- '-i', input_path,
62
- '-c', 'copy',
63
- '-movflags', 'faststart',
64
- '-avoid_negative_ts', 'make_zero',
65
- '-y', # Overwrite output file
66
- output_path
67
- ]
68
-
69
- result = subprocess.run(
70
- cmd,
71
- capture_output=True,
72
- text=True,
73
- timeout=300 # 5 minute timeout
74
- )
75
-
76
- return result.returncode == 0
77
- except (subprocess.TimeoutExpired, FileNotFoundError):
78
- return False
79
-
80
- def extract_frames_from_video(video_file, fps=1):
81
- """
82
- Extract frames from video at specified FPS (default 1 frame per second)
83
- Automatically handles corrupted videos by attempting repair with FFmpeg
84
- """
85
- frames = []
86
-
87
- with tempfile.NamedTemporaryFile(delete=False, suffix='.mp4') as tmp_file:
88
- tmp_file.write(video_file.read())
89
- tmp_file_path = tmp_file.name
90
-
91
- repaired_path = None
92
-
93
- try:
94
- # First attempt: try to open video directly
95
- cap = cv2.VideoCapture(tmp_file_path)
96
-
97
- # Check if video opened successfully and has frames
98
- if not cap.isOpened() or cap.get(cv2.CAP_PROP_FRAME_COUNT) == 0:
99
- cap.release()
100
-
101
- # Second attempt: try to repair the video with FFmpeg
102
- st.warning("Video appears corrupted (moov atom issue). Attempting repair...")
103
-
104
- with tempfile.NamedTemporaryFile(delete=False, suffix='_repaired.mp4') as repaired_file:
105
- repaired_path = repaired_file.name
106
-
107
- if repair_video_with_ffmpeg(tmp_file_path, repaired_path):
108
- st.success("Video repair successful! Processing frames...")
109
- cap = cv2.VideoCapture(repaired_path)
110
- else:
111
- st.error("Failed to repair video. FFmpeg may not be installed or video is severely corrupted.")
112
- return frames
113
-
114
- # Extract video properties
115
- video_fps = cap.get(cv2.CAP_PROP_FPS)
116
- if video_fps <= 0:
117
- video_fps = 30 # Default fallback FPS
118
-
119
- frame_interval = int(video_fps / fps) if video_fps > fps else 1
120
-
121
- frame_count = 0
122
- extracted_count = 0
123
-
124
- while True:
125
- ret, frame = cap.read()
126
- if not ret:
127
- break
128
-
129
- if frame_count % frame_interval == 0:
130
- frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
131
- pil_image = Image.fromarray(frame_rgb)
132
- frames.append({
133
- 'frame': pil_image,
134
- 'timestamp': frame_count / video_fps,
135
- 'frame_number': extracted_count
136
- })
137
- extracted_count += 1
138
-
139
- frame_count += 1
140
-
141
- cap.release()
142
-
143
- finally:
144
- # Clean up temporary files
145
- if os.path.exists(tmp_file_path):
146
- os.unlink(tmp_file_path)
147
- if repaired_path and os.path.exists(repaired_path):
148
- os.unlink(repaired_path)
149
-
150
- return frames
151
-
152
- def image_to_base64(image):
153
- """Convert PIL image to base64 string"""
154
- buffer = BytesIO()
155
- image.save(buffer, format="PNG")
156
- img_str = base64.b64encode(buffer.getvalue()).decode()
157
- return img_str
158
-
159
- def process_image_locally(image, prompt, model_name, local_manager):
160
- """
161
- Process image using local models
162
- """
163
- try:
164
- if model_name == "Person on Track Detector":
165
- # Special handling for person-on-track detection
166
- result = local_manager.person_on_track_detector.detect_person_on_track(image)
167
- return {"person_on_track_detection": result}
168
- else:
169
- caption = local_manager.generate_caption(model_name, image, prompt)
170
- return {"generated_text": caption}
171
- except Exception as e:
172
- return {"error": f"Local processing failed: {str(e)}"}
173
-
174
- def query_huggingface_api(image, prompt, model_name, api_token):
175
- """
176
- Query Hugging Face API with image and prompt
177
- """
178
- API_URL = f"https://api-inference.huggingface.co/models/{model_name}"
179
- headers = {"Authorization": f"Bearer {api_token}"}
180
-
181
- # Convert image to base64
182
- img_base64 = image_to_base64(image)
183
-
184
- # Prepare payload based on model type
185
- if "blip" in model_name.lower():
186
- # For BLIP models, send image directly
187
- buffer = BytesIO()
188
- image.save(buffer, format="PNG")
189
- response = requests.post(
190
- API_URL,
191
- headers=headers,
192
- files={"file": buffer.getvalue()}
193
- )
194
- else:
195
- # For other vision-language models
196
- payload = {
197
- "inputs": {
198
- "image": img_base64,
199
- "text": prompt
200
- }
201
- }
202
- response = requests.post(API_URL, headers=headers, json=payload)
203
-
204
- if response.status_code == 200:
205
- return response.json()
206
- else:
207
- return {"error": f"API request failed: {response.status_code} - {response.text}"}
208
 
209
- def main():
 
210
  st.set_page_config(
211
- page_title="Video Frame Analyzer",
212
  page_icon="🎥",
213
  layout="wide"
214
  )
215
 
216
- st.title("🎥 Video Frame Analyzer with Local AI Models")
217
- st.markdown("Upload a video, provide a prompt, and analyze each frame using local AI models (CNN or Transformer)")
218
-
219
- # Load settings and initialize local models
220
- settings = load_settings()
221
-
222
- # Initialize local models if enabled
223
  local_manager = None
224
  local_models_available = False
225
 
226
- if LOCAL_MODELS_ENABLED:
227
  try:
228
  local_manager = initialize_local_models()
229
  local_models_available = True
@@ -235,245 +80,157 @@ def main():
235
  else:
236
  st.info("💡 Local AI models not installed. Install with: `pip install torch torchvision transformers accelerate sentencepiece`")
237
 
238
- # Sidebar for configuration
239
- with st.sidebar:
240
- st.header("Configuration")
241
-
242
- # Model type selection
243
- available_options = []
244
- if local_models_available:
245
- available_options.append("Local Models")
246
- if REMOTE_MODELS_ENABLED:
247
- available_options.append("Remote API")
248
-
249
- if not available_options:
250
- available_options = ["Remote API"] # Fallback
251
-
252
- model_type = st.radio(
253
- "Model Type",
254
- available_options,
255
- help="Choose between local AI models or remote Hugging Face API"
256
- )
257
-
258
- if model_type == "Local Models" and local_models_available:
259
- # Local model selection
260
- available_local_models = local_manager.get_available_models()
261
- selected_model = st.selectbox(
262
- "Select Local Model",
263
- options=available_local_models,
264
- help="Choose between CNN (fast) or Transformer (detailed) models"
265
- )
266
 
267
- # Show model info
268
- model_info = local_manager.get_model_info()
269
- if selected_model in model_info:
270
- with st.expander("Model Information"):
271
- st.write(f"**Description:** {model_info[selected_model]['description']}")
272
- st.write(f"**Strengths:** {model_info[selected_model]['strengths']}")
273
- st.write(f"**Size:** {model_info[selected_model]['size']}")
274
 
275
- api_token = None # Not needed for local models
 
276
 
277
- else:
278
- # Remote API configuration
279
- default_token = settings.get('hugging_face_api_token', '')
280
- api_token = st.text_input(
281
- "Hugging Face API Token",
282
- value=default_token,
283
- type="password",
284
- help="Get your token from https://huggingface.co/settings/tokens or save in settings.json"
285
- )
286
 
287
- # Remote model selection
288
- selected_model = st.selectbox(
289
- "Select Model",
290
- options=list(AVAILABLE_MODELS.keys()),
291
- format_func=lambda x: AVAILABLE_MODELS[x]
292
- )
293
-
294
- # Frame extraction rate
295
- fps = st.slider(
296
- "Frames per second to extract",
297
- min_value=0.1,
298
- max_value=5.0,
299
- value=1.0,
300
- step=0.1
301
- )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
302
 
303
- # Main content area
 
 
 
 
304
  col1, col2 = st.columns([1, 1])
305
 
306
  with col1:
307
- st.header("Input")
 
308
 
309
- # Video upload
310
- video_file = st.file_uploader(
311
- "Upload Video",
312
- type=['mp4', 'avi', 'mov', 'mkv'],
313
- help="Upload a video file to analyze"
314
- )
315
 
316
- # Prompt input (conditional based on model)
317
- if model_type == "Local Models" and local_models_available and selected_model == "Person on Track Detector":
318
- # Person on Track Detector works automatically
319
- st.info("🤖 Person on Track Detector works automatically - no prompt needed!")
320
- prompt = "automatic" # Set automatic prompt
321
- else:
322
- # Regular models need user prompt
323
- prompt = st.text_area(
324
- "Analysis Prompt",
325
- placeholder="Describe what you see in the image...",
326
- help="Enter the prompt to analyze each frame"
327
- )
328
 
329
- # Process button
330
- process_button = st.button("Process Video", type="primary")
331
 
332
  with col2:
333
- st.header("Results")
334
- results_container = st.container()
335
-
336
- # Processing logic
337
- if process_button and video_file and (prompt or (model_type == "Local Models" and selected_model == "Person on Track Detector")) and (api_token or model_type == "Local Models"):
338
- with st.spinner("Processing video..."):
339
- # Extract frames
340
- frames = extract_frames_from_video(video_file, fps)
341
-
342
- if not frames:
343
- st.error("No frames could be extracted from the video")
344
- return
345
-
346
- st.success(f"Extracted {len(frames)} frames from video")
347
-
348
- # Process each frame
349
- results = []
350
- progress_bar = st.progress(0)
351
-
352
- for i, frame_data in enumerate(frames):
353
- with st.spinner(f"Analyzing frame {i+1}/{len(frames)}..."):
354
- # Process frame based on model type
355
- if model_type == "Local Models" and local_models_available:
356
- result = process_image_locally(
357
- frame_data['frame'],
358
- prompt,
359
- selected_model,
360
- local_manager
361
- )
362
- else:
363
- result = query_huggingface_api(
364
- frame_data['frame'],
365
- prompt,
366
- selected_model,
367
- api_token
368
- )
369
-
370
- results.append({
371
- 'frame_number': frame_data['frame_number'],
372
- 'timestamp': frame_data['timestamp'],
373
- 'image': frame_data['frame'],
374
- 'result': result
375
- })
376
-
377
- progress_bar.progress((i + 1) / len(frames))
378
-
379
- # Display results
380
- with results_container:
381
- st.subheader("Analysis Results")
382
 
383
- for result_data in results:
384
- with st.expander(f"Frame {result_data['frame_number']} (t={result_data['timestamp']:.1f}s)"):
385
- col_img, col_text = st.columns([1, 2])
 
386
 
387
- with col_img:
388
- st.image(
389
- result_data['image'],
390
- caption=f"Frame {result_data['frame_number']}",
391
- use_container_width=True
392
- )
393
 
394
- with col_text:
395
- if 'error' in result_data['result']:
396
- st.error(f"Error: {result_data['result']['error']}")
397
- elif 'person_on_track_detection' in result_data['result']:
398
- # Handle person-on-track detection results
399
- detection = result_data['result']['person_on_track_detection']
400
-
401
- people_count = detection.get('people_count', 0)
402
- confidence = detection.get('confidence', 0)
403
- analysis = detection.get('analysis', 'No analysis')
404
- person_on_track = detection.get('person_on_track', False)
405
-
406
- # Display analysis with color coding
407
- if person_on_track:
408
- st.error(f"🚨 **{analysis}**")
409
- else:
410
- st.success(f"✅ **{analysis}**")
411
-
412
- # Show metrics
413
- col1, col2 = st.columns(2)
414
- with col1:
415
- st.metric("👥 People on Track", people_count)
416
- with col2:
417
- st.metric("📊 Confidence", f"{confidence:.0%}")
418
- else:
419
- st.write("**Analysis Result:**")
420
- if 'generated_text' in result_data['result']:
421
- # Handle direct generated_text response (local models)
422
- st.write(result_data['result']['generated_text'])
423
- elif isinstance(result_data['result'], list) and len(result_data['result']) > 0:
424
- # Handle list responses (common for captioning models)
425
- if 'generated_text' in result_data['result'][0]:
426
- st.write(result_data['result'][0]['generated_text'])
427
- else:
428
- st.json(result_data['result'][0])
429
- else:
430
- st.json(result_data['result'])
431
-
432
- elif process_button:
433
- if not video_file:
434
- st.error("Please upload a video file")
435
- if not prompt and not (model_type == "Local Models" and selected_model == "Person on Track Detector"):
436
- st.error("Please enter an analysis prompt")
437
- if not api_token and model_type == "Remote API":
438
- st.error("Please provide your Hugging Face API token for remote models")
439
- if model_type == "Local Models" and not local_models_available:
440
- st.error("Local models failed to initialize. Check your installation.")
441
 
442
- # Instructions
443
- with st.expander("How to use"):
444
- st.markdown("""
445
- ## Local AI Models (Recommended)
446
- 1. **Upload a video**: Choose a video file (MP4, AVI, MOV, or MKV)
447
- 2. **Select model type**: Choose "Local Models" for offline processing
448
- 3. **Choose AI model**:
449
- - **CNN (BLIP)**: Fast, good for object detection (~1.2GB)
450
- - **Transformer (ViT-GPT2)**: Detailed descriptions (~1.8GB)
451
- 4. **Enter a prompt**: Describe what you want the AI to analyze
452
- 5. **Adjust frame rate**: Set frames per second to extract (default: 1 fps)
453
- 6. **Click Process**: Frames are processed locally on your machine
454
-
455
- ## Remote API Models (Optional)
456
- 1. **Get API token**: Visit [Hugging Face Settings](https://huggingface.co/settings/tokens)
457
- 2. **Select "Remote API"** in model type
458
- 3. **Enter token** and select remote model
459
-
460
- ## Video Support Features
461
- - **Automatic corruption repair**: Handles videos with corrupted moov atoms
462
- - **FFmpeg integration**: Auto-repairs problematic video files
463
- - **Multiple formats**: MP4, AVI, MOV, MKV support
464
-
465
- ## Requirements
466
- - **Python packages**: torch, transformers, accelerate (see requirements.txt)
467
- - **Optional**: FFmpeg for video repair (download from https://ffmpeg.org)
468
- - **Storage**: ~3GB for both local models
469
-
470
- ## Example Prompts
471
- - "Describe what you see in this image"
472
- - "Count the number of people in this scene"
473
- - "What objects are visible in this frame?"
474
- - "Describe the emotions and actions in this scene"
475
- - "What is the main activity happening here?"
476
- """)
477
 
478
  if __name__ == "__main__":
479
  main()
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Main Streamlit application for video frame analysis with ontology-based risk assessment
4
+ Refactored for better code organization and maintainability
5
+ """
6
  import streamlit as st
 
 
 
 
 
 
7
  import json
 
 
 
8
  from dotenv import load_dotenv
9
+
10
+ # Import our modular components
11
+ from video_processing import extract_frames_from_video
12
+ from ontology_integration import analyze_scene_with_ontology, extract_scene_description
13
+ from model_processing import process_frame
14
+ from ui_components import (
15
+ render_sidebar_config,
16
+ render_input_section,
17
+ render_prompt_section,
18
+ render_process_button,
19
+ render_results_header,
20
+ render_frame_result,
21
+ render_validation_errors,
22
+ render_instructions
23
+ )
24
+
25
  # Try to import local models, fall back gracefully if not available
26
  try:
27
  from local_models import get_local_model_manager
 
35
  # Load environment variables
36
  load_dotenv()
37
 
38
+
39
  def load_settings():
40
  """Load settings from JSON file"""
41
  try:
 
44
  except FileNotFoundError:
45
  return {}
46
 
 
 
 
47
 
 
48
  @st.cache_resource
49
  def initialize_local_models():
50
  """Initialize local model manager"""
51
  return get_local_model_manager()
52
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
53
 
54
+ def initialize_app():
55
+ """Initialize the Streamlit application"""
56
  st.set_page_config(
57
+ page_title="Video Frame Analyzer with Ontology",
58
  page_icon="🎥",
59
  layout="wide"
60
  )
61
 
62
+ st.title("🎥 Video Frame Analyzer with Ontology-Based Risk Assessment")
63
+ st.markdown("Upload a video and analyze frames using AI models with ontology-based safety classification")
64
+
65
+
66
+ def setup_local_models():
67
+ """Setup local models and return availability status"""
 
68
  local_manager = None
69
  local_models_available = False
70
 
71
+ if LOCAL_MODELS_AVAILABLE:
72
  try:
73
  local_manager = initialize_local_models()
74
  local_models_available = True
 
80
  else:
81
  st.info("💡 Local AI models not installed. Install with: `pip install torch torchvision transformers accelerate sentencepiece`")
82
 
83
+ return local_manager, local_models_available
84
+
85
+
86
+ def process_video_frames(video_file, config, local_manager=None):
87
+ """
88
+ Process all frames in the video and return results
89
+ """
90
+ # Extract frames
91
+ frames = extract_frames_from_video(video_file, config["fps"])
92
+
93
+ if not frames:
94
+ st.error("No frames could be extracted from the video")
95
+ return []
96
+
97
+ st.success(f"Extracted {len(frames)} frames from video")
98
+
99
+ # Process each frame
100
+ results = []
101
+ progress_bar = st.progress(0)
102
+
103
+ # Add prompt to config for processing
104
+ processing_config = config.copy()
105
+ processing_config["prompt"] = config.get("prompt", "")
106
+
107
+ for i, frame_data in enumerate(frames):
108
+ with st.spinner(f"Analyzing frame {i+1}/{len(frames)}..."):
109
+ # Process frame with selected model
110
+ result = process_frame(frame_data, processing_config, local_manager)
111
 
112
+ # Extract scene description for ontology analysis
113
+ scene_description = extract_scene_description(result)
 
 
 
 
 
114
 
115
+ # Apply ontology analysis
116
+ ontology_analysis = analyze_scene_with_ontology(scene_description, config["use_ontology"])
117
 
118
+ results.append({
119
+ 'frame_number': frame_data['frame_number'],
120
+ 'timestamp': frame_data['timestamp'],
121
+ 'image': frame_data['frame'],
122
+ 'result': result,
123
+ 'ontology_analysis': ontology_analysis
124
+ })
 
 
125
 
126
+ progress_bar.progress((i + 1) / len(frames))
127
+
128
+ return results
129
+
130
+
131
+ def validate_inputs(video_file, prompt, config, local_models_available):
132
+ """
133
+ Validate all required inputs
134
+ """
135
+ model_type = config["model_type"]
136
+ selected_model = config["selected_model"]
137
+ api_token = config["api_token"]
138
+
139
+ # Check basic requirements
140
+ if not video_file:
141
+ return False
142
+
143
+ # Check prompt requirements
144
+ if not prompt and not (model_type == "Local Models" and selected_model == "Person on Track Detector"):
145
+ return False
146
+
147
+ # Check API token for remote models
148
+ if not api_token and model_type == "Remote API":
149
+ return False
150
+
151
+ # Check local models availability
152
+ if model_type == "Local Models" and not local_models_available:
153
+ return False
154
+
155
+ return True
156
+
157
+
158
+ def main():
159
+ """Main application entry point"""
160
+ # Initialize application
161
+ initialize_app()
162
 
163
+ # Load settings and setup models
164
+ settings = load_settings()
165
+ local_manager, local_models_available = setup_local_models()
166
+
167
+ # Create main layout
168
  col1, col2 = st.columns([1, 1])
169
 
170
  with col1:
171
+ # Render sidebar configuration
172
+ config = render_sidebar_config(settings, local_models_available, local_manager)
173
 
174
+ # Render input section
175
+ input_data = render_input_section()
176
+ video_file = input_data["video_file"]
 
 
 
177
 
178
+ # Render prompt section
179
+ prompt = render_prompt_section(config)
 
 
 
 
 
 
 
 
 
 
180
 
181
+ # Render process button
182
+ process_button = render_process_button()
183
 
184
  with col2:
185
+ # Render results section
186
+ results_container = render_results_header()
187
+
188
+ # Main processing logic
189
+ if process_button:
190
+ if validate_inputs(video_file, prompt, config, local_models_available):
191
+ # Add prompt to config for processing
192
+ config["prompt"] = prompt
193
+
194
+ with st.spinner("Processing video..."):
195
+ # Process video frames
196
+ results = process_video_frames(video_file, config, local_manager)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
197
 
198
+ # Display results
199
+ if results:
200
+ with results_container:
201
+ st.subheader("Analysis Results")
202
 
203
+ # Display summary statistics
204
+ severity_counts = {}
205
+ for result in results:
206
+ severity = result['ontology_analysis'].get('severity', 'NONE')
207
+ severity_counts[severity] = severity_counts.get(severity, 0) + 1
 
208
 
209
+ if config["use_ontology"] and severity_counts:
210
+ st.write("**Summary:**")
211
+ summary_cols = st.columns(len(severity_counts))
212
+ for i, (severity, count) in enumerate(severity_counts.items()):
213
+ icon_map = {
214
+ 'NONE': '✅', 'LOW': '🟢', 'MEDIUM': '🟠',
215
+ 'HIGH': '⚠️', 'CRITICAL': '🚨'
216
+ }
217
+ with summary_cols[i]:
218
+ st.metric(f"{icon_map.get(severity, '')} {severity}", count)
219
+ st.divider()
220
+
221
+ # Display individual frame results
222
+ for result_data in results:
223
+ render_frame_result(result_data)
224
+ else:
225
+ # Show validation errors
226
+ render_validation_errors(
227
+ video_file, prompt, config["api_token"],
228
+ config["model_type"], local_models_available, config["selected_model"]
229
+ )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
230
 
231
+ # Render instructions
232
+ render_instructions()
233
+
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
234
 
235
  if __name__ == "__main__":
236
  main()
app_original_backup.py ADDED
@@ -0,0 +1,640 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+ import cv2
3
+ import os
4
+ import tempfile
5
+ import requests
6
+ import base64
7
+ import subprocess
8
+ import json
9
+ from io import BytesIO
10
+ from PIL import Image
11
+ import numpy as np
12
+ from dotenv import load_dotenv
13
+ from ontology_eval import Observation, evaluate, OntologyContext, decision_to_triples, triples_to_turtle, Severity
14
+ # Try to import local models, fall back gracefully if not available
15
+ try:
16
+ from local_models import get_local_model_manager
17
+ LOCAL_MODELS_AVAILABLE = True
18
+ except ImportError as e:
19
+ LOCAL_MODELS_AVAILABLE = False
20
+ print(f"Local models not available: {e}")
21
+ def get_local_model_manager():
22
+ return None
23
+
24
+ # Load environment variables
25
+ load_dotenv()
26
+
27
+ def load_settings():
28
+ """Load settings from JSON file"""
29
+ try:
30
+ with open('settings.json', 'r') as f:
31
+ return json.load(f)
32
+ except FileNotFoundError:
33
+ return {}
34
+
35
+ # Local models configuration
36
+ LOCAL_MODELS_ENABLED = LOCAL_MODELS_AVAILABLE
37
+ REMOTE_MODELS_ENABLED = True # Always allow remote API as fallback
38
+
39
+ # Initialize local model manager
40
+ @st.cache_resource
41
+ def initialize_local_models():
42
+ """Initialize local model manager"""
43
+ return get_local_model_manager()
44
+
45
+ # Hugging Face models for vision-language tasks (kept for compatibility)
46
+ AVAILABLE_MODELS = {
47
+ "microsoft/kosmos-2-patch14-224": "Kosmos-2",
48
+ "Salesforce/blip-image-captioning-large": "BLIP Image Captioning",
49
+ "microsoft/DialoGPT-medium": "DialoGPT",
50
+ "microsoft/git-large-coco": "GIT Large COCO",
51
+ "nlpconnect/vit-gpt2-image-captioning": "ViT-GPT2"
52
+ }
53
+
54
+ def repair_video_with_ffmpeg(input_path, output_path):
55
+ """
56
+ Repair corrupted video by moving moov atom to the beginning
57
+ """
58
+ try:
59
+ # Try to fix the video using FFmpeg
60
+ cmd = [
61
+ 'ffmpeg',
62
+ '-i', input_path,
63
+ '-c', 'copy',
64
+ '-movflags', 'faststart',
65
+ '-avoid_negative_ts', 'make_zero',
66
+ '-y', # Overwrite output file
67
+ output_path
68
+ ]
69
+
70
+ result = subprocess.run(
71
+ cmd,
72
+ capture_output=True,
73
+ text=True,
74
+ timeout=300 # 5 minute timeout
75
+ )
76
+
77
+ return result.returncode == 0
78
+ except (subprocess.TimeoutExpired, FileNotFoundError):
79
+ return False
80
+
81
+ def extract_frames_from_video(video_file, fps=1):
82
+ """
83
+ Extract frames from video at specified FPS (default 1 frame per second)
84
+ Automatically handles corrupted videos by attempting repair with FFmpeg
85
+ """
86
+ frames = []
87
+
88
+ with tempfile.NamedTemporaryFile(delete=False, suffix='.mp4') as tmp_file:
89
+ tmp_file.write(video_file.read())
90
+ tmp_file_path = tmp_file.name
91
+
92
+ repaired_path = None
93
+
94
+ try:
95
+ # First attempt: try to open video directly
96
+ cap = cv2.VideoCapture(tmp_file_path)
97
+
98
+ # Check if video opened successfully and has frames
99
+ if not cap.isOpened() or cap.get(cv2.CAP_PROP_FRAME_COUNT) == 0:
100
+ cap.release()
101
+
102
+ # Second attempt: try to repair the video with FFmpeg
103
+ st.warning("Video appears corrupted (moov atom issue). Attempting repair...")
104
+
105
+ with tempfile.NamedTemporaryFile(delete=False, suffix='_repaired.mp4') as repaired_file:
106
+ repaired_path = repaired_file.name
107
+
108
+ if repair_video_with_ffmpeg(tmp_file_path, repaired_path):
109
+ st.success("Video repair successful! Processing frames...")
110
+ cap = cv2.VideoCapture(repaired_path)
111
+ else:
112
+ st.error("Failed to repair video. FFmpeg may not be installed or video is severely corrupted.")
113
+ return frames
114
+
115
+ # Extract video properties
116
+ video_fps = cap.get(cv2.CAP_PROP_FPS)
117
+ if video_fps <= 0:
118
+ video_fps = 30 # Default fallback FPS
119
+
120
+ frame_interval = int(video_fps / fps) if video_fps > fps else 1
121
+
122
+ frame_count = 0
123
+ extracted_count = 0
124
+
125
+ while True:
126
+ ret, frame = cap.read()
127
+ if not ret:
128
+ break
129
+
130
+ if frame_count % frame_interval == 0:
131
+ frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
132
+ pil_image = Image.fromarray(frame_rgb)
133
+ frames.append({
134
+ 'frame': pil_image,
135
+ 'timestamp': frame_count / video_fps,
136
+ 'frame_number': extracted_count
137
+ })
138
+ extracted_count += 1
139
+
140
+ frame_count += 1
141
+
142
+ cap.release()
143
+
144
+ finally:
145
+ # Clean up temporary files
146
+ if os.path.exists(tmp_file_path):
147
+ os.unlink(tmp_file_path)
148
+ if repaired_path and os.path.exists(repaired_path):
149
+ os.unlink(repaired_path)
150
+
151
+ return frames
152
+
153
+ def image_to_base64(image):
154
+ """Convert PIL image to base64 string"""
155
+ buffer = BytesIO()
156
+ image.save(buffer, format="PNG")
157
+ img_str = base64.b64encode(buffer.getvalue()).decode()
158
+ return img_str
159
+
160
+ def process_image_locally(image, prompt, model_name, local_manager):
161
+ """
162
+ Process image using local models
163
+ """
164
+ try:
165
+ if model_name == "Person on Track Detector":
166
+ # Special handling for person-on-track detection
167
+ result = local_manager.person_on_track_detector.detect_person_on_track(image)
168
+ return {"person_on_track_detection": result}
169
+ else:
170
+ caption = local_manager.generate_caption(model_name, image, prompt)
171
+ return {"generated_text": caption}
172
+ except Exception as e:
173
+ return {"error": f"Local processing failed: {str(e)}"}
174
+
175
+ def query_huggingface_api(image, prompt, model_name, api_token):
176
+ """
177
+ Query Hugging Face API with image and prompt
178
+ """
179
+ API_URL = f"https://api-inference.huggingface.co/models/{model_name}"
180
+ headers = {"Authorization": f"Bearer {api_token}"}
181
+
182
+ # Convert image to base64
183
+ img_base64 = image_to_base64(image)
184
+
185
+ # Prepare payload based on model type
186
+ if "blip" in model_name.lower():
187
+ # For BLIP models, send image directly
188
+ buffer = BytesIO()
189
+ image.save(buffer, format="PNG")
190
+ response = requests.post(
191
+ API_URL,
192
+ headers=headers,
193
+ files={"file": buffer.getvalue()}
194
+ )
195
+ else:
196
+ # For other vision-language models
197
+ payload = {
198
+ "inputs": {
199
+ "image": img_base64,
200
+ "text": prompt
201
+ }
202
+ }
203
+ response = requests.post(API_URL, headers=headers, json=payload)
204
+
205
+ if response.status_code == 200:
206
+ return response.json()
207
+ else:
208
+ return {"error": f"API request failed: {response.status_code} - {response.text}"}
209
+
210
+ def analyze_scene_with_ontology(scene_description, use_ontology=True):
211
+ """
212
+ Analyze scene description using ontology-based evaluation
213
+ Returns classification and explanation
214
+ """
215
+ if not use_ontology:
216
+ return {
217
+ "severity": "NONE",
218
+ "severity_icon": "✅",
219
+ "score": 0,
220
+ "explanation": "Ontology-based analysis skipped",
221
+ "ontology_used": False,
222
+ "raw_description": scene_description
223
+ }
224
+
225
+ # Extract relevant information from scene description for ontology
226
+ scene_lower = scene_description.lower().strip() if scene_description else ""
227
+
228
+ # Initialize observation based on scene analysis
229
+ obs = Observation()
230
+
231
+ # Analyze scene for ontology features
232
+ person_words = ['person', 'people', 'man', 'woman', 'boy', 'girl', 'human', 'individual', 'someone']
233
+ track_words = ['track', 'tracks', 'rail', 'rails', 'railway', 'railroad']
234
+ platform_words = ['platform', 'station', 'bahnsteig']
235
+ danger_words = ['fallen', 'lying', 'down', 'accident', 'emergency']
236
+ fire_words = ['fire', 'smoke', 'flames', 'burning']
237
+ crowd_words = ['crowd', 'many people', 'group', 'mehrere personen']
238
+ safe_words = ['no people', 'empty', 'clear', 'safe', 'nobody', 'without people']
239
+
240
+ # Set observation values based on keyword analysis
241
+ person_mentions = sum(1 for word in person_words if word in scene_lower)
242
+ track_mentions = sum(1 for word in track_words if word in scene_lower)
243
+ platform_mentions = sum(1 for word in platform_words if word in scene_lower)
244
+ danger_mentions = sum(1 for word in danger_words if word in scene_lower)
245
+ fire_mentions = sum(1 for word in fire_words if word in scene_lower)
246
+ crowd_mentions = sum(1 for word in crowd_words if word in scene_lower)
247
+ safe_mentions = sum(1 for word in safe_words if word in scene_lower)
248
+
249
+ # Person on track detection (but not if explicitly safe)
250
+ if person_mentions > 0 and track_mentions > 0 and safe_mentions == 0:
251
+ # Check if person is actually on the tracks vs just mentioned
252
+ on_track_indicators = ['on track', 'on the track', 'on rails', 'on the rails', 'standing on', 'walking on']
253
+ on_track_specific = sum(1 for phrase in on_track_indicators if phrase in scene_lower)
254
+ if on_track_specific > 0:
255
+ obs.on_track_person = min(0.8, 0.6 + on_track_specific * 0.1)
256
+ elif person_mentions > 0 and track_mentions > 0:
257
+ # General co-occurrence but less confident - need stronger evidence
258
+ near_indicators = ['near', 'close to', 'next to', 'beside', 'by the']
259
+ near_mentions = sum(1 for phrase in near_indicators if phrase in scene_lower)
260
+ if near_mentions > 0:
261
+ # Person near tracks but not necessarily on them - lower confidence
262
+ obs.on_track_person = min(0.4, 0.25 + near_mentions * 0.05)
263
+ else:
264
+ # Just mention of person and tracks together - very low confidence
265
+ obs.on_track_person = min(0.3, 0.2 + (person_mentions + track_mentions) * 0.02)
266
+
267
+ # Fallen person detection
268
+ if person_mentions > 0 and danger_mentions > 0:
269
+ obs.fallen_person = min(0.7, 0.4 + danger_mentions * 0.1)
270
+
271
+ # Fire/smoke detection
272
+ if fire_mentions > 0:
273
+ obs.smoke_or_fire = min(0.8, 0.5 + fire_mentions * 0.15)
274
+
275
+ # Crowd detection
276
+ if crowd_mentions > 0 and (track_mentions > 0 or platform_mentions > 0):
277
+ obs.crowd_on_track = min(0.7, 0.4 + crowd_mentions * 0.1)
278
+
279
+ # Generic object detection (if no person but something mentioned on tracks)
280
+ if track_mentions > 0 and person_mentions == 0 and any(word in scene_lower for word in ['object', 'item', 'thing', 'debris']):
281
+ obs.object_on_track = 0.6
282
+
283
+ # Evaluate using ontology
284
+ decision = evaluate(obs)
285
+
286
+ # Map severity to icons and colors
287
+ severity_mapping = {
288
+ Severity.NONE: {"icon": "✅", "color": "green"},
289
+ Severity.LOW: {"icon": "🟢", "color": "lightgreen"},
290
+ Severity.MEDIUM: {"icon": "🟠", "color": "orange"},
291
+ Severity.HIGH: {"icon": "⚠️", "color": "red"},
292
+ Severity.CRITICAL: {"icon": "🚨", "color": "darkred"}
293
+ }
294
+
295
+ severity_info = severity_mapping[decision.severity]
296
+
297
+ return {
298
+ "severity": decision.severity.name,
299
+ "severity_icon": severity_info["icon"],
300
+ "severity_color": severity_info["color"],
301
+ "score": decision.score_0_100,
302
+ "labels": [label.value for label in decision.labels],
303
+ "explanations": decision.explanations,
304
+ "fired_rules": decision.fired_rules,
305
+ "ontology_used": True,
306
+ "raw_description": scene_description,
307
+ "observation": obs,
308
+ "decision": decision
309
+ }
310
+
311
+ def main():
312
+ st.set_page_config(
313
+ page_title="Video Frame Analyzer",
314
+ page_icon="🎥",
315
+ layout="wide"
316
+ )
317
+
318
+ st.title("🎥 Video Frame Analyzer with Local AI Models")
319
+ st.markdown("Upload a video, provide a prompt, and analyze each frame using local AI models (CNN or Transformer)")
320
+
321
+ # Load settings and initialize local models
322
+ settings = load_settings()
323
+
324
+ # Initialize local models if enabled
325
+ local_manager = None
326
+ local_models_available = False
327
+
328
+ if LOCAL_MODELS_ENABLED:
329
+ try:
330
+ local_manager = initialize_local_models()
331
+ local_models_available = True
332
+ st.success("🤖 Local AI models initialized successfully!")
333
+ except Exception as e:
334
+ st.warning(f"Local AI models not available: {str(e)}")
335
+ st.info("💡 Install AI packages: `pip install torch torchvision transformers accelerate sentencepiece`")
336
+ local_models_available = False
337
+ else:
338
+ st.info("💡 Local AI models not installed. Install with: `pip install torch torchvision transformers accelerate sentencepiece`")
339
+
340
+ # Sidebar for configuration
341
+ with st.sidebar:
342
+ st.header("Configuration")
343
+
344
+ # Model type selection
345
+ available_options = []
346
+ if local_models_available:
347
+ available_options.append("Local Models")
348
+ if REMOTE_MODELS_ENABLED:
349
+ available_options.append("Remote API")
350
+
351
+ if not available_options:
352
+ available_options = ["Remote API"] # Fallback
353
+
354
+ model_type = st.radio(
355
+ "Model Type",
356
+ available_options,
357
+ help="Choose between local AI models or remote Hugging Face API"
358
+ )
359
+
360
+ if model_type == "Local Models" and local_models_available:
361
+ # Local model selection
362
+ available_local_models = local_manager.get_available_models()
363
+ selected_model = st.selectbox(
364
+ "Select Local Model",
365
+ options=available_local_models,
366
+ help="Choose between CNN (fast) or Transformer (detailed) models"
367
+ )
368
+
369
+ # Show model info
370
+ model_info = local_manager.get_model_info()
371
+ if selected_model in model_info:
372
+ with st.expander("Model Information"):
373
+ st.write(f"**Description:** {model_info[selected_model]['description']}")
374
+ st.write(f"**Strengths:** {model_info[selected_model]['strengths']}")
375
+ st.write(f"**Size:** {model_info[selected_model]['size']}")
376
+
377
+ api_token = None # Not needed for local models
378
+
379
+ else:
380
+ # Remote API configuration
381
+ default_token = settings.get('hugging_face_api_token', '')
382
+ api_token = st.text_input(
383
+ "Hugging Face API Token",
384
+ value=default_token,
385
+ type="password",
386
+ help="Get your token from https://huggingface.co/settings/tokens or save in settings.json"
387
+ )
388
+
389
+ # Remote model selection
390
+ selected_model = st.selectbox(
391
+ "Select Model",
392
+ options=list(AVAILABLE_MODELS.keys()),
393
+ format_func=lambda x: AVAILABLE_MODELS[x]
394
+ )
395
+
396
+ # Frame extraction rate
397
+ fps = st.slider(
398
+ "Frames per second to extract",
399
+ min_value=0.1,
400
+ max_value=5.0,
401
+ value=1.0,
402
+ step=0.1
403
+ )
404
+
405
+ # Ontology settings
406
+ st.subheader("Ontology Analysis")
407
+ use_ontology = st.checkbox(
408
+ "Enable Ontology Analysis",
409
+ value=True,
410
+ help="Use ontology-based classification (NONE/LOW/MEDIUM/HIGH/CRITICAL)"
411
+ )
412
+
413
+ if not use_ontology:
414
+ st.info("🔄 Ontology analysis disabled - showing raw model output only")
415
+
416
+ # Main content area
417
+ col1, col2 = st.columns([1, 1])
418
+
419
+ with col1:
420
+ st.header("Input")
421
+
422
+ # Video upload
423
+ video_file = st.file_uploader(
424
+ "Upload Video",
425
+ type=['mp4', 'avi', 'mov', 'mkv'],
426
+ help="Upload a video file to analyze"
427
+ )
428
+
429
+ # Prompt input (conditional based on model)
430
+ if model_type == "Local Models" and local_models_available and selected_model == "Person on Track Detector":
431
+ # Person on Track Detector works automatically
432
+ st.info("🤖 Person on Track Detector works automatically - no prompt needed!")
433
+ prompt = "automatic" # Set automatic prompt
434
+ else:
435
+ # Regular models need user prompt
436
+ prompt = st.text_area(
437
+ "Analysis Prompt",
438
+ placeholder="Describe what you see in the image...",
439
+ help="Enter the prompt to analyze each frame"
440
+ )
441
+
442
+ # Process button
443
+ process_button = st.button("Process Video", type="primary")
444
+
445
+ with col2:
446
+ st.header("Results")
447
+ results_container = st.container()
448
+
449
+ # Processing logic
450
+ if process_button and video_file and (prompt or (model_type == "Local Models" and selected_model == "Person on Track Detector")) and (api_token or model_type == "Local Models"):
451
+ with st.spinner("Processing video..."):
452
+ # Extract frames
453
+ frames = extract_frames_from_video(video_file, fps)
454
+
455
+ if not frames:
456
+ st.error("No frames could be extracted from the video")
457
+ return
458
+
459
+ st.success(f"Extracted {len(frames)} frames from video")
460
+
461
+ # Process each frame
462
+ results = []
463
+ progress_bar = st.progress(0)
464
+
465
+ for i, frame_data in enumerate(frames):
466
+ with st.spinner(f"Analyzing frame {i+1}/{len(frames)}..."):
467
+ # Process frame based on model type
468
+ if model_type == "Local Models" and local_models_available:
469
+ result = process_image_locally(
470
+ frame_data['frame'],
471
+ prompt,
472
+ selected_model,
473
+ local_manager
474
+ )
475
+ else:
476
+ result = query_huggingface_api(
477
+ frame_data['frame'],
478
+ prompt,
479
+ selected_model,
480
+ api_token
481
+ )
482
+
483
+ # Extract scene description for ontology analysis
484
+ scene_description = ""
485
+ if 'person_on_track_detection' in result:
486
+ # For person detection results, use the analysis text
487
+ scene_description = result['person_on_track_detection'].get('detailed_analysis', {}).get('scene_description', '')
488
+ elif 'generated_text' in result:
489
+ scene_description = result['generated_text']
490
+ elif isinstance(result, list) and len(result) > 0 and 'generated_text' in result[0]:
491
+ scene_description = result[0]['generated_text']
492
+
493
+ # Apply ontology analysis
494
+ ontology_analysis = analyze_scene_with_ontology(scene_description, use_ontology)
495
+
496
+ results.append({
497
+ 'frame_number': frame_data['frame_number'],
498
+ 'timestamp': frame_data['timestamp'],
499
+ 'image': frame_data['frame'],
500
+ 'result': result,
501
+ 'ontology_analysis': ontology_analysis
502
+ })
503
+
504
+ progress_bar.progress((i + 1) / len(frames))
505
+
506
+ # Display results
507
+ with results_container:
508
+ st.subheader("Analysis Results")
509
+
510
+ for result_data in results:
511
+ ontology = result_data['ontology_analysis']
512
+ severity_icon = ontology.get('severity_icon', '✅')
513
+ severity = ontology.get('severity', 'NONE')
514
+
515
+ # Create expander title with severity indicator
516
+ expander_title = f"{severity_icon} {severity} - Frame {result_data['frame_number']} (t={result_data['timestamp']:.1f}s)"
517
+
518
+ with st.expander(expander_title):
519
+ col_img, col_text = st.columns([1, 2])
520
+
521
+ with col_img:
522
+ st.image(
523
+ result_data['image'],
524
+ caption=f"Frame {result_data['frame_number']}",
525
+ use_container_width=True
526
+ )
527
+
528
+ with col_text:
529
+ # Display ontology analysis first if enabled
530
+ if ontology.get('ontology_used', False):
531
+ # Severity display with color
532
+ severity_color = ontology.get('severity_color', 'green')
533
+ st.markdown(f"**Safety Assessment:** :{severity_color}[{severity_icon} {severity}]")
534
+
535
+ # Score display
536
+ if ontology.get('score', 0) > 0:
537
+ st.metric("Risk Score", f"{ontology['score']}/100")
538
+
539
+ # Show explanations if available
540
+ if ontology.get('explanations'):
541
+ st.write("**Ontology Analysis:**")
542
+ for explanation in ontology['explanations']:
543
+ st.write(f"• {explanation}")
544
+
545
+ # Show fired rules if available
546
+ if ontology.get('fired_rules'):
547
+ with st.expander("Technical Details"):
548
+ st.write("**Triggered Rules:**")
549
+ for rule in ontology['fired_rules']:
550
+ st.code(rule)
551
+
552
+ if ontology.get('labels'):
553
+ st.write("**Detected Hazard Labels:**")
554
+ for label in ontology['labels']:
555
+ st.code(label)
556
+
557
+ st.divider()
558
+
559
+ # Display original model results
560
+ st.write("**Model Output:**")
561
+ if 'error' in result_data['result']:
562
+ st.error(f"Error: {result_data['result']['error']}")
563
+ elif 'person_on_track_detection' in result_data['result']:
564
+ # Handle person-on-track detection results
565
+ detection = result_data['result']['person_on_track_detection']
566
+
567
+ people_count = detection.get('people_count', 0)
568
+ confidence = detection.get('confidence', 0)
569
+ analysis = detection.get('analysis', 'No analysis')
570
+ person_on_track = detection.get('person_on_track', False)
571
+
572
+ st.write(f"**Detection Analysis:** {analysis}")
573
+
574
+ # Show metrics
575
+ col1, col2 = st.columns(2)
576
+ with col1:
577
+ st.metric("👥 People Detected", people_count)
578
+ with col2:
579
+ st.metric("📊 Model Confidence", f"{confidence:.0%}")
580
+ else:
581
+ if 'generated_text' in result_data['result']:
582
+ # Handle direct generated_text response (local models)
583
+ st.write(f"*{result_data['result']['generated_text']}*")
584
+ elif isinstance(result_data['result'], list) and len(result_data['result']) > 0:
585
+ # Handle list responses (common for captioning models)
586
+ if 'generated_text' in result_data['result'][0]:
587
+ st.write(f"*{result_data['result'][0]['generated_text']}*")
588
+ else:
589
+ st.json(result_data['result'][0])
590
+ else:
591
+ st.json(result_data['result'])
592
+
593
+ elif process_button:
594
+ if not video_file:
595
+ st.error("Please upload a video file")
596
+ if not prompt and not (model_type == "Local Models" and selected_model == "Person on Track Detector"):
597
+ st.error("Please enter an analysis prompt")
598
+ if not api_token and model_type == "Remote API":
599
+ st.error("Please provide your Hugging Face API token for remote models")
600
+ if model_type == "Local Models" and not local_models_available:
601
+ st.error("Local models failed to initialize. Check your installation.")
602
+
603
+ # Instructions
604
+ with st.expander("How to use"):
605
+ st.markdown("""
606
+ ## Local AI Models (Recommended)
607
+ 1. **Upload a video**: Choose a video file (MP4, AVI, MOV, or MKV)
608
+ 2. **Select model type**: Choose "Local Models" for offline processing
609
+ 3. **Choose AI model**:
610
+ - **CNN (BLIP)**: Fast, good for object detection (~1.2GB)
611
+ - **Transformer (ViT-GPT2)**: Detailed descriptions (~1.8GB)
612
+ 4. **Enter a prompt**: Describe what you want the AI to analyze
613
+ 5. **Adjust frame rate**: Set frames per second to extract (default: 1 fps)
614
+ 6. **Click Process**: Frames are processed locally on your machine
615
+
616
+ ## Remote API Models (Optional)
617
+ 1. **Get API token**: Visit [Hugging Face Settings](https://huggingface.co/settings/tokens)
618
+ 2. **Select "Remote API"** in model type
619
+ 3. **Enter token** and select remote model
620
+
621
+ ## Video Support Features
622
+ - **Automatic corruption repair**: Handles videos with corrupted moov atoms
623
+ - **FFmpeg integration**: Auto-repairs problematic video files
624
+ - **Multiple formats**: MP4, AVI, MOV, MKV support
625
+
626
+ ## Requirements
627
+ - **Python packages**: torch, transformers, accelerate (see requirements.txt)
628
+ - **Optional**: FFmpeg for video repair (download from https://ffmpeg.org)
629
+ - **Storage**: ~3GB for both local models
630
+
631
+ ## Example Prompts
632
+ - "Describe what you see in this image"
633
+ - "Count the number of people in this scene"
634
+ - "What objects are visible in this frame?"
635
+ - "Describe the emotions and actions in this scene"
636
+ - "What is the main activity happening here?"
637
+ """)
638
+
639
+ if __name__ == "__main__":
640
+ main()
app_refactored.py ADDED
@@ -0,0 +1,236 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Main Streamlit application for video frame analysis with ontology-based risk assessment
4
+ Refactored for better code organization and maintainability
5
+ """
6
+ import streamlit as st
7
+ import json
8
+ from dotenv import load_dotenv
9
+
10
+ # Import our modular components
11
+ from video_processing import extract_frames_from_video
12
+ from ontology_integration import analyze_scene_with_ontology, extract_scene_description
13
+ from model_processing import process_frame
14
+ from ui_components import (
15
+ render_sidebar_config,
16
+ render_input_section,
17
+ render_prompt_section,
18
+ render_process_button,
19
+ render_results_header,
20
+ render_frame_result,
21
+ render_validation_errors,
22
+ render_instructions
23
+ )
24
+
25
+ # Try to import local models, fall back gracefully if not available
26
+ try:
27
+ from local_models import get_local_model_manager
28
+ LOCAL_MODELS_AVAILABLE = True
29
+ except ImportError as e:
30
+ LOCAL_MODELS_AVAILABLE = False
31
+ print(f"Local models not available: {e}")
32
+ def get_local_model_manager():
33
+ return None
34
+
35
+ # Load environment variables
36
+ load_dotenv()
37
+
38
+
39
+ def load_settings():
40
+ """Load settings from JSON file"""
41
+ try:
42
+ with open('settings.json', 'r') as f:
43
+ return json.load(f)
44
+ except FileNotFoundError:
45
+ return {}
46
+
47
+
48
+ @st.cache_resource
49
+ def initialize_local_models():
50
+ """Initialize local model manager"""
51
+ return get_local_model_manager()
52
+
53
+
54
+ def initialize_app():
55
+ """Initialize the Streamlit application"""
56
+ st.set_page_config(
57
+ page_title="Video Frame Analyzer with Ontology",
58
+ page_icon="🎥",
59
+ layout="wide"
60
+ )
61
+
62
+ st.title("🎥 Video Frame Analyzer with Ontology-Based Risk Assessment")
63
+ st.markdown("Upload a video and analyze frames using AI models with ontology-based safety classification")
64
+
65
+
66
+ def setup_local_models():
67
+ """Setup local models and return availability status"""
68
+ local_manager = None
69
+ local_models_available = False
70
+
71
+ if LOCAL_MODELS_AVAILABLE:
72
+ try:
73
+ local_manager = initialize_local_models()
74
+ local_models_available = True
75
+ st.success("🤖 Local AI models initialized successfully!")
76
+ except Exception as e:
77
+ st.warning(f"Local AI models not available: {str(e)}")
78
+ st.info("💡 Install AI packages: `pip install torch torchvision transformers accelerate sentencepiece`")
79
+ local_models_available = False
80
+ else:
81
+ st.info("💡 Local AI models not installed. Install with: `pip install torch torchvision transformers accelerate sentencepiece`")
82
+
83
+ return local_manager, local_models_available
84
+
85
+
86
+ def process_video_frames(video_file, config, local_manager=None):
87
+ """
88
+ Process all frames in the video and return results
89
+ """
90
+ # Extract frames
91
+ frames = extract_frames_from_video(video_file, config["fps"])
92
+
93
+ if not frames:
94
+ st.error("No frames could be extracted from the video")
95
+ return []
96
+
97
+ st.success(f"Extracted {len(frames)} frames from video")
98
+
99
+ # Process each frame
100
+ results = []
101
+ progress_bar = st.progress(0)
102
+
103
+ # Add prompt to config for processing
104
+ processing_config = config.copy()
105
+ processing_config["prompt"] = config.get("prompt", "")
106
+
107
+ for i, frame_data in enumerate(frames):
108
+ with st.spinner(f"Analyzing frame {i+1}/{len(frames)}..."):
109
+ # Process frame with selected model
110
+ result = process_frame(frame_data, processing_config, local_manager)
111
+
112
+ # Extract scene description for ontology analysis
113
+ scene_description = extract_scene_description(result)
114
+
115
+ # Apply ontology analysis
116
+ ontology_analysis = analyze_scene_with_ontology(scene_description, config["use_ontology"])
117
+
118
+ results.append({
119
+ 'frame_number': frame_data['frame_number'],
120
+ 'timestamp': frame_data['timestamp'],
121
+ 'image': frame_data['frame'],
122
+ 'result': result,
123
+ 'ontology_analysis': ontology_analysis
124
+ })
125
+
126
+ progress_bar.progress((i + 1) / len(frames))
127
+
128
+ return results
129
+
130
+
131
+ def validate_inputs(video_file, prompt, config, local_models_available):
132
+ """
133
+ Validate all required inputs
134
+ """
135
+ model_type = config["model_type"]
136
+ selected_model = config["selected_model"]
137
+ api_token = config["api_token"]
138
+
139
+ # Check basic requirements
140
+ if not video_file:
141
+ return False
142
+
143
+ # Check prompt requirements
144
+ if not prompt and not (model_type == "Local Models" and selected_model == "Person on Track Detector"):
145
+ return False
146
+
147
+ # Check API token for remote models
148
+ if not api_token and model_type == "Remote API":
149
+ return False
150
+
151
+ # Check local models availability
152
+ if model_type == "Local Models" and not local_models_available:
153
+ return False
154
+
155
+ return True
156
+
157
+
158
+ def main():
159
+ """Main application entry point"""
160
+ # Initialize application
161
+ initialize_app()
162
+
163
+ # Load settings and setup models
164
+ settings = load_settings()
165
+ local_manager, local_models_available = setup_local_models()
166
+
167
+ # Create main layout
168
+ col1, col2 = st.columns([1, 1])
169
+
170
+ with col1:
171
+ # Render sidebar configuration
172
+ config = render_sidebar_config(settings, local_models_available, local_manager)
173
+
174
+ # Render input section
175
+ input_data = render_input_section()
176
+ video_file = input_data["video_file"]
177
+
178
+ # Render prompt section
179
+ prompt = render_prompt_section(config)
180
+
181
+ # Render process button
182
+ process_button = render_process_button()
183
+
184
+ with col2:
185
+ # Render results section
186
+ results_container = render_results_header()
187
+
188
+ # Main processing logic
189
+ if process_button:
190
+ if validate_inputs(video_file, prompt, config, local_models_available):
191
+ # Add prompt to config for processing
192
+ config["prompt"] = prompt
193
+
194
+ with st.spinner("Processing video..."):
195
+ # Process video frames
196
+ results = process_video_frames(video_file, config, local_manager)
197
+
198
+ # Display results
199
+ if results:
200
+ with results_container:
201
+ st.subheader("Analysis Results")
202
+
203
+ # Display summary statistics
204
+ severity_counts = {}
205
+ for result in results:
206
+ severity = result['ontology_analysis'].get('severity', 'NONE')
207
+ severity_counts[severity] = severity_counts.get(severity, 0) + 1
208
+
209
+ if config["use_ontology"] and severity_counts:
210
+ st.write("**Summary:**")
211
+ summary_cols = st.columns(len(severity_counts))
212
+ for i, (severity, count) in enumerate(severity_counts.items()):
213
+ icon_map = {
214
+ 'NONE': '✅', 'LOW': '🟢', 'MEDIUM': '🟠',
215
+ 'HIGH': '⚠️', 'CRITICAL': '🚨'
216
+ }
217
+ with summary_cols[i]:
218
+ st.metric(f"{icon_map.get(severity, '❓')} {severity}", count)
219
+ st.divider()
220
+
221
+ # Display individual frame results
222
+ for result_data in results:
223
+ render_frame_result(result_data)
224
+ else:
225
+ # Show validation errors
226
+ render_validation_errors(
227
+ video_file, prompt, config["api_token"],
228
+ config["model_type"], local_models_available, config["selected_model"]
229
+ )
230
+
231
+ # Render instructions
232
+ render_instructions()
233
+
234
+
235
+ if __name__ == "__main__":
236
+ main()
model_processing.py ADDED
@@ -0,0 +1,97 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Model processing utilities for local and remote AI models
4
+ """
5
+ import requests
6
+ import base64
7
+ from io import BytesIO
8
+ from PIL import Image
9
+ from typing import Dict, Any, Optional
10
+
11
+
12
+ def image_to_base64(image: Image.Image) -> str:
13
+ """Convert PIL image to base64 string"""
14
+ buffer = BytesIO()
15
+ image.save(buffer, format="PNG")
16
+ img_str = base64.b64encode(buffer.getvalue()).decode()
17
+ return img_str
18
+
19
+
20
+ def process_image_locally(image: Image.Image, prompt: str, model_name: str, local_manager) -> Dict[str, Any]:
21
+ """
22
+ Process image using local models
23
+ """
24
+ try:
25
+ if model_name == "Person on Track Detector":
26
+ # Special handling for person-on-track detection
27
+ result = local_manager.person_on_track_detector.detect_person_on_track(image)
28
+ return {"person_on_track_detection": result}
29
+ else:
30
+ caption = local_manager.generate_caption(model_name, image, prompt)
31
+ return {"generated_text": caption}
32
+ except Exception as e:
33
+ return {"error": f"Local processing failed: {str(e)}"}
34
+
35
+
36
+ def query_huggingface_api(image: Image.Image, prompt: str, model_name: str, api_token: str) -> Dict[str, Any]:
37
+ """
38
+ Query Hugging Face API with image and prompt
39
+ """
40
+ API_URL = f"https://api-inference.huggingface.co/models/{model_name}"
41
+ headers = {"Authorization": f"Bearer {api_token}"}
42
+
43
+ # Convert image to base64
44
+ img_base64 = image_to_base64(image)
45
+
46
+ # Prepare payload based on model type
47
+ if "blip" in model_name.lower():
48
+ # For BLIP models, send image directly
49
+ buffer = BytesIO()
50
+ image.save(buffer, format="PNG")
51
+ response = requests.post(
52
+ API_URL,
53
+ headers=headers,
54
+ files={"file": buffer.getvalue()}
55
+ )
56
+ else:
57
+ # For other vision-language models
58
+ payload = {
59
+ "inputs": {
60
+ "image": img_base64,
61
+ "text": prompt
62
+ }
63
+ }
64
+ response = requests.post(API_URL, headers=headers, json=payload)
65
+
66
+ if response.status_code == 200:
67
+ return response.json()
68
+ else:
69
+ return {"error": f"API request failed: {response.status_code} - {response.text}"}
70
+
71
+
72
+ def process_frame(frame_data: Dict, config: Dict[str, Any], local_manager=None) -> Dict[str, Any]:
73
+ """
74
+ Process a single frame using the configured model
75
+ """
76
+ model_type = config["model_type"]
77
+ selected_model = config["selected_model"]
78
+ prompt = config.get("prompt", "")
79
+ api_token = config.get("api_token")
80
+
81
+ # Process frame based on model type
82
+ if model_type == "Local Models" and local_manager:
83
+ result = process_image_locally(
84
+ frame_data['frame'],
85
+ prompt,
86
+ selected_model,
87
+ local_manager
88
+ )
89
+ else:
90
+ result = query_huggingface_api(
91
+ frame_data['frame'],
92
+ prompt,
93
+ selected_model,
94
+ api_token
95
+ )
96
+
97
+ return result
ontology_integration.py ADDED
@@ -0,0 +1,144 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Ontology integration module for scene analysis and risk assessment
4
+ """
5
+ from ontology_eval import Observation, evaluate, Severity
6
+ from typing import Dict, Any, Optional
7
+
8
+
9
+ def analyze_scene_with_ontology(scene_description: str, use_ontology: bool = True) -> Dict[str, Any]:
10
+ """
11
+ Analyze scene description using ontology-based evaluation
12
+ Returns classification and explanation
13
+ """
14
+ if not use_ontology:
15
+ return {
16
+ "severity": "NONE",
17
+ "severity_icon": "✅",
18
+ "score": 0,
19
+ "explanation": "Ontology-based analysis skipped",
20
+ "ontology_used": False,
21
+ "raw_description": scene_description
22
+ }
23
+
24
+ # Extract relevant information from scene description for ontology
25
+ scene_lower = scene_description.lower().strip() if scene_description else ""
26
+
27
+ # Initialize observation based on scene analysis
28
+ obs = _extract_ontology_features(scene_lower)
29
+
30
+ # Evaluate using ontology
31
+ decision = evaluate(obs)
32
+
33
+ # Map severity to icons and colors
34
+ severity_mapping = {
35
+ Severity.NONE: {"icon": "✅", "color": "green"},
36
+ Severity.LOW: {"icon": "🟢", "color": "lightgreen"},
37
+ Severity.MEDIUM: {"icon": "🟠", "color": "orange"},
38
+ Severity.HIGH: {"icon": "⚠️", "color": "red"},
39
+ Severity.CRITICAL: {"icon": "🚨", "color": "darkred"}
40
+ }
41
+
42
+ severity_info = severity_mapping[decision.severity]
43
+
44
+ return {
45
+ "severity": decision.severity.name,
46
+ "severity_icon": severity_info["icon"],
47
+ "severity_color": severity_info["color"],
48
+ "score": decision.score_0_100,
49
+ "labels": [label.value for label in decision.labels],
50
+ "explanations": decision.explanations,
51
+ "fired_rules": decision.fired_rules,
52
+ "ontology_used": True,
53
+ "raw_description": scene_description,
54
+ "observation": obs,
55
+ "decision": decision
56
+ }
57
+
58
+
59
+ def _extract_ontology_features(scene_lower: str) -> Observation:
60
+ """
61
+ Extract ontology-relevant features from scene description
62
+ """
63
+ # Initialize observation
64
+ obs = Observation()
65
+
66
+ # Define keyword categories
67
+ person_words = ['person', 'people', 'man', 'woman', 'boy', 'girl', 'human', 'individual', 'someone']
68
+ track_words = ['track', 'tracks', 'rail', 'rails', 'railway', 'railroad']
69
+ platform_words = ['platform', 'station', 'bahnsteig']
70
+ danger_words = ['fallen', 'lying', 'down', 'accident', 'emergency']
71
+ fire_words = ['fire', 'smoke', 'flames', 'burning']
72
+ crowd_words = ['crowd', 'many people', 'group', 'mehrere personen']
73
+ safe_words = ['no people', 'empty', 'clear', 'safe', 'nobody', 'without people']
74
+
75
+ # Count keyword mentions
76
+ person_mentions = sum(1 for word in person_words if word in scene_lower)
77
+ track_mentions = sum(1 for word in track_words if word in scene_lower)
78
+ platform_mentions = sum(1 for word in platform_words if word in scene_lower)
79
+ danger_mentions = sum(1 for word in danger_words if word in scene_lower)
80
+ fire_mentions = sum(1 for word in fire_words if word in scene_lower)
81
+ crowd_mentions = sum(1 for word in crowd_words if word in scene_lower)
82
+ safe_mentions = sum(1 for word in safe_words if word in scene_lower)
83
+
84
+ # Person on track detection (but not if explicitly safe)
85
+ if person_mentions > 0 and track_mentions > 0 and safe_mentions == 0:
86
+ obs.on_track_person = _calculate_person_on_track_confidence(scene_lower, person_mentions, track_mentions)
87
+
88
+ # Fallen person detection
89
+ if person_mentions > 0 and danger_mentions > 0:
90
+ obs.fallen_person = min(0.7, 0.4 + danger_mentions * 0.1)
91
+
92
+ # Fire/smoke detection
93
+ if fire_mentions > 0:
94
+ obs.smoke_or_fire = min(0.8, 0.5 + fire_mentions * 0.15)
95
+
96
+ # Crowd detection
97
+ if crowd_mentions > 0 and (track_mentions > 0 or platform_mentions > 0):
98
+ obs.crowd_on_track = min(0.7, 0.4 + crowd_mentions * 0.1)
99
+
100
+ # Generic object detection (if no person but something mentioned on tracks)
101
+ if track_mentions > 0 and person_mentions == 0 and any(word in scene_lower for word in ['object', 'item', 'thing', 'debris']):
102
+ obs.object_on_track = 0.6
103
+
104
+ return obs
105
+
106
+
107
+ def _calculate_person_on_track_confidence(scene_lower: str, person_mentions: int, track_mentions: int) -> float:
108
+ """
109
+ Calculate confidence for person on track detection based on specific indicators
110
+ """
111
+ # Check for specific on-track indicators
112
+ on_track_indicators = ['on track', 'on the track', 'on rails', 'on the rails', 'standing on', 'walking on']
113
+ on_track_specific = sum(1 for phrase in on_track_indicators if phrase in scene_lower)
114
+
115
+ if on_track_specific > 0:
116
+ return min(0.8, 0.6 + on_track_specific * 0.1)
117
+ else:
118
+ # Check for proximity indicators
119
+ near_indicators = ['near', 'close to', 'next to', 'beside', 'by the']
120
+ near_mentions = sum(1 for phrase in near_indicators if phrase in scene_lower)
121
+
122
+ if near_mentions > 0:
123
+ # Person near tracks but not necessarily on them - lower confidence
124
+ return min(0.4, 0.25 + near_mentions * 0.05)
125
+ else:
126
+ # Just mention of person and tracks together - very low confidence
127
+ return min(0.3, 0.2 + (person_mentions + track_mentions) * 0.02)
128
+
129
+
130
+ def extract_scene_description(result: Dict[str, Any]) -> str:
131
+ """
132
+ Extract scene description from various model result formats
133
+ """
134
+ scene_description = ""
135
+
136
+ if 'person_on_track_detection' in result:
137
+ # For person detection results, use the analysis text
138
+ scene_description = result['person_on_track_detection'].get('detailed_analysis', {}).get('scene_description', '')
139
+ elif 'generated_text' in result:
140
+ scene_description = result['generated_text']
141
+ elif isinstance(result, list) and len(result) > 0 and 'generated_text' in result[0]:
142
+ scene_description = result[0]['generated_text']
143
+
144
+ return scene_description
ui_components.py ADDED
@@ -0,0 +1,320 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ UI components for the Streamlit application
4
+ """
5
+ import streamlit as st
6
+ from typing import Dict, List, Any, Optional
7
+ from local_models import get_local_model_manager
8
+
9
+
10
+ # Available Hugging Face models for remote API
11
+ AVAILABLE_MODELS = {
12
+ "microsoft/kosmos-2-patch14-224": "Kosmos-2",
13
+ "Salesforce/blip-image-captioning-large": "BLIP Image Captioning",
14
+ "microsoft/DialoGPT-medium": "DialoGPT",
15
+ "microsoft/git-large-coco": "GIT Large COCO",
16
+ "nlpconnect/vit-gpt2-image-captioning": "ViT-GPT2"
17
+ }
18
+
19
+
20
+ def render_sidebar_config(settings: Dict, local_models_available: bool, local_manager: Optional[Any]) -> Dict[str, Any]:
21
+ """
22
+ Render the sidebar configuration panel
23
+ Returns configuration settings
24
+ """
25
+ with st.sidebar:
26
+ st.header("Configuration")
27
+
28
+ # Model type selection
29
+ available_options = []
30
+ if local_models_available:
31
+ available_options.append("Local Models")
32
+ available_options.append("Remote API")
33
+
34
+ model_type = st.radio(
35
+ "Model Type",
36
+ available_options,
37
+ help="Choose between local AI models or remote Hugging Face API"
38
+ )
39
+
40
+ # Model selection based on type
41
+ if model_type == "Local Models" and local_models_available:
42
+ selected_model, api_token = _render_local_model_config(local_manager)
43
+ else:
44
+ selected_model, api_token = _render_remote_model_config(settings)
45
+
46
+ # Frame extraction rate
47
+ fps = st.slider(
48
+ "Frames per second to extract",
49
+ min_value=0.1,
50
+ max_value=5.0,
51
+ value=1.0,
52
+ step=0.1
53
+ )
54
+
55
+ # Ontology settings
56
+ st.subheader("Ontology Analysis")
57
+ use_ontology = st.checkbox(
58
+ "Enable Ontology Analysis",
59
+ value=True,
60
+ help="Use ontology-based classification (NONE/LOW/MEDIUM/HIGH/CRITICAL)"
61
+ )
62
+
63
+ if not use_ontology:
64
+ st.info("🔄 Ontology analysis disabled - showing raw model output only")
65
+
66
+ return {
67
+ "model_type": model_type,
68
+ "selected_model": selected_model,
69
+ "api_token": api_token,
70
+ "fps": fps,
71
+ "use_ontology": use_ontology
72
+ }
73
+
74
+
75
+ def _render_local_model_config(local_manager) -> tuple:
76
+ """Render local model configuration"""
77
+ available_local_models = local_manager.get_available_models()
78
+ selected_model = st.selectbox(
79
+ "Select Local Model",
80
+ options=available_local_models,
81
+ help="Choose between CNN (fast) or Transformer (detailed) models"
82
+ )
83
+
84
+ # Show model info
85
+ model_info = local_manager.get_model_info()
86
+ if selected_model in model_info:
87
+ with st.expander("Model Information"):
88
+ st.write(f"**Description:** {model_info[selected_model]['description']}")
89
+ st.write(f"**Strengths:** {model_info[selected_model]['strengths']}")
90
+ st.write(f"**Size:** {model_info[selected_model]['size']}")
91
+
92
+ return selected_model, None # No API token needed for local models
93
+
94
+
95
+ def _render_remote_model_config(settings: Dict) -> tuple:
96
+ """Render remote API model configuration"""
97
+ default_token = settings.get('hugging_face_api_token', '')
98
+ api_token = st.text_input(
99
+ "Hugging Face API Token",
100
+ value=default_token,
101
+ type="password",
102
+ help="Get your token from https://huggingface.co/settings/tokens or save in settings.json"
103
+ )
104
+
105
+ selected_model = st.selectbox(
106
+ "Select Model",
107
+ options=list(AVAILABLE_MODELS.keys()),
108
+ format_func=lambda x: AVAILABLE_MODELS[x]
109
+ )
110
+
111
+ return selected_model, api_token
112
+
113
+
114
+ def render_input_section() -> Dict[str, Any]:
115
+ """
116
+ Render the input section for video upload and prompts
117
+ Returns input data
118
+ """
119
+ st.header("Input")
120
+
121
+ # Video upload
122
+ video_file = st.file_uploader(
123
+ "Upload Video",
124
+ type=['mp4', 'avi', 'mov', 'mkv'],
125
+ help="Upload a video file to analyze"
126
+ )
127
+
128
+ return {
129
+ "video_file": video_file
130
+ }
131
+
132
+
133
+ def render_prompt_section(config: Dict[str, Any]) -> str:
134
+ """
135
+ Render prompt input section based on model configuration
136
+ """
137
+ model_type = config["model_type"]
138
+ selected_model = config["selected_model"]
139
+
140
+ # Prompt input (conditional based on model)
141
+ if (model_type == "Local Models" and
142
+ selected_model == "Person on Track Detector"):
143
+ # Person on Track Detector works automatically
144
+ st.info("🤖 Person on Track Detector works automatically - no prompt needed!")
145
+ return "automatic"
146
+ else:
147
+ # Regular models need user prompt
148
+ return st.text_area(
149
+ "Analysis Prompt",
150
+ placeholder="Describe what you see in the image...",
151
+ help="Enter the prompt to analyze each frame"
152
+ )
153
+
154
+
155
+ def render_process_button() -> bool:
156
+ """Render the process button"""
157
+ return st.button("Process Video", type="primary")
158
+
159
+
160
+ def render_results_header():
161
+ """Render the results section header"""
162
+ st.header("Results")
163
+ return st.container()
164
+
165
+
166
+ def render_frame_result(result_data: Dict[str, Any]):
167
+ """
168
+ Render a single frame result with ontology analysis
169
+ """
170
+ ontology = result_data['ontology_analysis']
171
+ severity_icon = ontology.get('severity_icon', '✅')
172
+ severity = ontology.get('severity', 'NONE')
173
+
174
+ # Create expander title with severity indicator
175
+ expander_title = f"{severity_icon} {severity} - Frame {result_data['frame_number']} (t={result_data['timestamp']:.1f}s)"
176
+
177
+ with st.expander(expander_title):
178
+ col_img, col_text = st.columns([1, 2])
179
+
180
+ with col_img:
181
+ st.image(
182
+ result_data['image'],
183
+ caption=f"Frame {result_data['frame_number']}",
184
+ use_container_width=True
185
+ )
186
+
187
+ with col_text:
188
+ # Display ontology analysis first if enabled
189
+ if ontology.get('ontology_used', False):
190
+ _render_ontology_analysis(ontology)
191
+ st.divider()
192
+
193
+ # Display original model results
194
+ _render_model_output(result_data['result'])
195
+
196
+
197
+ def _render_ontology_analysis(ontology: Dict[str, Any]):
198
+ """Render ontology analysis section"""
199
+ severity = ontology.get('severity', 'NONE')
200
+ severity_icon = ontology.get('severity_icon', '✅')
201
+ severity_color = ontology.get('severity_color', 'green')
202
+
203
+ # Severity display with color
204
+ st.markdown(f"**Safety Assessment:** :{severity_color}[{severity_icon} {severity}]")
205
+
206
+ # Score display
207
+ if ontology.get('score', 0) > 0:
208
+ st.metric("Risk Score", f"{ontology['score']}/100")
209
+
210
+ # Show explanations if available
211
+ if ontology.get('explanations'):
212
+ st.write("**Ontology Analysis:**")
213
+ for explanation in ontology['explanations']:
214
+ st.write(f"• {explanation}")
215
+
216
+ # Show fired rules if available
217
+ if ontology.get('fired_rules'):
218
+ with st.expander("Technical Details"):
219
+ st.write("**Triggered Rules:**")
220
+ for rule in ontology['fired_rules']:
221
+ st.code(rule)
222
+
223
+ if ontology.get('labels'):
224
+ st.write("**Detected Hazard Labels:**")
225
+ for label in ontology['labels']:
226
+ st.code(label)
227
+
228
+
229
+ def _render_model_output(result: Dict[str, Any]):
230
+ """Render original model output section"""
231
+ st.write("**Model Output:**")
232
+
233
+ if 'error' in result:
234
+ st.error(f"Error: {result['error']}")
235
+ elif 'person_on_track_detection' in result:
236
+ _render_person_detection_result(result['person_on_track_detection'])
237
+ else:
238
+ _render_general_model_result(result)
239
+
240
+
241
+ def _render_person_detection_result(detection: Dict[str, Any]):
242
+ """Render person on track detection specific results"""
243
+ people_count = detection.get('people_count', 0)
244
+ confidence = detection.get('confidence', 0)
245
+ analysis = detection.get('analysis', 'No analysis')
246
+
247
+ st.write(f"**Detection Analysis:** {analysis}")
248
+
249
+ # Show metrics
250
+ col1, col2 = st.columns(2)
251
+ with col1:
252
+ st.metric("👥 People Detected", people_count)
253
+ with col2:
254
+ st.metric("📊 Model Confidence", f"{confidence:.0%}")
255
+
256
+
257
+ def _render_general_model_result(result: Dict[str, Any]):
258
+ """Render general model results (captioning, etc.)"""
259
+ if 'generated_text' in result:
260
+ st.write(f"*{result['generated_text']}*")
261
+ elif isinstance(result, list) and len(result) > 0:
262
+ if 'generated_text' in result[0]:
263
+ st.write(f"*{result[0]['generated_text']}*")
264
+ else:
265
+ st.json(result[0])
266
+ else:
267
+ st.json(result)
268
+
269
+
270
+ def render_validation_errors(video_file, prompt, api_token, model_type, local_models_available, selected_model):
271
+ """
272
+ Render validation error messages
273
+ """
274
+ if not video_file:
275
+ st.error("Please upload a video file")
276
+ if not prompt and not (model_type == "Local Models" and selected_model == "Person on Track Detector"):
277
+ st.error("Please enter an analysis prompt")
278
+ if not api_token and model_type == "Remote API":
279
+ st.error("Please provide your Hugging Face API token for remote models")
280
+ if model_type == "Local Models" and not local_models_available:
281
+ st.error("Local models failed to initialize. Check your installation.")
282
+
283
+
284
+ def render_instructions():
285
+ """Render the instructions section"""
286
+ with st.expander("How to use"):
287
+ st.markdown("""
288
+ ## Local AI Models (Recommended)
289
+ 1. **Upload a video**: Choose a video file (MP4, AVI, MOV, or MKV)
290
+ 2. **Select model type**: Choose "Local Models" for offline processing
291
+ 3. **Choose AI model**:
292
+ - **CNN (BLIP)**: Fast, good for object detection (~1.2GB)
293
+ - **Transformer (ViT-GPT2)**: Detailed descriptions (~1.8GB)
294
+ 4. **Enter a prompt**: Describe what you want the AI to analyze
295
+ 5. **Enable/Disable Ontology**: Toggle ontology-based risk assessment
296
+ 6. **Adjust frame rate**: Set frames per second to extract (default: 1 fps)
297
+ 7. **Click Process**: Frames are processed locally on your machine
298
+
299
+ ## Ontology Analysis
300
+ - **✅ NONE**: No safety concerns detected
301
+ - **🟢 LOW**: Minor safety considerations
302
+ - **🟠 MEDIUM**: Moderate safety risk
303
+ - **⚠️ HIGH**: Significant safety risk
304
+ - **🚨 CRITICAL**: Immediate safety hazard
305
+
306
+ ## Remote API Models (Optional)
307
+ 1. **Get API token**: Visit [Hugging Face Settings](https://huggingface.co/settings/tokens)
308
+ 2. **Select "Remote API"** in model type
309
+ 3. **Enter token** and select remote model
310
+
311
+ ## Video Support Features
312
+ - **Automatic corruption repair**: Handles videos with corrupted moov atoms
313
+ - **FFmpeg integration**: Auto-repairs problematic video files
314
+ - **Multiple formats**: MP4, AVI, MOV, MKV support
315
+
316
+ ## Requirements
317
+ - **Python packages**: torch, transformers, accelerate (see requirements.txt)
318
+ - **Optional**: FFmpeg for video repair (download from https://ffmpeg.org)
319
+ - **Storage**: ~3GB for both local models
320
+ """)
video_processing.py ADDED
@@ -0,0 +1,112 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Video processing utilities for frame extraction and repair
4
+ """
5
+ import cv2
6
+ import os
7
+ import tempfile
8
+ import subprocess
9
+ import streamlit as st
10
+ from PIL import Image
11
+ from typing import List, Dict
12
+
13
+
14
+ def repair_video_with_ffmpeg(input_path: str, output_path: str) -> bool:
15
+ """
16
+ Repair corrupted video by moving moov atom to the beginning
17
+ """
18
+ try:
19
+ # Try to fix the video using FFmpeg
20
+ cmd = [
21
+ 'ffmpeg',
22
+ '-i', input_path,
23
+ '-c', 'copy',
24
+ '-movflags', 'faststart',
25
+ '-avoid_negative_ts', 'make_zero',
26
+ '-y', # Overwrite output file
27
+ output_path
28
+ ]
29
+
30
+ result = subprocess.run(
31
+ cmd,
32
+ capture_output=True,
33
+ text=True,
34
+ timeout=300 # 5 minute timeout
35
+ )
36
+
37
+ return result.returncode == 0
38
+ except (subprocess.TimeoutExpired, FileNotFoundError):
39
+ return False
40
+
41
+
42
+ def extract_frames_from_video(video_file, fps: float = 1) -> List[Dict]:
43
+ """
44
+ Extract frames from video at specified FPS (default 1 frame per second)
45
+ Automatically handles corrupted videos by attempting repair with FFmpeg
46
+ """
47
+ frames = []
48
+
49
+ with tempfile.NamedTemporaryFile(delete=False, suffix='.mp4') as tmp_file:
50
+ tmp_file.write(video_file.read())
51
+ tmp_file_path = tmp_file.name
52
+
53
+ repaired_path = None
54
+
55
+ try:
56
+ # First attempt: try to open video directly
57
+ cap = cv2.VideoCapture(tmp_file_path)
58
+
59
+ # Check if video opened successfully and has frames
60
+ if not cap.isOpened() or cap.get(cv2.CAP_PROP_FRAME_COUNT) == 0:
61
+ cap.release()
62
+
63
+ # Second attempt: try to repair the video with FFmpeg
64
+ st.warning("Video appears corrupted (moov atom issue). Attempting repair...")
65
+
66
+ with tempfile.NamedTemporaryFile(delete=False, suffix='_repaired.mp4') as repaired_file:
67
+ repaired_path = repaired_file.name
68
+
69
+ if repair_video_with_ffmpeg(tmp_file_path, repaired_path):
70
+ st.success("Video repair successful! Processing frames...")
71
+ cap = cv2.VideoCapture(repaired_path)
72
+ else:
73
+ st.error("Failed to repair video. FFmpeg may not be installed or video is severely corrupted.")
74
+ return frames
75
+
76
+ # Extract video properties
77
+ video_fps = cap.get(cv2.CAP_PROP_FPS)
78
+ if video_fps <= 0:
79
+ video_fps = 30 # Default fallback FPS
80
+
81
+ frame_interval = int(video_fps / fps) if video_fps > fps else 1
82
+
83
+ frame_count = 0
84
+ extracted_count = 0
85
+
86
+ while True:
87
+ ret, frame = cap.read()
88
+ if not ret:
89
+ break
90
+
91
+ if frame_count % frame_interval == 0:
92
+ frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
93
+ pil_image = Image.fromarray(frame_rgb)
94
+ frames.append({
95
+ 'frame': pil_image,
96
+ 'timestamp': frame_count / video_fps,
97
+ 'frame_number': extracted_count
98
+ })
99
+ extracted_count += 1
100
+
101
+ frame_count += 1
102
+
103
+ cap.release()
104
+
105
+ finally:
106
+ # Clean up temporary files
107
+ if os.path.exists(tmp_file_path):
108
+ os.unlink(tmp_file_path)
109
+ if repaired_path and os.path.exists(repaired_path):
110
+ os.unlink(repaired_path)
111
+
112
+ return frames