Spaces:

Migjomatic
/

bahngleis-detektor

Running

App Files Files Community

dxfoso commited on Sep 9

Commit

d33203e

1 Parent(s): 2aab908

refactor & connect ontology

Browse files

Files changed (8) hide show

REFACTORED_STRUCTURE.md +110 -0
app.py +166 -409
app_original_backup.py +640 -0
app_refactored.py +236 -0
model_processing.py +97 -0
ontology_integration.py +144 -0
ui_components.py +320 -0
video_processing.py +112 -0

REFACTORED_STRUCTURE.md ADDED Viewed

	@@ -0,0 +1,110 @@

+# 📁 Refactored Code Structure
+The application has been refactored into modular components for better maintainability and understanding.
+## 🗂️ File Structure
+```
+📦 Bahngleiserfassung/
+├── 🎯 app.py                       # Main Streamlit application (refactored)
+├── 📹 video_processing.py          # Video frame extraction and repair utilities
+├── 🧠 ontology_integration.py      # Ontology-based scene analysis and risk assessment
+├── 🤖 model_processing.py          # Local and remote AI model processing
+├── 🖥️ ui_components.py             # Streamlit UI components and rendering
+├── 🧮 ontology_eval.py             # Core ontology evaluation logic (unchanged)
+├── 🔬 local_models.py              # Local AI models (ViT, BLIP) (unchanged)
+└── 💾 app_original_backup.py       # Backup of original monolithic app.py
+```
+## 📋 Module Responsibilities
+### 🎯 `app.py` - Main Application
+- **Purpose**: Main entry point and orchestration
+- **Functions**:
+  - Application initialization and layout
+  - Model setup and configuration
+  - Main processing workflow coordination
+  - Input validation and error handling
+### 📹 `video_processing.py` - Video Processing
+- **Purpose**: Video frame extraction and repair
+- **Functions**:
+  - `extract_frames_from_video()` - Extract frames at specified FPS
+  - `repair_video_with_ffmpeg()` - Repair corrupted video files
+  - Handle various video formats (MP4, AVI, MOV, MKV)
+### 🧠 `ontology_integration.py` - Ontology Analysis
+- **Purpose**: Scene analysis using ontology-based risk assessment
+- **Functions**:
+  - `analyze_scene_with_ontology()` - Main ontology analysis function
+  - `_extract_ontology_features()` - Extract features from scene descriptions
+  - `_calculate_person_on_track_confidence()` - Calculate specific risk confidence
+  - `extract_scene_description()` - Extract text from model results
+### 🤖 `model_processing.py` - Model Processing
+- **Purpose**: Handle local and remote AI model processing
+- **Functions**:
+  - `process_image_locally()` - Process images using local models
+  - `query_huggingface_api()` - Process images using remote HF API
+  - `process_frame()` - Unified frame processing interface
+  - `image_to_base64()` - Image conversion utilities
+### 🖥️ `ui_components.py` - UI Components
+- **Purpose**: Streamlit UI components and rendering
+- **Functions**:
+  - `render_sidebar_config()` - Configuration sidebar
+  - `render_input_section()` - Video upload interface
+  - `render_frame_result()` - Display frame analysis results
+  - `render_validation_errors()` - Show validation messages
+  - Various helper rendering functions
+## 🔄 Data Flow
+```mermaid
+graph TD
+    A[app.py] --> B[ui_components.py]
+    A --> C[video_processing.py]
+    A --> D[model_processing.py]
+    A --> E[ontology_integration.py]
+    C --> F[Extract Frames]
+    D --> G[Process with AI Models]
+    E --> H[Ontology Risk Assessment]
+    F --> G
+    G --> H
+    H --> B
+    I[local_models.py] --> D
+    J[ontology_eval.py] --> E
+```
+## ✨ Benefits of Refactoring
+1. **🧩 Modularity**: Each module has a single responsibility
+2. **🔧 Maintainability**: Easier to update and debug individual components
+3. **📚 Readability**: Clear separation of concerns and smaller, focused files
+4. **🧪 Testability**: Each module can be tested independently
+5. **🔄 Reusability**: Components can be reused in other projects
+6. **👥 Collaboration**: Multiple developers can work on different modules
+## 🚀 Usage
+The refactored application works exactly the same as before:
+```bash
+streamlit run app.py
+```
+All functionality remains identical:
+- ✅ NONE / 🟢 LOW / 🟠 MEDIUM / ⚠️ HIGH / 🚨 CRITICAL classification
+- Toggle ontology analysis on/off
+- Support for local and remote AI models
+- Video processing with automatic repair
+## 🔒 Backwards Compatibility
+- Original functionality is preserved
+- API and interface remain unchanged
+- Configuration and settings work the same way
+- The original monolithic code is backed up as `app_original_backup.py`

app.py CHANGED Viewed

@@ -1,15 +1,27 @@
 import streamlit as st
-import cv2
-import os
-import tempfile
-import requests
-import base64
-import subprocess
 import json
-from io import BytesIO
-from PIL import Image
-import numpy as np
 from dotenv import load_dotenv
 # Try to import local models, fall back gracefully if not available
 try:
     from local_models import get_local_model_manager
@@ -23,6 +35,7 @@ except ImportError as e:
 # Load environment variables
 load_dotenv()
 def load_settings():
     """Load settings from JSON file"""
     try:
@@ -31,199 +44,31 @@ def load_settings():
     except FileNotFoundError:
         return {}
-# Local models configuration
-LOCAL_MODELS_ENABLED = LOCAL_MODELS_AVAILABLE
-REMOTE_MODELS_ENABLED = True  # Always allow remote API as fallback
-# Initialize local model manager
 @st.cache_resource
 def initialize_local_models():
     """Initialize local model manager"""
     return get_local_model_manager()
-# Hugging Face models for vision-language tasks (kept for compatibility)
-AVAILABLE_MODELS = {
-    "microsoft/kosmos-2-patch14-224": "Kosmos-2",
-    "Salesforce/blip-image-captioning-large": "BLIP Image Captioning",
-    "microsoft/DialoGPT-medium": "DialoGPT",
-    "microsoft/git-large-coco": "GIT Large COCO",
-    "nlpconnect/vit-gpt2-image-captioning": "ViT-GPT2"
-}
-def repair_video_with_ffmpeg(input_path, output_path):
-    """
-    Repair corrupted video by moving moov atom to the beginning
-    """
-    try:
-        # Try to fix the video using FFmpeg
-        cmd = [
-            'ffmpeg',
-            '-i', input_path,
-            '-c', 'copy',
-            '-movflags', 'faststart',
-            '-avoid_negative_ts', 'make_zero',
-            '-y',  # Overwrite output file
-            output_path
-        ]
-        result = subprocess.run(
-            cmd,
-            capture_output=True,
-            text=True,
-            timeout=300  # 5 minute timeout
-        )
-        return result.returncode == 0
-    except (subprocess.TimeoutExpired, FileNotFoundError):
-        return False
-def extract_frames_from_video(video_file, fps=1):
-    """
-    Extract frames from video at specified FPS (default 1 frame per second)
-    Automatically handles corrupted videos by attempting repair with FFmpeg
-    """
-    frames = []
-    with tempfile.NamedTemporaryFile(delete=False, suffix='.mp4') as tmp_file:
-        tmp_file.write(video_file.read())
-        tmp_file_path = tmp_file.name
-    repaired_path = None
-    try:
-        # First attempt: try to open video directly
-        cap = cv2.VideoCapture(tmp_file_path)
-        # Check if video opened successfully and has frames
-        if not cap.isOpened() or cap.get(cv2.CAP_PROP_FRAME_COUNT) == 0:
-            cap.release()
-            # Second attempt: try to repair the video with FFmpeg
-            st.warning("Video appears corrupted (moov atom issue). Attempting repair...")
-            with tempfile.NamedTemporaryFile(delete=False, suffix='_repaired.mp4') as repaired_file:
-                repaired_path = repaired_file.name
-            if repair_video_with_ffmpeg(tmp_file_path, repaired_path):
-                st.success("Video repair successful! Processing frames...")
-                cap = cv2.VideoCapture(repaired_path)
-            else:
-                st.error("Failed to repair video. FFmpeg may not be installed or video is severely corrupted.")
-                return frames
-        # Extract video properties
-        video_fps = cap.get(cv2.CAP_PROP_FPS)
-        if video_fps <= 0:
-            video_fps = 30  # Default fallback FPS
-        frame_interval = int(video_fps / fps) if video_fps > fps else 1
-        frame_count = 0
-        extracted_count = 0
-        while True:
-            ret, frame = cap.read()
-            if not ret:
-                break
-            if frame_count % frame_interval == 0:
-                frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
-                pil_image = Image.fromarray(frame_rgb)
-                frames.append({
-                    'frame': pil_image,
-                    'timestamp': frame_count / video_fps,
-                    'frame_number': extracted_count
-                })
-                extracted_count += 1
-            frame_count += 1
-        cap.release()
-    finally:
-        # Clean up temporary files
-        if os.path.exists(tmp_file_path):
-            os.unlink(tmp_file_path)
-        if repaired_path and os.path.exists(repaired_path):
-            os.unlink(repaired_path)
-    return frames
-def image_to_base64(image):
-    """Convert PIL image to base64 string"""
-    buffer = BytesIO()
-    image.save(buffer, format="PNG")
-    img_str = base64.b64encode(buffer.getvalue()).decode()
-    return img_str
-def process_image_locally(image, prompt, model_name, local_manager):
-    """
-    Process image using local models
-    """
-    try:
-        if model_name == "Person on Track Detector":
-            # Special handling for person-on-track detection
-            result = local_manager.person_on_track_detector.detect_person_on_track(image)
-            return {"person_on_track_detection": result}
-        else:
-            caption = local_manager.generate_caption(model_name, image, prompt)
-            return {"generated_text": caption}
-    except Exception as e:
-        return {"error": f"Local processing failed: {str(e)}"}
-def query_huggingface_api(image, prompt, model_name, api_token):
-    """
-    Query Hugging Face API with image and prompt
-    """
-    API_URL = f"https://api-inference.huggingface.co/models/{model_name}"
-    headers = {"Authorization": f"Bearer {api_token}"}
-    # Convert image to base64
-    img_base64 = image_to_base64(image)
-    # Prepare payload based on model type
-    if "blip" in model_name.lower():
-        # For BLIP models, send image directly
-        buffer = BytesIO()
-        image.save(buffer, format="PNG")
-        response = requests.post(
-            API_URL,
-            headers=headers,
-            files={"file": buffer.getvalue()}
-        )
-    else:
-        # For other vision-language models
-        payload = {
-            "inputs": {
-                "image": img_base64,
-                "text": prompt
-            }
-        }
-        response = requests.post(API_URL, headers=headers, json=payload)
-    if response.status_code == 200:
-        return response.json()
-    else:
-        return {"error": f"API request failed: {response.status_code} - {response.text}"}
-def main():
     st.set_page_config(
-        page_title="Video Frame Analyzer",
         page_icon="🎥",
         layout="wide"
     )
-    st.title("🎥 Video Frame Analyzer with Local AI Models")
-    st.markdown("Upload a video, provide a prompt, and analyze each frame using local AI models (CNN or Transformer)")
-    # Load settings and initialize local models
-    settings = load_settings()
-    # Initialize local models if enabled
     local_manager = None
     local_models_available = False
-    if LOCAL_MODELS_ENABLED:
         try:
             local_manager = initialize_local_models()
             local_models_available = True
@@ -235,245 +80,157 @@ def main():
     else:
         st.info("💡 Local AI models not installed. Install with: `pip install torch torchvision transformers accelerate sentencepiece`")
-    # Sidebar for configuration
-    with st.sidebar:
-        st.header("Configuration")
-        # Model type selection
-        available_options = []
-        if local_models_available:
-            available_options.append("Local Models")
-        if REMOTE_MODELS_ENABLED:
-            available_options.append("Remote API")
-        if not available_options:
-            available_options = ["Remote API"]  # Fallback
-        model_type = st.radio(
-            "Model Type",
-            available_options,
-            help="Choose between local AI models or remote Hugging Face API"
-        )
-        if model_type == "Local Models" and local_models_available:
-            # Local model selection
-            available_local_models = local_manager.get_available_models()
-            selected_model = st.selectbox(
-                "Select Local Model",
-                options=available_local_models,
-                help="Choose between CNN (fast) or Transformer (detailed) models"
-            )
-            # Show model info
-            model_info = local_manager.get_model_info()
-            if selected_model in model_info:
-                with st.expander("Model Information"):
-                    st.write(f"**Description:** {model_info[selected_model]['description']}")
-                    st.write(f"**Strengths:** {model_info[selected_model]['strengths']}")
-                    st.write(f"**Size:** {model_info[selected_model]['size']}")
-            api_token = None  # Not needed for local models
-        else:
-            # Remote API configuration
-            default_token = settings.get('hugging_face_api_token', '')
-            api_token = st.text_input(
-                "Hugging Face API Token",
-                value=default_token,
-                type="password",
-                help="Get your token from https://huggingface.co/settings/tokens or save in settings.json"
-            )
-            # Remote model selection
-            selected_model = st.selectbox(
-                "Select Model",
-                options=list(AVAILABLE_MODELS.keys()),
-                format_func=lambda x: AVAILABLE_MODELS[x]
-            )
-        # Frame extraction rate
-        fps = st.slider(
-            "Frames per second to extract",
-            min_value=0.1,
-            max_value=5.0,
-            value=1.0,
-            step=0.1
-        )
-    # Main content area
     col1, col2 = st.columns([1, 1])
     with col1:
-        st.header("Input")
-        # Video upload
-        video_file = st.file_uploader(
-            "Upload Video",
-            type=['mp4', 'avi', 'mov', 'mkv'],
-            help="Upload a video file to analyze"
-        )
-        # Prompt input (conditional based on model)
-        if model_type == "Local Models" and local_models_available and selected_model == "Person on Track Detector":
-            # Person on Track Detector works automatically
-            st.info("🤖 Person on Track Detector works automatically - no prompt needed!")
-            prompt = "automatic"  # Set automatic prompt
-        else:
-            # Regular models need user prompt
-            prompt = st.text_area(
-                "Analysis Prompt",
-                placeholder="Describe what you see in the image...",
-                help="Enter the prompt to analyze each frame"
-            )
-        # Process button
-        process_button = st.button("Process Video", type="primary")
     with col2:
-        st.header("Results")
-        results_container = st.container()
-    # Processing logic
-    if process_button and video_file and (prompt or (model_type == "Local Models" and selected_model == "Person on Track Detector")) and (api_token or model_type == "Local Models"):
-        with st.spinner("Processing video..."):
-            # Extract frames
-            frames = extract_frames_from_video(video_file, fps)
-            if not frames:
-                st.error("No frames could be extracted from the video")
-                return
-            st.success(f"Extracted {len(frames)} frames from video")
-            # Process each frame
-            results = []
-            progress_bar = st.progress(0)
-            for i, frame_data in enumerate(frames):
-                with st.spinner(f"Analyzing frame {i+1}/{len(frames)}..."):
-                    # Process frame based on model type
-                    if model_type == "Local Models" and local_models_available:
-                        result = process_image_locally(
-                            frame_data['frame'],
-                            prompt,
-                            selected_model,
-                            local_manager
-                        )
-                    else:
-                        result = query_huggingface_api(
-                            frame_data['frame'],
-                            prompt,
-                            selected_model,
-                            api_token
-                        )
-                    results.append({
-                        'frame_number': frame_data['frame_number'],
-                        'timestamp': frame_data['timestamp'],
-                        'image': frame_data['frame'],
-                        'result': result
-                    })
-                    progress_bar.progress((i + 1) / len(frames))
-            # Display results
-            with results_container:
-                st.subheader("Analysis Results")
-                for result_data in results:
-                    with st.expander(f"Frame {result_data['frame_number']} (t={result_data['timestamp']:.1f}s)"):
-                        col_img, col_text = st.columns([1, 2])
-                        with col_img:
-                            st.image(
-                                result_data['image'],
-                                caption=f"Frame {result_data['frame_number']}",
-                                use_container_width=True
-                            )
-                        with col_text:
-                            if 'error' in result_data['result']:
-                                st.error(f"Error: {result_data['result']['error']}")
-                            elif 'person_on_track_detection' in result_data['result']:
-                                # Handle person-on-track detection results
-                                detection = result_data['result']['person_on_track_detection']
-                                people_count = detection.get('people_count', 0)
-                                confidence = detection.get('confidence', 0)
-                                analysis = detection.get('analysis', 'No analysis')
-                                person_on_track = detection.get('person_on_track', False)
-                                # Display analysis with color coding
-                                if person_on_track:
-                                    st.error(f"🚨 **{analysis}**")
-                                else:
-                                    st.success(f"✅ **{analysis}**")
-                                # Show metrics
-                                col1, col2 = st.columns(2)
-                                with col1:
-                                    st.metric("👥 People on Track", people_count)
-                                with col2:
-                                    st.metric("📊 Confidence", f"{confidence:.0%}")
-                            else:
-                                st.write("**Analysis Result:**")
-                                if 'generated_text' in result_data['result']:
-                                    # Handle direct generated_text response (local models)
-                                    st.write(result_data['result']['generated_text'])
-                                elif isinstance(result_data['result'], list) and len(result_data['result']) > 0:
-                                    # Handle list responses (common for captioning models)
-                                    if 'generated_text' in result_data['result'][0]:
-                                        st.write(result_data['result'][0]['generated_text'])
-                                    else:
-                                        st.json(result_data['result'][0])
-                                else:
-                                    st.json(result_data['result'])
-    elif process_button:
-        if not video_file:
-            st.error("Please upload a video file")
-        if not prompt and not (model_type == "Local Models" and selected_model == "Person on Track Detector"):
-            st.error("Please enter an analysis prompt")
-        if not api_token and model_type == "Remote API":
-            st.error("Please provide your Hugging Face API token for remote models")
-        if model_type == "Local Models" and not local_models_available:
-            st.error("Local models failed to initialize. Check your installation.")
-    # Instructions
-    with st.expander("How to use"):
-        st.markdown("""
-        ## Local AI Models (Recommended)
-        1. **Upload a video**: Choose a video file (MP4, AVI, MOV, or MKV)
-        2. **Select model type**: Choose "Local Models" for offline processing
-        3. **Choose AI model**:
-           - **CNN (BLIP)**: Fast, good for object detection (~1.2GB)
-           - **Transformer (ViT-GPT2)**: Detailed descriptions (~1.8GB)
-        4. **Enter a prompt**: Describe what you want the AI to analyze
-        5. **Adjust frame rate**: Set frames per second to extract (default: 1 fps)
-        6. **Click Process**: Frames are processed locally on your machine
-        ## Remote API Models (Optional)
-        1. **Get API token**: Visit [Hugging Face Settings](https://huggingface.co/settings/tokens)
-        2. **Select "Remote API"** in model type
-        3. **Enter token** and select remote model
-        ## Video Support Features
-        - **Automatic corruption repair**: Handles videos with corrupted moov atoms
-        - **FFmpeg integration**: Auto-repairs problematic video files
-        - **Multiple formats**: MP4, AVI, MOV, MKV support
-        ## Requirements
-        - **Python packages**: torch, transformers, accelerate (see requirements.txt)
-        - **Optional**: FFmpeg for video repair (download from https://ffmpeg.org)
-        - **Storage**: ~3GB for both local models
-        ## Example Prompts
-        - "Describe what you see in this image"
-        - "Count the number of people in this scene"
-        - "What objects are visible in this frame?"
-        - "Describe the emotions and actions in this scene"
-        - "What is the main activity happening here?"
-        """)
 if __name__ == "__main__":
     main()

+#!/usr/bin/env python3
+"""
+Main Streamlit application for video frame analysis with ontology-based risk assessment
+Refactored for better code organization and maintainability
+"""
 import streamlit as st
 import json
 from dotenv import load_dotenv
+# Import our modular components
+from video_processing import extract_frames_from_video
+from ontology_integration import analyze_scene_with_ontology, extract_scene_description
+from model_processing import process_frame
+from ui_components import (
+    render_sidebar_config,
+    render_input_section,
+    render_prompt_section,
+    render_process_button,
+    render_results_header,
+    render_frame_result,
+    render_validation_errors,
+    render_instructions
+)
 # Try to import local models, fall back gracefully if not available
 try:
     from local_models import get_local_model_manager
 # Load environment variables
 load_dotenv()
 def load_settings():
     """Load settings from JSON file"""
     try:
     except FileNotFoundError:
         return {}
 @st.cache_resource
 def initialize_local_models():
     """Initialize local model manager"""
     return get_local_model_manager()
+def initialize_app():
+    """Initialize the Streamlit application"""
     st.set_page_config(
+        page_title="Video Frame Analyzer with Ontology",
         page_icon="🎥",
         layout="wide"
     )
+    st.title("🎥 Video Frame Analyzer with Ontology-Based Risk Assessment")
+    st.markdown("Upload a video and analyze frames using AI models with ontology-based safety classification")
+def setup_local_models():
+    """Setup local models and return availability status"""
     local_manager = None
     local_models_available = False
+    if LOCAL_MODELS_AVAILABLE:
         try:
             local_manager = initialize_local_models()
             local_models_available = True
     else:
         st.info("💡 Local AI models not installed. Install with: `pip install torch torchvision transformers accelerate sentencepiece`")
+    return local_manager, local_models_available
+def process_video_frames(video_file, config, local_manager=None):
+    """
+    Process all frames in the video and return results
+    """
+    # Extract frames
+    frames = extract_frames_from_video(video_file, config["fps"])
+    if not frames:
+        st.error("No frames could be extracted from the video")
+        return []
+    st.success(f"Extracted {len(frames)} frames from video")
+    # Process each frame
+    results = []
+    progress_bar = st.progress(0)
+    # Add prompt to config for processing
+    processing_config = config.copy()
+    processing_config["prompt"] = config.get("prompt", "")
+    for i, frame_data in enumerate(frames):
+        with st.spinner(f"Analyzing frame {i+1}/{len(frames)}..."):
+            # Process frame with selected model
+            result = process_frame(frame_data, processing_config, local_manager)
+            # Extract scene description for ontology analysis
+            scene_description = extract_scene_description(result)
+            # Apply ontology analysis
+            ontology_analysis = analyze_scene_with_ontology(scene_description, config["use_ontology"])
+            results.append({
+                'frame_number': frame_data['frame_number'],
+                'timestamp': frame_data['timestamp'],
+                'image': frame_data['frame'],
+                'result': result,
+                'ontology_analysis': ontology_analysis
+            })
+            progress_bar.progress((i + 1) / len(frames))
+    return results
+def validate_inputs(video_file, prompt, config, local_models_available):
+    """
+    Validate all required inputs
+    """
+    model_type = config["model_type"]
+    selected_model = config["selected_model"]
+    api_token = config["api_token"]
+    # Check basic requirements
+    if not video_file:
+        return False
+    # Check prompt requirements
+    if not prompt and not (model_type == "Local Models" and selected_model == "Person on Track Detector"):
+        return False
+    # Check API token for remote models
+    if not api_token and model_type == "Remote API":
+        return False
+    # Check local models availability
+    if model_type == "Local Models" and not local_models_available:
+        return False
+    return True
+def main():
+    """Main application entry point"""
+    # Initialize application
+    initialize_app()
+    # Load settings and setup models
+    settings = load_settings()
+    local_manager, local_models_available = setup_local_models()
+    # Create main layout
     col1, col2 = st.columns([1, 1])
     with col1:
+        # Render sidebar configuration
+        config = render_sidebar_config(settings, local_models_available, local_manager)
+        # Render input section
+        input_data = render_input_section()
+        video_file = input_data["video_file"]
+        # Render prompt section
+        prompt = render_prompt_section(config)
+        # Render process button
+        process_button = render_process_button()
     with col2:
+        # Render results section
+        results_container = render_results_header()
+    # Main processing logic
+    if process_button:
+        if validate_inputs(video_file, prompt, config, local_models_available):
+            # Add prompt to config for processing
+            config["prompt"] = prompt
+            with st.spinner("Processing video..."):
+                # Process video frames
+                results = process_video_frames(video_file, config, local_manager)
+                # Display results
+                if results:
+                    with results_container:
+                        st.subheader("Analysis Results")
+                        # Display summary statistics
+                        severity_counts = {}
+                        for result in results:
+                            severity = result['ontology_analysis'].get('severity', 'NONE')
+                            severity_counts[severity] = severity_counts.get(severity, 0) + 1
+                        if config["use_ontology"] and severity_counts:
+                            st.write("**Summary:**")
+                            summary_cols = st.columns(len(severity_counts))
+                            for i, (severity, count) in enumerate(severity_counts.items()):
+                                icon_map = {
+                                    'NONE': '✅', 'LOW': '🟢', 'MEDIUM': '🟠',
+                                    'HIGH': '⚠️', 'CRITICAL': '🚨'
+                                }
+                                with summary_cols[i]:
+                                    st.metric(f"{icon_map.get(severity, '❓')} {severity}", count)
+                            st.divider()
+                        # Display individual frame results
+                        for result_data in results:
+                            render_frame_result(result_data)
+        else:
+            # Show validation errors
+            render_validation_errors(
+                video_file, prompt, config["api_token"],
+                config["model_type"], local_models_available, config["selected_model"]
+            )
+    # Render instructions
+    render_instructions()
 if __name__ == "__main__":
     main()

app_original_backup.py ADDED Viewed

	@@ -0,0 +1,640 @@

+import streamlit as st
+import cv2
+import os
+import tempfile
+import requests
+import base64
+import subprocess
+import json
+from io import BytesIO
+from PIL import Image
+import numpy as np
+from dotenv import load_dotenv
+from ontology_eval import Observation, evaluate, OntologyContext, decision_to_triples, triples_to_turtle, Severity
+# Try to import local models, fall back gracefully if not available
+try:
+    from local_models import get_local_model_manager
+    LOCAL_MODELS_AVAILABLE = True
+except ImportError as e:
+    LOCAL_MODELS_AVAILABLE = False
+    print(f"Local models not available: {e}")
+    def get_local_model_manager():
+        return None
+# Load environment variables
+load_dotenv()
+def load_settings():
+    """Load settings from JSON file"""
+    try:
+        with open('settings.json', 'r') as f:
+            return json.load(f)
+    except FileNotFoundError:
+        return {}
+# Local models configuration
+LOCAL_MODELS_ENABLED = LOCAL_MODELS_AVAILABLE
+REMOTE_MODELS_ENABLED = True  # Always allow remote API as fallback
+# Initialize local model manager
+@st.cache_resource
+def initialize_local_models():
+    """Initialize local model manager"""
+    return get_local_model_manager()
+# Hugging Face models for vision-language tasks (kept for compatibility)
+AVAILABLE_MODELS = {
+    "microsoft/kosmos-2-patch14-224": "Kosmos-2",
+    "Salesforce/blip-image-captioning-large": "BLIP Image Captioning",
+    "microsoft/DialoGPT-medium": "DialoGPT",
+    "microsoft/git-large-coco": "GIT Large COCO",
+    "nlpconnect/vit-gpt2-image-captioning": "ViT-GPT2"
+}
+def repair_video_with_ffmpeg(input_path, output_path):
+    """
+    Repair corrupted video by moving moov atom to the beginning
+    """
+    try:
+        # Try to fix the video using FFmpeg
+        cmd = [
+            'ffmpeg',
+            '-i', input_path,
+            '-c', 'copy',
+            '-movflags', 'faststart',
+            '-avoid_negative_ts', 'make_zero',
+            '-y',  # Overwrite output file
+            output_path
+        ]
+        result = subprocess.run(
+            cmd,
+            capture_output=True,
+            text=True,
+            timeout=300  # 5 minute timeout
+        )
+        return result.returncode == 0
+    except (subprocess.TimeoutExpired, FileNotFoundError):
+        return False
+def extract_frames_from_video(video_file, fps=1):
+    """
+    Extract frames from video at specified FPS (default 1 frame per second)
+    Automatically handles corrupted videos by attempting repair with FFmpeg
+    """
+    frames = []
+    with tempfile.NamedTemporaryFile(delete=False, suffix='.mp4') as tmp_file:
+        tmp_file.write(video_file.read())
+        tmp_file_path = tmp_file.name
+    repaired_path = None
+    try:
+        # First attempt: try to open video directly
+        cap = cv2.VideoCapture(tmp_file_path)
+        # Check if video opened successfully and has frames
+        if not cap.isOpened() or cap.get(cv2.CAP_PROP_FRAME_COUNT) == 0:
+            cap.release()
+            # Second attempt: try to repair the video with FFmpeg
+            st.warning("Video appears corrupted (moov atom issue). Attempting repair...")
+            with tempfile.NamedTemporaryFile(delete=False, suffix='_repaired.mp4') as repaired_file:
+                repaired_path = repaired_file.name
+            if repair_video_with_ffmpeg(tmp_file_path, repaired_path):
+                st.success("Video repair successful! Processing frames...")
+                cap = cv2.VideoCapture(repaired_path)
+            else:
+                st.error("Failed to repair video. FFmpeg may not be installed or video is severely corrupted.")
+                return frames
+        # Extract video properties
+        video_fps = cap.get(cv2.CAP_PROP_FPS)
+        if video_fps <= 0:
+            video_fps = 30  # Default fallback FPS
+        frame_interval = int(video_fps / fps) if video_fps > fps else 1
+        frame_count = 0
+        extracted_count = 0
+        while True:
+            ret, frame = cap.read()
+            if not ret:
+                break
+            if frame_count % frame_interval == 0:
+                frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
+                pil_image = Image.fromarray(frame_rgb)
+                frames.append({
+                    'frame': pil_image,
+                    'timestamp': frame_count / video_fps,
+                    'frame_number': extracted_count
+                })
+                extracted_count += 1
+            frame_count += 1
+        cap.release()
+    finally:
+        # Clean up temporary files
+        if os.path.exists(tmp_file_path):
+            os.unlink(tmp_file_path)
+        if repaired_path and os.path.exists(repaired_path):
+            os.unlink(repaired_path)
+    return frames
+def image_to_base64(image):
+    """Convert PIL image to base64 string"""
+    buffer = BytesIO()
+    image.save(buffer, format="PNG")
+    img_str = base64.b64encode(buffer.getvalue()).decode()
+    return img_str
+def process_image_locally(image, prompt, model_name, local_manager):
+    """
+    Process image using local models
+    """
+    try:
+        if model_name == "Person on Track Detector":
+            # Special handling for person-on-track detection
+            result = local_manager.person_on_track_detector.detect_person_on_track(image)
+            return {"person_on_track_detection": result}
+        else:
+            caption = local_manager.generate_caption(model_name, image, prompt)
+            return {"generated_text": caption}
+    except Exception as e:
+        return {"error": f"Local processing failed: {str(e)}"}
+def query_huggingface_api(image, prompt, model_name, api_token):
+    """
+    Query Hugging Face API with image and prompt
+    """
+    API_URL = f"https://api-inference.huggingface.co/models/{model_name}"
+    headers = {"Authorization": f"Bearer {api_token}"}
+    # Convert image to base64
+    img_base64 = image_to_base64(image)
+    # Prepare payload based on model type
+    if "blip" in model_name.lower():
+        # For BLIP models, send image directly
+        buffer = BytesIO()
+        image.save(buffer, format="PNG")
+        response = requests.post(
+            API_URL,
+            headers=headers,
+            files={"file": buffer.getvalue()}
+        )
+    else:
+        # For other vision-language models
+        payload = {
+            "inputs": {
+                "image": img_base64,
+                "text": prompt
+            }
+        }
+        response = requests.post(API_URL, headers=headers, json=payload)
+    if response.status_code == 200:
+        return response.json()
+    else:
+        return {"error": f"API request failed: {response.status_code} - {response.text}"}
+def analyze_scene_with_ontology(scene_description, use_ontology=True):
+    """
+    Analyze scene description using ontology-based evaluation
+    Returns classification and explanation
+    """
+    if not use_ontology:
+        return {
+            "severity": "NONE",
+            "severity_icon": "✅",
+            "score": 0,
+            "explanation": "Ontology-based analysis skipped",
+            "ontology_used": False,
+            "raw_description": scene_description
+        }
+    # Extract relevant information from scene description for ontology
+    scene_lower = scene_description.lower().strip() if scene_description else ""
+    # Initialize observation based on scene analysis
+    obs = Observation()
+    # Analyze scene for ontology features
+    person_words = ['person', 'people', 'man', 'woman', 'boy', 'girl', 'human', 'individual', 'someone']
+    track_words = ['track', 'tracks', 'rail', 'rails', 'railway', 'railroad']
+    platform_words = ['platform', 'station', 'bahnsteig']
+    danger_words = ['fallen', 'lying', 'down', 'accident', 'emergency']
+    fire_words = ['fire', 'smoke', 'flames', 'burning']
+    crowd_words = ['crowd', 'many people', 'group', 'mehrere personen']
+    safe_words = ['no people', 'empty', 'clear', 'safe', 'nobody', 'without people']
+    # Set observation values based on keyword analysis
+    person_mentions = sum(1 for word in person_words if word in scene_lower)
+    track_mentions = sum(1 for word in track_words if word in scene_lower)
+    platform_mentions = sum(1 for word in platform_words if word in scene_lower)
+    danger_mentions = sum(1 for word in danger_words if word in scene_lower)
+    fire_mentions = sum(1 for word in fire_words if word in scene_lower)
+    crowd_mentions = sum(1 for word in crowd_words if word in scene_lower)
+    safe_mentions = sum(1 for word in safe_words if word in scene_lower)
+    # Person on track detection (but not if explicitly safe)
+    if person_mentions > 0 and track_mentions > 0 and safe_mentions == 0:
+        # Check if person is actually on the tracks vs just mentioned
+        on_track_indicators = ['on track', 'on the track', 'on rails', 'on the rails', 'standing on', 'walking on']
+        on_track_specific = sum(1 for phrase in on_track_indicators if phrase in scene_lower)
+        if on_track_specific > 0:
+            obs.on_track_person = min(0.8, 0.6 + on_track_specific * 0.1)
+        elif person_mentions > 0 and track_mentions > 0:
+            # General co-occurrence but less confident - need stronger evidence
+            near_indicators = ['near', 'close to', 'next to', 'beside', 'by the']
+            near_mentions = sum(1 for phrase in near_indicators if phrase in scene_lower)
+            if near_mentions > 0:
+                # Person near tracks but not necessarily on them - lower confidence
+                obs.on_track_person = min(0.4, 0.25 + near_mentions * 0.05)
+            else:
+                # Just mention of person and tracks together - very low confidence
+                obs.on_track_person = min(0.3, 0.2 + (person_mentions + track_mentions) * 0.02)
+    # Fallen person detection
+    if person_mentions > 0 and danger_mentions > 0:
+        obs.fallen_person = min(0.7, 0.4 + danger_mentions * 0.1)
+    # Fire/smoke detection
+    if fire_mentions > 0:
+        obs.smoke_or_fire = min(0.8, 0.5 + fire_mentions * 0.15)
+    # Crowd detection
+    if crowd_mentions > 0 and (track_mentions > 0 or platform_mentions > 0):
+        obs.crowd_on_track = min(0.7, 0.4 + crowd_mentions * 0.1)
+    # Generic object detection (if no person but something mentioned on tracks)
+    if track_mentions > 0 and person_mentions == 0 and any(word in scene_lower for word in ['object', 'item', 'thing', 'debris']):
+        obs.object_on_track = 0.6
+    # Evaluate using ontology
+    decision = evaluate(obs)
+    # Map severity to icons and colors
+    severity_mapping = {
+        Severity.NONE: {"icon": "✅", "color": "green"},
+        Severity.LOW: {"icon": "🟢", "color": "lightgreen"},
+        Severity.MEDIUM: {"icon": "🟠", "color": "orange"},
+        Severity.HIGH: {"icon": "⚠️", "color": "red"},
+        Severity.CRITICAL: {"icon": "🚨", "color": "darkred"}
+    }
+    severity_info = severity_mapping[decision.severity]
+    return {
+        "severity": decision.severity.name,
+        "severity_icon": severity_info["icon"],
+        "severity_color": severity_info["color"],
+        "score": decision.score_0_100,
+        "labels": [label.value for label in decision.labels],
+        "explanations": decision.explanations,
+        "fired_rules": decision.fired_rules,
+        "ontology_used": True,
+        "raw_description": scene_description,
+        "observation": obs,
+        "decision": decision
+    }
+def main():
+    st.set_page_config(
+        page_title="Video Frame Analyzer",
+        page_icon="🎥",
+        layout="wide"
+    )
+    st.title("🎥 Video Frame Analyzer with Local AI Models")
+    st.markdown("Upload a video, provide a prompt, and analyze each frame using local AI models (CNN or Transformer)")
+    # Load settings and initialize local models
+    settings = load_settings()
+    # Initialize local models if enabled
+    local_manager = None
+    local_models_available = False
+    if LOCAL_MODELS_ENABLED:
+        try:
+            local_manager = initialize_local_models()
+            local_models_available = True
+            st.success("🤖 Local AI models initialized successfully!")
+        except Exception as e:
+            st.warning(f"Local AI models not available: {str(e)}")
+            st.info("💡 Install AI packages: `pip install torch torchvision transformers accelerate sentencepiece`")
+            local_models_available = False
+    else:
+        st.info("💡 Local AI models not installed. Install with: `pip install torch torchvision transformers accelerate sentencepiece`")
+    # Sidebar for configuration
+    with st.sidebar:
+        st.header("Configuration")
+        # Model type selection
+        available_options = []
+        if local_models_available:
+            available_options.append("Local Models")
+        if REMOTE_MODELS_ENABLED:
+            available_options.append("Remote API")
+        if not available_options:
+            available_options = ["Remote API"]  # Fallback
+        model_type = st.radio(
+            "Model Type",
+            available_options,
+            help="Choose between local AI models or remote Hugging Face API"
+        )
+        if model_type == "Local Models" and local_models_available:
+            # Local model selection
+            available_local_models = local_manager.get_available_models()
+            selected_model = st.selectbox(
+                "Select Local Model",
+                options=available_local_models,
+                help="Choose between CNN (fast) or Transformer (detailed) models"
+            )
+            # Show model info
+            model_info = local_manager.get_model_info()
+            if selected_model in model_info:
+                with st.expander("Model Information"):
+                    st.write(f"**Description:** {model_info[selected_model]['description']}")
+                    st.write(f"**Strengths:** {model_info[selected_model]['strengths']}")
+                    st.write(f"**Size:** {model_info[selected_model]['size']}")
+            api_token = None  # Not needed for local models
+        else:
+            # Remote API configuration
+            default_token = settings.get('hugging_face_api_token', '')
+            api_token = st.text_input(
+                "Hugging Face API Token",
+                value=default_token,
+                type="password",
+                help="Get your token from https://huggingface.co/settings/tokens or save in settings.json"
+            )
+            # Remote model selection
+            selected_model = st.selectbox(
+                "Select Model",
+                options=list(AVAILABLE_MODELS.keys()),
+                format_func=lambda x: AVAILABLE_MODELS[x]
+            )
+        # Frame extraction rate
+        fps = st.slider(
+            "Frames per second to extract",
+            min_value=0.1,
+            max_value=5.0,
+            value=1.0,
+            step=0.1
+        )
+        # Ontology settings
+        st.subheader("Ontology Analysis")
+        use_ontology = st.checkbox(
+            "Enable Ontology Analysis",
+            value=True,
+            help="Use ontology-based classification (NONE/LOW/MEDIUM/HIGH/CRITICAL)"
+        )
+        if not use_ontology:
+            st.info("🔄 Ontology analysis disabled - showing raw model output only")
+    # Main content area
+    col1, col2 = st.columns([1, 1])
+    with col1:
+        st.header("Input")
+        # Video upload
+        video_file = st.file_uploader(
+            "Upload Video",
+            type=['mp4', 'avi', 'mov', 'mkv'],
+            help="Upload a video file to analyze"
+        )
+        # Prompt input (conditional based on model)
+        if model_type == "Local Models" and local_models_available and selected_model == "Person on Track Detector":
+            # Person on Track Detector works automatically
+            st.info("🤖 Person on Track Detector works automatically - no prompt needed!")
+            prompt = "automatic"  # Set automatic prompt
+        else:
+            # Regular models need user prompt
+            prompt = st.text_area(
+                "Analysis Prompt",
+                placeholder="Describe what you see in the image...",
+                help="Enter the prompt to analyze each frame"
+            )
+        # Process button
+        process_button = st.button("Process Video", type="primary")
+    with col2:
+        st.header("Results")
+        results_container = st.container()
+    # Processing logic
+    if process_button and video_file and (prompt or (model_type == "Local Models" and selected_model == "Person on Track Detector")) and (api_token or model_type == "Local Models"):
+        with st.spinner("Processing video..."):
+            # Extract frames
+            frames = extract_frames_from_video(video_file, fps)
+            if not frames:
+                st.error("No frames could be extracted from the video")
+                return
+            st.success(f"Extracted {len(frames)} frames from video")
+            # Process each frame
+            results = []
+            progress_bar = st.progress(0)
+            for i, frame_data in enumerate(frames):
+                with st.spinner(f"Analyzing frame {i+1}/{len(frames)}..."):
+                    # Process frame based on model type
+                    if model_type == "Local Models" and local_models_available:
+                        result = process_image_locally(
+                            frame_data['frame'],
+                            prompt,
+                            selected_model,
+                            local_manager
+                        )
+                    else:
+                        result = query_huggingface_api(
+                            frame_data['frame'],
+                            prompt,
+                            selected_model,
+                            api_token
+                        )
+                    # Extract scene description for ontology analysis
+                    scene_description = ""
+                    if 'person_on_track_detection' in result:
+                        # For person detection results, use the analysis text
+                        scene_description = result['person_on_track_detection'].get('detailed_analysis', {}).get('scene_description', '')
+                    elif 'generated_text' in result:
+                        scene_description = result['generated_text']
+                    elif isinstance(result, list) and len(result) > 0 and 'generated_text' in result[0]:
+                        scene_description = result[0]['generated_text']
+                    # Apply ontology analysis
+                    ontology_analysis = analyze_scene_with_ontology(scene_description, use_ontology)
+                    results.append({
+                        'frame_number': frame_data['frame_number'],
+                        'timestamp': frame_data['timestamp'],
+                        'image': frame_data['frame'],
+                        'result': result,
+                        'ontology_analysis': ontology_analysis
+                    })
+                    progress_bar.progress((i + 1) / len(frames))
+            # Display results
+            with results_container:
+                st.subheader("Analysis Results")
+                for result_data in results:
+                    ontology = result_data['ontology_analysis']
+                    severity_icon = ontology.get('severity_icon', '✅')
+                    severity = ontology.get('severity', 'NONE')
+                    # Create expander title with severity indicator
+                    expander_title = f"{severity_icon} {severity} - Frame {result_data['frame_number']} (t={result_data['timestamp']:.1f}s)"
+                    with st.expander(expander_title):
+                        col_img, col_text = st.columns([1, 2])
+                        with col_img:
+                            st.image(
+                                result_data['image'],
+                                caption=f"Frame {result_data['frame_number']}",
+                                use_container_width=True
+                            )
+                        with col_text:
+                            # Display ontology analysis first if enabled
+                            if ontology.get('ontology_used', False):
+                                # Severity display with color
+                                severity_color = ontology.get('severity_color', 'green')
+                                st.markdown(f"**Safety Assessment:** :{severity_color}[{severity_icon} {severity}]")
+                                # Score display
+                                if ontology.get('score', 0) > 0:
+                                    st.metric("Risk Score", f"{ontology['score']}/100")
+                                # Show explanations if available
+                                if ontology.get('explanations'):
+                                    st.write("**Ontology Analysis:**")
+                                    for explanation in ontology['explanations']:
+                                        st.write(f"• {explanation}")
+                                # Show fired rules if available
+                                if ontology.get('fired_rules'):
+                                    with st.expander("Technical Details"):
+                                        st.write("**Triggered Rules:**")
+                                        for rule in ontology['fired_rules']:
+                                            st.code(rule)
+                                        if ontology.get('labels'):
+                                            st.write("**Detected Hazard Labels:**")
+                                            for label in ontology['labels']:
+                                                st.code(label)
+                                st.divider()
+                            # Display original model results
+                            st.write("**Model Output:**")
+                            if 'error' in result_data['result']:
+                                st.error(f"Error: {result_data['result']['error']}")
+                            elif 'person_on_track_detection' in result_data['result']:
+                                # Handle person-on-track detection results
+                                detection = result_data['result']['person_on_track_detection']
+                                people_count = detection.get('people_count', 0)
+                                confidence = detection.get('confidence', 0)
+                                analysis = detection.get('analysis', 'No analysis')
+                                person_on_track = detection.get('person_on_track', False)
+                                st.write(f"**Detection Analysis:** {analysis}")
+                                # Show metrics
+                                col1, col2 = st.columns(2)
+                                with col1:
+                                    st.metric("👥 People Detected", people_count)
+                                with col2:
+                                    st.metric("📊 Model Confidence", f"{confidence:.0%}")
+                            else:
+                                if 'generated_text' in result_data['result']:
+                                    # Handle direct generated_text response (local models)
+                                    st.write(f"*{result_data['result']['generated_text']}*")
+                                elif isinstance(result_data['result'], list) and len(result_data['result']) > 0:
+                                    # Handle list responses (common for captioning models)
+                                    if 'generated_text' in result_data['result'][0]:
+                                        st.write(f"*{result_data['result'][0]['generated_text']}*")
+                                    else:
+                                        st.json(result_data['result'][0])
+                                else:
+                                    st.json(result_data['result'])
+    elif process_button:
+        if not video_file:
+            st.error("Please upload a video file")
+        if not prompt and not (model_type == "Local Models" and selected_model == "Person on Track Detector"):
+            st.error("Please enter an analysis prompt")
+        if not api_token and model_type == "Remote API":
+            st.error("Please provide your Hugging Face API token for remote models")
+        if model_type == "Local Models" and not local_models_available:
+            st.error("Local models failed to initialize. Check your installation.")
+    # Instructions
+    with st.expander("How to use"):
+        st.markdown("""
+        ## Local AI Models (Recommended)
+        1. **Upload a video**: Choose a video file (MP4, AVI, MOV, or MKV)
+        2. **Select model type**: Choose "Local Models" for offline processing
+        3. **Choose AI model**:
+           - **CNN (BLIP)**: Fast, good for object detection (~1.2GB)
+           - **Transformer (ViT-GPT2)**: Detailed descriptions (~1.8GB)
+        4. **Enter a prompt**: Describe what you want the AI to analyze
+        5. **Adjust frame rate**: Set frames per second to extract (default: 1 fps)
+        6. **Click Process**: Frames are processed locally on your machine
+        ## Remote API Models (Optional)
+        1. **Get API token**: Visit [Hugging Face Settings](https://huggingface.co/settings/tokens)
+        2. **Select "Remote API"** in model type
+        3. **Enter token** and select remote model
+        ## Video Support Features
+        - **Automatic corruption repair**: Handles videos with corrupted moov atoms
+        - **FFmpeg integration**: Auto-repairs problematic video files
+        - **Multiple formats**: MP4, AVI, MOV, MKV support
+        ## Requirements
+        - **Python packages**: torch, transformers, accelerate (see requirements.txt)
+        - **Optional**: FFmpeg for video repair (download from https://ffmpeg.org)
+        - **Storage**: ~3GB for both local models
+        ## Example Prompts
+        - "Describe what you see in this image"
+        - "Count the number of people in this scene"
+        - "What objects are visible in this frame?"
+        - "Describe the emotions and actions in this scene"
+        - "What is the main activity happening here?"
+        """)
+if __name__ == "__main__":
+    main()

app_refactored.py ADDED Viewed

	@@ -0,0 +1,236 @@

+#!/usr/bin/env python3
+"""
+Main Streamlit application for video frame analysis with ontology-based risk assessment
+Refactored for better code organization and maintainability
+"""
+import streamlit as st
+import json
+from dotenv import load_dotenv
+# Import our modular components
+from video_processing import extract_frames_from_video
+from ontology_integration import analyze_scene_with_ontology, extract_scene_description
+from model_processing import process_frame
+from ui_components import (
+    render_sidebar_config,
+    render_input_section,
+    render_prompt_section,
+    render_process_button,
+    render_results_header,
+    render_frame_result,
+    render_validation_errors,
+    render_instructions
+)
+# Try to import local models, fall back gracefully if not available
+try:
+    from local_models import get_local_model_manager
+    LOCAL_MODELS_AVAILABLE = True
+except ImportError as e:
+    LOCAL_MODELS_AVAILABLE = False
+    print(f"Local models not available: {e}")
+    def get_local_model_manager():
+        return None
+# Load environment variables
+load_dotenv()
+def load_settings():
+    """Load settings from JSON file"""
+    try:
+        with open('settings.json', 'r') as f:
+            return json.load(f)
+    except FileNotFoundError:
+        return {}
+@st.cache_resource
+def initialize_local_models():
+    """Initialize local model manager"""
+    return get_local_model_manager()
+def initialize_app():
+    """Initialize the Streamlit application"""
+    st.set_page_config(
+        page_title="Video Frame Analyzer with Ontology",
+        page_icon="🎥",
+        layout="wide"
+    )
+    st.title("🎥 Video Frame Analyzer with Ontology-Based Risk Assessment")
+    st.markdown("Upload a video and analyze frames using AI models with ontology-based safety classification")
+def setup_local_models():
+    """Setup local models and return availability status"""
+    local_manager = None
+    local_models_available = False
+    if LOCAL_MODELS_AVAILABLE:
+        try:
+            local_manager = initialize_local_models()
+            local_models_available = True
+            st.success("🤖 Local AI models initialized successfully!")
+        except Exception as e:
+            st.warning(f"Local AI models not available: {str(e)}")
+            st.info("💡 Install AI packages: `pip install torch torchvision transformers accelerate sentencepiece`")
+            local_models_available = False
+    else:
+        st.info("💡 Local AI models not installed. Install with: `pip install torch torchvision transformers accelerate sentencepiece`")
+    return local_manager, local_models_available
+def process_video_frames(video_file, config, local_manager=None):
+    """
+    Process all frames in the video and return results
+    """
+    # Extract frames
+    frames = extract_frames_from_video(video_file, config["fps"])
+    if not frames:
+        st.error("No frames could be extracted from the video")
+        return []
+    st.success(f"Extracted {len(frames)} frames from video")
+    # Process each frame
+    results = []
+    progress_bar = st.progress(0)
+    # Add prompt to config for processing
+    processing_config = config.copy()
+    processing_config["prompt"] = config.get("prompt", "")
+    for i, frame_data in enumerate(frames):
+        with st.spinner(f"Analyzing frame {i+1}/{len(frames)}..."):
+            # Process frame with selected model
+            result = process_frame(frame_data, processing_config, local_manager)
+            # Extract scene description for ontology analysis
+            scene_description = extract_scene_description(result)
+            # Apply ontology analysis
+            ontology_analysis = analyze_scene_with_ontology(scene_description, config["use_ontology"])
+            results.append({
+                'frame_number': frame_data['frame_number'],
+                'timestamp': frame_data['timestamp'],
+                'image': frame_data['frame'],
+                'result': result,
+                'ontology_analysis': ontology_analysis
+            })
+            progress_bar.progress((i + 1) / len(frames))
+    return results
+def validate_inputs(video_file, prompt, config, local_models_available):
+    """
+    Validate all required inputs
+    """
+    model_type = config["model_type"]
+    selected_model = config["selected_model"]
+    api_token = config["api_token"]
+    # Check basic requirements
+    if not video_file:
+        return False
+    # Check prompt requirements
+    if not prompt and not (model_type == "Local Models" and selected_model == "Person on Track Detector"):
+        return False
+    # Check API token for remote models
+    if not api_token and model_type == "Remote API":
+        return False
+    # Check local models availability
+    if model_type == "Local Models" and not local_models_available:
+        return False
+    return True
+def main():
+    """Main application entry point"""
+    # Initialize application
+    initialize_app()
+    # Load settings and setup models
+    settings = load_settings()
+    local_manager, local_models_available = setup_local_models()
+    # Create main layout
+    col1, col2 = st.columns([1, 1])
+    with col1:
+        # Render sidebar configuration
+        config = render_sidebar_config(settings, local_models_available, local_manager)
+        # Render input section
+        input_data = render_input_section()
+        video_file = input_data["video_file"]
+        # Render prompt section
+        prompt = render_prompt_section(config)
+        # Render process button
+        process_button = render_process_button()
+    with col2:
+        # Render results section
+        results_container = render_results_header()
+    # Main processing logic
+    if process_button:
+        if validate_inputs(video_file, prompt, config, local_models_available):
+            # Add prompt to config for processing
+            config["prompt"] = prompt
+            with st.spinner("Processing video..."):
+                # Process video frames
+                results = process_video_frames(video_file, config, local_manager)
+                # Display results
+                if results:
+                    with results_container:
+                        st.subheader("Analysis Results")
+                        # Display summary statistics
+                        severity_counts = {}
+                        for result in results:
+                            severity = result['ontology_analysis'].get('severity', 'NONE')
+                            severity_counts[severity] = severity_counts.get(severity, 0) + 1
+                        if config["use_ontology"] and severity_counts:
+                            st.write("**Summary:**")
+                            summary_cols = st.columns(len(severity_counts))
+                            for i, (severity, count) in enumerate(severity_counts.items()):
+                                icon_map = {
+                                    'NONE': '✅', 'LOW': '🟢', 'MEDIUM': '🟠',
+                                    'HIGH': '⚠️', 'CRITICAL': '🚨'
+                                }
+                                with summary_cols[i]:
+                                    st.metric(f"{icon_map.get(severity, '❓')} {severity}", count)
+                            st.divider()
+                        # Display individual frame results
+                        for result_data in results:
+                            render_frame_result(result_data)
+        else:
+            # Show validation errors
+            render_validation_errors(
+                video_file, prompt, config["api_token"],
+                config["model_type"], local_models_available, config["selected_model"]
+            )
+    # Render instructions
+    render_instructions()
+if __name__ == "__main__":
+    main()

model_processing.py ADDED Viewed

	@@ -0,0 +1,97 @@

+#!/usr/bin/env python3
+"""
+Model processing utilities for local and remote AI models
+"""
+import requests
+import base64
+from io import BytesIO
+from PIL import Image
+from typing import Dict, Any, Optional
+def image_to_base64(image: Image.Image) -> str:
+    """Convert PIL image to base64 string"""
+    buffer = BytesIO()
+    image.save(buffer, format="PNG")
+    img_str = base64.b64encode(buffer.getvalue()).decode()
+    return img_str
+def process_image_locally(image: Image.Image, prompt: str, model_name: str, local_manager) -> Dict[str, Any]:
+    """
+    Process image using local models
+    """
+    try:
+        if model_name == "Person on Track Detector":
+            # Special handling for person-on-track detection
+            result = local_manager.person_on_track_detector.detect_person_on_track(image)
+            return {"person_on_track_detection": result}
+        else:
+            caption = local_manager.generate_caption(model_name, image, prompt)
+            return {"generated_text": caption}
+    except Exception as e:
+        return {"error": f"Local processing failed: {str(e)}"}
+def query_huggingface_api(image: Image.Image, prompt: str, model_name: str, api_token: str) -> Dict[str, Any]:
+    """
+    Query Hugging Face API with image and prompt
+    """
+    API_URL = f"https://api-inference.huggingface.co/models/{model_name}"
+    headers = {"Authorization": f"Bearer {api_token}"}
+    # Convert image to base64
+    img_base64 = image_to_base64(image)
+    # Prepare payload based on model type
+    if "blip" in model_name.lower():
+        # For BLIP models, send image directly
+        buffer = BytesIO()
+        image.save(buffer, format="PNG")
+        response = requests.post(
+            API_URL,
+            headers=headers,
+            files={"file": buffer.getvalue()}
+        )
+    else:
+        # For other vision-language models
+        payload = {
+            "inputs": {
+                "image": img_base64,
+                "text": prompt
+            }
+        }
+        response = requests.post(API_URL, headers=headers, json=payload)
+    if response.status_code == 200:
+        return response.json()
+    else:
+        return {"error": f"API request failed: {response.status_code} - {response.text}"}
+def process_frame(frame_data: Dict, config: Dict[str, Any], local_manager=None) -> Dict[str, Any]:
+    """
+    Process a single frame using the configured model
+    """
+    model_type = config["model_type"]
+    selected_model = config["selected_model"]
+    prompt = config.get("prompt", "")
+    api_token = config.get("api_token")
+    # Process frame based on model type
+    if model_type == "Local Models" and local_manager:
+        result = process_image_locally(
+            frame_data['frame'],
+            prompt,
+            selected_model,
+            local_manager
+        )
+    else:
+        result = query_huggingface_api(
+            frame_data['frame'],
+            prompt,
+            selected_model,
+            api_token
+        )
+    return result

ontology_integration.py ADDED Viewed

	@@ -0,0 +1,144 @@

+#!/usr/bin/env python3
+"""
+Ontology integration module for scene analysis and risk assessment
+"""
+from ontology_eval import Observation, evaluate, Severity
+from typing import Dict, Any, Optional
+def analyze_scene_with_ontology(scene_description: str, use_ontology: bool = True) -> Dict[str, Any]:
+    """
+    Analyze scene description using ontology-based evaluation
+    Returns classification and explanation
+    """
+    if not use_ontology:
+        return {
+            "severity": "NONE",
+            "severity_icon": "✅",
+            "score": 0,
+            "explanation": "Ontology-based analysis skipped",
+            "ontology_used": False,
+            "raw_description": scene_description
+        }
+    # Extract relevant information from scene description for ontology
+    scene_lower = scene_description.lower().strip() if scene_description else ""
+    # Initialize observation based on scene analysis
+    obs = _extract_ontology_features(scene_lower)
+    # Evaluate using ontology
+    decision = evaluate(obs)
+    # Map severity to icons and colors
+    severity_mapping = {
+        Severity.NONE: {"icon": "✅", "color": "green"},
+        Severity.LOW: {"icon": "🟢", "color": "lightgreen"},
+        Severity.MEDIUM: {"icon": "🟠", "color": "orange"},
+        Severity.HIGH: {"icon": "⚠️", "color": "red"},
+        Severity.CRITICAL: {"icon": "🚨", "color": "darkred"}
+    }
+    severity_info = severity_mapping[decision.severity]
+    return {
+        "severity": decision.severity.name,
+        "severity_icon": severity_info["icon"],
+        "severity_color": severity_info["color"],
+        "score": decision.score_0_100,
+        "labels": [label.value for label in decision.labels],
+        "explanations": decision.explanations,
+        "fired_rules": decision.fired_rules,
+        "ontology_used": True,
+        "raw_description": scene_description,
+        "observation": obs,
+        "decision": decision
+    }
+def _extract_ontology_features(scene_lower: str) -> Observation:
+    """
+    Extract ontology-relevant features from scene description
+    """
+    # Initialize observation
+    obs = Observation()
+    # Define keyword categories
+    person_words = ['person', 'people', 'man', 'woman', 'boy', 'girl', 'human', 'individual', 'someone']
+    track_words = ['track', 'tracks', 'rail', 'rails', 'railway', 'railroad']
+    platform_words = ['platform', 'station', 'bahnsteig']
+    danger_words = ['fallen', 'lying', 'down', 'accident', 'emergency']
+    fire_words = ['fire', 'smoke', 'flames', 'burning']
+    crowd_words = ['crowd', 'many people', 'group', 'mehrere personen']
+    safe_words = ['no people', 'empty', 'clear', 'safe', 'nobody', 'without people']
+    # Count keyword mentions
+    person_mentions = sum(1 for word in person_words if word in scene_lower)
+    track_mentions = sum(1 for word in track_words if word in scene_lower)
+    platform_mentions = sum(1 for word in platform_words if word in scene_lower)
+    danger_mentions = sum(1 for word in danger_words if word in scene_lower)
+    fire_mentions = sum(1 for word in fire_words if word in scene_lower)
+    crowd_mentions = sum(1 for word in crowd_words if word in scene_lower)
+    safe_mentions = sum(1 for word in safe_words if word in scene_lower)
+    # Person on track detection (but not if explicitly safe)
+    if person_mentions > 0 and track_mentions > 0 and safe_mentions == 0:
+        obs.on_track_person = _calculate_person_on_track_confidence(scene_lower, person_mentions, track_mentions)
+    # Fallen person detection
+    if person_mentions > 0 and danger_mentions > 0:
+        obs.fallen_person = min(0.7, 0.4 + danger_mentions * 0.1)
+    # Fire/smoke detection
+    if fire_mentions > 0:
+        obs.smoke_or_fire = min(0.8, 0.5 + fire_mentions * 0.15)
+    # Crowd detection
+    if crowd_mentions > 0 and (track_mentions > 0 or platform_mentions > 0):
+        obs.crowd_on_track = min(0.7, 0.4 + crowd_mentions * 0.1)
+    # Generic object detection (if no person but something mentioned on tracks)
+    if track_mentions > 0 and person_mentions == 0 and any(word in scene_lower for word in ['object', 'item', 'thing', 'debris']):
+        obs.object_on_track = 0.6
+    return obs
+def _calculate_person_on_track_confidence(scene_lower: str, person_mentions: int, track_mentions: int) -> float:
+    """
+    Calculate confidence for person on track detection based on specific indicators
+    """
+    # Check for specific on-track indicators
+    on_track_indicators = ['on track', 'on the track', 'on rails', 'on the rails', 'standing on', 'walking on']
+    on_track_specific = sum(1 for phrase in on_track_indicators if phrase in scene_lower)
+    if on_track_specific > 0:
+        return min(0.8, 0.6 + on_track_specific * 0.1)
+    else:
+        # Check for proximity indicators
+        near_indicators = ['near', 'close to', 'next to', 'beside', 'by the']
+        near_mentions = sum(1 for phrase in near_indicators if phrase in scene_lower)
+        if near_mentions > 0:
+            # Person near tracks but not necessarily on them - lower confidence
+            return min(0.4, 0.25 + near_mentions * 0.05)
+        else:
+            # Just mention of person and tracks together - very low confidence
+            return min(0.3, 0.2 + (person_mentions + track_mentions) * 0.02)
+def extract_scene_description(result: Dict[str, Any]) -> str:
+    """
+    Extract scene description from various model result formats
+    """
+    scene_description = ""
+    if 'person_on_track_detection' in result:
+        # For person detection results, use the analysis text
+        scene_description = result['person_on_track_detection'].get('detailed_analysis', {}).get('scene_description', '')
+    elif 'generated_text' in result:
+        scene_description = result['generated_text']
+    elif isinstance(result, list) and len(result) > 0 and 'generated_text' in result[0]:
+        scene_description = result[0]['generated_text']
+    return scene_description

ui_components.py ADDED Viewed

	@@ -0,0 +1,320 @@

+#!/usr/bin/env python3
+"""
+UI components for the Streamlit application
+"""
+import streamlit as st
+from typing import Dict, List, Any, Optional
+from local_models import get_local_model_manager
+# Available Hugging Face models for remote API
+AVAILABLE_MODELS = {
+    "microsoft/kosmos-2-patch14-224": "Kosmos-2",
+    "Salesforce/blip-image-captioning-large": "BLIP Image Captioning",
+    "microsoft/DialoGPT-medium": "DialoGPT",
+    "microsoft/git-large-coco": "GIT Large COCO",
+    "nlpconnect/vit-gpt2-image-captioning": "ViT-GPT2"
+}
+def render_sidebar_config(settings: Dict, local_models_available: bool, local_manager: Optional[Any]) -> Dict[str, Any]:
+    """
+    Render the sidebar configuration panel
+    Returns configuration settings
+    """
+    with st.sidebar:
+        st.header("Configuration")
+        # Model type selection
+        available_options = []
+        if local_models_available:
+            available_options.append("Local Models")
+        available_options.append("Remote API")
+        model_type = st.radio(
+            "Model Type",
+            available_options,
+            help="Choose between local AI models or remote Hugging Face API"
+        )
+        # Model selection based on type
+        if model_type == "Local Models" and local_models_available:
+            selected_model, api_token = _render_local_model_config(local_manager)
+        else:
+            selected_model, api_token = _render_remote_model_config(settings)
+        # Frame extraction rate
+        fps = st.slider(
+            "Frames per second to extract",
+            min_value=0.1,
+            max_value=5.0,
+            value=1.0,
+            step=0.1
+        )
+        # Ontology settings
+        st.subheader("Ontology Analysis")
+        use_ontology = st.checkbox(
+            "Enable Ontology Analysis",
+            value=True,
+            help="Use ontology-based classification (NONE/LOW/MEDIUM/HIGH/CRITICAL)"
+        )
+        if not use_ontology:
+            st.info("🔄 Ontology analysis disabled - showing raw model output only")
+    return {
+        "model_type": model_type,
+        "selected_model": selected_model,
+        "api_token": api_token,
+        "fps": fps,
+        "use_ontology": use_ontology
+    }
+def _render_local_model_config(local_manager) -> tuple:
+    """Render local model configuration"""
+    available_local_models = local_manager.get_available_models()
+    selected_model = st.selectbox(
+        "Select Local Model",
+        options=available_local_models,
+        help="Choose between CNN (fast) or Transformer (detailed) models"
+    )
+    # Show model info
+    model_info = local_manager.get_model_info()
+    if selected_model in model_info:
+        with st.expander("Model Information"):
+            st.write(f"**Description:** {model_info[selected_model]['description']}")
+            st.write(f"**Strengths:** {model_info[selected_model]['strengths']}")
+            st.write(f"**Size:** {model_info[selected_model]['size']}")
+    return selected_model, None  # No API token needed for local models
+def _render_remote_model_config(settings: Dict) -> tuple:
+    """Render remote API model configuration"""
+    default_token = settings.get('hugging_face_api_token', '')
+    api_token = st.text_input(
+        "Hugging Face API Token",
+        value=default_token,
+        type="password",
+        help="Get your token from https://huggingface.co/settings/tokens or save in settings.json"
+    )
+    selected_model = st.selectbox(
+        "Select Model",
+        options=list(AVAILABLE_MODELS.keys()),
+        format_func=lambda x: AVAILABLE_MODELS[x]
+    )
+    return selected_model, api_token
+def render_input_section() -> Dict[str, Any]:
+    """
+    Render the input section for video upload and prompts
+    Returns input data
+    """
+    st.header("Input")
+    # Video upload
+    video_file = st.file_uploader(
+        "Upload Video",
+        type=['mp4', 'avi', 'mov', 'mkv'],
+        help="Upload a video file to analyze"
+    )
+    return {
+        "video_file": video_file
+    }
+def render_prompt_section(config: Dict[str, Any]) -> str:
+    """
+    Render prompt input section based on model configuration
+    """
+    model_type = config["model_type"]
+    selected_model = config["selected_model"]
+    # Prompt input (conditional based on model)
+    if (model_type == "Local Models" and
+        selected_model == "Person on Track Detector"):
+        # Person on Track Detector works automatically
+        st.info("🤖 Person on Track Detector works automatically - no prompt needed!")
+        return "automatic"
+    else:
+        # Regular models need user prompt
+        return st.text_area(
+            "Analysis Prompt",
+            placeholder="Describe what you see in the image...",
+            help="Enter the prompt to analyze each frame"
+        )
+def render_process_button() -> bool:
+    """Render the process button"""
+    return st.button("Process Video", type="primary")
+def render_results_header():
+    """Render the results section header"""
+    st.header("Results")
+    return st.container()
+def render_frame_result(result_data: Dict[str, Any]):
+    """
+    Render a single frame result with ontology analysis
+    """
+    ontology = result_data['ontology_analysis']
+    severity_icon = ontology.get('severity_icon', '✅')
+    severity = ontology.get('severity', 'NONE')
+    # Create expander title with severity indicator
+    expander_title = f"{severity_icon} {severity} - Frame {result_data['frame_number']} (t={result_data['timestamp']:.1f}s)"
+    with st.expander(expander_title):
+        col_img, col_text = st.columns([1, 2])
+        with col_img:
+            st.image(
+                result_data['image'],
+                caption=f"Frame {result_data['frame_number']}",
+                use_container_width=True
+            )
+        with col_text:
+            # Display ontology analysis first if enabled
+            if ontology.get('ontology_used', False):
+                _render_ontology_analysis(ontology)
+                st.divider()
+            # Display original model results
+            _render_model_output(result_data['result'])
+def _render_ontology_analysis(ontology: Dict[str, Any]):
+    """Render ontology analysis section"""
+    severity = ontology.get('severity', 'NONE')
+    severity_icon = ontology.get('severity_icon', '✅')
+    severity_color = ontology.get('severity_color', 'green')
+    # Severity display with color
+    st.markdown(f"**Safety Assessment:** :{severity_color}[{severity_icon} {severity}]")
+    # Score display
+    if ontology.get('score', 0) > 0:
+        st.metric("Risk Score", f"{ontology['score']}/100")
+    # Show explanations if available
+    if ontology.get('explanations'):
+        st.write("**Ontology Analysis:**")
+        for explanation in ontology['explanations']:
+            st.write(f"• {explanation}")
+    # Show fired rules if available
+    if ontology.get('fired_rules'):
+        with st.expander("Technical Details"):
+            st.write("**Triggered Rules:**")
+            for rule in ontology['fired_rules']:
+                st.code(rule)
+            if ontology.get('labels'):
+                st.write("**Detected Hazard Labels:**")
+                for label in ontology['labels']:
+                    st.code(label)
+def _render_model_output(result: Dict[str, Any]):
+    """Render original model output section"""
+    st.write("**Model Output:**")
+    if 'error' in result:
+        st.error(f"Error: {result['error']}")
+    elif 'person_on_track_detection' in result:
+        _render_person_detection_result(result['person_on_track_detection'])
+    else:
+        _render_general_model_result(result)
+def _render_person_detection_result(detection: Dict[str, Any]):
+    """Render person on track detection specific results"""
+    people_count = detection.get('people_count', 0)
+    confidence = detection.get('confidence', 0)
+    analysis = detection.get('analysis', 'No analysis')
+    st.write(f"**Detection Analysis:** {analysis}")
+    # Show metrics
+    col1, col2 = st.columns(2)
+    with col1:
+        st.metric("👥 People Detected", people_count)
+    with col2:
+        st.metric("📊 Model Confidence", f"{confidence:.0%}")
+def _render_general_model_result(result: Dict[str, Any]):
+    """Render general model results (captioning, etc.)"""
+    if 'generated_text' in result:
+        st.write(f"*{result['generated_text']}*")
+    elif isinstance(result, list) and len(result) > 0:
+        if 'generated_text' in result[0]:
+            st.write(f"*{result[0]['generated_text']}*")
+        else:
+            st.json(result[0])
+    else:
+        st.json(result)
+def render_validation_errors(video_file, prompt, api_token, model_type, local_models_available, selected_model):
+    """
+    Render validation error messages
+    """
+    if not video_file:
+        st.error("Please upload a video file")
+    if not prompt and not (model_type == "Local Models" and selected_model == "Person on Track Detector"):
+        st.error("Please enter an analysis prompt")
+    if not api_token and model_type == "Remote API":
+        st.error("Please provide your Hugging Face API token for remote models")
+    if model_type == "Local Models" and not local_models_available:
+        st.error("Local models failed to initialize. Check your installation.")
+def render_instructions():
+    """Render the instructions section"""
+    with st.expander("How to use"):
+        st.markdown("""
+        ## Local AI Models (Recommended)
+        1. **Upload a video**: Choose a video file (MP4, AVI, MOV, or MKV)
+        2. **Select model type**: Choose "Local Models" for offline processing
+        3. **Choose AI model**:
+           - **CNN (BLIP)**: Fast, good for object detection (~1.2GB)
+           - **Transformer (ViT-GPT2)**: Detailed descriptions (~1.8GB)
+        4. **Enter a prompt**: Describe what you want the AI to analyze
+        5. **Enable/Disable Ontology**: Toggle ontology-based risk assessment
+        6. **Adjust frame rate**: Set frames per second to extract (default: 1 fps)
+        7. **Click Process**: Frames are processed locally on your machine
+        ## Ontology Analysis
+        - **✅ NONE**: No safety concerns detected
+        - **🟢 LOW**: Minor safety considerations
+        - **🟠 MEDIUM**: Moderate safety risk
+        - **⚠️ HIGH**: Significant safety risk
+        - **🚨 CRITICAL**: Immediate safety hazard
+        ## Remote API Models (Optional)
+        1. **Get API token**: Visit [Hugging Face Settings](https://huggingface.co/settings/tokens)
+        2. **Select "Remote API"** in model type
+        3. **Enter token** and select remote model
+        ## Video Support Features
+        - **Automatic corruption repair**: Handles videos with corrupted moov atoms
+        - **FFmpeg integration**: Auto-repairs problematic video files
+        - **Multiple formats**: MP4, AVI, MOV, MKV support
+        ## Requirements
+        - **Python packages**: torch, transformers, accelerate (see requirements.txt)
+        - **Optional**: FFmpeg for video repair (download from https://ffmpeg.org)
+        - **Storage**: ~3GB for both local models
+        """)

video_processing.py ADDED Viewed

	@@ -0,0 +1,112 @@

+#!/usr/bin/env python3
+"""
+Video processing utilities for frame extraction and repair
+"""
+import cv2
+import os
+import tempfile
+import subprocess
+import streamlit as st
+from PIL import Image
+from typing import List, Dict
+def repair_video_with_ffmpeg(input_path: str, output_path: str) -> bool:
+    """
+    Repair corrupted video by moving moov atom to the beginning
+    """
+    try:
+        # Try to fix the video using FFmpeg
+        cmd = [
+            'ffmpeg',
+            '-i', input_path,
+            '-c', 'copy',
+            '-movflags', 'faststart',
+            '-avoid_negative_ts', 'make_zero',
+            '-y',  # Overwrite output file
+            output_path
+        ]
+        result = subprocess.run(
+            cmd,
+            capture_output=True,
+            text=True,
+            timeout=300  # 5 minute timeout
+        )
+        return result.returncode == 0
+    except (subprocess.TimeoutExpired, FileNotFoundError):
+        return False
+def extract_frames_from_video(video_file, fps: float = 1) -> List[Dict]:
+    """
+    Extract frames from video at specified FPS (default 1 frame per second)
+    Automatically handles corrupted videos by attempting repair with FFmpeg
+    """
+    frames = []
+    with tempfile.NamedTemporaryFile(delete=False, suffix='.mp4') as tmp_file:
+        tmp_file.write(video_file.read())
+        tmp_file_path = tmp_file.name
+    repaired_path = None
+    try:
+        # First attempt: try to open video directly
+        cap = cv2.VideoCapture(tmp_file_path)
+        # Check if video opened successfully and has frames
+        if not cap.isOpened() or cap.get(cv2.CAP_PROP_FRAME_COUNT) == 0:
+            cap.release()
+            # Second attempt: try to repair the video with FFmpeg
+            st.warning("Video appears corrupted (moov atom issue). Attempting repair...")
+            with tempfile.NamedTemporaryFile(delete=False, suffix='_repaired.mp4') as repaired_file:
+                repaired_path = repaired_file.name
+            if repair_video_with_ffmpeg(tmp_file_path, repaired_path):
+                st.success("Video repair successful! Processing frames...")
+                cap = cv2.VideoCapture(repaired_path)
+            else:
+                st.error("Failed to repair video. FFmpeg may not be installed or video is severely corrupted.")
+                return frames
+        # Extract video properties
+        video_fps = cap.get(cv2.CAP_PROP_FPS)
+        if video_fps <= 0:
+            video_fps = 30  # Default fallback FPS
+        frame_interval = int(video_fps / fps) if video_fps > fps else 1
+        frame_count = 0
+        extracted_count = 0
+        while True:
+            ret, frame = cap.read()
+            if not ret:
+                break
+            if frame_count % frame_interval == 0:
+                frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
+                pil_image = Image.fromarray(frame_rgb)
+                frames.append({
+                    'frame': pil_image,
+                    'timestamp': frame_count / video_fps,
+                    'frame_number': extracted_count
+                })
+                extracted_count += 1
+            frame_count += 1
+        cap.release()
+    finally:
+        # Clean up temporary files
+        if os.path.exists(tmp_file_path):
+            os.unlink(tmp_file_path)
+        if repaired_path and os.path.exists(repaired_path):
+            os.unlink(repaired_path)
+    return frames