Spaces:

Bryceeee
/

CSRC-Car-Manual-RAG

Sleeping

App Files Files Community

Bryceeee commited on Nov 22, 2025

Commit

78e8dd4

verified ·

1 Parent(s): 338bd2f

Upload 18 files

Browse files

Files changed (18) hide show

README_SPACES.md +130 -0
app.py +224 -0
requirements.txt +30 -0
src/__init__.py +12 -0
src/__pycache__/__init__.cpython-312.pyc +0 -0
src/__pycache__/config.cpython-312.pyc +0 -0
src/__pycache__/gradio_interface.cpython-312.pyc +0 -0
src/__pycache__/knowledge_graph.cpython-312.pyc +0 -0
src/__pycache__/question_generator.cpython-312.pyc +0 -0
src/__pycache__/rag_query.cpython-312.pyc +0 -0
src/__pycache__/vector_store.cpython-312.pyc +0 -0
src/config.py +79 -0
src/evaluation.py +226 -0
src/gradio_interface.py +745 -0
src/knowledge_graph.py +323 -0
src/question_generator.py +161 -0
src/rag_query.py +118 -0
src/vector_store.py +145 -0

README_SPACES.md ADDED Viewed

	@@ -0,0 +1,130 @@

+---
+title: CSRC Car Manual RAG System
+emoji: 🚗
+colorFrom: blue
+colorTo: purple
+sdk: gradio
+sdk_version: 4.0.0
+app_file: app.py
+pinned: false
+license: mit
+---
+# CSRC Car Manual RAG System
+An intelligent RAG (Retrieval-Augmented Generation) system for querying car manual documents using OpenAI and vector stores.
+## 🚀 Features
+- **RAG-based Q&A**: Ask questions about car manual content
+- **Vector Store**: Fast and accurate document retrieval
+- **Knowledge Graph**: Visualize document relationships
+- **Personalized Learning**: Adaptive learning paths (optional)
+- **Scenario Contextualization**: Context-aware responses (optional)
+## 📋 Setup Instructions
+### 1. Clone or Upload to Hugging Face Spaces
+- **Option A**: Create a new Space on Hugging Face and upload files
+- **Option B**: Connect your GitHub repository to Spaces
+### 2. Set Environment Variables (Secrets)
+Go to **Settings > Secrets** in your Space and add:
+```
+OPENAI_API_KEY=your-openai-api-key-here
+```
+⚠️ **Important**: Never commit API keys to the repository. Always use Spaces Secrets.
+### 3. Upload PDF Files
+Ensure your PDF files are in the `car_manual/` directory:
+```
+car_manual/
+├── Function of Active Distance Assist DISTRONIC.pdf
+├── Function of Active Lane Change Assist.pdf
+├── Function of Active Steering Assist.pdf
+└── Function of Active Stop-and-Go Assist.pdf
+```
+### 4. Wait for Build
+Spaces will automatically:
+- Install dependencies from `requirements.txt`
+- Run `app.py`
+- Start the Gradio interface
+## 📁 Project Structure
+```
+.
+├── app.py                 # Hugging Face Spaces entry point
+├── main.py               # Local development entry point
+├── requirements.txt      # Python dependencies
+├── src/                  # Core modules
+├── modules/              # Feature modules
+├── car_manual/           # PDF files directory
+├── config/               # Configuration files
+└── output/               # Output directory (auto-created)
+```
+## 🔧 Configuration
+### Required
+- **OPENAI_API_KEY**: Your OpenAI API key (set in Spaces Secrets)
+### Optional
+- **PDF Files**: Place in `car_manual/` directory
+- **Vector Store**: Automatically created on first run
+## 📖 Usage
+1. Wait for the Space to build (check the logs)
+2. Open the Gradio interface
+3. Enter your question in the input field
+4. Get answers with source citations
+## 🐛 Troubleshooting
+### Error: OPENAI_API_KEY not found
+- Go to Settings > Secrets
+- Add `OPENAI_API_KEY` with your actual API key
+- Restart the Space
+### Error: No PDF files found
+- Ensure PDF files are in the `car_manual/` directory
+- Check file permissions
+- Verify file names (case-sensitive)
+### Build Fails
+- Check the logs for error messages
+- Verify `requirements.txt` is correct
+- Ensure all Python dependencies are compatible
+## 📝 Notes
+- Vector store is created automatically on first run
+- Vector store ID is saved in `config/vector_store_config.json`
+- First initialization may take time (uploading PDFs to OpenAI)
+## 🔗 Links
+- [OpenAI API Keys](https://platform.openai.com/api-keys)
+- [Hugging Face Spaces Documentation](https://huggingface.co/docs/hub/spaces)
+## 📄 License
+MIT License

app.py ADDED Viewed

	@@ -0,0 +1,224 @@

+"""
+Hugging Face Spaces Entry Point for CSRC Car Manual RAG System
+This is the entry point for Hugging Face Spaces deployment
+Note: For local development, use main.py instead.
+"""
+import os
+import sys
+from pathlib import Path
+# Detect if running in Hugging Face Spaces
+IS_SPACES = os.getenv("SPACE_ID") is not None or os.getenv("HF_SPACE") is not None
+# Add the current directory to Python path for Spaces environment
+sys.path.insert(0, str(Path(__file__).parent))
+from openai import OpenAI
+from src.config import Config
+from src.vector_store import VectorStoreManager
+from src.rag_query import RAGQueryEngine
+from src.question_generator import QuestionGenerator
+from src.knowledge_graph import KnowledgeGraphGenerator
+from src.gradio_interface import GradioInterfaceBuilder
+# Import personalized learning if available
+try:
+    from modules.personalized_learning import UserProfilingSystem, LearningPathGenerator, AdaptiveLearningEngine
+    PERSONALIZED_LEARNING_AVAILABLE = True
+except ImportError:
+    PERSONALIZED_LEARNING_AVAILABLE = False
+    print("⚠️ Personalized learning modules not available")
+# Import proactive learning if available
+try:
+    from modules.proactive_learning import ProactiveLearningEngine
+    PROACTIVE_LEARNING_AVAILABLE = True
+except ImportError:
+    PROACTIVE_LEARNING_AVAILABLE = False
+    print("⚠️ Proactive learning modules not available")
+# Import scenario contextualization if available
+try:
+    from modules.scenario_contextualization.database.scenario_database import ScenarioDatabase
+    from modules.scenario_contextualization.integration.feature_extractor import ADASFeatureExtractor
+    from modules.scenario_contextualization.retrieval.scenario_retriever import ScenarioRetriever
+    from modules.scenario_contextualization.formatting.constructive_formatter import ConstructiveFormatter
+    from modules.scenario_contextualization.integration.enhanced_rag_engine import EnhancedRAGEngine
+    SCENARIO_CONTEXTUALIZATION_AVAILABLE = True
+except ImportError as e:
+    SCENARIO_CONTEXTUALIZATION_AVAILABLE = False
+    print(f"⚠️ Scenario contextualization modules not available: {e}")
+def initialize_system(config: Config) -> dict:
+    """Initialize the RAG system components"""
+    # Initialize OpenAI client
+    if not config.openai_api_key:
+        raise ValueError(
+            "OPENAI_API_KEY not found! Please set it in Hugging Face Spaces Secrets. "
+            "Go to Settings > Secrets and add OPENAI_API_KEY"
+        )
+    client = OpenAI(api_key=config.openai_api_key)
+    # Initialize vector store manager
+    vector_store_manager = VectorStoreManager(client)
+    # Get or create vector store
+    vector_store_id = config.get_vector_store_id()
+    if not vector_store_id:
+        print("📦 Creating new vector store...")
+        pdf_files = config.get_pdf_files()
+        if not pdf_files:
+            raise ValueError(f"No PDF files found in {config.car_manual_dir}")
+        vector_store_details = vector_store_manager.create_vector_store(config.vector_store_name)
+        if not vector_store_details:
+            raise RuntimeError("Failed to create vector store")
+        vector_store_id = vector_store_details["id"]
+        config.save_vector_store_id(vector_store_id, config.vector_store_name)
+        # Upload files
+        upload_stats = vector_store_manager.upload_pdf_files(pdf_files, vector_store_id)
+        if upload_stats["successful_uploads"] == 0:
+            raise RuntimeError("Failed to upload any files")
+    else:
+        print(f"✅ Using existing vector store: {vector_store_id}")
+    # Initialize RAG query engine
+    rag_engine = RAGQueryEngine(client, vector_store_id, config.model)
+    # Initialize question generator
+    question_generator = QuestionGenerator(client, rag_engine)
+    # Initialize knowledge graph generator
+    knowledge_graph = KnowledgeGraphGenerator(client, vector_store_id, str(config.output_dir))
+    # Initialize personalized learning (if available)
+    user_profiling = None
+    learning_path_generator = None
+    adaptive_engine = None
+    if PERSONALIZED_LEARNING_AVAILABLE:
+        try:
+            user_profiling = UserProfilingSystem()
+            learning_path_generator = LearningPathGenerator(user_profiling, config.available_topics)
+            adaptive_engine = AdaptiveLearningEngine(user_profiling, learning_path_generator)
+            print("✅ Personalized Learning System initialized!")
+        except Exception as e:
+            print(f"⚠️ Error initializing Personalized Learning System: {e}")
+    # Initialize proactive learning (if available)
+    proactive_engine = None
+    if PROACTIVE_LEARNING_AVAILABLE and user_profiling:
+        try:
+            proactive_engine = ProactiveLearningEngine(
+                client, rag_engine, user_profiling, adaptive_engine, config.available_topics
+            )
+            print("✅ Proactive Learning Assistance initialized!")
+        except Exception as e:
+            print(f"⚠️ Error initializing Proactive Learning Assistance: {e}")
+    # Initialize scenario contextualization (if available)
+    enhanced_rag_engine = None
+    if SCENARIO_CONTEXTUALIZATION_AVAILABLE:
+        try:
+            scenario_database = ScenarioDatabase()
+            feature_extractor = ADASFeatureExtractor(use_llm=False, client=client)
+            scenario_retriever = ScenarioRetriever(
+                scenario_database=scenario_database,
+                scenario_vector_store_id=None,
+                client=client
+            )
+            formatter = ConstructiveFormatter()
+            enhanced_rag_engine = EnhancedRAGEngine(
+                base_rag_engine=rag_engine,
+                scenario_retriever=scenario_retriever,
+                feature_extractor=feature_extractor,
+                formatter=formatter
+            )
+            print("✅ Scenario Contextualization initialized!")
+        except Exception as e:
+            print(f"⚠️ Error initializing Scenario Contextualization: {e}")
+            import traceback
+            traceback.print_exc()
+    return {
+        "client": client,
+        "vector_store_manager": vector_store_manager,
+        "rag_engine": rag_engine,
+        "question_generator": question_generator,
+        "knowledge_graph": knowledge_graph,
+        "user_profiling": user_profiling,
+        "learning_path_generator": learning_path_generator,
+        "adaptive_engine": adaptive_engine,
+        "proactive_engine": proactive_engine,
+        "enhanced_rag_engine": enhanced_rag_engine,
+        "config": config
+    }
+def create_app():
+    """Create and return the Gradio app for Hugging Face Spaces"""
+    print("=" * 60)
+    print("🚗 CSRC Car Manual RAG System - Hugging Face Spaces")
+    print("=" * 60)
+    # Load configuration
+    config = Config()
+    # Initialize system
+    try:
+        components = initialize_system(config)
+    except Exception as e:
+        print(f"❌ Error initializing system: {e}")
+        import gradio as gr
+        # Create error interface
+        error_msg = f"""
+        # ❌ Initialization Error
+        **Error:** {str(e)}
+        **Possible solutions:**
+        1. Check if OPENAI_API_KEY is set in Spaces Secrets (Settings > Secrets)
+        2. Ensure PDF files are in the `car_manual/` directory
+        3. Check the logs for more details
+        """
+        def error_display():
+            return error_msg
+        error_interface = gr.Interface(
+            fn=error_display,
+            inputs=None,
+            outputs=gr.Markdown(),
+            title="CSRC Car Manual RAG System",
+            description="An error occurred during initialization. Please check the logs."
+        )
+        return error_interface
+    # Build Gradio interface
+    print("\n🌐 Building Gradio interface...")
+    interface_builder = GradioInterfaceBuilder(
+        rag_engine=components["rag_engine"],
+        question_generator=components["question_generator"],
+        knowledge_graph=components["knowledge_graph"],
+        config=components["config"],
+        user_profiling=components["user_profiling"],
+        adaptive_engine=components["adaptive_engine"],
+        proactive_engine=components["proactive_engine"]
+    )
+    demo = interface_builder.create_interface()
+    return demo
+# Create the app for Hugging Face Spaces
+# Spaces will automatically detect Gradio and run this
+demo = create_app()

requirements.txt ADDED Viewed

	@@ -0,0 +1,30 @@

+# CSRC Car Manual RAG System - Requirements
+# Core dependencies
+openai>=1.0.0
+PyPDF2>=3.0.0
+pandas>=2.0.0
+tqdm>=4.65.0
+# Visualization
+matplotlib>=3.7.0
+networkx>=3.0
+scikit-learn>=1.3.0
+# Web Interface
+gradio>=4.0.0
+# Evaluation (optional)
+ragas>=0.3.0
+sentence-transformers>=2.2.0
+nltk>=3.8.0
+# Utilities
+numpy>=1.24.0
+python-dotenv>=1.0.0

src/__init__.py ADDED Viewed

	@@ -0,0 +1,12 @@

+"""
+CSRC Car Manual RAG System - Modular Package
+"""
+__version__ = "1.0.0"

src/__pycache__/__init__.cpython-312.pyc ADDED Viewed

Binary file (256 Bytes). View file

src/__pycache__/config.cpython-312.pyc ADDED Viewed

Binary file (4.48 kB). View file

src/__pycache__/gradio_interface.cpython-312.pyc ADDED Viewed

Binary file (38.9 kB). View file

src/__pycache__/knowledge_graph.cpython-312.pyc ADDED Viewed

Binary file (17.2 kB). View file

src/__pycache__/question_generator.cpython-312.pyc ADDED Viewed

Binary file (7.84 kB). View file

src/__pycache__/rag_query.cpython-312.pyc ADDED Viewed

Binary file (4.78 kB). View file

src/__pycache__/vector_store.cpython-312.pyc ADDED Viewed

Binary file (6.32 kB). View file

src/config.py ADDED Viewed

	@@ -0,0 +1,79 @@

+"""
+Configuration management for the RAG system
+"""
+import os
+import json
+from typing import Optional, Dict
+from pathlib import Path
+from dotenv import load_dotenv
+# Load environment variables from .env file if it exists
+load_dotenv()
+class Config:
+    """Centralized configuration management"""
+    def __init__(self):
+        self.base_dir = Path(__file__).parent.parent
+        self.car_manual_dir = self.base_dir / "car_manual"
+        self.output_dir = self.base_dir / "output"
+        self.user_data_dir = self.base_dir / "user_data"
+        self.config_file = self.base_dir / "config" / "vector_store_config.json"
+        # Create necessary directories
+        self.car_manual_dir.mkdir(exist_ok=True)
+        self.output_dir.mkdir(exist_ok=True)
+        self.user_data_dir.mkdir(exist_ok=True)
+        # OpenAI settings
+        self.openai_api_key = os.getenv("OPENAI_API_KEY")
+        if not self.openai_api_key:
+            # Provide helpful error message
+            env_file = self.base_dir / ".env"
+            raise ValueError(
+                f"OPENAI_API_KEY not found!\n\n"
+                f"Please set your OpenAI API key using one of these methods:\n"
+                f"1. Create a .env file in the project root with: OPENAI_API_KEY=your-key-here\n"
+                f"2. Set environment variable: export OPENAI_API_KEY=your-key-here (Linux/Mac) or "
+                f"$env:OPENAI_API_KEY='your-key-here' (Windows PowerShell)\n"
+                f"3. Set environment variable: set OPENAI_API_KEY=your-key-here (Windows CMD)\n\n"
+                f"You can copy .env.example to .env and add your key there."
+            )
+        self.model = "gpt-4o-mini"
+        self.vector_store_name = "mercedes_manual_store"
+        # Available topics
+        self.available_topics = [
+            "Function of Active Distance Assist DISTRONIC",
+            "Function of Active Lane Change Assist",
+            "Function of Active Steering Assist",
+            "Function of Active Stop-and-Go Assist"
+        ]
+    def get_vector_store_id(self) -> Optional[str]:
+        """Load vector store ID from config file"""
+        if self.config_file.exists():
+            try:
+                with open(self.config_file, 'r') as f:
+                    config = json.load(f)
+                    return config.get('id')
+            except Exception as e:
+                print(f"Error loading vector store config: {e}")
+        return None
+    def save_vector_store_id(self, vector_store_id: str, name: str = None):
+        """Save vector store ID to config file"""
+        config = {
+            'id': vector_store_id,
+            'name': name or self.vector_store_name,
+            'created_at': str(Path(self.config_file).stat().st_mtime)
+        }
+        with open(self.config_file, 'w') as f:
+            json.dump(config, f, indent=2)
+    def get_pdf_files(self) -> list:
+        """Get list of PDF files in car_manual directory"""
+        pdf_files = list(self.car_manual_dir.glob("*.pdf"))
+        return [str(f) for f in pdf_files]

src/evaluation.py ADDED Viewed

	@@ -0,0 +1,226 @@

+"""
+RAG Evaluation Module
+Comprehensive evaluation system for RAG-based Q&A
+"""
+import pandas as pd
+import numpy as np
+from sentence_transformers import SentenceTransformer
+from sklearn.metrics.pairwise import cosine_similarity
+import nltk
+import re
+from typing import Dict, List, Optional
+from datetime import datetime
+# Download required NLTK data
+try:
+    nltk.download('punkt', quiet=True)
+    nltk.download('stopwords', quiet=True)
+except:
+    pass
+class ComprehensiveRAGEvaluator:
+    """Comprehensive evaluation system for RAG-based car manual Q&A"""
+    def __init__(self, rag_system, client):
+        self.rag_system = rag_system
+        self.client = client
+        self.sentence_model = SentenceTransformer('all-MiniLM-L6-v2')
+        self.evaluation_results = {}
+    def evaluate_answer_quality(self, question: str, generated_answer: str,
+                                expected_answer: str, retrieved_contexts: List[str]) -> Dict:
+        """
+        Comprehensive answer quality evaluation
+        Args:
+            question: The question asked
+            generated_answer: Answer generated by RAG system
+            expected_answer: Expected correct answer
+            retrieved_contexts: Contexts retrieved for the answer
+        Returns:
+            Dictionary of quality metrics
+        """
+        metrics = {}
+        # 1. Semantic Similarity to Expected Answer
+        gen_embedding = self.sentence_model.encode([generated_answer])
+        exp_embedding = self.sentence_model.encode([expected_answer])
+        metrics['semantic_similarity'] = cosine_similarity(gen_embedding, exp_embedding)[0][0]
+        # 2. Answer Relevance to Question
+        q_embedding = self.sentence_model.encode([question])
+        a_embedding = self.sentence_model.encode([generated_answer])
+        metrics['answer_relevance'] = cosine_similarity(q_embedding, a_embedding)[0][0]
+        # 3. Faithfulness (grounding in retrieved context)
+        metrics['faithfulness'] = self._calculate_faithfulness(generated_answer, retrieved_contexts)
+        # 4. Completeness Assessment
+        metrics['completeness'] = self._assess_completeness(question, generated_answer, expected_answer)
+        # 5. Safety Appropriateness
+        metrics['safety_appropriateness'] = self._check_safety_appropriateness(question, generated_answer)
+        # 6. Technical Accuracy
+        metrics['technical_accuracy'] = self._assess_technical_accuracy(generated_answer, retrieved_contexts)
+        # 7. Clarity and Actionability
+        metrics['clarity'] = self._assess_clarity(generated_answer)
+        metrics['actionability'] = self._assess_actionability(question, generated_answer)
+        return metrics
+    def _calculate_faithfulness(self, answer: str, contexts: List[str]) -> float:
+        """Calculate how well the answer is grounded in the retrieved contexts"""
+        if not contexts:
+            return 0.0
+        answer_sentences = nltk.sent_tokenize(answer)
+        supported_sentences = 0
+        for sentence in answer_sentences:
+            sentence_embedding = self.sentence_model.encode([sentence])
+            max_similarity = 0
+            for context in contexts:
+                context_embedding = self.sentence_model.encode([context])
+                similarity = cosine_similarity(sentence_embedding, context_embedding)[0][0]
+                max_similarity = max(max_similarity, similarity)
+            if max_similarity > 0.7:
+                supported_sentences += 1
+        return supported_sentences / len(answer_sentences) if answer_sentences else 0.0
+    def _assess_completeness(self, question: str, generated_answer: str, expected_answer: str) -> float:
+        """Assess if the generated answer covers all aspects of the expected answer"""
+        expected_words = set(expected_answer.lower().split())
+        generated_words = set(generated_answer.lower().split())
+        stop_words = set(['the', 'a', 'an', 'and', 'or', 'but', 'in', 'on', 'at', 'to', 'for', 'of', 'with', 'by'])
+        expected_words -= stop_words
+        generated_words -= stop_words
+        if not expected_words:
+            return 1.0
+        overlap = len(expected_words.intersection(generated_words))
+        return overlap / len(expected_words)
+    def _check_safety_appropriateness(self, question: str, answer: str) -> float:
+        """Check if safety-critical information is handled appropriately"""
+        safety_keywords = ['brake', 'airbag', 'emergency', 'warning', 'danger', 'caution', 'safety', 'speed', 'steering']
+        question_lower = question.lower()
+        answer_lower = answer.lower()
+        is_safety_related = any(keyword in question_lower for keyword in safety_keywords)
+        if not is_safety_related:
+            return 1.0
+        safety_indicators = ['warning', 'caution', 'important', 'ensure', 'never', 'always', 'must']
+        has_safety_language = any(indicator in answer_lower for indicator in safety_indicators)
+        return 1.0 if has_safety_language else 0.5
+    def _assess_technical_accuracy(self, answer: str, contexts: List[str]) -> float:
+        """Assess technical accuracy based on context alignment"""
+        if not contexts:
+            return 0.5
+        answer_embedding = self.sentence_model.encode([answer])
+        context_embeddings = self.sentence_model.encode(contexts)
+        similarities = cosine_similarity(answer_embedding, context_embeddings)[0]
+        return np.mean(similarities)
+    def _assess_clarity(self, answer: str) -> float:
+        """Assess clarity of the answer"""
+        sentences = nltk.sent_tokenize(answer)
+        if not sentences:
+            return 0.0
+        avg_sentence_length = np.mean([len(sentence.split()) for sentence in sentences])
+        length_score = min(1.0, 15.0 / avg_sentence_length) if avg_sentence_length > 0 else 0.0
+        structure_indicators = ['step', 'first', 'second', 'then', 'next', 'finally', '1.', '2.']
+        has_structure = any(indicator in answer.lower() for indicator in structure_indicators)
+        structure_score = 1.0 if has_structure else 0.7
+        return (length_score + structure_score) / 2
+    def _assess_actionability(self, question: str, answer: str) -> float:
+        """Assess if the answer provides actionable information"""
+        question_lower = question.lower()
+        answer_lower = answer.lower()
+        if 'how to' in question_lower or 'how do' in question_lower:
+            action_indicators = ['press', 'turn', 'select', 'push', 'pull', 'set', 'adjust', 'follow', 'ensure']
+            has_actions = any(indicator in answer_lower for indicator in action_indicators)
+            return 1.0 if has_actions else 0.3
+        return 0.8
+    def generate_evaluation_report(self) -> str:
+        """Generate comprehensive evaluation report"""
+        if not self.evaluation_results:
+            return "No evaluation results available. Run evaluation first."
+        df = pd.DataFrame(self.evaluation_results)
+        # Overall metrics
+        overall_metrics = {
+            'semantic_similarity': df['semantic_similarity'].mean(),
+            'answer_relevance': df['answer_relevance'].mean(),
+            'faithfulness': df['faithfulness'].mean(),
+            'completeness': df['completeness'].mean(),
+            'safety_appropriateness': df['safety_appropriateness'].mean(),
+            'technical_accuracy': df['technical_accuracy'].mean(),
+            'clarity': df['clarity'].mean(),
+            'actionability': df['actionability'].mean()
+        }
+        # Performance by question type
+        type_performance = df.groupby('question_type')[list(overall_metrics.keys())].mean()
+        # Performance by difficulty
+        difficulty_performance = df.groupby('difficulty')[list(overall_metrics.keys())].mean()
+        # Generate report
+        report = f"""
+# RAG System Comprehensive Evaluation Report
+Generated on: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}
+## Overall Performance Metrics
+{'-' * 40}
+{'Metric':<25} {'Score':<10} {'Interpretation':<30}
+{'-' * 40}
+{'Semantic Similarity':<25} {overall_metrics['semantic_similarity']:.3f}      {'Answer matches expected content'}
+{'Answer Relevance':<25} {overall_metrics['answer_relevance']:.3f}      {'Answer addresses the question'}
+{'Faithfulness':<25} {overall_metrics['faithfulness']:.3f}      {'Answer is grounded in context'}
+{'Completeness':<25} {overall_metrics['completeness']:.3f}      {'Answer covers all aspects'}
+{'Safety Appropriateness':<25} {overall_metrics['safety_appropriateness']:.3f}      {'Safety info handled properly'}
+{'Technical Accuracy':<25} {overall_metrics['technical_accuracy']:.3f}      {'Technically correct information'}
+{'Clarity':<25} {overall_metrics['clarity']:.3f}      {'Clear and understandable'}
+{'Actionability':<25} {overall_metrics['actionability']:.3f}      {'Provides actionable guidance'}
+{'-' * 40}
+## Performance by Question Type
+{type_performance.round(3)}
+## Performance by Difficulty Level
+{difficulty_performance.round(3)}
+"""
+        return report

src/gradio_interface.py ADDED Viewed

	@@ -0,0 +1,745 @@

+"""
+Gradio Interface Module
+Creates the main Gradio web interface for the RAG system
+"""
+import gradio as gr
+from typing import Optional
+from .rag_query import RAGQueryEngine
+from .question_generator import QuestionGenerator
+from .knowledge_graph import KnowledgeGraphGenerator
+from .config import Config
+# Import cold start onboarding functions if available
+try:
+    from modules.cold_start_onboarding import check_and_show_onboarding
+    COLD_START_AVAILABLE = True
+except ImportError:
+    COLD_START_AVAILABLE = False
+    def check_and_show_onboarding(user_profiling, user_id):
+        """Fallback function if module not available"""
+        if not user_profiling:
+            return False
+        return user_profiling.is_cold_start(user_id)
+class GradioInterfaceBuilder:
+    """Builds the Gradio interface for the RAG system"""
+    def __init__(self, rag_engine: RAGQueryEngine, question_generator: QuestionGenerator,
+                 knowledge_graph: KnowledgeGraphGenerator, config: Config,
+                 user_profiling=None, adaptive_engine=None, proactive_engine=None, enhanced_rag_engine=None):
+        self.rag_engine = rag_engine
+        self.question_generator = question_generator
+        self.knowledge_graph = knowledge_graph
+        self.config = config
+        self.user_profiling = user_profiling
+        self.adaptive_engine = adaptive_engine
+        self.proactive_engine = proactive_engine
+        self.enhanced_rag_engine = enhanced_rag_engine  # Enhanced RAG engine (scenario feature)
+    def create_interface(self):
+        """Create the main Gradio interface"""
+        with gr.Blocks(title="Mercedes E-class ADAS Manual Interface") as demo:
+            gr.Markdown("# 🚗 Mercedes E-class ADAS Manual Interface")
+            gr.Markdown("Ask questions, explore knowledge maps, and test your understanding!")
+            # Create tabs with proper order: Setup -> Ask Questions -> Knowledge Map -> Test -> Personalized Learning
+            # Ask Questions is set as default selected tab
+            with gr.Tabs(selected=1 if self.user_profiling else 0) as tabs:
+                # Tab 1: Setup (Cold Start/Onboarding) - only shown if user_profiling is available
+                if self.user_profiling:
+                    with gr.TabItem("Setup"):
+                        self._create_onboarding_tab()
+                # Tab 2: Ask Questions (Default selected tab)
+                with gr.TabItem("Ask Questions"):
+                    self._create_qa_tab()
+                # Tab 3: Knowledge Map
+                with gr.TabItem("Knowledge Map"):
+                    self._create_knowledge_map_tab()
+                # Tab 4: Test Your Knowledge
+                with gr.TabItem("Test Your Knowledge"):
+                    self._create_test_tab()
+                # Tab 5: Personalized Learning Path (if available)
+                if self.adaptive_engine:
+                    with gr.TabItem("Personalized Learning"):
+                        self._create_learning_path_tab()
+        return demo
+    def _create_qa_tab(self):
+        """Create the Q&A tab"""
+        gr.Markdown("Ask questions about your car's advanced driver assistance systems")
+        # User ID input (if user profiling is available)
+        user_id_input = None
+        if self.user_profiling:
+            with gr.Row():
+                user_id_input = gr.Textbox(
+                    label="User ID",
+                    placeholder="Enter your user ID (e.g., default_user)",
+                    value="default_user",
+                    scale=3
+                )
+                load_suggestions_btn = gr.Button("💡 Get Suggestions", variant="secondary", scale=1)
+        # Prompt suggestions area
+        suggestions_container = gr.Column(visible=False)
+        suggestions_display = None
+        refresh_suggestions_btn = None
+        cancel_suggestions_btn = None
+        regenerate_suggestions_btn = None
+        if self.proactive_engine:
+            with suggestions_container:
+                gr.Markdown("### 💡 Suggested Questions for You:")
+                suggestions_display = gr.HTML()
+                with gr.Row():
+                    refresh_suggestions_btn = gr.Button("🔄 Refresh Suggestions", variant="secondary", size="sm")
+                    cancel_suggestions_btn = gr.Button("⏹️ Stop", variant="stop", size="sm")
+                    regenerate_suggestions_btn = gr.Button("🔄 Regenerate", variant="secondary", size="sm")
+        with gr.Row():
+            query_input = gr.Textbox(
+                lines=2,
+                placeholder="Enter your question here...",
+                label="Your Question"
+            )
+        with gr.Row():
+            submit_btn = gr.Button("Get Answer", variant="primary")
+            cancel_answer_btn = gr.Button("⏹️ Stop", variant="stop")
+            regenerate_answer_btn = gr.Button("🔄 Regenerate", variant="secondary")
+        with gr.Column():
+            answer_output = gr.Markdown(label="Answer")
+            footnotes_output = gr.Markdown(label="Sources")
+            # Scenario contextualization area (collapsible)
+            scenarios_container = gr.Column(visible=False)
+            with scenarios_container:
+                scenarios_header = gr.Markdown("### 🎯 Related Scenarios")
+                scenarios_display = gr.HTML()
+            # Follow-up questions area
+            followup_container = gr.Column(visible=False)
+            cancel_followup_btn = None
+            regenerate_followup_btn = None
+            if self.proactive_engine:
+                with followup_container:
+                    gr.Markdown("### 💡 Want to learn more? Try these questions:")
+                    followup_questions_display = gr.HTML()
+                    with gr.Row():
+                        cancel_followup_btn = gr.Button("⏹️ Stop", variant="stop", size="sm")
+                        regenerate_followup_btn = gr.Button("🔄 Regenerate", variant="secondary", size="sm")
+            else:
+                followup_questions_display = gr.HTML()
+        def process_query(query, user_id="default_user"):
+            """Process query and generate follow-up questions"""
+            # Use enhanced RAG engine if available, otherwise use standard
+            if self.enhanced_rag_engine:
+                try:
+                    enhanced_answer = self.enhanced_rag_engine.query(query, user_id=user_id)
+                    answer = enhanced_answer.answer
+                    footnotes = enhanced_answer.sources
+                    scenarios_html = enhanced_answer.scenarios_html
+                    show_scenarios = enhanced_answer.scenario_count > 0
+                except Exception as e:
+                    print(f"⚠️ Error in enhanced RAG engine: {e}, falling back to standard")
+                    answer, footnotes = self.rag_engine.query(query)
+                    scenarios_html = ""
+                    show_scenarios = False
+            else:
+                answer, footnotes = self.rag_engine.query(query)
+                scenarios_html = ""
+                show_scenarios = False
+            # Update user profile with question
+            if self.user_profiling and user_id:
+                try:
+                    self.user_profiling.update_from_question(user_id, query)
+                except Exception as e:
+                    print(f"Error updating user profile: {e}")
+            # Generate follow-up questions
+            followup_html = ""
+            followup_visible = False
+            if self.proactive_engine and user_id:
+                try:
+                    followup_questions = self.proactive_engine.get_follow_up_questions(
+                        user_id, answer, max_questions=5
+                    )
+                    if followup_questions:
+                        followup_visible = True
+                        followup_html = "<div style='margin-top: 15px;'>"
+                        for i, q_data in enumerate(followup_questions, 1):
+                            question = q_data.get("question", "")
+                            bloom_level = q_data.get("bloom_level", "")
+                            # Escape quotes for JavaScript
+                            question_escaped = question.replace("'", "\\'").replace('"', '\\"')
+                            followup_html += f"""
+                            <div style='margin: 10px 0; padding: 12px; background-color: #f5f5f5; border-radius: 5px; border-left: 3px solid #4CAF50; display: flex; justify-content: space-between; align-items: center;'>
+                                <div style='flex: 1;'>
+                                    <div style='font-weight: 500; margin-bottom: 4px;'>{question}</div>
+                                    <small style='color: #666;'>Bloom Level: {bloom_level.title()}</small>
+                                </div>
+                                <button onclick="document.querySelector('textarea[label=\\'Your Question\\']').value='{question_escaped}'; this.style.backgroundColor='#4CAF50'; this.style.color='white';"
+                                        style='margin-left: 15px; padding: 8px 16px; background-color: #2196F3; color: white; border: none; border-radius: 3px; cursor: pointer; white-space: nowrap;'>
+                                    Use
+                                </button>
+                            </div>
+                            """
+                        followup_html += "</div>"
+                except Exception as e:
+                    print(f"Error generating follow-up questions: {e}")
+            # Prepare return values
+            outputs = [answer, footnotes]
+            # Add scenarios output
+            if self.enhanced_rag_engine:
+                outputs.append(gr.update(visible=show_scenarios))
+                outputs.append(scenarios_html if scenarios_html else "")
+            # Add follow-up questions output
+            if self.proactive_engine:
+                outputs.append(gr.update(visible=followup_visible))
+                outputs.append(followup_html)
+            return tuple(outputs)
+        def load_suggestions(user_id="default_user"):
+            """Load prompt suggestions"""
+            if not self.proactive_engine or not user_id:
+                return gr.update(visible=False), ""
+            try:
+                suggestions = self.proactive_engine.get_prompt_suggestions(user_id, max_suggestions=5)
+                if not suggestions:
+                    return gr.update(visible=False), ""
+                suggestions_html = "<div style='margin-top: 10px;'>"
+                for i, suggestion in enumerate(suggestions, 1):
+                    question = suggestion.get("question", "")
+                    reason = suggestion.get("reason", "")
+                    priority = suggestion.get("priority", "low")
+                    priority_color = {"high": "#f44336", "medium": "#ff9800", "low": "#4CAF50"}.get(priority, "#666")
+                    # Escape quotes for JavaScript
+                    question_escaped = question.replace("'", "\\'").replace('"', '\\"')
+                    suggestions_html += f"""
+                    <div style='margin: 10px 0; padding: 12px; background-color: #f9f9f9; border-radius: 5px; border-left: 4px solid {priority_color};'>
+                        <div style='display: flex; justify-content: space-between; align-items: start;'>
+                            <div style='flex: 1;'>
+                                <strong style='color: #333;'>{i}. {question}</strong>
+                                <br><small style='color: #666;'>{reason}</small>
+                            </div>
+                            <button onclick="document.querySelector('textarea[label=\\'Your Question\\']').value='{question_escaped}'; this.style.backgroundColor='#4CAF50'; this.style.color='white';"
+                                    style='margin-left: 10px; padding: 8px 15px; background-color: #2196F3; color: white; border: none; border-radius: 3px; cursor: pointer; white-space: nowrap;'>
+                                Use
+                            </button>
+                        </div>
+                    </div>
+                    """
+                suggestions_html += "</div>"
+                return gr.update(visible=True), suggestions_html
+            except Exception as e:
+                print(f"Error loading suggestions: {e}")
+                return gr.update(visible=False), ""
+        # Set up event handlers
+        if self.proactive_engine and user_id_input and suggestions_display:
+            # Suggestions event handlers
+            suggestions_event = load_suggestions_btn.click(
+                load_suggestions,
+                inputs=[user_id_input],
+                outputs=[suggestions_container, suggestions_display]
+            )
+            if refresh_suggestions_btn:
+                refresh_suggestions_btn.click(
+                    load_suggestions,
+                    inputs=[user_id_input],
+                    outputs=[suggestions_container, suggestions_display]
+                )
+            if regenerate_suggestions_btn:
+                regenerate_suggestions_btn.click(
+                    load_suggestions,
+                    inputs=[user_id_input],
+                    outputs=[suggestions_container, suggestions_display]
+                )
+            if cancel_suggestions_btn:
+                cancel_suggestions_btn.click(fn=None, cancels=suggestions_event)
+            # Build outputs list for query
+            outputs_list = [answer_output, footnotes_output]
+            if self.enhanced_rag_engine:
+                outputs_list.extend([scenarios_container, scenarios_display])
+            outputs_list.extend([followup_container, followup_questions_display])
+            # Query event handlers
+            query_event = submit_btn.click(
+                process_query,
+                inputs=[query_input, user_id_input],
+                outputs=outputs_list
+            )
+            regenerate_answer_btn.click(
+                process_query,
+                inputs=[query_input, user_id_input],
+                outputs=outputs_list
+            )
+            if cancel_answer_btn:
+                cancel_answer_btn.click(fn=None, cancels=query_event)
+            # Follow-up questions event handlers (regenerate only, cancel is handled by query cancel)
+            if self.proactive_engine and regenerate_followup_btn:
+                def regenerate_followup(query, user_id, answer_text):
+                    """Regenerate follow-up questions based on the current answer"""
+                    if not self.proactive_engine or not user_id or not answer_text:
+                        return gr.update(visible=False), ""
+                    try:
+                        followup_questions = self.proactive_engine.get_follow_up_questions(
+                            user_id, answer_text, max_questions=5
+                        )
+                        if followup_questions:
+                            followup_html = "<div style='margin-top: 15px;'>"
+                            for i, q_data in enumerate(followup_questions, 1):
+                                question = q_data.get("question", "")
+                                bloom_level = q_data.get("bloom_level", "")
+                                question_escaped = question.replace("'", "\\'").replace('"', '\\"')
+                                followup_html += f"""
+                                <div style='margin: 10px 0; padding: 12px; background-color: #f5f5f5; border-radius: 5px; border-left: 3px solid #4CAF50; display: flex; justify-content: space-between; align-items: center;'>
+                                    <div style='flex: 1;'>
+                                        <div style='font-weight: 500; margin-bottom: 4px;'>{question}</div>
+                                        <small style='color: #666;'>Bloom Level: {bloom_level.title()}</small>
+                                    </div>
+                                    <button onclick="document.querySelector('textarea[label=\\'Your Question\\']').value='{question_escaped}'; this.style.backgroundColor='#4CAF50'; this.style.color='white';"
+                                            style='margin-left: 15px; padding: 8px 16px; background-color: #2196F3; color: white; border: none; border-radius: 3px; cursor: pointer; white-space: nowrap;'>
+                                        Use
+                                    </button>
+                                </div>
+                                """
+                            followup_html += "</div>"
+                            return gr.update(visible=True), followup_html
+                        else:
+                            return gr.update(visible=False), ""
+                    except Exception as e:
+                        print(f"Error regenerating follow-up questions: {e}")
+                        return gr.update(visible=False), ""
+                followup_event = regenerate_followup_btn.click(
+                    regenerate_followup,
+                    inputs=[query_input, user_id_input, answer_output],
+                    outputs=[followup_container, followup_questions_display]
+                )
+                if cancel_followup_btn:
+                    cancel_followup_btn.click(fn=None, cancels=followup_event)
+        else:
+            # Build outputs list (must match process_query return values)
+            outputs_list = [answer_output, footnotes_output]
+            if self.enhanced_rag_engine:
+                outputs_list.extend([scenarios_container, scenarios_display])
+            if self.proactive_engine:
+                outputs_list.extend([followup_container, followup_questions_display])
+            query_event = submit_btn.click(
+                process_query,
+                inputs=[query_input],
+                outputs=outputs_list
+            )
+            regenerate_answer_btn.click(
+                process_query,
+                inputs=[query_input],
+                outputs=outputs_list
+            )
+            if cancel_answer_btn:
+                cancel_answer_btn.click(fn=None, cancels=query_event)
+    def _create_knowledge_map_tab(self):
+        """Create the knowledge map tab"""
+        gr.Markdown("## 📊 Car Manual Knowledge Map")
+        gr.Markdown("This visualization shows how different concepts in the car manual are related.")
+        knowledge_map_img = gr.Image(
+            value=str(self.config.output_dir / "knowledge_graph.png"),
+            label="Knowledge Graph"
+        )
+        gr.Markdown("## 🔥 Document Similarity Heatmap")
+        gr.Markdown("This heatmap shows how similar different ADAS features are to each other.")
+        similarity_heatmap_img = gr.Image(
+            value=str(self.config.output_dir / "similarity_heatmap.png"),
+            label="Similarity Heatmap"
+        )
+        with gr.Row():
+            refresh_btn = gr.Button("🔄 Refresh Visualizations", variant="secondary")
+        def refresh_images():
+            graph_path, heatmap_path = self.knowledge_graph.generate_visualizations()
+            return graph_path, heatmap_path
+        refresh_btn.click(
+            refresh_images,
+            inputs=[],
+            outputs=[knowledge_map_img, similarity_heatmap_img]
+        )
+    def _create_test_tab(self):
+        """Create the test tab"""
+        gr.Markdown("## 📝 Test Your Understanding of Mercedes E-class ADAS")
+        gr.Markdown("Select a topic to test your knowledge with multiple-choice questions based on Bloom's taxonomy levels.")
+        topic_files = self.rag_engine.get_files_from_vector_store()
+        with gr.Row():
+            test_questions = gr.State(None)
+            current_level_idx = gr.State(0)
+            selected_topic = gr.State(None)
+            test_results = gr.State([])
+            topic_dropdown = gr.Dropdown(
+                label="Select a Topic",
+                choices=topic_files,
+                value=topic_files[0] if topic_files else None,
+                interactive=True
+            )
+            start_test_btn = gr.Button("Start Test", variant="primary")
+        # Progress indicator
+        with gr.Column(visible=False) as progress_container:
+            progress_html = gr.HTML()
+        # Test container
+        with gr.Column(visible=False) as test_container:
+            taxonomy_level = gr.Markdown("Level: Remember")
+            level_description = gr.Markdown()
+            question_display = gr.Markdown()
+            option_radio = gr.Radio(
+                choices=["A", "B", "C", "D"],
+                label="Select your answer",
+                interactive=True
+            )
+            submit_answer_btn = gr.Button("Submit Answer", variant="primary")
+            feedback_display = gr.Markdown(visible=False)
+            next_question_btn = gr.Button("Next Question", visible=False)
+            show_summary_btn = gr.Button("Show Summary", visible=False)
+        # Summary container
+        with gr.Column(visible=False) as summary_container:
+            summary_topic = gr.Markdown()
+            summary_results = gr.HTML()
+            summary_recommendation = gr.Markdown()
+            restart_btn = gr.Button("Start Another Test")
+        # Connect handlers (simplified - full implementation would include all handlers)
+        # This is a placeholder structure - full implementation would be quite long
+    def _create_onboarding_tab(self):
+        """Create onboarding tab for cold start"""
+        gr.Markdown("## 🎯 Welcome! Let's Get Started")
+        gr.Markdown("Complete your profile to get a personalized learning experience.")
+        if not self.user_profiling:
+            gr.Markdown("⚠️ Personalized Learning System not initialized.")
+            return
+        user_id_input = gr.Textbox(
+            label="User ID",
+            placeholder="Enter your user ID",
+            value="default_user"
+        )
+        with gr.Accordion("📋 Step 1: Background Information", open=True):
+            background_input = gr.Radio(
+                label="What's your experience with ADAS systems?",
+                choices=[
+                    ("Beginner - I'm new to ADAS systems", "beginner"),
+                    ("Intermediate - I know some basics", "intermediate"),
+                    ("Experienced - I have good knowledge", "experienced")
+                ],
+                value="beginner"
+            )
+        with gr.Accordion("🎨 Step 2: Learning Preferences", open=True):
+            learning_style_input = gr.Radio(
+                label="How do you prefer to learn?",
+                choices=[
+                    ("Visual - I like diagrams and illustrations", "visual"),
+                    ("Textual - I prefer reading and explanations", "textual"),
+                    ("Practical - I learn by doing", "practical"),
+                    ("Mixed - I like a combination", "mixed")
+                ],
+                value="mixed"
+            )
+            learning_pace_input = gr.Radio(
+                label="What's your preferred learning pace?",
+                choices=[
+                    ("Slow - I like to take my time", "slow"),
+                    ("Medium - Normal pace is fine", "medium"),
+                    ("Fast - I want to learn quickly", "fast")
+                ],
+                value="medium"
+            )
+        with gr.Accordion("🎯 Step 3: Learning Goals", open=True):
+            learning_goals_input = gr.CheckboxGroup(
+                label="What are your learning goals?",
+                choices=[
+                    "Understand basic ADAS functions",
+                    "Learn how to operate ADAS features",
+                    "Master advanced ADAS capabilities",
+                    "Troubleshoot ADAS issues",
+                    "Prepare for certification",
+                    "General knowledge improvement"
+                ],
+                value=["Understand basic ADAS functions"]
+            )
+        with gr.Accordion("📊 Step 4: Initial Knowledge Assessment", open=True):
+            gr.Markdown("Rate your familiarity with each topic (0 = No knowledge, 1 = Expert)")
+            knowledge_sliders = {}
+            for topic in self.config.available_topics:
+                display_name = topic.replace("Function of ", "").replace(" Assist", "")
+                knowledge_sliders[topic] = gr.Slider(
+                    label=display_name,
+                    minimum=0.0,
+                    maximum=1.0,
+                    value=0.0,
+                    step=0.1
+                )
+        submit_btn = gr.Button("Complete Setup", variant="primary")
+        output_result = gr.JSON(label="Setup Result")
+        def submit_onboarding(user_id, background, learning_style, learning_pace,
+                             learning_goals, *knowledge_values):
+            """Submit cold start data"""
+            if not self.user_profiling:
+                return {"status": "error", "message": "System not initialized"}
+            # Convert knowledge_values tuple to dictionary
+            knowledge_survey = {}
+            for i, topic in enumerate(self.config.available_topics):
+                if i < len(knowledge_values):
+                    knowledge_survey[topic] = knowledge_values[i]
+                else:
+                    knowledge_survey[topic] = 0.0
+            # Handle tuple values from Radio components
+            if isinstance(background, tuple):
+                background = background[1] if len(background) > 1 else background[0]
+            if isinstance(learning_style, tuple):
+                learning_style = learning_style[1] if len(learning_style) > 1 else learning_style[0]
+            if isinstance(learning_pace, tuple):
+                learning_pace = learning_pace[1] if len(learning_pace) > 1 else learning_pace[0]
+            onboarding_data = {
+                'learning_style': learning_style,
+                'learning_pace': learning_pace,
+                'background_experience': background,
+                'learning_goals': learning_goals if learning_goals else [],
+                'initial_knowledge_survey': knowledge_survey,
+                'initial_assessment_completed': True
+            }
+            try:
+                profile = self.user_profiling.complete_onboarding(user_id, onboarding_data)
+                return {
+                    "status": "success",
+                    "message": f"Onboarding completed for {user_id}",
+                    "profile_summary": self.user_profiling.get_profile_summary(user_id)
+                }
+            except Exception as e:
+                import traceback
+                error_details = traceback.format_exc()
+                return {"status": "error", "message": f"Error: {str(e)}\nDetails: {error_details}"}
+        inputs = [user_id_input, background_input, learning_style_input,
+                 learning_pace_input, learning_goals_input] + list(knowledge_sliders.values())
+        submit_btn.click(submit_onboarding, inputs=inputs, outputs=output_result)
+    def _create_learning_path_tab(self):
+        """Create personalized learning path tab"""
+        gr.Markdown("## 🗺️ Your Personalized Learning Journey")
+        gr.Markdown("Get a customized learning path based on your knowledge profile.")
+        if not self.adaptive_engine or not self.user_profiling:
+            gr.Markdown("⚠️ Personalized Learning System not initialized.")
+            return
+        # User ID input
+        with gr.Row():
+            user_id_input = gr.Textbox(
+                label="User ID",
+                placeholder="Enter your user ID",
+                value="default_user"
+            )
+            load_profile_btn = gr.Button("Load My Profile", variant="primary")
+        # User profile display
+        with gr.Column(visible=False) as profile_container:
+            profile_summary = gr.Markdown()
+            with gr.Row():
+                with gr.Column():
+                    gr.Markdown("### 📊 Knowledge Profile")
+                    knowledge_level_display = gr.JSON()
+                with gr.Column():
+                    gr.Markdown("### 📈 Learning Statistics")
+                    learning_stats = gr.JSON()
+        # Learning path generation
+        with gr.Row():
+            focus_areas_input = gr.CheckboxGroup(
+                label="Focus Areas (Optional)",
+                choices=self.config.available_topics,
+                value=[],
+                interactive=True
+            )
+            generate_path_btn = gr.Button("Generate Learning Path", variant="primary")
+        # Learning path display
+        with gr.Column(visible=False) as path_container:
+            gr.Markdown("### 🗺️ Your Learning Path")
+            path_progress = gr.HTML()
+            path_visualization = gr.HTML()
+            with gr.Row():
+                with gr.Column():
+                    current_node_info = gr.Markdown()
+                    recommendations_display = gr.JSON()
+        def check_and_show_onboarding_wrapper(user_id):
+            """Check if user needs onboarding"""
+            if not user_id:
+                return False
+            return check_and_show_onboarding(self.user_profiling, user_id)
+        def load_user_profile(user_id):
+            """Load the user profile"""
+            if not self.user_profiling or not user_id:
+                return (gr.update(visible=False), "", {}, {}, [])
+            if check_and_show_onboarding_wrapper(user_id):
+                return (
+                    gr.update(visible=False),
+                    f"## ⚠️ Onboarding Required\n\nPlease complete onboarding first.",
+                    {},
+                    {},
+                    self.config.available_topics
+                )
+            profile = self.user_profiling.get_or_create_profile(user_id)
+            summary = self.user_profiling.get_profile_summary(user_id)
+            summary_text = f"""
+### 👤 User Profile: {user_id}
+**Learning Style:** {summary['learning_style'].title()}
+**Learning Pace:** {summary['learning_pace'].title()}
+**Overall Progress:** {summary['overall_progress']:.1%}
+**Total Questions:** {summary['total_questions']}
+**Total Tests:** {summary['total_tests']}
+**Strong Areas:** {', '.join(summary['strong_areas']) if summary['strong_areas'] else 'None'}
+**Weak Areas:** {', '.join(summary['weak_areas']) if summary['weak_areas'] else 'None'}
+"""
+            knowledge_data = summary['knowledge_level'] or {"No topics learned yet": 0.0}
+            stats_data = {
+                "Total Questions": summary['total_questions'],
+                "Total Tests": summary['total_tests'],
+                "Preferred Topics": summary['preferred_topics'][:5] if summary['preferred_topics'] else [],
+                "Overall Progress": f"{summary['overall_progress']:.1%}"
+            }
+            return (
+                gr.update(visible=True),
+                summary_text,
+                knowledge_data,
+                stats_data,
+                self.config.available_topics
+            )
+        def generate_learning_path(user_id, focus_areas):
+            """Generate learning paths"""
+            if not self.adaptive_engine or not user_id:
+                return (gr.update(visible=False), "⚠️ System not initialized.", "", "", {})
+            if check_and_show_onboarding_wrapper(user_id):
+                return (gr.update(visible=False), "⚠️ Please complete onboarding first.", "", "", {})
+            path = self.adaptive_engine.create_or_update_path(user_id, focus_areas if focus_areas else None)
+            progress_html = f"""
+            <div style="width:100%; background-color:#f0f0f0; border-radius:5px; overflow:hidden; margin:20px 0;">
+                <div style="width:{path.completion_percentage*100}%; background-color:#4CAF50; height:30px; border-radius:5px; display:flex; align-items:center; justify-content:center; color:white; font-weight:bold;">
+                    {path.completion_percentage*100:.1f}% Complete
+                </div>
+            </div>
+            <p><strong>Total Nodes:</strong> {len(path.nodes)} | <strong>Completed:</strong> {sum(1 for n in path.nodes if n.status == 'completed')} | <strong>Estimated Time:</strong> {path.estimated_total_time} minutes</p>
+            """
+            path_html = "<div style='margin:20px 0;'><h4>Learning Path:</h4><div style='display:flex; flex-direction:column; gap:10px;'>"
+            for i, node in enumerate(path.nodes):
+                status_color = {"completed": "#4CAF50", "in_progress": "#2196F3", "pending": "#9E9E9E", "skipped": "#FF9800"}.get(node.status, "#9E9E9E")
+                is_current = i == path.current_node_index
+                highlight = "border: 3px solid #FF5722; padding: 10px;" if is_current else "padding: 10px;"
+                path_html += f"""
+                <div style='{highlight} background-color:white; border-left: 5px solid {status_color}; border-radius:5px; margin:5px 0;'>
+                    <div style='display:flex; justify-content:space-between; align-items:center;'>
+                        <div><strong>{node.topic}</strong> - {node.bloom_level.title()} ({node.content_type})<br><small>Difficulty: {node.difficulty:.2f} | Time: {node.estimated_time} min</small></div>
+                        <div style='color:{status_color}; font-weight:bold;'>{node.status.title()}</div>
+                    </div>
+                </div>
+                """
+            path_html += "</div></div>"
+            if path.current_node_index < len(path.nodes):
+                current_node = path.nodes[path.current_node_index]
+                current_node_info_text = f"""
+### Current Learning Node
+**Topic:** {current_node.topic}
+**Bloom Level:** {current_node.bloom_level.title()}
+**Content Type:** {current_node.content_type.title()}
+**Difficulty:** {current_node.difficulty:.2f}
+**Estimated Time:** {current_node.estimated_time} minutes
+"""
+            else:
+                current_node_info_text = "### Learning Path Complete! 🎉"
+            recommendations = self.adaptive_engine.get_recommendations(user_id)
+            return (
+                gr.update(visible=True),
+                progress_html,
+                path_html,
+                current_node_info_text,
+                recommendations
+            )
+        load_profile_btn.click(
+            load_user_profile,
+            inputs=[user_id_input],
+            outputs=[profile_container, profile_summary, knowledge_level_display, learning_stats, focus_areas_input]
+        )
+        generate_path_btn.click(
+            generate_learning_path,
+            inputs=[user_id_input, focus_areas_input],
+            outputs=[path_container, path_progress, path_visualization, current_node_info, recommendations_display]
+        )

src/knowledge_graph.py ADDED Viewed

	@@ -0,0 +1,323 @@

+"""
+Knowledge Graph Visualization Module
+Creates knowledge maps and similarity heatmaps from document relationships
+"""
+import matplotlib.pyplot as plt
+import networkx as nx
+import numpy as np
+import os
+import re
+import json
+from typing import Tuple, Optional, Dict, List
+from openai import OpenAI
+from pathlib import Path
+class KnowledgeGraphGenerator:
+    """Generates knowledge graphs and visualizations"""
+    def __init__(self, client: OpenAI, vector_store_id: str, output_dir: str = "output"):
+        self.client = client
+        self.vector_store_id = vector_store_id
+        self.output_dir = Path(output_dir)
+        self.output_dir.mkdir(exist_ok=True)
+    def get_files_from_vector_store(self) -> List[str]:
+        """Get list of files from vector store"""
+        try:
+            query = "List all documents in the manual"
+            response = self.client.responses.create(
+                input=query,
+                model="gpt-4o-mini",
+                tools=[{
+                    "type": "file_search",
+                    "vector_store_ids": [self.vector_store_id],
+                    "max_num_results": 25
+                }]
+            )
+            file_list = []
+            if response and hasattr(response.output[1].content[0], 'annotations'):
+                annotations = response.output[1].content[0].annotations
+                file_list = list(set([annotation.filename for annotation in annotations]))
+                file_list = [f.replace('.pdf', '') for f in file_list]
+                file_list.sort()
+            return file_list
+        except Exception as e:
+            print(f"❌ Error getting files: {str(e)}")
+            return []
+    def extract_topics_from_content(self, file_list: List[str]) -> Tuple[Dict[str, List[str]], List[str]]:
+        """Extract topics from document content using GPT"""
+        all_topics = set()
+        file_topics = {}
+        file_descriptions = {}
+        print("📖 Getting content descriptions for each file...")
+        # Get descriptions for each file
+        for file in file_list:
+            try:
+                query = f"What is the main purpose and key concepts covered in the document titled '{file}'? Be brief and focused on technical concepts."
+                response = self.client.responses.create(
+                    input=query,
+                    model="gpt-4o-mini",
+                    tools=[{
+                        "type": "file_search",
+                        "vector_store_ids": [self.vector_store_id]
+                    }]
+                )
+                if response and hasattr(response.output[1], 'content'):
+                    description = response.output[1].content[0].text
+                    file_descriptions[file] = description
+                    print(f"  ✓ Got description for {file}")
+                else:
+                    file_descriptions[file] = f"Information about {file}"
+            except Exception as e:
+                print(f"  ⚠️ Error getting description for {file}: {e}")
+                file_descriptions[file] = f"Information about {file}"
+        # Extract topics from descriptions
+        prompt = "Extract key technical concepts (single words or short phrases) from these document descriptions. Focus on functional concepts, components, and technologies.\n\n"
+        for file, desc in file_descriptions.items():
+            prompt += f"Document: {file}\nDescription: {desc}\n\n"
+        prompt += "\nFor each document, list 3-5 key technical concepts. Format as a JSON object where keys are document names and values are arrays of concepts."
+        try:
+            response = self.client.chat.completions.create(
+                model="gpt-4o",
+                messages=[
+                    {"role": "system", "content": "You extract key technical concepts from document descriptions in a structured way."},
+                    {"role": "user", "content": prompt}
+                ],
+                temperature=0.3
+            )
+            topics_text = response.choices[0].message.content
+            json_match = re.search(r'\{.*\}', topics_text, re.DOTALL)
+            if json_match:
+                try:
+                    file_topics = json.loads(json_match.group(0))
+                    for topics in file_topics.values():
+                        all_topics.update(topics)
+                    print(f"✅ Successfully extracted topics for {len(file_topics)} documents")
+                except json.JSONDecodeError:
+                    print("⚠️ Error parsing JSON response, using fallback")
+                    file_topics = self._create_fallback_topics(file_list)
+            else:
+                file_topics = self._create_fallback_topics(file_list)
+        except Exception as e:
+            print(f"⚠️ Error extracting topics: {e}, using fallback")
+            file_topics = self._create_fallback_topics(file_list)
+        # Ensure all files have topics
+        for file in file_list:
+            if file not in file_topics or not file_topics[file]:
+                words = [word for word in re.findall(r'\b[A-Za-z]{3,}\b', file)
+                        if word.lower() not in ['the', 'and', 'for', 'with', 'function', 'of']]
+                file_topics[file] = words if words else ["Topic"]
+        return file_topics, list(all_topics)
+    def _create_fallback_topics(self, file_list: List[str]) -> Dict[str, List[str]]:
+        """Create fallback topics from filenames"""
+        file_topics = {}
+        for file in file_list:
+            words = [word for word in re.findall(r'\b[A-Za-z]{3,}\b', file)
+                    if word.lower() not in ['the', 'and', 'for', 'with', 'function', 'of']]
+            file_topics[file] = words if words else ["Topic"]
+        return file_topics
+    def analyze_document_relationships(self, file_list: List[str],
+                                      file_topics: Dict[str, List[str]]) -> np.ndarray:
+        """Analyze relationships between documents based on topics"""
+        n = len(file_list)
+        similarity_matrix = np.zeros((n, n))
+        # Create topic vectors
+        all_topics = set()
+        for topics in file_topics.values():
+            all_topics.update(topics)
+        topic_list = list(all_topics)
+        # Create binary vectors for each document
+        topic_vectors = {}
+        for file in file_list:
+            vector = np.zeros(len(topic_list))
+            for i, topic in enumerate(topic_list):
+                if topic in file_topics.get(file, []):
+                    vector[i] = 1
+            topic_vectors[file] = vector
+        # Calculate cosine similarity
+        for i, file1 in enumerate(file_list):
+            for j, file2 in enumerate(file_list):
+                if i == j:
+                    similarity_matrix[i][j] = 1.0
+                else:
+                    vec1 = topic_vectors.get(file1, np.zeros(len(topic_list)))
+                    vec2 = topic_vectors.get(file2, np.zeros(len(topic_list)))
+                    dot_product = np.dot(vec1, vec2)
+                    norm1 = np.linalg.norm(vec1)
+                    norm2 = np.linalg.norm(vec2)
+                    if norm1 > 0 and norm2 > 0:
+                        similarity_matrix[i][j] = dot_product / (norm1 * norm2)
+        return similarity_matrix
+    def create_knowledge_graph(self, file_list: List[str], file_topics: Dict[str, List[str]],
+                             similarity_matrix: np.ndarray) -> nx.Graph:
+        """Create knowledge graph from documents and topics"""
+        G = nx.Graph()
+        # Add document nodes
+        for file in file_list:
+            G.add_node(file, type='document', size=700)
+        # Add topic nodes and connections
+        for file, topics in file_topics.items():
+            for topic in topics:
+                if topic not in G:
+                    G.add_node(topic, type='topic', size=500)
+                G.add_edge(file, topic, weight=3)
+        # Add edges between similar documents
+        for i, file1 in enumerate(file_list):
+            for j, file2 in enumerate(file_list):
+                if i < j:
+                    sim = similarity_matrix[i][j]
+                    if sim > 0.25:
+                        G.add_edge(file1, file2, weight=sim * 5)
+        return G
+    def save_knowledge_graph(self, G: nx.Graph) -> str:
+        """Save knowledge graph visualization"""
+        plt.figure(figsize=(16, 12))
+        pos = nx.kamada_kawai_layout(G)
+        document_nodes = [n for n, attr in G.nodes(data=True) if attr.get('type') == 'document']
+        topic_nodes = [n for n, attr in G.nodes(data=True) if attr.get('type') == 'topic']
+        edge_widths = [G[u][v].get('weight', 1) * 0.6 for u, v in G.edges()]
+        nx.draw_networkx_nodes(G, pos, nodelist=document_nodes, node_color='#5B9BD5',
+                             node_size=800, alpha=0.8)
+        nx.draw_networkx_nodes(G, pos, nodelist=topic_nodes, node_color='#70AD47',
+                             node_size=600, alpha=0.8)
+        nx.draw_networkx_edges(G, pos, width=edge_widths, alpha=0.7, edge_color='#A5A5A5')
+        # Create labels
+        doc_labels = {}
+        for node in document_nodes:
+            if len(node) > 20:
+                shortened = re.sub(r'(?:Function|Operating|Setting|Activating|Deactivating) of ', '', node)
+                shortened = re.sub(r' Assist', '', shortened)
+                if len(shortened) > 20:
+                    shortened = shortened[:18] + '...'
+                doc_labels[node] = shortened
+            else:
+                doc_labels[node] = node
+        # Draw labels
+        for node, label in doc_labels.items():
+            x, y = pos[node]
+            plt.text(x, y, label, fontsize=9, fontweight='bold',
+                    ha='center', va='center',
+                    bbox=dict(facecolor='white', alpha=0.7, edgecolor='none', boxstyle='round,pad=0.3'))
+        for node in topic_nodes:
+            x, y = pos[node]
+            plt.text(x, y, node, fontsize=8, ha='center', va='center',
+                   bbox=dict(facecolor='#E8F4E5', alpha=0.9, edgecolor='none', boxstyle='round,pad=0.2'))
+        plt.title("System Knowledge Map", fontsize=18)
+        plt.axis('off')
+        plt.tight_layout()
+        output_path = self.output_dir / "knowledge_graph.png"
+        plt.savefig(output_path, dpi=300, bbox_inches='tight')
+        plt.close()
+        print(f"✅ Knowledge graph saved to {output_path}")
+        return str(output_path)
+    def save_similarity_heatmap(self, matrix: np.ndarray, labels: List[str]) -> str:
+        """Save similarity heatmap"""
+        plt.figure(figsize=(12, 10))
+        plt.imshow(matrix, cmap='Blues')
+        plt.colorbar(label='Similarity')
+        # Shorten labels
+        shortened_labels = []
+        for label in labels:
+            if len(label) > 15:
+                shortened = re.sub(r'(?:Function|Operating|Setting|Activating|Deactivating) of ', '', label)
+                shortened = re.sub(r' Assist', '', shortened)
+                if len(shortened) > 15:
+                    shortened = shortened[:13] + '...'
+                shortened_labels.append(shortened)
+            else:
+                shortened_labels.append(label)
+        plt.xticks(range(len(labels)), shortened_labels, rotation=45, ha='right')
+        plt.yticks(range(len(labels)), shortened_labels)
+        # Add similarity values
+        for i in range(len(labels)):
+            for j in range(len(labels)):
+                if i != j:
+                    plt.text(j, i, f'{matrix[i, j]:.2f}',
+                           ha="center", va="center",
+                           color="white" if matrix[i, j] > 0.5 else "black")
+        plt.title("Document Similarity Heatmap", fontsize=16)
+        plt.tight_layout()
+        output_path = self.output_dir / "similarity_heatmap.png"
+        plt.savefig(output_path, dpi=300, bbox_inches='tight')
+        plt.close()
+        print(f"✅ Similarity heatmap saved to {output_path}")
+        return str(output_path)
+    def generate_visualizations(self) -> Tuple[Optional[str], Optional[str]]:
+        """Generate both knowledge graph and heatmap visualizations"""
+        print("🔄 Generating knowledge graph visualizations...")
+        file_list = self.get_files_from_vector_store()
+        if not file_list:
+            print("⚠️ No files found. Cannot create knowledge map.")
+            return None, None
+        print("📊 Extracting topics from content...")
+        file_topics, all_topics = self.extract_topics_from_content(file_list)
+        print("🔗 Analyzing document relationships...")
+        similarity_matrix = self.analyze_document_relationships(file_list, file_topics)
+        print("🎨 Creating knowledge graph...")
+        G = self.create_knowledge_graph(file_list, file_topics, similarity_matrix)
+        print("💾 Saving visualizations...")
+        graph_path = self.save_knowledge_graph(G)
+        heatmap_path = self.save_similarity_heatmap(similarity_matrix, file_list)
+        print("✅ Dynamic visualizations complete!")
+        return graph_path, heatmap_path

src/question_generator.py ADDED Viewed

	@@ -0,0 +1,161 @@

+"""
+Question Generation Module
+Generates multiple-choice questions based on Bloom's taxonomy
+"""
+import json
+from typing import Dict, List
+from openai import OpenAI
+class QuestionGenerator:
+    """Generates educational questions based on Bloom's taxonomy"""
+    def __init__(self, client: OpenAI, rag_query_engine):
+        self.client = client
+        self.rag_query_engine = rag_query_engine
+        self.blooms_levels = {
+            "remember": "generate questions that test basic recall of facts and information",
+            "understand": "generate questions that test explanation and interpretation of concepts",
+            "apply": "generate questions that test application of knowledge in practical situations",
+            "analyze": "generate questions that test analysis of relationships and structure",
+            "evaluate": "generate questions that test evaluation and judgment based on criteria",
+            "create": "generate questions that test creation of new ideas or solutions"
+        }
+    def generate_questions(self, topic_file: str) -> Dict[str, Dict]:
+        """
+        Generate multiple-choice questions for all Bloom's taxonomy levels
+        Args:
+            topic_file: Name of the topic file (PDF)
+        Returns:
+            Dictionary mapping Bloom's level to question data
+        """
+        topic_clean = topic_file.replace('.pdf', '')
+        # Get content about this topic
+        file_content_query = f"What are the key points covered in the document '{topic_clean}'?"
+        content_response, _ = self.rag_query_engine.query(file_content_query)
+        # Build prompt
+        prompt = self._build_question_prompt(topic_clean, content_response)
+        try:
+            response = self.client.chat.completions.create(
+                model="gpt-4o",
+                messages=[
+                    {"role": "system", "content": "You are an expert in creating educational assessment materials for automotive systems."},
+                    {"role": "user", "content": prompt}
+                ],
+                temperature=0.4,
+                response_format={"type": "json_object"}
+            )
+            response_text = response.choices[0].message.content
+            try:
+                questions_data = json.loads(response_text)
+                question_dict = self._validate_and_format_questions(questions_data, topic_clean)
+                return question_dict
+            except json.JSONDecodeError as e:
+                print(f"⚠️ Error parsing JSON: {e}")
+                return self._create_fallback_questions(topic_clean)
+        except Exception as e:
+            print(f"❌ Error generating questions: {e}")
+            return self._create_fallback_questions(topic_clean)
+    def _build_question_prompt(self, topic_clean: str, content_response: str) -> str:
+        """Build the prompt for question generation"""
+        prompt = f"""You are a tester trying to come up with multiple choice questions from system users based on the input car manuals.
+You are trying to make it not tricky, but at the same time not too easy. However, users' understanding of the system is your utmost priority.
+Create 1 multiple choice question based on the manual file about '{topic_clean}' for each of the following levels of Bloom's taxonomy:
+- Remember: {self.blooms_levels['remember']}
+- Understand: {self.blooms_levels['understand']}
+- Apply: {self.blooms_levels['apply']}
+- Analyze: {self.blooms_levels['analyze']}
+- Evaluate: {self.blooms_levels['evaluate']}
+- Create: {self.blooms_levels['create']}
+Try to find the most important and insightful content for each question. Do note where the right answer in the manual file is located.
+Separate the questions and explanations (i.e., only write all the explanations at the end).
+Please do not generate questions that give varying numbers as answers. Test users' concepts and understanding of the vehicle system.
+Make sure there are no questions with possibility of two correct answers.
+Try to have a definitive right answer. Be slow and steady.
+Here is the content from the manual:
+{content_response}
+Output your response as a clean JSON object with these fields for each question:
+- level (string): the Bloom's taxonomy level
+- question_text (string): the full question text
+- options (array): four answer choices as strings
+- correct_option_index (integer): index of the correct answer (0-3)
+- explanation (string): explanation of why the correct answer is right
+Example JSON format:
+{{
+  "questions": [
+    {{
+      "level": "remember",
+      "question_text": "What does DISTRONIC stand for?",
+      "options": ["Distance Control", "Dynamic Intelligent Speed Tronic", "Direct Intelligence Control", "Digital Road Navigation Intelligence Control"],
+      "correct_option_index": 1,
+      "explanation": "DISTRONIC stands for Dynamic Intelligent Speed Tronic as stated in section 3.2 of the manual."
+    }}
+  ]
+}}
+"""
+        return prompt
+    def _validate_and_format_questions(self, questions_data: Dict, topic_clean: str) -> Dict[str, Dict]:
+        """Validate and format questions, ensuring all levels are present"""
+        expected_levels = ["remember", "understand", "apply", "analyze", "evaluate", "create"]
+        question_dict = {}
+        for q in questions_data.get("questions", []):
+            level = q.get("level", "").lower()
+            if level in expected_levels:
+                question_dict[level] = q
+        # Fill missing levels with fallback questions
+        for level in expected_levels:
+            if level not in question_dict:
+                print(f"⚠️ Missing question for level: {level}")
+                question_dict[level] = {
+                    "level": level,
+                    "question_text": f"Question for {level} level could not be generated.",
+                    "options": ["Option A", "Option B", "Option C", "Option D"],
+                    "correct_option_index": 0,
+                    "explanation": "Please try again or select a different topic."
+                }
+        return question_dict
+    def _create_fallback_questions(self, topic_name: str) -> Dict[str, Dict]:
+        """Create fallback questions when generation fails"""
+        fallback = {}
+        for level in ["remember", "understand", "apply", "analyze", "evaluate", "create"]:
+            fallback[level] = {
+                "level": level,
+                "question_text": f"What is a key feature of {topic_name}?",
+                "options": [
+                    f"Option A about {topic_name}",
+                    f"Option B about {topic_name}",
+                    f"Option C about {topic_name}",
+                    f"Option D about {topic_name}"
+                ],
+                "correct_option_index": 0,
+                "explanation": f"This is a fallback question for the {level} level. Please try again or select a different topic."
+            }
+        return fallback

src/rag_query.py ADDED Viewed

	@@ -0,0 +1,118 @@

+"""
+RAG Query Module
+Handles querying the RAG system and extracting answers with sources
+"""
+from typing import Tuple, Optional, List
+from openai import OpenAI
+class RAGQueryEngine:
+    """Handles RAG queries with source attribution"""
+    def __init__(self, client: OpenAI, vector_store_id: str, model: str = "gpt-4o-mini"):
+        self.client = client
+        self.vector_store_id = vector_store_id
+        self.model = model
+    def get_response_from_vectorstore(self, query: str):
+        """
+        Get response from vector store using OpenAI responses API
+        Args:
+            query: User query
+        Returns:
+            Response object or None if failed
+        """
+        try:
+            response = self.client.responses.create(
+                input=query,
+                model=self.model,
+                tools=[{
+                    "type": "file_search",
+                    "vector_store_ids": [self.vector_store_id],
+                }]
+            )
+            # Check if response is valid
+            if response and hasattr(response.output[1], 'content'):
+                return response
+            else:
+                print("⚠️ Invalid response structure")
+                return None
+        except Exception as e:
+            print(f"❌ Error during API call: {e}")
+            return None
+    def query(self, query: str) -> Tuple[str, str]:
+        """
+        Query the RAG model and return answer with sources
+        Args:
+            query: User query
+        Returns:
+            Tuple of (answer_text, footnotes)
+        """
+        response = self.get_response_from_vectorstore(query)
+        if not response:
+            return "That question is outside my area of expertise.", ""
+        # Extract the answer text
+        answer_text = response.output[1].content[0].text
+        # Extract the source files
+        footnotes = ""
+        if hasattr(response.output[1].content[0], 'annotations'):
+            annotations = response.output[1].content[0].annotations
+            if annotations:
+                # Get unique source files
+                source_files = list(set([result.filename for result in annotations]))
+                # Format the footnotes
+                footnotes = "\n\n📚 **Sources:**\n"
+                for i, filename in enumerate(source_files, 1):
+                    # Remove the ".pdf" extension and format nicely
+                    clean_name = filename.replace('.pdf', '')
+                    footnotes += f"{i}. {clean_name}\n"
+        return answer_text, footnotes
+    def get_files_from_vector_store(self) -> List[str]:
+        """
+        Get list of files in the vector store
+        Returns:
+            List of filenames
+        """
+        try:
+            query = "List all documents about Mercedes E-class ADAS features"
+            response = self.get_response_from_vectorstore(query)
+            file_list = []
+            if response and hasattr(response.output[1].content[0], 'annotations'):
+                annotations = response.output[1].content[0].annotations
+                file_list = list(set([annotation.filename for annotation in annotations]))
+                file_list.sort()
+            # Fallback to default list if empty
+            if not file_list:
+                file_list = [
+                    "Function of Active Distance Assist DISTRONIC.pdf",
+                    "Function of Active Lane Change Assist.pdf",
+                    "Function of Active Steering Assist.pdf",
+                    "Function of Active Stop-and-Go Assist.pdf"
+                ]
+            return file_list
+        except Exception as e:
+            print(f"❌ Error getting files: {str(e)}")
+            return []

src/vector_store.py ADDED Viewed

	@@ -0,0 +1,145 @@

+"""
+Vector Store Management Module
+Handles creation, file upload, and management of OpenAI vector stores
+"""
+from typing import Dict, List, Optional
+from concurrent.futures import ThreadPoolExecutor
+from tqdm import tqdm
+import concurrent.futures
+import os
+from openai import OpenAI
+class VectorStoreManager:
+    """Manages OpenAI vector store operations"""
+    def __init__(self, client: OpenAI):
+        self.client = client
+    def create_vector_store(self, store_name: str) -> Optional[Dict]:
+        """
+        Create a Vector Store on OpenAI's servers
+        Args:
+            store_name: Name for the vector store
+        Returns:
+            Dictionary with vector store details or None if failed
+        """
+        try:
+            vector_store = self.client.vector_stores.create(name=store_name)
+            details = {
+                "id": vector_store.id,
+                "name": vector_store.name,
+                "created_at": vector_store.created_at,
+                "file_count": vector_store.file_counts.completed
+            }
+            print(f"✅ Vector store created: {details}")
+            return details
+        except Exception as e:
+            print(f"❌ Error creating vector store: {e}")
+            return None
+    def upload_single_pdf(self, file_path: str, vector_store_id: str) -> Dict:
+        """
+        Upload a single PDF file to the vector store
+        Args:
+            file_path: Path to the PDF file
+            vector_store_id: ID of the vector store
+        Returns:
+            Dictionary with upload status
+        """
+        file_name = os.path.basename(file_path)
+        try:
+            # Create file
+            with open(file_path, 'rb') as f:
+                file_response = self.client.files.create(
+                    file=f,
+                    purpose="assistants"
+                )
+            # Attach to vector store
+            attach_response = self.client.vector_stores.files.create(
+                vector_store_id=vector_store_id,
+                file_id=file_response.id
+            )
+            return {"file": file_name, "status": "success"}
+        except Exception as e:
+            print(f"❌ Error uploading {file_name}: {str(e)}")
+            return {"file": file_name, "status": "failed", "error": str(e)}
+    def upload_pdf_files(self, pdf_files: List[str], vector_store_id: str,
+                        max_workers: int = 10) -> Dict:
+        """
+        Upload multiple PDF files to vector store in parallel
+        Args:
+            pdf_files: List of PDF file paths
+            vector_store_id: ID of the vector store
+            max_workers: Maximum number of parallel workers
+        Returns:
+            Dictionary with upload statistics
+        """
+        stats = {
+            "total_files": len(pdf_files),
+            "successful_uploads": 0,
+            "failed_uploads": 0,
+            "errors": []
+        }
+        if not pdf_files:
+            print("⚠️ No PDF files to upload")
+            return stats
+        print(f"📤 Uploading {len(pdf_files)} PDF files in parallel...")
+        with ThreadPoolExecutor(max_workers=max_workers) as executor:
+            futures = {
+                executor.submit(self.upload_single_pdf, file_path, vector_store_id): file_path
+                for file_path in pdf_files
+            }
+            for future in tqdm(concurrent.futures.as_completed(futures),
+                             total=len(pdf_files), desc="Uploading"):
+                result = future.result()
+                if result["status"] == "success":
+                    stats["successful_uploads"] += 1
+                else:
+                    stats["failed_uploads"] += 1
+                    stats["errors"].append(result)
+        print(f"✅ Upload complete: {stats['successful_uploads']}/{stats['total_files']} successful")
+        return stats
+    def search_vector_store(self, query: str, vector_store_id: str,
+                          max_results: int = 10):
+        """
+        Search the vector store directly
+        Args:
+            query: Search query
+            vector_store_id: ID of the vector store
+            max_results: Maximum number of results
+        Returns:
+            Search results
+        """
+        try:
+            search_results = self.client.vector_stores.search(
+                vector_store_id=vector_store_id,
+                query=query
+            )
+            return search_results
+        except Exception as e:
+            print(f"❌ Error searching vector store: {e}")
+            return None