# ContextFlow Architecture: Complete System Overview ## Table of Contents 1. [System Vision](#1-system-vision) 2. [High-Level Architecture](#2-high-level-architecture) 3. [Frontend Layer](#3-frontend-layer) 4. [Backend Layer](#4-backend-layer) 5. [Agent Network](#5-agent-network) 6. [Reinforcement Learning Pipeline](#6-reinforcement-learning-pipeline) 7. [Data Flow](#7-data-flow) 8. [API Design](#8-api-design) 9. [Multi-Modal Detection](#9-multi-modal-detection) 10. [Privacy & Security](#10-privacy--security) 11. [Deployment Architecture](#11-deployment-architecture) --- ## 1. System Vision **ContextFlow** is an AI-powered learning intelligence engine that predicts when learners will get confused BEFORE it happens, enabling proactive intervention in educational settings. ### Core Problem Solved - Traditional learning systems are **reactive** - they respond after confusion occurs - ContextFlow is **proactive** - it predicts confusion and intervenes before disengagement ### Key Innovations 1. **Predictive AI** - RL-based doubt prediction 2. **Gesture Control** - Hands-free learning assistance 3. **Multi-Agent Orchestration** - 9 specialized agents working in concert 4. **Privacy-First** - Face blur for classroom deployment --- ## 2. High-Level Architecture ``` ┌─────────────────────────────────────────────────────────────────────┐ │ USERS │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ │ Students │ │ Teachers │ │ Researchers │ │ │ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │ └─────────┼─────────────────┼─────────────────┼─────────────────────────┘ │ │ │ ▼ ▼ ▼ ┌─────────────────────────────────────────────────────────────────────┐ │ PRESENTATION LAYER │ │ ┌─────────────────────────────────────────────────────────────┐ │ │ │ React Frontend (Vite) │ │ │ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │ │ │ │ Learn │ │ LLMFlow │ │Gestures │ │ Predict │ ... │ │ │ │ │ Tab │ │ Tab │ │ Tab │ │ Tab │ │ │ │ │ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │ │ │ │ │ │ │ │ ┌─────────────────────────────────────────────────────┐ │ │ │ │ │ MediaPipe Camera Feed (Gesture + Face) │ │ │ │ │ │ ┌──────────┐ ┌──────────┐ │ │ │ │ │ │ │ Hand │ │ Face │ │ │ │ │ │ │ │ Detection │ │ Blur │ │ │ │ │ │ │ └──────────┘ └──────────┘ │ │ │ │ │ └─────────────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────────────┘ │ │ REST API (JSON) │ WebSocket (Optional) ▼ ┌─────────────────────────────────────────────────────────────────────┐ │ BACKEND LAYER (Flask) │ │ │ │ ┌──────────────────────────────────────────────────────────────┐ │ │ │ API Gateway (Flask Blueprints) │ │ │ │ /api/session/* /api/predict/* /api/gesture/* /api/* │ │ │ └──────────────────────────────────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌──────────────────────────────────────────────────────────────┐ │ │ │ STUDY ORCHESTRATOR (Central Coordinator) │ │ │ │ ┌────────────────────────────────────────────────────┐ │ │ │ │ │ Agent Registry │ │ │ │ │ │ DoubtPredictor │ Behavioral │ Gesture │ Recall │ │ │ │ │ │ KnowledgeGraph │ PeerLearn │ LLMOrch │ Prompt │ │ │ │ │ └────────────────────────────────────────────────────┘ │ │ │ └──────────────────────────────────────────────────────────────┘ │ │ │ │ │ ┌───────────────┬─────────────┼─────────────┬───────────────┐ │ │ ▼ ▼ ▼ ▼ ▼ │ │ ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ │ │ │ Q- │ │Behavioral│ │Gesture│ │Recall│ │LLM │ │ │ │Network│ │Agent │ │Agent │ │Agent │ │Orch │ │ │ └─────┘ └─────┘ └─────┘ └─────┘ └─────┘ │ │ │ └─────────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────────┐ │ DATA LAYER │ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐ │ │ │ Checkpoint │ │ Session │ │ Knowledge │ │ Real │ │ │ │ (RL Model) │ │ State │ │ Graph │ │ Data │ │ │ │ .pkl │ │ JSON │ │ NetworkX │ │ Collection│ │ │ └────────────┘ └────────────┘ └────────────┘ └────────────┘ │ └─────────────────────────────────────────────────────────────────────┘ ``` --- ## 3. Frontend Layer ### 3.1 Technology Stack | Component | Technology | Purpose | |-----------|------------|---------| | Framework | React 18 | UI Components | | Build Tool | Vite | Fast development | | Styling | Tailwind CSS | Responsive design | | Icons | Lucide React | Consistent icons | | Camera | MediaPipe | Hand/Face detection | ### 3.2 Application Structure ``` frontend/src/ ├── App.jsx # Main application (9 tabs) ├── main.jsx # Entry point ├── index.css # Global styles ├── BrowserLLMLauncher.js # AI chat launcher └── MediaPipeProcessor.js # Camera + gesture processing ``` ### 3.3 Tab Interface | Tab | Purpose | |-----|---------| | **Learn** | Dashboard with predictions, reviews, gamification | | **LLM Flow** | Browser-based AI launcher (no API keys) | | **Gestures** | Train custom hand gestures | | **Predict** | RL doubt prediction visualization | | **Behavior** | Behavioral signal tracking | | **Peer** | Social learning insights | | **Stats** | Learning statistics | | **Gamify** | Fish/XP rewards system | | **Settings** | AI provider configuration | ### 3.4 BrowserLLMLauncher.js Opens AI chats directly in browser without API keys: ```javascript // Opens chat.openai.com with pre-filled context openAIChat(context, model = 'gpt-4') { const url = `https://chat.openai.com/?q=${encodeURIComponent(context)}`; window.open(url, '_blank'); } ``` ### 3.5 MediaPipeProcessor.js Handles real-time camera processing: ``` ┌─────────────────┐ │ Camera Feed │ └────────┬────────┘ │ ▼ ┌─────────────────┐ ┌─────────────────┐ │ Hand Landmark │ │ Face Mesh │ │ Detection │ │ Detection │ │ (21 points) │ │ (468 points) │ └────────┬────────┘ └────────┬────────┘ │ │ ▼ ▼ ┌─────────────────┐ ┌─────────────────┐ │ Gesture │ │ Face Blur │ │ Recognition │───▶│ (Privacy) │ └────────┬────────┘ └─────────────────┘ │ ▼ ┌─────────────────┐ │ Backend API │ │ /api/gesture/ │ └─────────────────┘ ``` --- ## 4. Backend Layer ### 4.1 Technology Stack | Component | Technology | Purpose | |-----------|------------|---------| | Framework | Flask | REST API | | Async | asyncio | Non-blocking I/O | | ML | PyTorch | RL model | | Data | NumPy | Feature extraction | | Graphs | NetworkX | Knowledge graphs | | Storage | JSON/SQLite | Session persistence | ### 4.2 Flask Application Structure ``` backend/ ├── run.py # Application entry point ├── app/ │ ├── __init__.py # Flask app factory │ ├── config.py # Configuration │ ├── api/ │ │ ├── __init__.py │ │ └── main.py # All API routes (889 lines) │ └── agents/ │ ├── __init__.py │ ├── study_orchestrator.py # Central coordinator │ ├── doubt_predictor.py # RL prediction │ ├── behavioral_agent.py # Signal processing │ ├── hand_gesture_agent.py # MediaPipe integration │ ├── recall_agent.py # Spaced repetition │ ├── knowledge_graph_agent.py # Concept mapping │ ├── peer_learning_agent.py # Social learning │ ├── llm_orchestrator_agent.py # Multi-AI │ ├── gesture_action_agent.py # Gesture→Action │ └── prompt_agent.py # Prompt templates ``` ### 4.3 Flask App Factory ```python def create_app(): app = Flask(__name__) # Load config app.config.from_object('app.config.Config') # Register blueprints from app.api.main import api app.register_blueprint(api, url_prefix='/api') # Initialize agents init_agents() return app ``` --- ## 5. Agent Network ### 5.1 Agent Overview ``` ┌─────────────────────────────────────────────────────────────┐ │ STUDY ORCHESTRATOR │ │ (Central Coordinator) │ │ │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ │ Doubt │ │ Behavioral │ │ Hand │ │ │ │ Predictor │◀─│ Agent │─▶│ Gesture │ │ │ │ Agent │ │ │ │ Agent │ │ │ └──────┬──────┘ └─────────────┘ └──────┬──────┘ │ │ │ │ │ │ ▼ ▼ │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ │ Knowledge │ │ Recall │ │ LLM │ │ │ │ Graph │◀─│ Agent │─▶│ Orchestrator│ │ │ │ Agent │ │ │ │ │ │ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │ │ │ ┌─────────────┐ ┌─────────────┐ │ │ │ Peer │ │ Gesture │ │ │ │ Learning │ │ Action │ │ │ │ Agent │ │ Mapper │ │ │ └─────────────┘ └─────────────┘ │ └─────────────────────────────────────────────────────────────┘ ``` ### 5.2 StudyOrchestrator (Central Coordinator) The orchestrator manages the learning lifecycle: ```python class StudyOrchestrator: def __init__(self, user_id: str): self.user_id = user_id # Initialize all agents self.doubt_predictor = DoubtPredictorAgent(user_id) self.behavioral_agent = BehavioralAgent(user_id) self.gesture_agent = HandGestureAgent(user_id) self.recall_agent = RecallAgent(user_id) self.knowledge_graph = KnowledgeGraphAgent(user_id) self.peer_agent = PeerLearningAgent(user_id) # State management self.state = OrchestratorState() ``` **Session Lifecycle:** 1. **PRE_LEARNING** - Load predictions, check recalls, get peer insights 2. **ACTIVE_LEARNING** - Monitor signals, update predictions, capture doubts 3. **REVIEW** - Trigger spaced repetition, update knowledge graph 4. **POST_LEARNING** - Sync data, update gamification, generate summary ### 5.3 DoubtPredictorAgent (RL Core) Predicts confusion before it happens: ```python class DoubtPredictorAgent: def __init__(self, user_id: str, config: dict = None): self.user_id = user_id self.model = self._load_checkpoint() self.feature_extractor = FeatureExtractor() def predict_doubts(self, context: dict, top_k: int = 5): # 1. Extract 64-dim state vector state = self.feature_extractor.extract_state(context) # 2. Get Q-values from RL model q_values = self.model.predict(state) # 3. Return top-k predictions return self._format_predictions(q_values, top_k) ``` ### 5.4 BehavioralAgent Processes raw behavioral signals: ```python class BehavioralSignal: mouse_hesitation: float # Pause frequency scroll_reversals: int # Back-and-forth time_on_page: float # Seconds eye_tracking: Tuple[float, float] click_frequency: int def calculate_confusion_score(self) -> float: # Weighted average of signals weights = { 'hesitation': 0.3, 'reversals': 0.25, 'time_on_page': 0.2, 'tab_switches': 0.15, 'back_button': 0.1 } return weighted_sum(signals, weights) ``` ### 5.5 HandGestureAgent MediaPipe integration for gesture recognition: ``` Camera Frame │ ▼ ┌─────────────────┐ │ MediaPipe Hands │ │ (21 landmarks) │ └────────┬────────┘ │ ▼ ┌─────────────────┐ │ Gesture Template│ │ Matching │ └────────┬────────┘ │ ▼ ┌─────────────────┐ │ Confidence │──▶ Recognized Gesture │ Score (0-1) │ └─────────────────┘ ``` **Pre-built Gestures:** | Gesture | Description | |---------|-------------| | pinch | Thumb + Index | | swipe_up | 2-finger up | | swipe_down | 2-finger down | | swipe_right | 2-finger right | | swipe_left | 2-finger left | | point | Index extended | | wave | Open palm wave | | thumbs_up | 👍 confirmation | | thumbs_down | 👎 rejection | | fist | Closed hand | ### 5.6 RecallAgent SM-2 based spaced repetition: ```python class RecallCard: front: str # Question back: str # Answer interval: int # Days until review ease_factor: float # Difficulty (default 2.5) repetitions: int # Successful reviews def schedule_review(card: RecallCard, quality: int): if quality >= 3: # Correct if card.repetitions == 0: card.interval = 1 elif card.repetitions == 1: card.interval = 6 else: card.interval *= card.ease_factor card.repetitions += 1 else: # Incorrect card.repetitions = 0 card.interval = 1 # Update ease factor card.ease_factor += (0.1 - (5 - quality) * (0.08 + (5 - quality) * 0.02)) card.ease_factor = max(1.3, card.ease_factor) ``` ### 5.7 KnowledgeGraphAgent Concept mapping with NetworkX: ```python class KnowledgeGraphAgent: def __init__(self, user_id: str): self.graph = nx.MultiDiGraph() def add_doubt_to_graph(self, doubt: dict): # Create node self.graph.add_node( doubt['concept'], type='concept', topic=doubt['topic'], timestamp=datetime.now() ) # Connect to prerequisites for prereq in doubt.get('prerequisites', []): self.graph.add_edge(prereq, doubt['concept'], type='prerequisite') # Connect to related concepts for related in doubt.get('related', []): self.graph.add_edge(doubt['concept'], related, type='related') def find_learning_path(self, from_topic: str, to_topic: str): try: return nx.shortest_path(self.graph, from_topic, to_topic) except nx.NetworkXNoPath: return [] ``` ### 5.8 LLMOrchestrator Multi-provider AI integration: ```python class LLMOrchestrator: SUPPORTED_PROVIDERS = { 'chatgpt': LLMProvider.CHATGPT, 'gemini': LLMProvider.GEMINI, 'claude': LLMProvider.CLAUDE, 'deepseek': LLMProvider.DEEPSEEK, 'ollama': LLMProvider.OLLAMA, 'groq': LLMProvider.GROQ } async def query_parallel(self, request: LLMRequest): tasks = [] for provider in request.providers: task = self._query_provider(provider, request) tasks.append(task) # Execute all queries concurrently responses = await asyncio.gather(*tasks, return_exceptions=True) return [r for r in responses if not isinstance(r, Exception)] ``` ### 5.9 GestureActionMapper Maps gestures to system actions: ```python class GestureAction(Enum): QUERY_MULTI_LLM = "query_multi_llm" QUERY_CHATGPT = "query_chatgpt" QUERY_GEMINI = "query_gemini" TRIGGER_RL_LOOP = "trigger_rl_loop" CAPTURE_CONTENT = "capture_content" PAUSE_SESSION = "pause_session" RESUME_SESSION = "resume_session" class GestureActionMapper: def __init__(self): self.action_rules = { GestureAction.QUERY_MULTI_LLM: { "trigger": {"finger_count": 2, "swipe": "right"} }, GestureAction.PAUSE_SESSION: { "trigger": {"gesture": "open_palm"} }, GestureAction.RESUME_SESSION: { "trigger": {"gesture": "thumbs_up"} } } ``` ### 5.10 PeerLearningAgent Social learning insights: ```python class PeerLearningAgent: def get_peer_insights(self, topic: str): # Aggregate insights from "similar" students insights = [] # Find students who learned this topic similar_students = self._find_similar_students(topic) for student in similar_students: # What confused them? insights.extend(student.difficult_concepts) # Return aggregated insights return self._aggregate_insights(insights) ``` --- ## 6. Reinforcement Learning Pipeline ### 6.1 Problem Formulation **State Space (64 dimensions):** ``` ┌────────────────────────────────────────────────────────────────┐ │ Topic Embedding (32) │ Progress │ Confusion (16) │ Gesture (14) │ Time │ │ TF-IDF of topic │ 0.0-1.0 │ Behavioral │ Hand │ 0-1 │ │ │ │ signals │ signals │ │ └────────────────────────────────────────────────────────────────┘ ``` **Action Space (10 doubt types):** 1. `what_is_backpropagation` 2. `why_gradient_descent` 3. `how_overfitting_works` 4. `explain_regularization` 5. `what_loss_function` 6. `how_optimization_works` 7. `explain_learning_rate` 8. `what_regularization` 9. `how_batch_norm_works` 10. `explain_softmax` **Reward Function:** | Event | Reward | |-------|--------| | Correct prediction | +1.0 | | Helpful explanation | +0.5 | | Engagement maintained | +0.3 | | False positive | -0.5 | | Missed confusion | -1.0 | ### 6.2 Q-Network Architecture ```python class QNetwork(nn.Module): def __init__(self, state_dim=64, action_dim=10, hidden_dim=128): super().__init__() self.fc1 = nn.Linear(state_dim, hidden_dim) # 64 → 128 self.fc2 = nn.Linear(hidden_dim, hidden_dim) # 128 → 128 self.fc3 = nn.Linear(hidden_dim, action_dim) # 128 → 10 def forward(self, x): x = F.relu(self.fc1(x)) # ReLU activation x = F.relu(self.fc2(x)) return self.fc3(x) # Q-values for each action ``` ### 6.3 Training Algorithm (GRPO) ```python class DoubtPredictionRL: def train(self, epochs=10, batch_size=32): for epoch in range(epochs): for batch in self.dataloader: # 1. Get current Q-values q_values = self.q_network(batch.states) # 2. Compute targets (GRPO-style) with torch.no_grad(): next_q = self.target_network(batch.next_states).max(1)[0] targets = batch.rewards + self.gamma * next_q * (~batch.dones) # 3. Compute loss and update loss = self.loss_fn(q_values.gather(1, batch.actions), targets) loss.backward() self.optimizer.step() # 4. Update target network self.update_target_network() # 5. Decay epsilon (exploration) self.epsilon *= self.epsilon_decay ``` ### 6.4 Feature Extraction ```python class FeatureExtractor: STATE_DIM = 64 def extract_state(self, context: dict) -> np.ndarray: # Topic embedding (32 dims) topic_emb = self._extract_topic_embedding(context['topic']) # Progress (1 dim) progress = np.array([context['progress']]) # Confusion signals (16 dims) confusion = self._extract_confusion_signals(context['confusion_signals']) # Gesture signals (14 dims) gestures = self._extract_gesture_signals(context['gesture_signals']) # Time spent (1 dim) time_spent = np.array([context['time_spent'] / 1800]) # Concatenate return np.concatenate([topic_emb, progress, confusion, gestures, time_spent]) ``` --- ## 7. Data Flow ### 7.1 Learning Session Flow ``` ┌─────────────────────────────────────────────────────────────────┐ │ USER STARTS SESSION │ └─────────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ ORCHESTRATOR.START_SESSION() │ │ 1. Create new LearningSession │ │ 2. Load RL model checkpoint │ │ 3. Build learning context │ └─────────────────────────────────────────────────────────────────┘ │ ┌───────────────┼───────────────┐ ▼ ▼ ▼ ┌───────────┐ ┌───────────┐ ┌───────────┐ │ Doubt │ │ Behavioral│ │ Peer │ │ Predictor │ │ Agent │ │ Learning │ │ │ │ │ │ Agent │ │ Predict │ │ Analyze │ │ Get │ │ doubts │ │ signals │ │ insights │ └─────┬─────┘ └─────┬─────┘ └─────┬─────┘ │ │ │ └───────────────┼───────────────┘ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ RETURN INITIAL PREDICTIONS │ │ - Top 5 predicted doubts │ │ - Pending reviews │ │ - Peer insights │ └─────────────────────────────────────────────────────────────────┘ ``` ### 7.2 Behavioral Signal Flow ``` ┌─────────────────────────────────────────────────────────────────┐ │ REAL-TIME SIGNALS │ │ │ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │ │ Mouse │ │ Scroll │ │Gesture │ │ Time │ │ │ │Movement │ │ Pattern │ │Camera │ │ On │ │ │ └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘ │ └───────┼───────────┼───────────┼───────────┼───────────────────────┘ │ │ │ │ └───────────┴─────┬─────┴───────────┘ ▼ ┌───────────────────────┐ │ BEHAVIORAL AGENT │ │ │ │ calculate_confusion_ │ │ score(signals) │ │ │ │ Returns: 0.0 - 1.0 │ └───────────┬───────────┘ │ ▼ ┌───────────────────────┐ │ DOUBT PREDICTOR │ │ │ │ If score > 0.5: │ │ Re-predict doubts │ │ Trigger intervention│ │ │ └───────────────────────┘ ``` ### 7.3 Gesture-to-Action Flow ``` ┌─────────────────────────────────────────────────────────────────┐ │ CAMERA FRAME │ └─────────────────────────────┬───────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ MEDIAPIPE PROCESSING │ │ │ │ ┌──────────────────────┐ ┌──────────────────────┐ │ │ │ Hand Landmark │ │ Face Mesh │ │ │ │ Detection │ │ (468 points) │ │ │ │ (21 points) │ │ │ │ │ └──────────┬─────────┘ └──────────┬───────────┘ │ └─────────────┼───────────────────────────┼───────────────────────┘ │ │ ▼ ▼ ┌──────────────────────┐ ┌──────────────────────┐ │ GESTURE TEMPLATE │ │ FACE BLUR │ │ MATCHING │ │ (Privacy) │ │ │ │ │ │ Compare landmarks │ │ Blur regions with │ │ to known gestures │ │ facial keypoints │ └──────────┬─────────┘ └───────────────────────┘ │ ▼ ┌──────────────────────┐ │ GESTURE RECOGNIZED │──▶ Backend /api/gesture/recognize │ │ │ { │ │ "gesture": "pinch",│ │ "confidence": 0.92│ │ } │ └──────────────────────┘ │ ▼ ┌──────────────────────┐ │ GESTURE ACTION MAPPER │ │ │ │ pinch ──────────────▶│ TRIGGER_AI_HELP │ swipe_right ────────▶│ LAUNCH_BROWSER_CHAT │ open_palm ──────────▶│ PAUSE_SESSION │ thumbs_up ──────────▶│ MARK_UNDERSTOOD └──────────────────────┘ ``` --- ## 8. API Design ### 8.1 API Structure | Category | Endpoints | |----------|-----------| | Session | `/session/start`, `/session/update`, `/session/end`, `/session/insights` | | Prediction | `/predict/doubts`, `/recommendations` | | Behavior | `/behavior/track`, `/behavior/heatmap` | | Graph | `/graph/add`, `/graph/query`, `/graph/path` | | Review | `/review/due`, `/review/complete`, `/review/stats` | | Peer | `/peer/insights`, `/peer/doubts`, `/peer/trending` | | Gesture | `/gesture/list`, `/gesture/recognize`, `/gesture/training/*` | | LLM | `/llm/query`, `/llm/gesture-action`, `/llm/rl/*` | ### 8.2 Session API ```python # POST /api/session/start { "user_id": "student123", "topic": "Machine Learning", "subtopic": "Neural Networks" } # Response { "session_id": "session_1699999999.123", "topic": "Machine Learning", "predictions": [ { "doubt": "how_overfitting_works", "confidence": 0.85, "explanation": "Student showing signs of confusion...", "priority": 1 } ], "pending_reviews": 5, "peer_insights_count": 3 } ``` ### 8.3 Doubt Prediction API ```python # POST /api/predict/doubts { "context": { "topic": "Neural Networks", "progress": 0.5, "confusion_signals": 0.7 } } # Response { "predictions": [ { "doubt": "how_overfitting_works", "confidence": 0.85, "explanation": "...", "priority": 1, "estimated_time": "10 min", "prerequisites": ["regularization", "bias-variance"] } ] } ``` --- ## 9. Multi-Modal Detection ### 9.1 Supported Modalities ``` ┌─────────────────────────────────────────────────────────────────┐ │ MULTI-MODAL FUSION │ │ │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │ │ Audio │ │ Biometric │ │ Behavioral │ │ │ │ │ │ │ │ │ │ │ │ Speech rate │ │ Heart rate │ │ Mouse moves │ │ │ │ Hesitations │ │ GSR │ │ Scroll │ │ │ │ Pauses │ │ Eye tracking│ │ Key presses │ │ │ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │ │ │ │ │ │ │ └────────────────┼────────────────┘ │ │ ▼ │ │ ┌─────────────────────────┐ │ │ │ WEIGHTED FUSION │ │ │ │ │ │ │ │ audio_weight: 0.2 │ │ │ │ biometric_weight: 0.3 │ │ │ │ behavioral_weight: 0.5 │ │ │ └───────────┬─────────────┘ │ │ │ │ │ ▼ │ │ ┌─────────────────────────┐ │ │ │ UNIFIED CONFUSION │ │ │ │ SCORE │ │ │ │ 0.0 - 1.0 │ │ │ └─────────────────────────┘ │ └─────────────────────────────────────────────────────────────────┘ ``` ### 9.2 Feature Extraction by Modality **Audio (7 features):** - Speech rate (WPM) - Pause frequency - Pause duration - Pitch variation - Volume level - Hesitation count - Question markers **Biometric (6 features):** - Heart rate (BPM) - Heart rate variability - Skin conductance (GSR) - Skin temperature - Eye blink rate - Eye open duration **Behavioral (8 features):** - Mouse hesitation - Scroll reversals - Time on page - Click frequency - Back button usage - Tab switches - Copy attempts - Search usage --- ## 10. Privacy & Security ### 10.1 Face Blur Implementation ```python class FaceBlurProcessor: def __init__(self): self.face_mesh = mp_face_mesh.FaceMesh( static_image_mode=False, max_num_faces=1, refine_landmarks=True ) def blur_face(self, frame): # Detect face landmarks results = self.face_mesh.process(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)) if results.multi_face_landmarks: # Get face region face_region = self._get_face_region(frame, results) # Apply Gaussian blur blurred = cv2.GaussianBlur(face_region, (51, 51), 0) # Replace face region frame = self._replace_region(frame, blurred, results) return frame ``` ### 10.2 Data Privacy | Data Type | Storage | Privacy | |-----------|---------|---------| | Video frames | None | Processed in-memory only | | Face images | None | Auto-blurred | | Hand landmarks | Optional | Anonymized | | Session data | Local JSON | User-owned | | Model weights | HuggingFace | Open | --- ## 11. Deployment Architecture ### 11.1 Development Setup ``` ┌─────────────────────────────────────────────────────────────────┐ │ DEVELOPMENT │ │ │ │ Terminal 1: Terminal 2: │ │ ┌─────────────────┐ ┌─────────────────┐ │ │ │ cd backend │ │ cd frontend │ │ │ │ python run.py │ │ npm run dev │ │ │ │ │ │ │ │ │ │ Flask :5001 │ │ Vite :5173 │ │ │ └────────┬────────┘ └────────┬────────┘ │ └───────────┼───────────────────────────┼─────────────────────────┘ │ │ │ ┌───────────────┘ │ │ ▼ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ BROWSER (localhost) │ │ │ │ ┌─────────────────────────────────────────────────────────┐ │ │ │ Frontend (:5173) <─────── Proxy ───────> Backend (:5001)│ │ │ └─────────────────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────────┘ ``` ### 11.2 Production Setup ``` ┌─────────────────┐ │ Load Balancer │ └────────┬────────┘ │ ┌────────────────────┼────────────────────┐ │ │ │ ▼ ▼ ▼ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │ Flask Worker │ │ Flask Worker │ │ Flask Worker │ │ (:5001) │ │ (:5001) │ │ (:5001) │ └───────────────┘ └───────────────┘ └───────────────┘ │ │ │ └────────────────────┼────────────────────┘ │ ▼ ┌─────────────────┐ │ Redis Cache │ └────────┬────────┘ │ ▼ ┌─────────────────┐ │ PostgreSQL │ └─────────────────┘ ``` ### 11.3 HuggingFace Model Hosting ``` ┌─────────────────────────────────────────────────────────────────┐ │ HuggingFace Hub │ │ │ │ ┌─────────────────────────────────────────────────────────┐ │ │ │ namish10/contextflow-rl │ │ │ │ │ │ │ │ checkpoint.pkl ← Trained RL model │ │ │ │ train_rl.py ← Training script │ │ │ │ feature_extractor.py ← State extraction │ │ │ │ online_learning.py ← Continuous learning │ │ │ │ data_collector.py ← Real data collection │ │ │ │ multimodal_detection.py ← Audio/biometric fusion │ │ │ │ demo.ipynb ← Interactive demo │ │ │ │ RESEARCH_PAPER.md ← Full documentation │ │ │ │ │ │ │ │ app/ (9 agents + API) │ │ │ │ frontend/ (React UI) │ │ │ └─────────────────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────────┘ ``` --- ## Summary ContextFlow is a comprehensive system combining: 1. **Predictive AI** - RL-based doubt prediction before confusion occurs 2. **Multi-Agent Architecture** - 9 specialized agents coordinated by orchestrator 3. **Gesture Recognition** - Privacy-first MediaPipe hand detection 4. **Multi-Modal Sensing** - Audio + Biometric + Behavioral fusion 5. **Browser-Based AI** - Direct AI chat launching without API keys 6. **Continuous Learning** - Online learning from user feedback The system is production-ready with all 9 API endpoints working, complete agent network, and trained RL model available on HuggingFace.