| # ContextFlow Architecture: Complete System Overview |
|
|
| ## Table of Contents |
| 1. [System Vision](#1-system-vision) |
| 2. [High-Level Architecture](#2-high-level-architecture) |
| 3. [Frontend Layer](#3-frontend-layer) |
| 4. [Backend Layer](#4-backend-layer) |
| 5. [Agent Network](#5-agent-network) |
| 6. [Reinforcement Learning Pipeline](#6-reinforcement-learning-pipeline) |
| 7. [Data Flow](#7-data-flow) |
| 8. [API Design](#8-api-design) |
| 9. [Multi-Modal Detection](#9-multi-modal-detection) |
| 10. [Privacy & Security](#10-privacy--security) |
| 11. [Deployment Architecture](#11-deployment-architecture) |
|
|
| --- |
|
|
| ## 1. System Vision |
|
|
| **ContextFlow** is an AI-powered learning intelligence engine that predicts when learners will get confused BEFORE it happens, enabling proactive intervention in educational settings. |
|
|
| ### Core Problem Solved |
| - Traditional learning systems are **reactive** - they respond after confusion occurs |
| - ContextFlow is **proactive** - it predicts confusion and intervenes before disengagement |
|
|
| ### Key Innovations |
| 1. **Predictive AI** - RL-based doubt prediction |
| 2. **Gesture Control** - Hands-free learning assistance |
| 3. **Multi-Agent Orchestration** - 9 specialized agents working in concert |
| 4. **Privacy-First** - Face blur for classroom deployment |
|
|
| --- |
|
|
| ## 2. High-Level Architecture |
|
|
| ``` |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| β USERS β |
| β βββββββββββββββ βββββββββββββββ βββββββββββββββ β |
| β β Students β β Teachers β β Researchers β β |
| β ββββββββ¬βββββββ ββββββββ¬βββββββ ββββββββ¬βββββββ β |
| βββββββββββΌββββββββββββββββββΌββββββββββββββββββΌββββββββββββββββββββββββββ |
| β β β |
| βΌ βΌ βΌ |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| β PRESENTATION LAYER β |
| β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β |
| β β React Frontend (Vite) β β |
| β β βββββββββββ βββββββββββ βββββββββββ βββββββββββ β β |
| β β β Learn β β LLMFlow β βGestures β β Predict β ... β β |
| β β β Tab β β Tab β β Tab β β Tab β β β |
| β β βββββββββββ βββββββββββ βββββββββββ βββββββββββ β β |
| β β β β |
| β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β |
| β β β MediaPipe Camera Feed (Gesture + Face) β β β |
| β β β ββββββββββββ ββββββββββββ β β β |
| β β β β Hand β β Face β β β β |
| β β β β Detection β β Blur β β β β |
| β β β ββββββββββββ ββββββββββββ β β β |
| β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β |
| β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| β |
| β REST API (JSON) |
| β WebSocket (Optional) |
| βΌ |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| β BACKEND LAYER (Flask) β |
| β β |
| β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β |
| β β API Gateway (Flask Blueprints) β β |
| β β /api/session/* /api/predict/* /api/gesture/* /api/* β β |
| β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β |
| β β β |
| β βΌ β |
| β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β |
| β β STUDY ORCHESTRATOR (Central Coordinator) β β |
| β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β |
| β β β Agent Registry β β β |
| β β β DoubtPredictor β Behavioral β Gesture β Recall β β β |
| β β β KnowledgeGraph β PeerLearn β LLMOrch β Prompt β β β |
| β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β |
| β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β |
| β β β |
| β βββββββββββββββββ¬ββββββββββββββΌββββββββββββββ¬ββββββββββββββββ β |
| β βΌ βΌ βΌ βΌ βΌ β |
| β βββββββ βββββββ βββββββ βββββββ βββββββ β |
| β β Q- β βBehavioralβ βGestureβ βRecallβ βLLM β β |
| β βNetworkβ βAgent β βAgent β βAgent β βOrch β β |
| β βββββββ βββββββ βββββββ βββββββ βββββββ β |
| β β |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| β |
| βΌ |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| β DATA LAYER β |
| β ββββββββββββββ ββββββββββββββ ββββββββββββββ ββββββββββββββ β |
| β β Checkpoint β β Session β β Knowledge β β Real β β |
| β β (RL Model) β β State β β Graph β β Data β β |
| β β .pkl β β JSON β β NetworkX β β Collectionβ β |
| β ββββββββββββββ ββββββββββββββ ββββββββββββββ ββββββββββββββ β |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| ``` |
|
|
| --- |
|
|
| ## 3. Frontend Layer |
|
|
| ### 3.1 Technology Stack |
|
|
| | Component | Technology | Purpose | |
| |-----------|------------|---------| |
| | Framework | React 18 | UI Components | |
| | Build Tool | Vite | Fast development | |
| | Styling | Tailwind CSS | Responsive design | |
| | Icons | Lucide React | Consistent icons | |
| | Camera | MediaPipe | Hand/Face detection | |
|
|
| ### 3.2 Application Structure |
|
|
| ``` |
| frontend/src/ |
| βββ App.jsx # Main application (9 tabs) |
| βββ main.jsx # Entry point |
| βββ index.css # Global styles |
| βββ BrowserLLMLauncher.js # AI chat launcher |
| βββ MediaPipeProcessor.js # Camera + gesture processing |
| ``` |
|
|
| ### 3.3 Tab Interface |
|
|
| | Tab | Purpose | |
| |-----|---------| |
| | **Learn** | Dashboard with predictions, reviews, gamification | |
| | **LLM Flow** | Browser-based AI launcher (no API keys) | |
| | **Gestures** | Train custom hand gestures | |
| | **Predict** | RL doubt prediction visualization | |
| | **Behavior** | Behavioral signal tracking | |
| | **Peer** | Social learning insights | |
| | **Stats** | Learning statistics | |
| | **Gamify** | Fish/XP rewards system | |
| | **Settings** | AI provider configuration | |
|
|
| ### 3.4 BrowserLLMLauncher.js |
|
|
| Opens AI chats directly in browser without API keys: |
|
|
| ```javascript |
| // Opens chat.openai.com with pre-filled context |
| openAIChat(context, model = 'gpt-4') { |
| const url = `https://chat.openai.com/?q=${encodeURIComponent(context)}`; |
| window.open(url, '_blank'); |
| } |
| ``` |
|
|
| ### 3.5 MediaPipeProcessor.js |
|
|
| Handles real-time camera processing: |
|
|
| ``` |
| βββββββββββββββββββ |
| β Camera Feed β |
| ββββββββββ¬βββββββββ |
| β |
| βΌ |
| βββββββββββββββββββ βββββββββββββββββββ |
| β Hand Landmark β β Face Mesh β |
| β Detection β β Detection β |
| β (21 points) β β (468 points) β |
| ββββββββββ¬βββββββββ ββββββββββ¬βββββββββ |
| β β |
| βΌ βΌ |
| βββββββββββββββββββ βββββββββββββββββββ |
| β Gesture β β Face Blur β |
| β Recognition βββββΆβ (Privacy) β |
| ββββββββββ¬βββββββββ βββββββββββββββββββ |
| β |
| βΌ |
| βββββββββββββββββββ |
| β Backend API β |
| β /api/gesture/ β |
| βββββββββββββββββββ |
| ``` |
|
|
| --- |
|
|
| ## 4. Backend Layer |
|
|
| ### 4.1 Technology Stack |
|
|
| | Component | Technology | Purpose | |
| |-----------|------------|---------| |
| | Framework | Flask | REST API | |
| | Async | asyncio | Non-blocking I/O | |
| | ML | PyTorch | RL model | |
| | Data | NumPy | Feature extraction | |
| | Graphs | NetworkX | Knowledge graphs | |
| | Storage | JSON/SQLite | Session persistence | |
|
|
| ### 4.2 Flask Application Structure |
|
|
| ``` |
| backend/ |
| βββ run.py # Application entry point |
| βββ app/ |
| β βββ __init__.py # Flask app factory |
| β βββ config.py # Configuration |
| β βββ api/ |
| β β βββ __init__.py |
| β β βββ main.py # All API routes (889 lines) |
| β βββ agents/ |
| β βββ __init__.py |
| β βββ study_orchestrator.py # Central coordinator |
| β βββ doubt_predictor.py # RL prediction |
| β βββ behavioral_agent.py # Signal processing |
| β βββ hand_gesture_agent.py # MediaPipe integration |
| β βββ recall_agent.py # Spaced repetition |
| β βββ knowledge_graph_agent.py # Concept mapping |
| β βββ peer_learning_agent.py # Social learning |
| β βββ llm_orchestrator_agent.py # Multi-AI |
| β βββ gesture_action_agent.py # GestureβAction |
| β βββ prompt_agent.py # Prompt templates |
| ``` |
|
|
| ### 4.3 Flask App Factory |
|
|
| ```python |
| def create_app(): |
| app = Flask(__name__) |
| |
| # Load config |
| app.config.from_object('app.config.Config') |
| |
| # Register blueprints |
| from app.api.main import api |
| app.register_blueprint(api, url_prefix='/api') |
| |
| # Initialize agents |
| init_agents() |
| |
| return app |
| ``` |
|
|
| --- |
|
|
| ## 5. Agent Network |
|
|
| ### 5.1 Agent Overview |
|
|
| ``` |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| β STUDY ORCHESTRATOR β |
| β (Central Coordinator) β |
| β β |
| β βββββββββββββββ βββββββββββββββ βββββββββββββββ β |
| β β Doubt β β Behavioral β β Hand β β |
| β β Predictor ββββ Agent βββΆβ Gesture β β |
| β β Agent β β β β Agent β β |
| β ββββββββ¬βββββββ βββββββββββββββ ββββββββ¬βββββββ β |
| β β β β |
| β βΌ βΌ β |
| β βββββββββββββββ βββββββββββββββ βββββββββββββββ β |
| β β Knowledge β β Recall β β LLM β β |
| β β Graph ββββ Agent βββΆβ Orchestratorβ β |
| β β Agent β β β β β β |
| β βββββββββββββββ βββββββββββββββ βββββββββββββββ β |
| β β |
| β βββββββββββββββ βββββββββββββββ β |
| β β Peer β β Gesture β β |
| β β Learning β β Action β β |
| β β Agent β β Mapper β β |
| β βββββββββββββββ βββββββββββββββ β |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| ``` |
|
|
| ### 5.2 StudyOrchestrator (Central Coordinator) |
|
|
| The orchestrator manages the learning lifecycle: |
|
|
| ```python |
| class StudyOrchestrator: |
| def __init__(self, user_id: str): |
| self.user_id = user_id |
| |
| # Initialize all agents |
| self.doubt_predictor = DoubtPredictorAgent(user_id) |
| self.behavioral_agent = BehavioralAgent(user_id) |
| self.gesture_agent = HandGestureAgent(user_id) |
| self.recall_agent = RecallAgent(user_id) |
| self.knowledge_graph = KnowledgeGraphAgent(user_id) |
| self.peer_agent = PeerLearningAgent(user_id) |
| |
| # State management |
| self.state = OrchestratorState() |
| ``` |
|
|
| **Session Lifecycle:** |
| 1. **PRE_LEARNING** - Load predictions, check recalls, get peer insights |
| 2. **ACTIVE_LEARNING** - Monitor signals, update predictions, capture doubts |
| 3. **REVIEW** - Trigger spaced repetition, update knowledge graph |
| 4. **POST_LEARNING** - Sync data, update gamification, generate summary |
| |
| ### 5.3 DoubtPredictorAgent (RL Core) |
| |
| Predicts confusion before it happens: |
| |
| ```python |
| class DoubtPredictorAgent: |
| def __init__(self, user_id: str, config: dict = None): |
| self.user_id = user_id |
| self.model = self._load_checkpoint() |
| self.feature_extractor = FeatureExtractor() |
| |
| def predict_doubts(self, context: dict, top_k: int = 5): |
| # 1. Extract 64-dim state vector |
| state = self.feature_extractor.extract_state(context) |
| |
| # 2. Get Q-values from RL model |
| q_values = self.model.predict(state) |
| |
| # 3. Return top-k predictions |
| return self._format_predictions(q_values, top_k) |
| ``` |
| |
| ### 5.4 BehavioralAgent |
| |
| Processes raw behavioral signals: |
| |
| ```python |
| class BehavioralSignal: |
| mouse_hesitation: float # Pause frequency |
| scroll_reversals: int # Back-and-forth |
| time_on_page: float # Seconds |
| eye_tracking: Tuple[float, float] |
| click_frequency: int |
| |
| def calculate_confusion_score(self) -> float: |
| # Weighted average of signals |
| weights = { |
| 'hesitation': 0.3, |
| 'reversals': 0.25, |
| 'time_on_page': 0.2, |
| 'tab_switches': 0.15, |
| 'back_button': 0.1 |
| } |
| return weighted_sum(signals, weights) |
| ``` |
| |
| ### 5.5 HandGestureAgent |
| |
| MediaPipe integration for gesture recognition: |
| |
| ``` |
| Camera Frame |
| β |
| βΌ |
| βββββββββββββββββββ |
| β MediaPipe Hands β |
| β (21 landmarks) β |
| ββββββββββ¬βββββββββ |
| β |
| βΌ |
| βββββββββββββββββββ |
| β Gesture Templateβ |
| β Matching β |
| ββββββββββ¬βββββββββ |
| β |
| βΌ |
| βββββββββββββββββββ |
| β Confidence ββββΆ Recognized Gesture |
| β Score (0-1) β |
| βββββββββββββββββββ |
| ``` |
| |
| **Pre-built Gestures:** |
| | Gesture | Description | |
| |---------|-------------| |
| | pinch | Thumb + Index | |
| | swipe_up | 2-finger up | |
| | swipe_down | 2-finger down | |
| | swipe_right | 2-finger right | |
| | swipe_left | 2-finger left | |
| | point | Index extended | |
| | wave | Open palm wave | |
| | thumbs_up | π confirmation | |
| | thumbs_down | π rejection | |
| | fist | Closed hand | |
| |
| ### 5.6 RecallAgent |
| |
| SM-2 based spaced repetition: |
| |
| ```python |
| class RecallCard: |
| front: str # Question |
| back: str # Answer |
| interval: int # Days until review |
| ease_factor: float # Difficulty (default 2.5) |
| repetitions: int # Successful reviews |
| |
| def schedule_review(card: RecallCard, quality: int): |
| if quality >= 3: # Correct |
| if card.repetitions == 0: |
| card.interval = 1 |
| elif card.repetitions == 1: |
| card.interval = 6 |
| else: |
| card.interval *= card.ease_factor |
| card.repetitions += 1 |
| else: # Incorrect |
| card.repetitions = 0 |
| card.interval = 1 |
| |
| # Update ease factor |
| card.ease_factor += (0.1 - (5 - quality) * (0.08 + (5 - quality) * 0.02)) |
| card.ease_factor = max(1.3, card.ease_factor) |
| ``` |
| |
| ### 5.7 KnowledgeGraphAgent |
| |
| Concept mapping with NetworkX: |
| |
| ```python |
| class KnowledgeGraphAgent: |
| def __init__(self, user_id: str): |
| self.graph = nx.MultiDiGraph() |
| |
| def add_doubt_to_graph(self, doubt: dict): |
| # Create node |
| self.graph.add_node( |
| doubt['concept'], |
| type='concept', |
| topic=doubt['topic'], |
| timestamp=datetime.now() |
| ) |
| |
| # Connect to prerequisites |
| for prereq in doubt.get('prerequisites', []): |
| self.graph.add_edge(prereq, doubt['concept'], type='prerequisite') |
| |
| # Connect to related concepts |
| for related in doubt.get('related', []): |
| self.graph.add_edge(doubt['concept'], related, type='related') |
| |
| def find_learning_path(self, from_topic: str, to_topic: str): |
| try: |
| return nx.shortest_path(self.graph, from_topic, to_topic) |
| except nx.NetworkXNoPath: |
| return [] |
| ``` |
| |
| ### 5.8 LLMOrchestrator |
| |
| Multi-provider AI integration: |
| |
| ```python |
| class LLMOrchestrator: |
| SUPPORTED_PROVIDERS = { |
| 'chatgpt': LLMProvider.CHATGPT, |
| 'gemini': LLMProvider.GEMINI, |
| 'claude': LLMProvider.CLAUDE, |
| 'deepseek': LLMProvider.DEEPSEEK, |
| 'ollama': LLMProvider.OLLAMA, |
| 'groq': LLMProvider.GROQ |
| } |
| |
| async def query_parallel(self, request: LLMRequest): |
| tasks = [] |
| for provider in request.providers: |
| task = self._query_provider(provider, request) |
| tasks.append(task) |
| |
| # Execute all queries concurrently |
| responses = await asyncio.gather(*tasks, return_exceptions=True) |
| return [r for r in responses if not isinstance(r, Exception)] |
| ``` |
| |
| ### 5.9 GestureActionMapper |
| |
| Maps gestures to system actions: |
| |
| ```python |
| class GestureAction(Enum): |
| QUERY_MULTI_LLM = "query_multi_llm" |
| QUERY_CHATGPT = "query_chatgpt" |
| QUERY_GEMINI = "query_gemini" |
| TRIGGER_RL_LOOP = "trigger_rl_loop" |
| CAPTURE_CONTENT = "capture_content" |
| PAUSE_SESSION = "pause_session" |
| RESUME_SESSION = "resume_session" |
| |
| class GestureActionMapper: |
| def __init__(self): |
| self.action_rules = { |
| GestureAction.QUERY_MULTI_LLM: { |
| "trigger": {"finger_count": 2, "swipe": "right"} |
| }, |
| GestureAction.PAUSE_SESSION: { |
| "trigger": {"gesture": "open_palm"} |
| }, |
| GestureAction.RESUME_SESSION: { |
| "trigger": {"gesture": "thumbs_up"} |
| } |
| } |
| ``` |
| |
| ### 5.10 PeerLearningAgent |
| |
| Social learning insights: |
| |
| ```python |
| class PeerLearningAgent: |
| def get_peer_insights(self, topic: str): |
| # Aggregate insights from "similar" students |
| insights = [] |
| |
| # Find students who learned this topic |
| similar_students = self._find_similar_students(topic) |
| |
| for student in similar_students: |
| # What confused them? |
| insights.extend(student.difficult_concepts) |
| |
| # Return aggregated insights |
| return self._aggregate_insights(insights) |
| ``` |
| |
| --- |
| |
| ## 6. Reinforcement Learning Pipeline |
| |
| ### 6.1 Problem Formulation |
| |
| **State Space (64 dimensions):** |
| ``` |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| β Topic Embedding (32) β Progress β Confusion (16) β Gesture (14) β Time β |
| β TF-IDF of topic β 0.0-1.0 β Behavioral β Hand β 0-1 β |
| β β β signals β signals β β |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| ``` |
| |
| **Action Space (10 doubt types):** |
| 1. `what_is_backpropagation` |
| 2. `why_gradient_descent` |
| 3. `how_overfitting_works` |
| 4. `explain_regularization` |
| 5. `what_loss_function` |
| 6. `how_optimization_works` |
| 7. `explain_learning_rate` |
| 8. `what_regularization` |
| 9. `how_batch_norm_works` |
| 10. `explain_softmax` |
| |
| **Reward Function:** |
| | Event | Reward | |
| |-------|--------| |
| | Correct prediction | +1.0 | |
| | Helpful explanation | +0.5 | |
| | Engagement maintained | +0.3 | |
| | False positive | -0.5 | |
| | Missed confusion | -1.0 | |
| |
| ### 6.2 Q-Network Architecture |
| |
| ```python |
| class QNetwork(nn.Module): |
| def __init__(self, state_dim=64, action_dim=10, hidden_dim=128): |
| super().__init__() |
| self.fc1 = nn.Linear(state_dim, hidden_dim) # 64 β 128 |
| self.fc2 = nn.Linear(hidden_dim, hidden_dim) # 128 β 128 |
| self.fc3 = nn.Linear(hidden_dim, action_dim) # 128 β 10 |
| |
| def forward(self, x): |
| x = F.relu(self.fc1(x)) # ReLU activation |
| x = F.relu(self.fc2(x)) |
| return self.fc3(x) # Q-values for each action |
| ``` |
| |
| ### 6.3 Training Algorithm (GRPO) |
| |
| ```python |
| class DoubtPredictionRL: |
| def train(self, epochs=10, batch_size=32): |
| for epoch in range(epochs): |
| for batch in self.dataloader: |
| # 1. Get current Q-values |
| q_values = self.q_network(batch.states) |
| |
| # 2. Compute targets (GRPO-style) |
| with torch.no_grad(): |
| next_q = self.target_network(batch.next_states).max(1)[0] |
| targets = batch.rewards + self.gamma * next_q * (~batch.dones) |
| |
| # 3. Compute loss and update |
| loss = self.loss_fn(q_values.gather(1, batch.actions), targets) |
| loss.backward() |
| self.optimizer.step() |
| |
| # 4. Update target network |
| self.update_target_network() |
| |
| # 5. Decay epsilon (exploration) |
| self.epsilon *= self.epsilon_decay |
| ``` |
| |
| ### 6.4 Feature Extraction |
| |
| ```python |
| class FeatureExtractor: |
| STATE_DIM = 64 |
| |
| def extract_state(self, context: dict) -> np.ndarray: |
| # Topic embedding (32 dims) |
| topic_emb = self._extract_topic_embedding(context['topic']) |
| |
| # Progress (1 dim) |
| progress = np.array([context['progress']]) |
| |
| # Confusion signals (16 dims) |
| confusion = self._extract_confusion_signals(context['confusion_signals']) |
| |
| # Gesture signals (14 dims) |
| gestures = self._extract_gesture_signals(context['gesture_signals']) |
| |
| # Time spent (1 dim) |
| time_spent = np.array([context['time_spent'] / 1800]) |
| |
| # Concatenate |
| return np.concatenate([topic_emb, progress, confusion, gestures, time_spent]) |
| ``` |
| |
| --- |
| |
| ## 7. Data Flow |
| |
| ### 7.1 Learning Session Flow |
| |
| ``` |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| β USER STARTS SESSION β |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| β |
| βΌ |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| β ORCHESTRATOR.START_SESSION() β |
| β 1. Create new LearningSession β |
| β 2. Load RL model checkpoint β |
| β 3. Build learning context β |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| β |
| βββββββββββββββββΌββββββββββββββββ |
| βΌ βΌ βΌ |
| βββββββββββββ βββββββββββββ βββββββββββββ |
| β Doubt β β Behavioralβ β Peer β |
| β Predictor β β Agent β β Learning β |
| β β β β β Agent β |
| β Predict β β Analyze β β Get β |
| β doubts β β signals β β insights β |
| βββββββ¬ββββββ βββββββ¬ββββββ βββββββ¬ββββββ |
| β β β |
| βββββββββββββββββΌββββββββββββββββ |
| βΌ |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| β RETURN INITIAL PREDICTIONS β |
| β - Top 5 predicted doubts β |
| β - Pending reviews β |
| β - Peer insights β |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| ``` |
| |
| ### 7.2 Behavioral Signal Flow |
| |
| ``` |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| β REAL-TIME SIGNALS β |
| β β |
| β βββββββββββ βββββββββββ βββββββββββ βββββββββββ β |
| β β Mouse β β Scroll β βGesture β β Time β β |
| β βMovement β β Pattern β βCamera β β On β β |
| β ββββββ¬βββββ ββββββ¬βββββ ββββββ¬βββββ ββββββ¬βββββ β |
| βββββββββΌββββββββββββΌββββββββββββΌββββββββββββΌββββββββββββββββββββββββ |
| β β β β |
| βββββββββββββ΄ββββββ¬ββββββ΄ββββββββββββ |
| βΌ |
| βββββββββββββββββββββββββ |
| β BEHAVIORAL AGENT β |
| β β |
| β calculate_confusion_ β |
| β score(signals) β |
| β β |
| β Returns: 0.0 - 1.0 β |
| βββββββββββββ¬ββββββββββββ |
| β |
| βΌ |
| βββββββββββββββββββββββββ |
| β DOUBT PREDICTOR β |
| β β |
| β If score > 0.5: β |
| β Re-predict doubts β |
| β Trigger interventionβ |
| β β |
| βββββββββββββββββββββββββ |
| ``` |
| |
| ### 7.3 Gesture-to-Action Flow |
| |
| ``` |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| β CAMERA FRAME β |
| βββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ |
| β |
| βΌ |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| β MEDIAPIPE PROCESSING β |
| β β |
| β ββββββββββββββββββββββββ ββββββββββββββββββββββββ β |
| β β Hand Landmark β β Face Mesh β β |
| β β Detection β β (468 points) β β |
| β β (21 points) β β β β |
| β ββββββββββββ¬ββββββββββ ββββββββββββ¬ββββββββββββ β |
| βββββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββ |
| β β |
| βΌ βΌ |
| ββββββββββββββββββββββββ ββββββββββββββββββββββββ |
| β GESTURE TEMPLATE β β FACE BLUR β |
| β MATCHING β β (Privacy) β |
| β β β β |
| β Compare landmarks β β Blur regions with β |
| β to known gestures β β facial keypoints β |
| ββββββββββββ¬ββββββββββ βββββββββββββββββββββββββ |
| β |
| βΌ |
| ββββββββββββββββββββββββ |
| β GESTURE RECOGNIZED ββββΆ Backend /api/gesture/recognize |
| β β |
| β { β |
| β "gesture": "pinch",β |
| β "confidence": 0.92β |
| β } β |
| ββββββββββββββββββββββββ |
| β |
| βΌ |
| ββββββββββββββββββββββββ |
| β GESTURE ACTION MAPPER β |
| β β |
| β pinch βββββββββββββββΆβ TRIGGER_AI_HELP |
| β swipe_right βββββββββΆβ LAUNCH_BROWSER_CHAT |
| β open_palm βββββββββββΆβ PAUSE_SESSION |
| β thumbs_up βββββββββββΆβ MARK_UNDERSTOOD |
| ββββββββββββββββββββββββ |
| ``` |
| |
| --- |
| |
| ## 8. API Design |
| |
| ### 8.1 API Structure |
| |
| | Category | Endpoints | |
| |----------|-----------| |
| | Session | `/session/start`, `/session/update`, `/session/end`, `/session/insights` | |
| | Prediction | `/predict/doubts`, `/recommendations` | |
| | Behavior | `/behavior/track`, `/behavior/heatmap` | |
| | Graph | `/graph/add`, `/graph/query`, `/graph/path` | |
| | Review | `/review/due`, `/review/complete`, `/review/stats` | |
| | Peer | `/peer/insights`, `/peer/doubts`, `/peer/trending` | |
| | Gesture | `/gesture/list`, `/gesture/recognize`, `/gesture/training/*` | |
| | LLM | `/llm/query`, `/llm/gesture-action`, `/llm/rl/*` | |
| |
| ### 8.2 Session API |
| |
| ```python |
| # POST /api/session/start |
| { |
| "user_id": "student123", |
| "topic": "Machine Learning", |
| "subtopic": "Neural Networks" |
| } |
| |
| # Response |
| { |
| "session_id": "session_1699999999.123", |
| "topic": "Machine Learning", |
| "predictions": [ |
| { |
| "doubt": "how_overfitting_works", |
| "confidence": 0.85, |
| "explanation": "Student showing signs of confusion...", |
| "priority": 1 |
| } |
| ], |
| "pending_reviews": 5, |
| "peer_insights_count": 3 |
| } |
| ``` |
| |
| ### 8.3 Doubt Prediction API |
| |
| ```python |
| # POST /api/predict/doubts |
| { |
| "context": { |
| "topic": "Neural Networks", |
| "progress": 0.5, |
| "confusion_signals": 0.7 |
| } |
| } |
| |
| # Response |
| { |
| "predictions": [ |
| { |
| "doubt": "how_overfitting_works", |
| "confidence": 0.85, |
| "explanation": "...", |
| "priority": 1, |
| "estimated_time": "10 min", |
| "prerequisites": ["regularization", "bias-variance"] |
| } |
| ] |
| } |
| ``` |
| |
| --- |
| |
| ## 9. Multi-Modal Detection |
| |
| ### 9.1 Supported Modalities |
| |
| ``` |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| β MULTI-MODAL FUSION β |
| β β |
| β βββββββββββββββ βββββββββββββββ βββββββββββββββ β |
| β β Audio β β Biometric β β Behavioral β β |
| β β β β β β β β |
| β β Speech rate β β Heart rate β β Mouse moves β β |
| β β Hesitations β β GSR β β Scroll β β |
| β β Pauses β β Eye trackingβ β Key presses β β |
| β ββββββββ¬βββββββ ββββββββ¬βββββββ ββββββββ¬βββββββ β |
| β β β β β |
| β ββββββββββββββββββΌβββββββββββββββββ β |
| β βΌ β |
| β βββββββββββββββββββββββββββ β |
| β β WEIGHTED FUSION β β |
| β β β β |
| β β audio_weight: 0.2 β β |
| β β biometric_weight: 0.3 β β |
| β β behavioral_weight: 0.5 β β |
| β βββββββββββββ¬ββββββββββββββ β |
| β β β |
| β βΌ β |
| β βββββββββββββββββββββββββββ β |
| β β UNIFIED CONFUSION β β |
| β β SCORE β β |
| β β 0.0 - 1.0 β β |
| β βββββββββββββββββββββββββββ β |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| ``` |
| |
| ### 9.2 Feature Extraction by Modality |
| |
| **Audio (7 features):** |
| - Speech rate (WPM) |
| - Pause frequency |
| - Pause duration |
| - Pitch variation |
| - Volume level |
| - Hesitation count |
| - Question markers |
|
|
| **Biometric (6 features):** |
| - Heart rate (BPM) |
| - Heart rate variability |
| - Skin conductance (GSR) |
| - Skin temperature |
| - Eye blink rate |
| - Eye open duration |
|
|
| **Behavioral (8 features):** |
| - Mouse hesitation |
| - Scroll reversals |
| - Time on page |
| - Click frequency |
| - Back button usage |
| - Tab switches |
| - Copy attempts |
| - Search usage |
|
|
| --- |
|
|
| ## 10. Privacy & Security |
|
|
| ### 10.1 Face Blur Implementation |
|
|
| ```python |
| class FaceBlurProcessor: |
| def __init__(self): |
| self.face_mesh = mp_face_mesh.FaceMesh( |
| static_image_mode=False, |
| max_num_faces=1, |
| refine_landmarks=True |
| ) |
| |
| def blur_face(self, frame): |
| # Detect face landmarks |
| results = self.face_mesh.process(cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)) |
| |
| if results.multi_face_landmarks: |
| # Get face region |
| face_region = self._get_face_region(frame, results) |
| |
| # Apply Gaussian blur |
| blurred = cv2.GaussianBlur(face_region, (51, 51), 0) |
| |
| # Replace face region |
| frame = self._replace_region(frame, blurred, results) |
| |
| return frame |
| ``` |
|
|
| ### 10.2 Data Privacy |
|
|
| | Data Type | Storage | Privacy | |
| |-----------|---------|---------| |
| | Video frames | None | Processed in-memory only | |
| | Face images | None | Auto-blurred | |
| | Hand landmarks | Optional | Anonymized | |
| | Session data | Local JSON | User-owned | |
| | Model weights | HuggingFace | Open | |
|
|
| --- |
|
|
| ## 11. Deployment Architecture |
|
|
| ### 11.1 Development Setup |
|
|
| ``` |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| β DEVELOPMENT β |
| β β |
| β Terminal 1: Terminal 2: β |
| β βββββββββββββββββββ βββββββββββββββββββ β |
| β β cd backend β β cd frontend β β |
| β β python run.py β β npm run dev β β |
| β β β β β β |
| β β Flask :5001 β β Vite :5173 β β |
| β ββββββββββ¬βββββββββ ββββββββββ¬βββββββββ β |
| βββββββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββββ |
| β β |
| β βββββββββββββββββ |
| β β |
| βΌ βΌ |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| β BROWSER (localhost) β |
| β β |
| β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β |
| β β Frontend (:5173) <βββββββ Proxy βββββββ> Backend (:5001)β β |
| β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| ``` |
|
|
| ### 11.2 Production Setup |
|
|
| ``` |
| βββββββββββββββββββ |
| β Load Balancer β |
| ββββββββββ¬βββββββββ |
| β |
| ββββββββββββββββββββββΌβββββββββββββββββββββ |
| β β β |
| βΌ βΌ βΌ |
| βββββββββββββββββ βββββββββββββββββ βββββββββββββββββ |
| β Flask Worker β β Flask Worker β β Flask Worker β |
| β (:5001) β β (:5001) β β (:5001) β |
| βββββββββββββββββ βββββββββββββββββ βββββββββββββββββ |
| β β β |
| ββββββββββββββββββββββΌβββββββββββββββββββββ |
| β |
| βΌ |
| βββββββββββββββββββ |
| β Redis Cache β |
| ββββββββββ¬βββββββββ |
| β |
| βΌ |
| βββββββββββββββββββ |
| β PostgreSQL β |
| βββββββββββββββββββ |
| ``` |
|
|
| ### 11.3 HuggingFace Model Hosting |
|
|
| ``` |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| β HuggingFace Hub β |
| β β |
| β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β |
| β β namish10/contextflow-rl β β |
| β β β β |
| β β checkpoint.pkl β Trained RL model β β |
| β β train_rl.py β Training script β β |
| β β feature_extractor.py β State extraction β β |
| β β online_learning.py β Continuous learning β β |
| β β data_collector.py β Real data collection β β |
| β β multimodal_detection.py β Audio/biometric fusion β β |
| β β demo.ipynb β Interactive demo β β |
| β β RESEARCH_PAPER.md β Full documentation β β |
| β β β β |
| β β app/ (9 agents + API) β β |
| β β frontend/ (React UI) β β |
| β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| ``` |
|
|
| --- |
|
|
| ## Summary |
|
|
| ContextFlow is a comprehensive system combining: |
|
|
| 1. **Predictive AI** - RL-based doubt prediction before confusion occurs |
| 2. **Multi-Agent Architecture** - 9 specialized agents coordinated by orchestrator |
| 3. **Gesture Recognition** - Privacy-first MediaPipe hand detection |
| 4. **Multi-Modal Sensing** - Audio + Biometric + Behavioral fusion |
| 5. **Browser-Based AI** - Direct AI chat launching without API keys |
| 6. **Continuous Learning** - Online learning from user feedback |
|
|
| The system is production-ready with all 9 API endpoints working, complete agent network, and trained RL model available on HuggingFace. |
|
|