Jaimodiji commited on
Commit
c001f24
·
1 Parent(s): 54cf1de

Upload folder using huggingface_hub

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
.dockerignore ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ data_repo/
2
+ output/
3
+ processed/
4
+ uploads/
5
+ database.db
6
+ backup.zip
7
+ venv/
8
+ __pycache__/
9
+ *.pyc
10
+ .git/
11
+ .gitignore
12
+ .idea/
13
+ .claude/
14
+ backups/
15
+ tmp/
16
+ *.log
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ arial.ttf filter=lfs diff=lfs merge=lfs -text
.gitignore ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ .env
2
+ .next/
3
+ tmp/
4
+ __pycache__/
5
+ backup/
6
+ bak2/
7
+ *.zip
8
+ changes_db.txt
9
+ database.db
10
+ output/
11
+ processed/
12
+ templates/json_upload.html.bak
13
+ uploads/
14
+ venv/
15
+ x
16
+ {%
17
+ client_secret.json
18
+ backups/
19
+ nohup.out
20
+ request.json
21
+ data_repo/
Dockerfile ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.11-slim
2
+
3
+ # Install system dependencies for OpenCV and HF
4
+ RUN apt-get update && apt-get install -y \
5
+ git \
6
+ git-lfs \
7
+ curl \
8
+ libgl1-mesa-glx \
9
+ libglib2.0-0 \
10
+ && rm -rf /var/lib/apt/lists/*
11
+
12
+ # Install production dependencies
13
+ RUN pip install --no-cache-dir huggingface_hub[cli] gunicorn eventlet
14
+
15
+ WORKDIR /app
16
+
17
+ # Copy requirements and install
18
+ COPY requirements.txt .
19
+ RUN pip install --no-cache-dir -r requirements.txt
20
+
21
+ # Copy the rest of the application
22
+ COPY . .
23
+
24
+ # Ensure scripts are executable
25
+ RUN chmod +x entrypoint.sh
26
+
27
+ # HF Spaces default user is 1000
28
+ RUN useradd -m -u 1000 user
29
+ RUN chown -R user:user /app
30
+ USER user
31
+ ENV HOME=/home/user
32
+ ENV PATH=/home/user/.local/bin:$PATH
33
+
34
+ # Environment variables
35
+ ENV PORT=7680
36
+ ENV PYTHONUNBUFFERED=1
37
+
38
+ EXPOSE 7680
39
+
40
+ CMD ["./entrypoint.sh"]
GEMINI.md ADDED
@@ -0,0 +1,91 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Report Generator Documentation
2
+
3
+ This document provides an overview of the Report Generator application, its features, and how the code is structured.
4
+
5
+ ## Core Functionality
6
+
7
+ The primary purpose of this application is to streamline the process of creating analysis reports from PDF documents or images. It is particularly useful for analyzing test papers or other documents containing questions.
8
+
9
+ ### Workflow
10
+
11
+ 1. **Upload:** The user can start by uploading either a single PDF file or multiple image files.
12
+ * If a PDF is uploaded, the application splits it into individual pages, which are treated as images.
13
+ 2. **Cropping:** The user is then taken to a cropping interface where they can draw boxes around specific areas of interest on each page (e.g., individual questions).
14
+ 3. **Data Entry:** After cropping, the user enters details for each cropped image, including:
15
+ * Question Number
16
+ * Status (Correct, Wrong, Unattempted)
17
+ * Marked Answer
18
+ * Correct Answer
19
+ 4. **PDF Generation:** Finally, the user provides metadata for the report (Subject, Tags, Notes) and generates a consolidated PDF report. This report can be filtered to include all questions or only specific statuses (e.g., "Wrong Only").
20
+
21
+ ### Key Features
22
+
23
+ * **PDF & Image Upload:** Supports both PDF and multiple image uploads.
24
+ * **Multi-Box Cropping:** An intuitive interface to select multiple questions from a single page.
25
+ * **NVIDIA NIM OCR:** Optionally, the application can use the NVIDIA NIM OCR API to automatically extract question numbers from the cropped images, saving manual entry time. This requires setting the `NVIDIA_API_KEY` environment variable.
26
+ * **Session Management:** Each upload creates a session, which can be persisted to prevent automatic deletion.
27
+ * **PDF Management:** Generated PDFs are stored and can be managed through a dedicated PDF Manager.
28
+
29
+ ## PDF Management
30
+
31
+ A key feature of this application is the ability to track and manage the final generated PDFs.
32
+
33
+ * **Metadata:** Each generated PDF is stored with the following metadata:
34
+ * **Subject (Mandatory):** The main subject of the report.
35
+ * **Tags (Optional):** Comma-separated tags for easy filtering.
36
+ * **Notes (Optional):** A text area for additional details.
37
+ * **Source File:** The name of the original PDF or images used to create the report.
38
+ * **Creation Date:** The date and time the PDF was generated.
39
+ * **Persistence:** Like sessions, generated PDFs can be marked as "Persisted" to prevent them from being automatically deleted.
40
+ * **Auto-Deletion:** A cleanup job runs periodically to delete old, non-persisted session data and generated PDFs (defaulting to older than 1 day).
41
+ * **PDF Manager Dashboard:** A dedicated dashboard at `/pdf_manager` allows users to:
42
+ * View all generated PDFs.
43
+ * Search and filter PDFs by subject, tags, or notes.
44
+ * Download any generated PDF.
45
+ * Toggle the persistence status of a PDF.
46
+ * Manually delete a PDF.
47
+
48
+ ## Code Structure
49
+
50
+ The application is built using Flask, a Python web framework.
51
+
52
+ ### Backend (`app.py`)
53
+
54
+ This file contains the core logic of the application.
55
+
56
+ * **Database Setup (`setup_database`):** Initializes the SQLite database and creates the necessary tables (`sessions`, `images`, `questions`, `generated_pdfs`). It also handles schema migrations, such as adding new columns.
57
+ * **Cleanup (`cleanup_old_data`):** Contains the logic for deleting old, non-persisted data.
58
+ * **Flask Routes:**
59
+ * `/` & `/v2`: Main landing pages for choosing upload type.
60
+ * `/upload_pdf` & `/upload_images`: Handle the file uploads and create new sessions.
61
+ * `/cropv2/<session_id>/<image_index>`: Displays the cropping interface.
62
+ * `/process_crop_v2`: Processes the cropping data and saves the cropped images.
63
+ * `/question_entry_v2/<session_id>`: The main data entry page.
64
+ * `/save_questions`: Saves the question data to the database.
65
+ * `/generate_pdf`: Generates the final PDF report and saves its metadata.
66
+ * `/dashboard`: Displays the session management dashboard.
67
+ * `/pdf_manager`: Displays the new PDF management dashboard.
68
+ * `/delete_session/<session_id>` & `/toggle_persist/<session_id>`: Handle session deletion and persistence.
69
+ * `/delete_generated_pdf/<pdf_id>` & `/toggle_persist_generated_pdf/<pdf_id>`: Handle generated PDF deletion and persistence.
70
+ * `/extract_question_number` & `/extract_all_question_numbers`: (Optional) Routes for the NVIDIA NIM OCR functionality.
71
+
72
+ ### Frontend (`templates/`)
73
+
74
+ The frontend is built with HTML templates using the Jinja2 templating engine and Bootstrap for styling.
75
+
76
+ * **`base.html`:** The base template that other templates extend.
77
+ * **`main.html`:** The main entry point, allowing users to choose between PDF and image upload.
78
+ * **`indexv2.html` & `image_upload.html`:** The upload forms.
79
+ * **`cropv2.html`:** The cropping interface.
80
+ * **`question_entry_v2.html`:** The form for entering question details and generating the final PDF.
81
+ * **`dashboard.html`:** The dashboard for managing upload sessions.
82
+ * **`pdf_manager.html`:** The new dashboard for managing the final generated PDFs.
83
+
84
+ ### Database (`database.db`)
85
+
86
+ A SQLite database is used for data storage.
87
+
88
+ * **`sessions`:** Stores information about each upload session.
89
+ * **`images`:** Stores information about each page/image, including original and cropped versions.
90
+ * **`questions`:** Stores the data for each question.
91
+ * **`generated_pdfs`:** Stores the metadata for each final generated PDF.
LICENSE ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ MIT License
2
+
3
+ Copyright (c) 2025 DocuPDF
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
README.md CHANGED
@@ -1,10 +1,164 @@
1
- ---
2
- title: Report Generator
3
- emoji: 📚
4
- colorFrom: green
5
- colorTo: gray
6
- sdk: docker
7
- pinned: false
8
- ---
9
-
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # DocuPDF: Smart Scanner & PDF Generator
2
+
3
+ DocuPDF is a web-based utility that transforms images of documents—like test papers, notes, or book pages—into a polished, organized, and enhanced PDF. It provides powerful 8-point perspective crop, image enhancement tools, and detailed metadata entry, all within your browser.
4
+
5
+ ## ✨ Key Features
6
+
7
+ ### Dual Input Support
8
+ - **PDF Upload**: Upload a PDF document and extract individual pages as images
9
+ - **Image Upload**: Upload multiple image files directly (PNG, JPG, JPEG, GIF, BMP)
10
+
11
+ ### Advanced Cropping
12
+ - Powerful 8-point perspective correction tool
13
+ - Draw multiple crop boxes on a single page/image
14
+ - Touch-friendly UI with magnifying loupe for precision
15
+
16
+ ### Image Enhancement
17
+ - Fine-tune scans with adjustments for:
18
+ - Brightness
19
+ - Contrast
20
+ - Gamma
21
+
22
+ ### Metadata Management
23
+ - Add structured metadata to each question:
24
+ - Question Number
25
+ - Subject
26
+ - Status (Correct/Wrong/Unattempted)
27
+ - Marked Solution
28
+ - Actual Solution
29
+ - Time Taken
30
+
31
+ ### Smart Features
32
+ - Automatic question number extraction using NVIDIA NIM OCR (when API key is provided)
33
+ - Dashboard for session management
34
+ - Custom PDF generation with filtering options
35
+
36
+ ### Modern UI
37
+ - Responsive design that works on desktop and mobile
38
+ - Dark theme for comfortable extended use
39
+ - Keyboard shortcuts for power users
40
+
41
+ ## 🔧 Tech Stack
42
+
43
+ - **Backend**: Flask (Python)
44
+ - **Image Processing**: OpenCV, Pillow, PyMuPDF
45
+ - **Frontend**: HTML5, Bootstrap 5, JavaScript
46
+ - **Database**: SQLite
47
+ - **OCR**: NVIDIA NIM API (optional)
48
+
49
+ ## ⚙️ Installation & Setup
50
+
51
+ ### Prerequisites
52
+ - Python 3.7+
53
+ - pip package installer
54
+
55
+ ### Step-by-Step Guide
56
+
57
+ 1. **Clone the Repository**
58
+ ```bash
59
+ git clone <repository-url>
60
+ cd Report-Generator
61
+ ```
62
+
63
+ 2. **Create and Activate a Virtual Environment (Recommended)**
64
+ ```bash
65
+ # Create the environment
66
+ python -m venv venv
67
+
68
+ # Activate it
69
+ # On Windows:
70
+ .\venv\Scripts\activate
71
+ # On macOS/Linux:
72
+ source venv/bin/activate
73
+ ```
74
+
75
+ 3. **Install Dependencies**
76
+ ```bash
77
+ pip install -r requirements.txt
78
+ ```
79
+
80
+ 4. **Run the Application**
81
+ ```bash
82
+ python app.py
83
+ ```
84
+
85
+ 5. **Open in Browser**
86
+ Navigate to `http://127.0.0.1:1302` in your web browser
87
+
88
+ ### Environment Variables (Optional)
89
+
90
+ To enable the automatic question number extraction feature, set the NVIDIA_API_KEY environment variable:
91
+
92
+ ```bash
93
+ # On Linux/macOS:
94
+ export NVIDIA_API_KEY="your-api-key-here"
95
+
96
+ # On Windows:
97
+ set NVIDIA_API_KEY=your-api-key-here
98
+
99
+ # Or create a .env file with:
100
+ NVIDIA_API_KEY=your-api-key-here
101
+ ```
102
+
103
+ If you don't set this variable, the application will still work but the automatic question number extraction feature will be disabled.
104
+
105
+ ## 📖 How to Use
106
+
107
+ ### Workflow Options
108
+
109
+ 1. **PDF Workflow**:
110
+ - Upload a PDF document
111
+ - Each page is converted to an image
112
+ - Crop and enhance individual pages
113
+ - Enter question details
114
+ - Generate final PDF
115
+
116
+ 2. **Image Workflow**:
117
+ - Upload multiple image files directly
118
+ - Crop and enhance individual images
119
+ - Enter question details
120
+ - Generate final PDF
121
+
122
+ ### Step 1: Choose Input Method
123
+ - Select either PDF upload or multiple image upload from the main page
124
+
125
+ ### Step 2: Crop & Enhance
126
+ - For each page/image, draw crop boxes around questions
127
+ - Use the sliders to adjust brightness, contrast, and gamma
128
+ - Save and continue to the next page/image
129
+
130
+ ### Step 3: Enter Details
131
+ - Fill in metadata for each extracted question
132
+ - Use productivity features like "Same Subject for All"
133
+ - Extract question numbers automatically (if NVIDIA API is configured)
134
+
135
+ ### Step 4: Generate & Download
136
+ - Choose PDF name and layout options
137
+ - Filter questions by status if needed
138
+ - Generate and download your final document
139
+
140
+ ## 📁 Project Structure
141
+
142
+ ```
143
+ /
144
+ ├── uploads/ # Stores original user uploads (temporary)
145
+ ├── processed/ # Stores cropped & enhanced images (temporary)
146
+ ├── output/ # Stores the final generated PDFs
147
+ ├── templates/ # Contains all HTML files
148
+ ├── app.py # The main Flask application logic
149
+ ├── strings.py # Route constants and string definitions
150
+ ├── requirements.txt # List of Python dependencies
151
+ └── README.md # This file
152
+ ```
153
+
154
+ ## 🧪 Testing
155
+
156
+ Run the test suite to verify functionality:
157
+
158
+ ```bash
159
+ python test.py
160
+ ```
161
+
162
+ ## 📄 License
163
+
164
+ This project is licensed under the MIT License.
api_key_manager.py ADDED
@@ -0,0 +1,265 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Unified API Key Manager with automatic failover and rotation.
3
+
4
+ This module manages multiple API keys for each service and automatically
5
+ switches to backup keys when one fails due to rate limiting or errors.
6
+ """
7
+
8
+ import os
9
+ import time
10
+ from typing import List, Dict, Optional, Tuple
11
+ from dataclasses import dataclass, field
12
+ from datetime import datetime, timedelta
13
+ import threading
14
+ import logging
15
+
16
+ logger = logging.getLogger(__name__)
17
+
18
+ @dataclass
19
+ class APIKeyStatus:
20
+ """Tracks the status of an individual API key."""
21
+ key: str
22
+ service: str
23
+ last_used: Optional[datetime] = None
24
+ failure_count: int = 0
25
+ last_failure: Optional[datetime] = None
26
+ is_blocked: bool = False
27
+ blocked_until: Optional[datetime] = None
28
+ total_requests: int = 0
29
+ successful_requests: int = 0
30
+
31
+ def mark_success(self):
32
+ """Mark a successful API call."""
33
+ self.last_used = datetime.now()
34
+ self.total_requests += 1
35
+ self.successful_requests += 1
36
+ self.failure_count = 0 # Reset failure count on success
37
+ self.is_blocked = False
38
+ self.blocked_until = None
39
+
40
+ def mark_failure(self, block_duration_minutes: int = 5):
41
+ """Mark a failed API call and potentially block the key."""
42
+ self.last_used = datetime.now()
43
+ self.last_failure = datetime.now()
44
+ self.total_requests += 1
45
+ self.failure_count += 1
46
+
47
+ # Block key after 3 consecutive failures
48
+ if self.failure_count >= 3:
49
+ self.is_blocked = True
50
+ self.blocked_until = datetime.now() + timedelta(minutes=block_duration_minutes)
51
+ logger.warning(f"API key for {self.service} blocked until {self.blocked_until} after {self.failure_count} failures")
52
+
53
+ def is_available(self) -> bool:
54
+ """Check if this key is available for use."""
55
+ if not self.is_blocked:
56
+ return True
57
+
58
+ # Check if block has expired
59
+ if self.blocked_until and datetime.now() > self.blocked_until:
60
+ self.is_blocked = False
61
+ self.blocked_until = None
62
+ self.failure_count = 0
63
+ logger.info(f"API key for {self.service} unblocked after cooldown period")
64
+ return True
65
+
66
+ return False
67
+
68
+ def get_success_rate(self) -> float:
69
+ """Calculate success rate percentage."""
70
+ if self.total_requests == 0:
71
+ return 100.0
72
+ return (self.successful_requests / self.total_requests) * 100
73
+
74
+
75
+ class APIKeyManager:
76
+ """
77
+ Manages multiple API keys for different services with automatic failover.
78
+
79
+ Supports multiple keys per service and automatically rotates to backup keys
80
+ when one fails or hits rate limits.
81
+ """
82
+
83
+ def __init__(self):
84
+ self.keys: Dict[str, List[APIKeyStatus]] = {}
85
+ self.current_index: Dict[str, int] = {}
86
+ self.lock = threading.Lock()
87
+ self._load_keys_from_env()
88
+
89
+ def _load_keys_from_env(self):
90
+ """Load API keys from environment variables."""
91
+
92
+ # NVIDIA API Keys
93
+ nvidia_keys = self._get_keys_from_env('NVIDIA_API_KEY')
94
+ if nvidia_keys:
95
+ self.register_service('nvidia', nvidia_keys)
96
+
97
+ # Gemini API Keys
98
+ gemini_keys = self._get_keys_from_env('GEMINI_API_KEY')
99
+ google_keys = self._get_keys_from_env('GOOGLE_API_KEY')
100
+ all_gemini_keys = gemini_keys + google_keys
101
+ if all_gemini_keys:
102
+ self.register_service('gemini', all_gemini_keys)
103
+
104
+ # OpenRouter API Keys (for Nova)
105
+ openrouter_keys = self._get_keys_from_env('OPENROUTER_API_KEY')
106
+ if openrouter_keys:
107
+ self.register_service('openrouter', openrouter_keys)
108
+
109
+ logger.info(f"Loaded API keys: NVIDIA={len(nvidia_keys)}, Gemini={len(all_gemini_keys)}, OpenRouter={len(openrouter_keys)}")
110
+
111
+ def _get_keys_from_env(self, base_name: str) -> List[str]:
112
+ """
113
+ Get API keys from environment variables.
114
+ Loads keys in order:
115
+ 1. BASE_NAME (as index 0)
116
+ 2. BASE_NAME_1, BASE_NAME_2, BASE_NAME_3, etc. (as indices 1, 2, 3...)
117
+
118
+ Example:
119
+ - GEMINI_API_KEY → index 0
120
+ - GEMINI_API_KEY_1 → index 1
121
+ - GEMINI_API_KEY_2 → index 2
122
+ """
123
+ keys = []
124
+
125
+ # First, try base key (index 0)
126
+ base_key = os.environ.get(base_name)
127
+ if base_key:
128
+ keys.append(base_key)
129
+
130
+ # Then try numbered keys (1-10)
131
+ for i in range(1, 11):
132
+ numbered_key = os.environ.get(f"{base_name}_{i}")
133
+ if numbered_key:
134
+ keys.append(numbered_key)
135
+
136
+ # Remove duplicates while preserving order
137
+ seen = set()
138
+ unique_keys = []
139
+ for key in keys:
140
+ if key not in seen:
141
+ seen.add(key)
142
+ unique_keys.append(key)
143
+
144
+ return unique_keys
145
+
146
+ def register_service(self, service: str, api_keys: List[str]):
147
+ """Register multiple API keys for a service."""
148
+ with self.lock:
149
+ self.keys[service] = [
150
+ APIKeyStatus(key=key, service=service)
151
+ for key in api_keys
152
+ ]
153
+ self.current_index[service] = 0
154
+ logger.info(f"Registered {len(api_keys)} API key(s) for service: {service}")
155
+
156
+ def get_key(self, service: str) -> Optional[Tuple[str, int]]:
157
+ """
158
+ Get an available API key for the specified service.
159
+ Returns (api_key, key_index) or (None, -1) if no keys available.
160
+ """
161
+ with self.lock:
162
+ if service not in self.keys or not self.keys[service]:
163
+ logger.warning(f"No API keys registered for service: {service}")
164
+ return None, -1
165
+
166
+ service_keys = self.keys[service]
167
+ start_index = self.current_index[service]
168
+
169
+ # Try to find an available key, starting from current index
170
+ for attempt in range(len(service_keys)):
171
+ current_idx = (start_index + attempt) % len(service_keys)
172
+ key_status = service_keys[current_idx]
173
+
174
+ if key_status.is_available():
175
+ self.current_index[service] = current_idx
176
+ logger.debug(f"Using API key {current_idx + 1}/{len(service_keys)} for {service}")
177
+ return key_status.key, current_idx
178
+
179
+ # All keys are blocked
180
+ logger.error(f"All API keys for {service} are currently blocked or unavailable")
181
+ return None, -1
182
+
183
+ def mark_success(self, service: str, key_index: int):
184
+ """Mark an API call as successful."""
185
+ with self.lock:
186
+ if service in self.keys and 0 <= key_index < len(self.keys[service]):
187
+ self.keys[service][key_index].mark_success()
188
+ logger.debug(f"API key {key_index + 1} for {service} marked as successful")
189
+
190
+ # Move to next key for load balancing (round-robin)
191
+ self.current_index[service] = (key_index + 1) % len(self.keys[service])
192
+
193
+ def mark_failure(self, service: str, key_index: int, block_duration_minutes: int = 5):
194
+ """Mark an API call as failed and potentially block the key."""
195
+ with self.lock:
196
+ if service in self.keys and 0 <= key_index < len(self.keys[service]):
197
+ self.keys[service][key_index].mark_failure(block_duration_minutes)
198
+ logger.warning(f"API key {key_index + 1} for {service} marked as failed")
199
+
200
+ # Move to next key immediately
201
+ self.current_index[service] = (key_index + 1) % len(self.keys[service])
202
+
203
+ def get_service_status(self, service: str) -> Dict:
204
+ """Get status information for a service."""
205
+ with self.lock:
206
+ if service not in self.keys:
207
+ return {
208
+ 'service': service,
209
+ 'available': False,
210
+ 'total_keys': 0,
211
+ 'available_keys': 0,
212
+ 'blocked_keys': 0
213
+ }
214
+
215
+ service_keys = self.keys[service]
216
+ available_keys = sum(1 for k in service_keys if k.is_available())
217
+ blocked_keys = sum(1 for k in service_keys if k.is_blocked)
218
+
219
+ return {
220
+ 'service': service,
221
+ 'available': available_keys > 0,
222
+ 'total_keys': len(service_keys),
223
+ 'available_keys': available_keys,
224
+ 'blocked_keys': blocked_keys,
225
+ 'keys': [
226
+ {
227
+ 'index': i,
228
+ 'is_available': k.is_available(),
229
+ 'is_blocked': k.is_blocked,
230
+ 'failure_count': k.failure_count,
231
+ 'total_requests': k.total_requests,
232
+ 'success_rate': round(k.get_success_rate(), 2),
233
+ 'blocked_until': k.blocked_until.isoformat() if k.blocked_until else None
234
+ }
235
+ for i, k in enumerate(service_keys)
236
+ ]
237
+ }
238
+
239
+ def get_all_services_status(self) -> Dict[str, Dict]:
240
+ """Get status for all registered services."""
241
+ return {
242
+ service: self.get_service_status(service)
243
+ for service in self.keys.keys()
244
+ }
245
+
246
+ def reset_service(self, service: str):
247
+ """Reset all keys for a service (unblock and clear stats)."""
248
+ with self.lock:
249
+ if service in self.keys:
250
+ for key_status in self.keys[service]:
251
+ key_status.is_blocked = False
252
+ key_status.blocked_until = None
253
+ key_status.failure_count = 0
254
+ logger.info(f"Reset all keys for service: {service}")
255
+
256
+
257
+ # Global singleton instance
258
+ _api_key_manager = None
259
+
260
+ def get_api_key_manager() -> APIKeyManager:
261
+ """Get the global API key manager instance."""
262
+ global _api_key_manager
263
+ if _api_key_manager is None:
264
+ _api_key_manager = APIKeyManager()
265
+ return _api_key_manager
app.py ADDED
@@ -0,0 +1,93 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import sys
3
+ from flask import Flask
4
+ from flask_cors import CORS
5
+ from flask_socketio import SocketIO
6
+ from datetime import datetime, date
7
+ from dotenv import load_dotenv
8
+
9
+ # Load environment variables from .env file
10
+ load_dotenv()
11
+
12
+ # Ensure the current directory is in the Python path
13
+ sys.path.append(os.path.abspath(os.path.dirname(__file__)))
14
+
15
+ from database import setup_database
16
+
17
+ socketio = SocketIO()
18
+
19
+ def humanize_datetime(dt_str):
20
+ """Converts a datetime string to a human-friendly format."""
21
+ if not dt_str:
22
+ return ""
23
+ try:
24
+ # Split the string at the decimal point to handle microseconds
25
+ dt = datetime.fromisoformat(dt_str.split('.')[0])
26
+ today = date.today()
27
+ if dt.date() == today:
28
+ return "Today"
29
+ elif dt.date() == date.fromordinal(today.toordinal() - 1):
30
+ return "Yesterday"
31
+ else:
32
+ return dt.strftime('%b %d, %Y')
33
+ except (ValueError, TypeError):
34
+ return dt_str # Return original string if parsing fails
35
+
36
+ def create_app():
37
+ app = Flask(__name__)
38
+ CORS(app)
39
+ socketio.init_app(app, cors_allowed_origins="*")
40
+
41
+ # Register custom Jinja2 filter
42
+ app.jinja_env.filters['humanize'] = humanize_datetime
43
+ app.jinja_env.filters['chr'] = chr
44
+
45
+ # Configuration
46
+ app.config['SECRET_KEY'] = os.urandom(24)
47
+ app.config['MAX_CONTENT_LENGTH'] = 16 * 1024 * 1024 * 4096
48
+ app.config['UPLOAD_FOLDER'] = 'uploads'
49
+ app.config['PROCESSED_FOLDER'] = 'processed'
50
+ app.config['OUTPUT_FOLDER'] = 'output'
51
+ app.config['TEMP_FOLDER'] = 'tmp'
52
+
53
+ # Ensure instance folders exist
54
+ for folder in [app.config['UPLOAD_FOLDER'], app.config['PROCESSED_FOLDER'], app.config['OUTPUT_FOLDER'], app.config['TEMP_FOLDER']]:
55
+ os.makedirs(folder, exist_ok=True)
56
+
57
+ with app.app_context():
58
+ setup_database()
59
+
60
+ # Setup Login Manager
61
+ from user_auth import setup_login_manager
62
+ setup_login_manager(app)
63
+
64
+ # Register Blueprints
65
+ from routes import main_bp
66
+ from json_processor import json_bp
67
+ from neetprep import neetprep_bp
68
+ from classifier_routes import classifier_bp
69
+ from dashboard import dashboard_bp
70
+ from image_routes import image_bp
71
+ from auth_routes import auth_bp
72
+ from settings_routes import settings_bp
73
+ from subjective_routes import subjective_bp
74
+ from camera_routes import camera_bp
75
+ from drive_routes import drive_bp
76
+ from qtab_routes import qtab_bp
77
+
78
+ app.register_blueprint(main_bp)
79
+ app.register_blueprint(json_bp)
80
+ app.register_blueprint(neetprep_bp)
81
+ app.register_blueprint(classifier_bp)
82
+ app.register_blueprint(dashboard_bp)
83
+ app.register_blueprint(image_bp)
84
+ app.register_blueprint(auth_bp)
85
+ app.register_blueprint(settings_bp)
86
+ app.register_blueprint(subjective_bp)
87
+ app.register_blueprint(camera_bp)
88
+ app.register_blueprint(drive_bp)
89
+ app.register_blueprint(qtab_bp)
90
+
91
+ return app
92
+
93
+ app = create_app()
auth_routes.py ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from flask import Blueprint, render_template, request, redirect, url_for, flash
2
+ from flask_login import login_user, logout_user, login_required
3
+ from user_auth import User
4
+ from werkzeug.security import check_password_hash
5
+ from urllib.parse import urlparse
6
+
7
+ auth_bp = Blueprint('auth', __name__)
8
+
9
+ @auth_bp.route('/login', methods=['GET', 'POST'])
10
+ def login():
11
+ if request.method == 'POST':
12
+ username = request.form.get('username')
13
+ password = request.form.get('password')
14
+ remember = True if request.form.get('remember') else False
15
+
16
+ user = User.get_by_username(username)
17
+
18
+ if not user or not check_password_hash(user.password_hash, password):
19
+ flash('Please check your login details and try again.')
20
+ return redirect(url_for('auth.login'))
21
+
22
+ login_user(user, remember=remember)
23
+
24
+ next_page = request.form.get('next')
25
+ # Security: Only redirect to local paths
26
+ if next_page and urlparse(next_page).netloc == '':
27
+ return redirect(next_page)
28
+
29
+ return redirect(url_for('dashboard.dashboard'))
30
+
31
+ return render_template('login.html')
32
+
33
+ @auth_bp.route('/register', methods=['GET', 'POST'])
34
+ def register():
35
+ if request.method == 'POST':
36
+ username = request.form.get('username')
37
+ email = request.form.get('email')
38
+ password = request.form.get('password')
39
+
40
+ # Check if user already exists
41
+ if User.get_by_username(username):
42
+ flash('Username already exists.')
43
+ return redirect(url_for('auth.register'))
44
+
45
+ # Create new user
46
+ user = User.create(username, email, password)
47
+ if user:
48
+ login_user(user)
49
+ return redirect(url_for('dashboard.dashboard'))
50
+ else:
51
+ flash('An error occurred during registration.')
52
+ return redirect(url_for('auth.register'))
53
+
54
+ return render_template('register.html')
55
+
56
+ @auth_bp.route('/logout')
57
+ @login_required
58
+ def logout():
59
+ logout_user()
60
+ return redirect(url_for('main.index'))
backup.py ADDED
@@ -0,0 +1,80 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import sqlite3
2
+ import json
3
+ import os
4
+ import shutil
5
+ import zipfile
6
+
7
+ def backup_database_and_files(db_path='database.db', backup_dir='backup', zip_filename='backup.zip'):
8
+ """
9
+ Exports all tables from the SQLite database to JSON files, backs up associated files,
10
+ and creates a zip archive of the backup.
11
+
12
+ :param db_path: Path to the SQLite database file.
13
+ :param backup_dir: Directory to save the backup.
14
+ :param zip_filename: Name of the output zip file.
15
+ """
16
+ if os.path.exists(backup_dir):
17
+ shutil.rmtree(backup_dir)
18
+ os.makedirs(backup_dir)
19
+
20
+ # 1. Backup the database to JSON files
21
+ conn = None
22
+ try:
23
+ conn = sqlite3.connect(db_path)
24
+ conn.row_factory = sqlite3.Row
25
+ cursor = conn.cursor()
26
+
27
+ cursor.execute("SELECT name FROM sqlite_master WHERE type='table';")
28
+ tables = [row[0] for row in cursor.fetchall()]
29
+
30
+ for table_name in tables:
31
+ print(f"Backing up table: {table_name}")
32
+ cursor.execute(f"SELECT * FROM {table_name}")
33
+ rows = cursor.fetchall()
34
+ data = [dict(row) for row in rows]
35
+
36
+ backup_file_path = os.path.join(backup_dir, f"{table_name}.json")
37
+ with open(backup_file_path, 'w') as f:
38
+ json.dump(data, f, indent=4)
39
+
40
+ print(f"Successfully backed up {table_name} to {backup_file_path}")
41
+
42
+ except sqlite3.Error as e:
43
+ print(f"Database error: {e}")
44
+ return
45
+ finally:
46
+ if conn:
47
+ conn.close()
48
+
49
+ # 2. Backup associated files
50
+ file_dirs_to_backup = ['output', 'processed', 'uploads']
51
+ for dir_name in file_dirs_to_backup:
52
+ source_dir = dir_name
53
+ dest_dir = os.path.join(backup_dir, dir_name)
54
+
55
+ if os.path.exists(source_dir):
56
+ print(f"Backing up directory: {source_dir}")
57
+ shutil.copytree(source_dir, dest_dir)
58
+ print(f"Successfully backed up {source_dir} to {dest_dir}")
59
+ else:
60
+ print(f"Directory not found, skipping: {source_dir}")
61
+
62
+ # 3. Create a zip archive of the backup directory
63
+ print(f"\nCreating zip archive: {zip_filename}")
64
+ with zipfile.ZipFile(zip_filename, 'w', zipfile.ZIP_DEFLATED) as zipf:
65
+ for root, dirs, files in os.walk(backup_dir):
66
+ for file in files:
67
+ file_path = os.path.join(root, file)
68
+ arcname = os.path.relpath(file_path, backup_dir)
69
+ zipf.write(file_path, arcname)
70
+
71
+ print(f"Successfully created {zip_filename}")
72
+
73
+ # 4. Clean up the backup directory
74
+ shutil.rmtree(backup_dir)
75
+ print(f"Cleaned up backup directory: {backup_dir}")
76
+
77
+ print("\nBackup complete!")
78
+
79
+ if __name__ == '__main__':
80
+ backup_database_and_files()
camera_routes.py ADDED
@@ -0,0 +1,95 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from flask import Blueprint, render_template, request, jsonify
2
+ from flask_login import login_required, current_user
3
+ from flask_socketio import emit, join_room
4
+ from app import socketio
5
+ from werkzeug.utils import secure_filename
6
+ from database import get_db_connection
7
+ import os
8
+ import uuid
9
+
10
+ camera_bp = Blueprint('camera', __name__)
11
+
12
+ @camera_bp.route('/camera_web')
13
+ @login_required
14
+ def camera_web():
15
+ return render_template('camera_web.html')
16
+
17
+ @camera_bp.route('/camera_mobile')
18
+ @login_required
19
+ def camera_mobile():
20
+ # camera_id can be used to select specific camera if needed, defaulting to back camera
21
+ return render_template('camera_mobile.html')
22
+
23
+ # --- WebRTC Signaling ---
24
+
25
+ @socketio.on('join')
26
+ def handle_join(data):
27
+ room = data.get('room', 'stream_room')
28
+ join_room(room)
29
+ print(f"Client joined room: {room}")
30
+ emit('user_joined', {'message': 'A user has joined'}, room=room)
31
+
32
+ @socketio.on('offer')
33
+ def handle_offer(data):
34
+ room = data.get('room', 'stream_room')
35
+ print("Received offer")
36
+ emit('offer', data['offer'], room=room, include_self=False)
37
+
38
+ @socketio.on('answer')
39
+ def handle_answer(data):
40
+ room = data.get('room', 'stream_room')
41
+ print("Received answer")
42
+ emit('answer', data['answer'], room=room, include_self=False)
43
+
44
+ @socketio.on('candidate')
45
+ def handle_candidate(data):
46
+ room = data.get('room', 'stream_room')
47
+ print("Received candidate")
48
+ emit('candidate', data['candidate'], room=room, include_self=False)
49
+
50
+ @socketio.on('remote_capture')
51
+ def handle_remote_capture(data):
52
+ room = data.get('room', 'stream_room')
53
+ print("Received remote capture request")
54
+ emit('trigger_capture', {}, room=room, include_self=False)
55
+
56
+ @camera_bp.route('/camera/upload_captured_image', methods=['POST'])
57
+ @login_required
58
+ def upload_captured_image():
59
+ if 'image' not in request.files:
60
+ return jsonify({'error': 'No image file provided'}), 400
61
+
62
+ file = request.files['image']
63
+ if file.filename == '':
64
+ return jsonify({'error': 'No selected file'}), 400
65
+
66
+ if file:
67
+ session_id = str(uuid.uuid4())
68
+ original_filename = secure_filename(file.filename) or f"captured_image_{session_id}.png"
69
+
70
+ # Save to UPLOAD_FOLDER or TEMP_FOLDER
71
+ # For captured images, TEMP_FOLDER is suitable, then processed further
72
+ save_path = os.path.join(os.getcwd(), 'tmp', original_filename) # Using tmp folder relative to CWD
73
+ file.save(save_path)
74
+
75
+ conn = get_db_connection()
76
+ try:
77
+ conn.execute(
78
+ 'INSERT INTO sessions (id, original_filename, name, user_id, session_type) VALUES (?, ?, ?, ?, ?)',
79
+ (session_id, original_filename, original_filename, current_user.id, 'image_capture')
80
+ )
81
+ # Insert the image into the images table
82
+ conn.execute(
83
+ 'INSERT INTO images (session_id, image_index, filename, original_name, image_type) VALUES (?, ?, ?, ?, ?)',
84
+ (session_id, 0, original_filename, original_filename, 'original')
85
+ )
86
+ conn.commit()
87
+ except Exception as e:
88
+ conn.rollback()
89
+ return jsonify({'error': f'Database error: {str(e)}'}), 500
90
+ finally:
91
+ conn.close()
92
+
93
+ return jsonify({'success': True, 'session_id': session_id, 'filename': original_filename})
94
+
95
+ return jsonify({'error': 'Image capture failed'}), 500
classifier_routes.py ADDED
@@ -0,0 +1,282 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from flask import Blueprint, jsonify, current_app, render_template, request
2
+ from flask_login import login_required, current_user
3
+ from utils import get_db_connection
4
+ import os
5
+ import time
6
+ import json
7
+ from processing import resize_image_if_needed, call_nim_ocr_api
8
+ from gemini_classifier import classify_questions_with_gemini
9
+ from gemma_classifier import GemmaClassifier
10
+ from nova_classifier import classify_questions_with_nova
11
+
12
+ classifier_bp = Blueprint('classifier_bp', __name__)
13
+
14
+ # Instantiate classifiers
15
+ gemma_classifier = GemmaClassifier()
16
+
17
+ @classifier_bp.route('/classified/edit')
18
+ @login_required
19
+ def edit_classified_questions():
20
+ """Renders the page for editing classified questions."""
21
+ conn = get_db_connection()
22
+
23
+ AVAILABLE_SUBJECTS = ["Biology", "Chemistry", "Physics", "Mathematics"]
24
+
25
+ # Security: Fetch questions belonging to the current user
26
+ questions_from_db = conn.execute("""
27
+ SELECT q.id, q.question_text, q.chapter, q.subject, q.tags
28
+ FROM questions q
29
+ JOIN sessions s ON q.session_id = s.id
30
+ WHERE s.user_id = ? AND q.subject IS NOT NULL AND q.chapter IS NOT NULL
31
+ ORDER BY q.id
32
+ """, (current_user.id,)).fetchall()
33
+
34
+ questions = []
35
+ for q in questions_from_db:
36
+ q_dict = dict(q)
37
+ plain_text = q_dict['question_text'] # It's already plain text from OCR
38
+ q_dict['question_text_plain'] = (plain_text[:100] + '...') if len(plain_text) > 100 else plain_text
39
+ questions.append(q_dict)
40
+
41
+ # Suggestions should also be user-specific
42
+ chapters = conn.execute('SELECT DISTINCT q.chapter FROM questions q JOIN sessions s ON q.session_id = s.id WHERE s.user_id = ? AND q.chapter IS NOT NULL ORDER BY q.chapter', (current_user.id,)).fetchall()
43
+ tags_query = conn.execute('SELECT DISTINCT q.tags FROM questions q JOIN sessions s ON q.session_id = s.id WHERE s.user_id = ? AND q.tags IS NOT NULL AND q.tags != \'\'', (current_user.id,)).fetchall()
44
+ all_tags = set()
45
+ for row in tags_query:
46
+ tags = [tag.strip() for tag in row['tags'].split(',')]
47
+ all_tags.update(tags)
48
+
49
+ conn.close()
50
+ return render_template('classified_edit.html',
51
+ questions=questions,
52
+ chapters=[c['chapter'] for c in chapters],
53
+ all_tags=sorted(list(all_tags)),
54
+ available_subjects=AVAILABLE_SUBJECTS)
55
+
56
+ @classifier_bp.route('/classified/update_question/<int:question_id>', methods=['POST'])
57
+ @login_required
58
+ def update_classified_question(question_id):
59
+ """Handles updating a question's metadata."""
60
+ data = request.json
61
+ new_chapter = data.get('chapter')
62
+ new_subject = data.get('subject')
63
+
64
+ if not new_chapter or not new_subject:
65
+ return jsonify({'error': 'Chapter and Subject cannot be empty.'}), 400
66
+
67
+ try:
68
+ conn = get_db_connection()
69
+ # Security: Check ownership before update
70
+ question_owner = conn.execute("SELECT s.user_id FROM questions q JOIN sessions s ON q.session_id = s.id WHERE q.id = ?", (question_id,)).fetchone()
71
+ if not question_owner or question_owner['user_id'] != current_user.id:
72
+ conn.close()
73
+ return jsonify({'error': 'Unauthorized'}), 403
74
+
75
+ conn.execute(
76
+ 'UPDATE questions SET chapter = ?, subject = ? WHERE id = ?',
77
+ (new_chapter, new_subject, question_id)
78
+ )
79
+ conn.commit()
80
+ conn.close()
81
+ return jsonify({'success': True})
82
+ except Exception as e:
83
+ current_app.logger.error(f"Error updating question {question_id}: {repr(e)}")
84
+ return jsonify({'error': str(e)}), 500
85
+
86
+ @classifier_bp.route('/classified/delete_question/<int:question_id>', methods=['DELETE'])
87
+ @login_required
88
+ def delete_classified_question(question_id):
89
+ """Handles deleting a classified question."""
90
+ try:
91
+ conn = get_db_connection()
92
+ # Security: Check ownership before delete
93
+ question_owner = conn.execute("SELECT s.user_id FROM questions q JOIN sessions s ON q.session_id = s.id WHERE q.id = ?", (question_id,)).fetchone()
94
+ if not question_owner or question_owner['user_id'] != current_user.id:
95
+ conn.close()
96
+ return jsonify({'error': 'Unauthorized'}), 403
97
+
98
+ # Update the question to remove classification
99
+ conn.execute('UPDATE questions SET subject = NULL, chapter = NULL WHERE id = ?', (question_id,))
100
+
101
+ conn.commit()
102
+ conn.close()
103
+ return jsonify({'success': True})
104
+ except Exception as e:
105
+ current_app.logger.error(f"Error deleting question {question_id}: {repr(e)}")
106
+ return jsonify({'error': str(e)}), 500
107
+
108
+ @classifier_bp.route('/classified/delete_many', methods=['POST'])
109
+ @login_required
110
+ def delete_many_classified_questions():
111
+ """Handles bulk deleting classified questions."""
112
+ data = request.json
113
+ question_ids = data.get('ids', [])
114
+
115
+ if not question_ids:
116
+ return jsonify({'error': 'No question IDs provided.'}), 400
117
+
118
+ try:
119
+ conn = get_db_connection()
120
+ # Security: Filter IDs to only those owned by the user
121
+ placeholders = ','.join('?' for _ in question_ids)
122
+ owned_q_ids_rows = conn.execute(f"""
123
+ SELECT q.id FROM questions q
124
+ JOIN sessions s ON q.session_id = s.id
125
+ WHERE q.id IN ({placeholders}) AND s.user_id = ?
126
+ """, (*question_ids, current_user.id)).fetchall()
127
+
128
+ owned_q_ids = [row['id'] for row in owned_q_ids_rows]
129
+
130
+ if not owned_q_ids:
131
+ conn.close()
132
+ return jsonify({'success': True, 'message': 'No owned questions to delete.'})
133
+
134
+ update_placeholders = ','.join('?' for _ in owned_q_ids)
135
+ conn.execute(f'UPDATE questions SET subject = NULL, chapter = NULL WHERE id IN ({update_placeholders})', owned_q_ids)
136
+
137
+ conn.commit()
138
+ conn.close()
139
+ return jsonify({'success': True})
140
+ except Exception as e:
141
+ current_app.logger.error(f"Error deleting questions: {repr(e)}")
142
+ return jsonify({'error': str(e)}), 500
143
+
144
+ from rich.table import Table
145
+ from rich.console import Console
146
+
147
+ @classifier_bp.route('/extract_and_classify_all/<session_id>', methods=['POST'])
148
+ @login_required
149
+ def extract_and_classify_all(session_id):
150
+ try:
151
+ conn = get_db_connection()
152
+ # Security: Check ownership of the session
153
+ session_owner = conn.execute('SELECT user_id FROM sessions WHERE id = ?', (session_id,)).fetchone()
154
+ if not session_owner or session_owner['user_id'] != current_user.id:
155
+ conn.close()
156
+ return jsonify({'error': 'Unauthorized'}), 403
157
+
158
+ images = conn.execute(
159
+ "SELECT id, processed_filename FROM images WHERE session_id = ? AND image_type = 'cropped' ORDER BY id",
160
+ (session_id,)
161
+ ).fetchall()
162
+
163
+ if not images:
164
+ conn.close()
165
+ return jsonify({'error': 'No cropped images found in session'}), 404
166
+
167
+ current_app.logger.info(f"Found {len(images)} images to process for user {current_user.id}.")
168
+
169
+ question_texts = []
170
+ image_ids = []
171
+ for image in images:
172
+ image_id = image['id']
173
+ processed_filename = image['processed_filename']
174
+
175
+ if not processed_filename:
176
+ continue
177
+
178
+ image_path = os.path.join(current_app.config['PROCESSED_FOLDER'], processed_filename)
179
+ if not os.path.exists(image_path):
180
+ continue
181
+
182
+ image_bytes = resize_image_if_needed(image_path)
183
+ ocr_result = call_nim_ocr_api(image_bytes)
184
+
185
+ current_app.logger.info(f"NVIDIA OCR Result for image {image_id}: {ocr_result}")
186
+
187
+ if not ocr_result.get('data') or not ocr_result['data'][0].get('text_detections'):
188
+ current_app.logger.error(f"NVIDIA OCR result for image {image_id} does not contain 'text_detections' key. Full response: {ocr_result}")
189
+ continue
190
+
191
+ text = " ".join(item['text_prediction']['text'] for item in ocr_result['data'][0]['text_detections'])
192
+
193
+ conn.execute('UPDATE questions SET question_text = ? WHERE image_id = ?', (text, image_id))
194
+ current_app.logger.info(f"Updated question_text for image_id: {image_id}")
195
+ question_texts.append(text)
196
+ image_ids.append(image_id)
197
+
198
+ conn.commit()
199
+
200
+ # --- Batch Processing and Classification ---
201
+ batch_size = 7 # Default batch size
202
+ total_questions = len(question_texts)
203
+ num_batches = (total_questions + batch_size - 1) // batch_size
204
+ total_update_count = 0
205
+
206
+ for i in range(num_batches):
207
+ start_index = i * batch_size
208
+ end_index = start_index + batch_size
209
+
210
+ batch_texts = question_texts[start_index:end_index]
211
+ batch_image_ids = image_ids[start_index:end_index]
212
+
213
+ if not batch_texts:
214
+ continue
215
+
216
+ current_app.logger.info(f"Processing Batch {i+1}/{num_batches}...")
217
+
218
+ # Choose classifier based on user preference
219
+ classifier_model = getattr(current_user, 'classifier_model', 'gemini')
220
+
221
+ if classifier_model == 'nova':
222
+ current_app.logger.info(f"Using Nova classifier for user {current_user.id}")
223
+ classification_result = classify_questions_with_nova(batch_texts, start_index=start_index)
224
+ model_name = "Nova"
225
+ elif classifier_model == 'gemma':
226
+ current_app.logger.info(f"Using Gemma classifier for user {current_user.id}")
227
+ classification_result = gemma_classifier.classify(batch_texts, start_index=start_index)
228
+ model_name = "Gemma"
229
+ else:
230
+ current_app.logger.info(f"Using Gemini classifier for user {current_user.id}")
231
+ classification_result = classify_questions_with_gemini(batch_texts, start_index=start_index)
232
+ model_name = "Gemini"
233
+
234
+ # Log the result to the terminal
235
+ current_app.logger.info(f"--- Classification Result ({model_name}) for Batch {i+1} ---")
236
+ current_app.logger.info(json.dumps(classification_result, indent=2))
237
+ current_app.logger.info("---------------------------------------------")
238
+
239
+ if not classification_result or not classification_result.get('data'):
240
+ current_app.logger.error(f'{model_name} classifier did not return valid data for batch {i+1}.')
241
+ continue # Move to the next batch
242
+
243
+ # --- Immediate DB Update for the Batch ---
244
+ batch_update_count = 0
245
+ for item in classification_result.get('data', []):
246
+ item_index_global = item.get('index') # This is the global index (e.g., 1 to 14)
247
+ if item_index_global is not None:
248
+ # Find the corresponding local index in our full list
249
+ try:
250
+ # The item_index_global is 1-based, our list is 0-based
251
+ local_list_index = item_index_global - 1
252
+ # Find the image_id for that question
253
+ matched_id = image_ids[local_list_index]
254
+ except IndexError:
255
+ current_app.logger.error(f"Classifier returned an out-of-bounds index: {item_index_global}")
256
+ continue
257
+
258
+ new_subject = item.get('subject')
259
+ new_chapter = item.get('chapter_title')
260
+
261
+ if new_subject and new_subject != 'Unclassified' and new_chapter and new_chapter != 'Unclassified':
262
+ conn.execute('UPDATE questions SET subject = ?, chapter = ? WHERE image_id = ?', (new_subject, new_chapter, matched_id))
263
+ batch_update_count += 1
264
+ elif new_subject and new_subject != 'Unclassified':
265
+ conn.execute('UPDATE questions SET subject = ?, chapter = ? WHERE image_id = ?', (new_subject, 'Unclassified', matched_id))
266
+ batch_update_count += 1
267
+
268
+ conn.commit()
269
+ total_update_count += batch_update_count
270
+ current_app.logger.info(f"Batch {i+1} processed. Updated {batch_update_count} questions in the database.")
271
+
272
+ if i < num_batches - 1:
273
+ current_app.logger.info("Waiting 5 seconds before next batch...")
274
+ time.sleep(5)
275
+
276
+ conn.close()
277
+
278
+ return jsonify({'success': True, 'message': f'Successfully extracted and classified {total_questions} questions. Updated {total_update_count} entries in the database.'})
279
+
280
+ except Exception as e:
281
+ current_app.logger.error(f'Failed to extract and classify questions: {str(e)}', exc_info=True)
282
+ return jsonify({'error': f'Failed to extract and classify questions: {str(e)}'}), 500
cleanup.py ADDED
@@ -0,0 +1,286 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ import sqlite3
3
+ import os
4
+ from datetime import datetime, timedelta
5
+ from rich.console import Console
6
+ from rich.table import Table
7
+
8
+ # --- Configuration ---
9
+ DB_PATH = 'database.db'
10
+ UPLOAD_FOLDER = 'uploads'
11
+ PROCESSED_FOLDER = 'processed'
12
+ OUTPUT_FOLDER = 'output'
13
+ OLDER_THAN_DAYS = 5
14
+ DRY_RUN = True # Set to False to perform actual deletion
15
+
16
+ # --- Immunity Reasons ---
17
+ REASON_PERSISTED = "Persisted"
18
+ REASON_NEETPREP = "NeetPrep/JSON"
19
+ REASON_CLASSIFIED = "Classified"
20
+ REASON_RECENT = "Too Recent"
21
+
22
+ def get_db_connection():
23
+ """Establishes a connection to the SQLite database."""
24
+ conn = sqlite3.connect(DB_PATH)
25
+ conn.row_factory = sqlite3.Row
26
+ return conn
27
+
28
+ def is_classified_session(conn, session_id):
29
+ """Checks if a session contains any classified questions."""
30
+ if not session_id:
31
+ return False
32
+ cursor = conn.cursor()
33
+ cursor.execute("""
34
+ SELECT 1 FROM questions
35
+ WHERE session_id = ? AND subject IS NOT NULL AND chapter IS NOT NULL
36
+ LIMIT 1
37
+ """, (session_id,))
38
+ return cursor.fetchone() is not None
39
+
40
+ def show_disk_usage_report(console):
41
+ """Calculates and displays a report of disk usage by category."""
42
+ console.print("\n[bold cyan]Disk Usage Report[/bold cyan]")
43
+
44
+ def sizeof_fmt(num, suffix="B"):
45
+ """Formats a size in bytes to a human-readable string."""
46
+ for unit in ["", "K", "M", "G", "T", "P", "E", "Z"]:
47
+ if abs(num) < 1024.0:
48
+ return f"{num:3.1f}{unit}{suffix}"
49
+ num /= 1024.0
50
+ return f"{num:.1f}Y{suffix}"
51
+
52
+ # --- Summary Report ---
53
+ usage_data = {}
54
+ folders_to_scan = {
55
+ "Uploaded Originals": UPLOAD_FOLDER,
56
+ "Processed Images": PROCESSED_FOLDER,
57
+ "Generated PDFs": OUTPUT_FOLDER,
58
+ }
59
+
60
+ for category, folder in folders_to_scan.items():
61
+ total_size = 0
62
+ file_count = 0
63
+ try:
64
+ for dirpath, _, filenames in os.walk(folder):
65
+ for f in filenames:
66
+ fp = os.path.join(dirpath, f)
67
+ if not os.path.islink(fp):
68
+ try:
69
+ total_size += os.path.getsize(fp)
70
+ file_count += 1
71
+ except FileNotFoundError:
72
+ pass
73
+ except FileNotFoundError:
74
+ pass
75
+ usage_data[category] = {"size": total_size, "count": file_count}
76
+
77
+ summary_table = Table(title="Disk Space Usage by Category")
78
+ summary_table.add_column("Category", style="cyan")
79
+ summary_table.add_column("File Count", style="magenta", justify="right")
80
+ summary_table.add_column("Total Size", style="green", justify="right")
81
+
82
+ total_size_all = 0
83
+ total_count_all = 0
84
+ for category, data in usage_data.items():
85
+ summary_table.add_row(category, str(data["count"]), sizeof_fmt(data["size"]))
86
+ total_size_all += data["size"]
87
+ total_count_all += data["count"]
88
+
89
+ summary_table.add_section()
90
+ summary_table.add_row("Total", f"[bold]{total_count_all}[/bold]", f"[bold]{sizeof_fmt(total_size_all)}[/bold]")
91
+
92
+ console.print(summary_table)
93
+
94
+ # --- Detailed Breakdown for Uploaded Originals ---
95
+ console.print("\n[bold]Breakdown of 'Uploaded Originals':[/bold]")
96
+
97
+ conn = get_db_connection()
98
+ sessions = conn.execute('SELECT id, original_filename FROM sessions').fetchall()
99
+
100
+ session_sizes = []
101
+ with console.status("[cyan]Calculating size per session...[/cyan]"):
102
+ for session in sessions:
103
+ session_id = session['id']
104
+ images = conn.execute("SELECT filename FROM images WHERE session_id = ? AND image_type = 'original'", (session_id,)).fetchall()
105
+
106
+ total_size = 0
107
+ file_count = 0
108
+ for img in images:
109
+ if not img['filename']: continue
110
+ try:
111
+ fp = os.path.join(UPLOAD_FOLDER, img['filename'])
112
+ if not os.path.islink(fp):
113
+ total_size += os.path.getsize(fp)
114
+ file_count += 1
115
+ except FileNotFoundError:
116
+ pass # File may not exist, that's okay
117
+
118
+ if file_count > 0:
119
+ session_sizes.append({
120
+ "id": session_id,
121
+ "name": session['original_filename'],
122
+ "size": total_size,
123
+ "count": file_count
124
+ })
125
+
126
+ # Sort sessions by size, descending
127
+ session_sizes.sort(key=lambda x: x['size'], reverse=True)
128
+
129
+ breakdown_table = Table(show_header=True, header_style="bold magenta")
130
+ breakdown_table.add_column("Session ID", style="dim", min_width=15)
131
+ breakdown_table.add_column("Original Filename", style="cyan", min_width=30)
132
+ breakdown_table.add_column("File Count", style="magenta", justify="right")
133
+ breakdown_table.add_column("Total Size", style="green", justify="right")
134
+
135
+ for session_data in session_sizes:
136
+ breakdown_table.add_row(
137
+ session_data['id'],
138
+ session_data['name'],
139
+ str(session_data['count']),
140
+ sizeof_fmt(session_data['size'])
141
+ )
142
+
143
+ console.print(breakdown_table)
144
+ conn.close()
145
+
146
+
147
+
148
+ def main():
149
+ """Main function to identify and clean up old data."""
150
+ console = Console()
151
+ console.print(f"[bold cyan]Starting Cleanup Process...[/bold cyan]")
152
+ console.print(f"Mode: [bold {'yellow' if DRY_RUN else 'red'}]{'DRY RUN' if DRY_RUN else 'DELETION ENABLED'}[/]")
153
+ console.print(f"Looking for items older than {OLDER_THAN_DAYS} days.")
154
+
155
+ show_disk_usage_report(console)
156
+
157
+ conn = get_db_connection()
158
+ cutoff_date = datetime.now() - timedelta(days=OLDER_THAN_DAYS)
159
+
160
+ sessions_to_delete = []
161
+ pdfs_to_delete = []
162
+
163
+ # --- 1. Identify Sessions to Delete ---
164
+ all_sessions = conn.execute('SELECT id, created_at, original_filename, persist FROM sessions').fetchall()
165
+
166
+ with console.status("[cyan]Analyzing sessions...[/cyan]") as status:
167
+ for session in all_sessions:
168
+ session_id = session['id']
169
+ reason = ""
170
+
171
+ created_at = datetime.fromisoformat(session['created_at'])
172
+
173
+ if created_at > cutoff_date:
174
+ reason = REASON_RECENT
175
+ elif session['persist'] == 1:
176
+ reason = REASON_PERSISTED
177
+ elif session['original_filename'] and ('.json' in session['original_filename'].lower() or 'neetprep' in session['original_filename'].lower()):
178
+ reason = REASON_NEETPREP
179
+ elif is_classified_session(conn, session_id):
180
+ reason = REASON_CLASSIFIED
181
+
182
+ if not reason:
183
+ sessions_to_delete.append(session)
184
+ status.update(f"[cyan]Analyzed {len(all_sessions)} sessions. Found {len(sessions_to_delete)} candidates for deletion.[/cyan]")
185
+
186
+ # --- 2. Identify Generated PDFs to Delete ---
187
+ all_pdfs = conn.execute('SELECT id, session_id, filename, created_at, persist, source_filename, notes FROM generated_pdfs').fetchall()
188
+
189
+ with console.status("[cyan]Analyzing generated PDFs...[/cyan]") as status:
190
+ for pdf in all_pdfs:
191
+ reason = ""
192
+
193
+ created_at = datetime.fromisoformat(pdf['created_at'])
194
+
195
+ if created_at > cutoff_date:
196
+ reason = REASON_RECENT
197
+ elif pdf['persist'] == 1:
198
+ reason = REASON_PERSISTED
199
+ elif pdf['source_filename'] and ('.json' in pdf['source_filename'].lower() or 'neetprep' in pdf['source_filename'].lower()):
200
+ reason = REASON_NEETPREP
201
+ elif pdf['notes'] and 'json upload' in pdf['notes'].lower():
202
+ reason = REASON_NEETPREP
203
+ elif is_classified_session(conn, pdf['session_id']):
204
+ reason = REASON_CLASSIFIED
205
+
206
+ if not reason:
207
+ pdfs_to_delete.append(pdf)
208
+ status.update(f"[cyan]Analyzed {len(all_pdfs)} PDFs. Found {len(pdfs_to_delete)} candidates for deletion.[/cyan]")
209
+
210
+ # --- 3. Display Findings ---
211
+ table = Table(title="Items Marked for Deletion", show_header=True, header_style="bold magenta")
212
+ table.add_column("Type", style="dim", min_width=10)
213
+ table.add_column("ID / Filename", style="cyan", min_width=30)
214
+ table.add_column("Created At", style="green", min_width=20)
215
+ table.add_column("Age (Days)", style="yellow", min_width=10)
216
+ table.add_column("Details", min_width=30)
217
+
218
+ if not sessions_to_delete and not pdfs_to_delete:
219
+ console.print("\n[bold green]No items found to delete. Everything is up to date.[/bold green]")
220
+ conn.close()
221
+ return
222
+
223
+ for session in sessions_to_delete:
224
+ age = (datetime.now() - datetime.fromisoformat(session['created_at'])).days
225
+ table.add_row("Session", session['id'], session['created_at'], str(age), session['original_filename'])
226
+
227
+ for pdf in pdfs_to_delete:
228
+ age = (datetime.now() - datetime.fromisoformat(pdf['created_at'])).days
229
+ table.add_row("Generated PDF", pdf['filename'], pdf['created_at'], str(age), f"Source: {pdf['source_filename']}")
230
+
231
+ console.print(table)
232
+
233
+ if DRY_RUN:
234
+ console.print("\n[bold yellow]This was a DRY RUN. No files or database records were deleted.[/bold yellow]")
235
+ console.print("To run the deletion, change the [code]DRY_RUN[/code] flag to [code]False[/code] in the script.")
236
+ else:
237
+ # --- 4. Perform Deletion ---
238
+ console.print("\n[bold red]PERFORMING DELETION...[/bold red]")
239
+
240
+ # Delete Sessions and associated files
241
+ for session in sessions_to_delete:
242
+ session_id = session['id']
243
+ console.print(f"Deleting session [cyan]{session_id}[/cyan]...")
244
+ images_to_delete = conn.execute('SELECT filename, processed_filename FROM images WHERE session_id = ?', (session_id,)).fetchall()
245
+ for img in images_to_delete:
246
+ if img['filename']:
247
+ try:
248
+ f_path = os.path.join(UPLOAD_FOLDER, img['filename'])
249
+ os.remove(f_path)
250
+ console.print(f" - Deleted upload: [dim]{f_path}[/dim]")
251
+ except OSError as e:
252
+ console.print(f" - [red]Error deleting {f_path}: {e}[/red]")
253
+ if img['processed_filename']:
254
+ try:
255
+ f_path = os.path.join(PROCESSED_FOLDER, img['processed_filename'])
256
+ os.remove(f_path)
257
+ console.print(f" - Deleted processed: [dim]{f_path}[/dim]")
258
+ except OSError as e:
259
+ console.print(f" - [red]Error deleting {f_path}: {e}[/red]")
260
+
261
+ conn.execute('DELETE FROM questions WHERE session_id = ?', (session_id,))
262
+ conn.execute('DELETE FROM images WHERE session_id = ?', (session_id,))
263
+ conn.execute('DELETE FROM sessions WHERE id = ?', (session_id,))
264
+ console.print(f" - Deleted DB records for session {session_id}")
265
+
266
+ # Delete Generated PDFs and their files
267
+ for pdf in pdfs_to_delete:
268
+ pdf_id, pdf_filename = pdf['id'], pdf['filename']
269
+ console.print(f"Deleting generated PDF [cyan]{pdf_filename}[/cyan]...")
270
+ try:
271
+ f_path = os.path.join(OUTPUT_FOLDER, pdf_filename)
272
+ os.remove(f_path)
273
+ console.print(f" - Deleted file: [dim]{f_path}[/dim]")
274
+ except OSError as e:
275
+ console.print(f" - [red]Error deleting {f_path}: {e}[/red]")
276
+
277
+ conn.execute('DELETE FROM generated_pdfs WHERE id = ?', (pdf_id,))
278
+ console.print(f" - Deleted DB record for PDF {pdf_id}")
279
+
280
+ conn.commit()
281
+ console.print("\n[bold green]Deletion complete.[/bold green]")
282
+
283
+ conn.close()
284
+
285
+ if __name__ == "__main__":
286
+ main()
cli.py ADDED
@@ -0,0 +1,354 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import re
3
+ import sqlite3
4
+ import sys
5
+ import uuid
6
+ from datetime import datetime, timedelta
7
+
8
+ import click
9
+ import fitz # PyMuPDF
10
+ import requests
11
+ from rich.progress import (
12
+ BarColumn,
13
+ Progress,
14
+ SpinnerColumn,
15
+ TextColumn,
16
+ TimeElapsedColumn,
17
+ TimeRemainingColumn,
18
+ )
19
+ from urllib.parse import urlparse
20
+ from werkzeug.utils import secure_filename
21
+
22
+ # --- Configuration ---
23
+ from utils import get_db_connection
24
+
25
+ SCRIPT_DIR = os.path.dirname(os.path.realpath(__file__))
26
+ UPLOAD_FOLDER = os.path.join(SCRIPT_DIR, 'uploads')
27
+ PROCESSED_FOLDER = os.path.join(SCRIPT_DIR, 'processed')
28
+ OUTPUT_FOLDER = os.path.join(SCRIPT_DIR, 'output')
29
+
30
+ os.makedirs(UPLOAD_FOLDER, exist_ok=True)
31
+ os.makedirs(PROCESSED_FOLDER, exist_ok=True)
32
+ os.makedirs(OUTPUT_FOLDER, exist_ok=True)
33
+
34
+
35
+
36
+ # --- Core Logic Functions (mirrored from app.py) ---
37
+ def setup_database_cli():
38
+ """Initializes the database and creates/updates tables as needed."""
39
+ conn = get_db_connection()
40
+ cursor = conn.cursor()
41
+ click.echo("Creating/updating tables...")
42
+
43
+ cursor.execute("CREATE TABLE IF NOT EXISTS sessions (id TEXT PRIMARY KEY, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, original_filename TEXT, persist INTEGER DEFAULT 0, subject TEXT, tags TEXT, notes TEXT);")
44
+ cursor.execute("CREATE TABLE IF NOT EXISTS images (id INTEGER PRIMARY KEY AUTOINCREMENT, session_id TEXT NOT NULL, image_index INTEGER NOT NULL, filename TEXT NOT NULL, original_name TEXT NOT NULL, processed_filename TEXT, image_type TEXT DEFAULT 'original', FOREIGN KEY (session_id) REFERENCES sessions (id));")
45
+ cursor.execute("CREATE TABLE IF NOT EXISTS questions (id INTEGER PRIMARY KEY AUTOINCREMENT, session_id TEXT NOT NULL, image_id INTEGER NOT NULL, question_number TEXT, subject TEXT, status TEXT, marked_solution TEXT, actual_solution TEXT, time_taken TEXT, tags TEXT, FOREIGN KEY (session_id) REFERENCES sessions (id), FOREIGN KEY (image_id) REFERENCES images (id));")
46
+ cursor.execute("CREATE TABLE IF NOT EXISTS folders (id INTEGER PRIMARY KEY AUTOINCREMENT, name TEXT NOT NULL, parent_id INTEGER, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, FOREIGN KEY (parent_id) REFERENCES folders (id) ON DELETE CASCADE);")
47
+ cursor.execute("CREATE TABLE IF NOT EXISTS generated_pdfs (id INTEGER PRIMARY KEY AUTOINCREMENT, session_id TEXT NOT NULL, filename TEXT NOT NULL, subject TEXT NOT NULL, tags TEXT, notes TEXT, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, source_filename TEXT, folder_id INTEGER, persist INTEGER DEFAULT 0, FOREIGN KEY (session_id) REFERENCES sessions (id), FOREIGN KEY (folder_id) REFERENCES folders (id) ON DELETE SET NULL);")
48
+ cursor.execute("CREATE TABLE IF NOT EXISTS neetprep_questions (id TEXT PRIMARY KEY, question_text TEXT, options TEXT, correct_answer_index INTEGER, level TEXT, topic TEXT, subject TEXT, last_fetched_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP);")
49
+ cursor.execute("CREATE TABLE IF NOT EXISTS neetprep_processed_attempts (attempt_id TEXT PRIMARY KEY, processed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP);")
50
+
51
+ # Add columns to sessions table if they don't exist
52
+ try:
53
+ cursor.execute("SELECT subject FROM sessions LIMIT 1")
54
+ except sqlite3.OperationalError:
55
+ cursor.execute("ALTER TABLE sessions ADD COLUMN subject TEXT")
56
+ try:
57
+ cursor.execute("SELECT tags FROM sessions LIMIT 1")
58
+ except sqlite3.OperationalError:
59
+ cursor.execute("ALTER TABLE sessions ADD COLUMN tags TEXT")
60
+ try:
61
+ cursor.execute("SELECT notes FROM sessions LIMIT 1")
62
+ except sqlite3.OperationalError:
63
+ cursor.execute("ALTER TABLE sessions ADD COLUMN notes TEXT")
64
+
65
+ try:
66
+ cursor.execute("SELECT tags FROM questions LIMIT 1")
67
+ except sqlite3.OperationalError:
68
+ cursor.execute("ALTER TABLE questions ADD COLUMN tags TEXT")
69
+
70
+ click.echo("Tables created successfully.")
71
+ conn.commit()
72
+ conn.close()
73
+
74
+ def cleanup_old_data_cli():
75
+ """Removes sessions, files, and PDFs older than 1 day, unless persisted."""
76
+ conn = get_db_connection()
77
+ cutoff = datetime.now() - timedelta(days=1)
78
+ click.echo(f"Starting cleanup for items older than {cutoff.strftime('%Y-%m-%d %H:%M:%S')}:")
79
+
80
+ old_sessions = conn.execute('SELECT id FROM sessions WHERE created_at < ? AND persist = 0', (cutoff,)).fetchall()
81
+ click.echo(f"Found {len(old_sessions)} old, non-persisted sessions to delete.")
82
+ for session in old_sessions:
83
+ session_id = session['id']
84
+ images_to_delete = conn.execute('SELECT filename, processed_filename FROM images WHERE session_id = ?', (session_id,)).fetchall()
85
+ for img in images_to_delete:
86
+ if img['filename'] and os.path.exists(os.path.join(UPLOAD_FOLDER, img['filename'])): os.remove(os.path.join(UPLOAD_FOLDER, img['filename']))
87
+ if img['processed_filename'] and os.path.exists(os.path.join(PROCESSED_FOLDER, img['processed_filename'])): os.remove(os.path.join(PROCESSED_FOLDER, img['processed_filename']))
88
+ conn.execute('DELETE FROM questions WHERE session_id = ?', (session_id,))
89
+ conn.execute('DELETE FROM images WHERE session_id = ?', (session_id,))
90
+ conn.execute('DELETE FROM sessions WHERE id = ?', (session_id,))
91
+
92
+ old_pdfs = conn.execute('SELECT id, filename FROM generated_pdfs WHERE created_at < ? AND persist = 0', (cutoff,)).fetchall()
93
+ click.echo(f"Found {len(old_pdfs)} old, non-persisted generated PDFs to delete.")
94
+ for pdf in old_pdfs:
95
+ if os.path.exists(os.path.join(OUTPUT_FOLDER, pdf['filename'])): os.remove(os.path.join(OUTPUT_FOLDER, pdf['filename']))
96
+ conn.execute('DELETE FROM generated_pdfs WHERE id = ?', (pdf['id'],))
97
+
98
+ conn.commit()
99
+ conn.close()
100
+
101
+ def _get_local_pdf_path(path_or_url):
102
+ """
103
+ Takes a path or URL. If it's a URL, downloads it to the UPLOAD_FOLDER.
104
+ Returns (local_path, original_filename, is_temp_file)
105
+ """
106
+ is_url = path_or_url.lower().startswith(('http://', 'https://'))
107
+ if is_url:
108
+ click.echo(f"Downloading from URL: {path_or_url}")
109
+ try:
110
+ if "drive.google.com" in path_or_url:
111
+ file_id = path_or_url.split('/')[-2]
112
+ download_url = f'https://drive.google.com/uc?export=download&id={file_id}'
113
+ response = requests.get(download_url, stream=True)
114
+ content_disposition = response.headers.get('content-disposition')
115
+ if content_disposition:
116
+ filenames = re.findall('filename="(.+)"', content_disposition)
117
+ original_name = secure_filename(filenames[0]) if filenames else f"{str(uuid.uuid4())}.pdf"
118
+ else:
119
+ original_name = f"{str(uuid.uuid4())}.pdf"
120
+ elif path_or_url.lower().endswith('.pdf'):
121
+ response = requests.get(path_or_url, stream=True)
122
+ response.raise_for_status()
123
+ original_name = secure_filename(path_or_url.split('/')[-1]) or f"{str(uuid.uuid4())}.pdf"
124
+ else:
125
+ raise ValueError("URL is not a recognized Google Drive or direct .pdf link.")
126
+
127
+ local_path = os.path.join(UPLOAD_FOLDER, f"temp_{original_name}")
128
+ with open(local_path, 'wb') as f:
129
+ for chunk in response.iter_content(chunk_size=8192):
130
+ f.write(chunk)
131
+
132
+ return local_path, original_name, True
133
+ except Exception as e:
134
+ click.secho(f"Error downloading file: {e}", fg="red", err=True)
135
+ return None, None, False
136
+ else:
137
+ if not os.path.exists(path_or_url):
138
+ click.secho(f"Error: File not found at {path_or_url}", fg="red", err=True)
139
+ return None, None, False
140
+ return path_or_url, secure_filename(os.path.basename(path_or_url)), False
141
+
142
+ # --- CLI Group ---
143
+ @click.group()
144
+ def cli():
145
+ """A CLI for managing the Report Generator application."""
146
+ pass
147
+
148
+ # --- CLI Commands ---
149
+ @cli.command()
150
+ def db_init():
151
+ """Initializes or updates the database schema."""
152
+ click.secho("Initializing database schema...", fg="yellow")
153
+ setup_database_cli()
154
+ click.secho("Database schema is up to date.", fg="green")
155
+
156
+ @cli.command()
157
+ def db_cleanup():
158
+ """Cleans up old, non-persisted data."""
159
+ click.secho("Starting cleanup of old data...", fg="yellow")
160
+ cleanup_old_data_cli()
161
+ click.secho("Cleanup finished.", fg="green")
162
+
163
+ @cli.command('add-question')
164
+ @click.option('--session-id', required=True, type=click.STRING)
165
+ @click.option('--image-path', required=True, type=click.Path(exists=True))
166
+ @click.option('--q-num', type=click.STRING)
167
+ @click.option('--status', type=click.Choice(['Correct', 'Wrong', 'Unattempted']))
168
+ @click.option('--marked-ans', type=click.STRING)
169
+ @click.option('--correct-ans', type=click.STRING)
170
+ @click.option('--subject', type=click.STRING)
171
+ @click.option('--time', type=click.STRING)
172
+ def add_question(session_id, image_path, q_num, status, marked_ans, correct_ans, subject, time):
173
+ """Adds a single question with metadata to the database."""
174
+ setup_database_cli() # Ensure database tables exist
175
+ try:
176
+ conn = get_db_connection()
177
+ cursor = conn.cursor()
178
+
179
+ # 1. Copy image to processed folder
180
+ original_filename = secure_filename(os.path.basename(image_path))
181
+ processed_filename = f"processed_{session_id}_{str(uuid.uuid4())[:8]}_{original_filename}"
182
+ processed_path = os.path.join(PROCESSED_FOLDER, processed_filename)
183
+ import shutil
184
+ shutil.copy(image_path, processed_path)
185
+
186
+ # 2. Create a new image record
187
+ # Find the next available image_index for the session
188
+ cursor.execute("SELECT MAX(image_index) FROM images WHERE session_id = ?", (session_id,))
189
+ max_index = cursor.fetchone()[0]
190
+ new_index = (max_index or -1) + 1
191
+
192
+ cursor.execute(
193
+ 'INSERT INTO images (session_id, image_index, filename, original_name, processed_filename, image_type) VALUES (?, ?, ?, ?, ?, ?)',
194
+ (session_id, new_index, original_filename, original_filename, processed_filename, 'cropped')
195
+ )
196
+ image_id = cursor.lastrowid
197
+
198
+ # 3. Create a new question record
199
+ cursor.execute(
200
+ 'INSERT INTO questions (session_id, image_id, question_number, status, marked_solution, actual_solution, subject, time_taken) VALUES (?, ?, ?, ?, ?, ?, ?, ?)',
201
+ (session_id, image_id, q_num, status, marked_ans, correct_ans, subject, time)
202
+ )
203
+
204
+ conn.commit()
205
+ click.secho(f"Successfully added question {q_num} (Image ID: {image_id}) to session {session_id}.", fg="green")
206
+
207
+ except Exception as e:
208
+ click.secho(f"Error adding question: {e}", fg="red", err=True)
209
+ raise click.Abort()
210
+ finally:
211
+ if conn:
212
+ conn.close()
213
+
214
+ @cli.command('upload')
215
+ @click.argument('pdf_paths', type=click.STRING)
216
+ @click.option('--simple-progress', is_flag=True, help='Print simple percentage progress to stdout.')
217
+ @click.option('--final', is_flag=True, help='Mark the PDF as a final version and add to generated_pdfs table.')
218
+ @click.option('--subject', type=click.STRING, help='Subject for the final PDF.')
219
+ @click.option('--tags', type=click.STRING, help='Tags for the final PDF (comma-separated).')
220
+ @click.option('--notes', type=click.STRING, help='Notes for the final PDF.')
221
+ @click.option('--log', is_flag=True, help='Log all output to cli.log.')
222
+ def upload(pdf_paths, simple_progress, final, subject, tags, notes, log):
223
+ """
224
+ A CLI tool to upload a large PDF directly to the application's database.
225
+ PDF_PATHS: A comma-separated list of full paths to the PDF files you wish to upload or Google Drive URLs.
226
+ """
227
+ setup_database_cli() # Ensure database tables exist
228
+ if log:
229
+ try:
230
+ log_f = open('cli.log', 'a')
231
+ sys.stdout = log_f
232
+ sys.stderr = log_f
233
+ except Exception as e:
234
+ click.secho(f"Error opening log file: {e}", fg="red", err=True)
235
+ raise click.Abort()
236
+
237
+ click.echo(f"--- Log entry: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')} ---")
238
+ click.echo(f"Arguments: pdf_paths={pdf_paths}, simple_progress={simple_progress}, final={final}, subject={subject}, tags={tags}, notes={notes}, log={log}")
239
+ click.echo("---" * 20)
240
+
241
+ files_to_process = [p.strip() for p in pdf_paths.split(',')]
242
+
243
+ for pdf_path_or_url in files_to_process:
244
+ click.secho(f"--- Processing: {click.style(pdf_path_or_url, bold=True)} ---", fg="yellow")
245
+
246
+ local_pdf_path, original_filename, is_temp = _get_local_pdf_path(pdf_path_or_url)
247
+
248
+ if not local_pdf_path:
249
+ continue
250
+
251
+ try:
252
+ if final:
253
+ if not subject:
254
+ click.secho("Error: --subject is required when using --final.", fg="red", err=True)
255
+ raise click.Abort()
256
+
257
+ session_id = str(uuid.uuid4())
258
+ conn = get_db_connection()
259
+ cursor = conn.cursor()
260
+ cursor.execute('INSERT INTO sessions (id, original_filename) VALUES (?, ?)',
261
+ (session_id, original_filename))
262
+
263
+ output_filename = original_filename
264
+ output_path = os.path.join(OUTPUT_FOLDER, output_filename)
265
+
266
+ if os.path.exists(output_path):
267
+ timestamp = datetime.now().strftime('%Y%m%d%H%M%S')
268
+ output_filename = f"{timestamp}_{original_filename}"
269
+ output_path = os.path.join(OUTPUT_FOLDER, output_filename)
270
+ click.secho(f"Warning: File '{original_filename}' already exists. Saving as '{output_filename}'.", fg="yellow")
271
+
272
+ import shutil
273
+ shutil.copy(local_pdf_path, output_path)
274
+
275
+ cursor.execute(
276
+ 'INSERT INTO generated_pdfs (session_id, filename, subject, tags, notes, source_filename) VALUES (?, ?, ?, ?, ?, ?)',
277
+ (session_id, output_filename, subject, tags, notes, original_filename)
278
+ )
279
+ conn.commit()
280
+ conn.close()
281
+ click.secho(f"Successfully added final PDF '{original_filename}' to the database.", fg="green")
282
+
283
+ else: # Standard page-extraction mode
284
+ click.echo(f"Processing PDF: {click.style(original_filename, bold=True)}")
285
+ session_id = str(uuid.uuid4())
286
+ doc = fitz.open(local_pdf_path)
287
+ num_pages = len(doc)
288
+ if num_pages == 0:
289
+ click.secho("Warning: This PDF has 0 pages. Nothing to process.", fg="yellow")
290
+ continue
291
+
292
+ click.echo(f"PDF contains {num_pages} pages to process.")
293
+ conn = get_db_connection()
294
+ cursor = conn.cursor()
295
+ cursor.execute('INSERT INTO sessions (id, original_filename) VALUES (?, ?)',
296
+ (session_id, original_filename))
297
+ click.echo(f"Created session: {click.style(session_id, fg='cyan')}")
298
+
299
+ images_to_insert = []
300
+
301
+ if simple_progress:
302
+ for i, page in enumerate(doc):
303
+ pix = page.get_pixmap(dpi=150)
304
+ page_filename = f"{session_id}_page_{i}.png"
305
+ page_path = os.path.join(UPLOAD_FOLDER, page_filename)
306
+ pix.save(page_path)
307
+ images_to_insert.append(
308
+ (session_id, i, page_filename, f"Page {i + 1}", 'original')
309
+ )
310
+ percentage = int(((i + 1) / num_pages) * 100)
311
+ sys.stdout.write(f"{percentage}\n")
312
+ sys.stdout.flush()
313
+ else:
314
+ progress = Progress(
315
+ SpinnerColumn(),
316
+ TextColumn("[progress.description]{task.description}"),
317
+ BarColumn(bar_width=None),
318
+ TextColumn("[progress.percentage]{task.percentage:>3.0f}%"),
319
+ TextColumn("• Page {task.completed}/{task.total}"),
320
+ TextColumn("• Elapsed:"), TimeElapsedColumn(),
321
+ TextColumn("• Remaining:"), TimeRemainingColumn(),
322
+ )
323
+ with progress:
324
+ task = progress.add_task("[green]Extracting pages...", total=num_pages)
325
+ for i, page in enumerate(doc):
326
+ pix = page.get_pixmap(dpi=150)
327
+ page_filename = f"{session_id}_page_{i}.png"
328
+ page_path = os.path.join(UPLOAD_FOLDER, page_filename)
329
+ pix.save(page_path)
330
+ images_to_insert.append(
331
+ (session_id, i, page_filename, f"Page {i + 1}", 'original')
332
+ )
333
+ progress.update(task, advance=1)
334
+
335
+ click.echo("\nInserting image records into the database...")
336
+ cursor.executemany(
337
+ 'INSERT INTO images (session_id, image_index, filename, original_name, image_type) VALUES (?, ?, ?, ?, ?)',
338
+ images_to_insert
339
+ )
340
+ conn.commit()
341
+ click.secho(f"Successfully committed {len(images_to_insert)} records to the database.", fg="green")
342
+ doc.close()
343
+
344
+ except Exception as e:
345
+ click.secho(f"An unexpected error occurred while processing {original_filename}: {e}", fg="red", err=True)
346
+
347
+ finally:
348
+ if is_temp and os.path.exists(local_pdf_path):
349
+ os.remove(local_pdf_path)
350
+
351
+ click.secho(f"\n✅ All done! Upload complete for '{original_filename}'.", fg="green", bold=True)
352
+
353
+ if __name__ == '__main__':
354
+ cli()
clients.json ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "wooderin": {
3
+ "name": "charges-ready",
4
+ "client_id": "wooderin",
5
+ "sessions": {
6
+ "3_11_2025__10_18_18_pm.wooderin": {
7
+ "tasks": {
8
+ "338db4f6-bf6a-4171-9856-da4f64c00019": {
9
+ "task_id": "338db4f6-bf6a-4171-9856-da4f64c00019",
10
+ "progress": {
11
+ "audio": {
12
+ "type": "audio",
13
+ "total": 2132,
14
+ "current": 2132,
15
+ "percentage": 100.0,
16
+ "segment_num": 2101,
17
+ "success": true,
18
+ "failed_segments": [],
19
+ "timestamp": "2025-11-03 16:51:00"
20
+ },
21
+ "video": {
22
+ "type": "video",
23
+ "total": 2131,
24
+ "current": 2131,
25
+ "percentage": 100.0,
26
+ "segment_num": 2125,
27
+ "success": true,
28
+ "failed_segments": [],
29
+ "timestamp": "2025-11-03 16:51:49"
30
+ }
31
+ },
32
+ "status": "completed",
33
+ "name": "Oscillations_01__Feel_of_SHM__Rescheduled_at_0830_AM__NO_DPP_",
34
+ "out_dir": "/data/data/com.termux/files/home/pwdlv3/webdl",
35
+ "id": "68b7ea1c18f010ab787a2fc9",
36
+ "batch_name": "678b4cf5a3a368218a2b16e7",
37
+ "topic_name": null,
38
+ "lecture_url": null,
39
+ "client_id": "wooderin",
40
+ "session_id": "3_11_2025__10_18_18_pm.wooderin",
41
+ "url": "/get-file/338db4f6-bf6a-4171-9856-da4f64c00019/Oscillations_01__Feel_of_SHM__Rescheduled_at_0830_AM__NO_DPP_"
42
+ }
43
+ },
44
+ "name": "invitations-cancel",
45
+ "timestamp": "2025-11-03 22:18:19"
46
+ }
47
+ }
48
+ }
49
+ }
color_rm.md ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Color Removal Tool (`color_rm`)
2
+
3
+ This document provides an overview of the Color Removal Tool, explaining its functionality, interface, and workflow.
4
+
5
+ ## Overview
6
+
7
+ The Color Removal Tool is a feature designed to process images by selectively removing colors. The core idea is to keep specific colors in an image while removing all others. This is particularly useful for cleaning up scanned documents, highlighting specific content, or creating stylized images.
8
+
9
+ The tool operates on the principle of "colors to keep". You select the colors you want to preserve, and the tool removes everything else.
10
+
11
+ ## Workflow
12
+
13
+ The process of using the Color Removal Tool is as follows:
14
+
15
+ 1. **Upload:** You start by uploading a PDF or a set of images. If a PDF is uploaded, it is automatically split into individual pages, which are treated as images.
16
+
17
+ 2. **Interface:** You are then taken to the color removal interface. Here, you can navigate through the pages of your document one by one.
18
+
19
+ 3. **Color Selection:**
20
+ * **From Image:** Click directly on the image to select a color you want to keep. The selected color will appear as a swatch in the "Selected Colors" list.
21
+ * **Manual Picker:** Use the color picker to choose a color manually and add it to the list.
22
+
23
+ 4. **Adjust Settings:**
24
+ * **Match Strictness:** This slider controls how closely a color in the image must match a selected color to be kept. A stricter setting means only very similar colors will be kept. A looser setting will keep a wider range of shades.
25
+ * **Background Fill:** You can choose what the removed parts of the image are filled with: black, white, or transparent.
26
+ * **Region Selection:** You can choose to apply the color removal effect only to a specific rectangular region of the image.
27
+
28
+ 5. **Preview:** Click the "Preview" button to see the result of your selections and settings applied to the current image.
29
+
30
+ 6. **Save & Continue:**
31
+ * **Save:** Saves the processed image for the current page and automatically moves to the next page.
32
+ * **Apply to All Pages:** This will process all pages in the session using the current settings (colors, strictness, etc.). This is a batch operation and may take some time.
33
+
34
+ 7. **Generate PDF:** Once you have processed the desired pages, you can generate a PDF of the results. You can choose to include all pages or a specific range of pages.
35
+
36
+ ## Processing Logic
37
+
38
+ The color matching is not based on simple RGB values. Instead, it uses the **CIELAB color space**. This color space is designed to approximate human vision, so the "distance" between two colors in this space is more perceptually uniform.
39
+
40
+ When you select a color, the tool calculates its LAB value. Then, for each pixel in the image, it calculates the difference (Delta E) between the pixel's color and the selected colors. If the difference is below the "Match Strictness" threshold for any of the selected colors, the pixel is kept. Otherwise, it is removed.
41
+
42
+ ## Keyboard Shortcuts
43
+
44
+ The following keyboard shortcuts are available in the color removal interface for faster workflow:
45
+
46
+ * `ArrowRight`: Go to the next page.
47
+ * `ArrowLeft`: Go to the previous page.
48
+ * `Enter`: Save the current page and go to the next one.
49
+ * `p` or `P`: Toggle the preview.
50
+ * `r` or `R`: Reset the current image to its original state (a confirmation will be asked).
51
+ * `Delete` or `Backspace`: Remove the last selected color.
52
+
53
+ **Note:** Shortcuts are disabled when you are typing in an input field (e.g., the page number input).
config.py ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+
3
+ class Config:
4
+ MAX_CONTENT_LENGTH = 16 * 1024 * 1024 * 4096
5
+ UPLOAD_FOLDER = 'uploads'
6
+ PROCESSED_FOLDER = 'processed'
7
+ OUTPUT_FOLDER = 'output'
8
+ DATABASE = 'database.db'
9
+ NVIDIA_API_KEY = os.getenv("NVIDIA_API_KEY")
10
+ NIM_API_URL = "https://ai.api.nvidia.com/v1/cv/nvidia/nemoretriever-ocr-v1"
11
+ NIM_HEADERS = {
12
+ "Authorization": f"Bearer {NVIDIA_API_KEY}",
13
+ "Accept": "application/json",
14
+ "Content-Type": "application/json",
15
+ }
16
+ MODEL_MAX_WIDTH = 500
17
+ MODEL_MAX_HEIGHT = 500
18
+ NVIDIA_NIM_AVAILABLE = bool(NVIDIA_API_KEY)
dash ADDED
@@ -0,0 +1,724 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {% extends "base.html" %}
2
+
3
+ {% block title %}Subjective Questions Manager{% endblock %}
4
+
5
+ {% block head %}
6
+ <script src="https://cdn.jsdelivr.net/npm/@editorjs/editorjs@latest"></script>
7
+ <script src="https://cdn.jsdelivr.net/npm/@editorjs/header@latest"></script>
8
+ <script src="https://cdn.jsdelivr.net/npm/@editorjs/list@latest"></script>
9
+ <script src="https://cdn.jsdelivr.net/npm/@editorjs/raw@latest"></script>
10
+ <script src="https://cdn.jsdelivr.net/npm/@editorjs/simple-image@latest"></script>
11
+ <style>
12
+ .codex-editor__redactor { padding-bottom: 50px !important; }
13
+ .ce-block__content { max-width: 100%; }
14
+
15
+ /* Dark Mode Editor Fixes */
16
+ #editorjs {
17
+ color: #e0e0e0; /* Light gray text for better readability than pure white */
18
+ }
19
+ #editorjs ::selection {
20
+ background-color: #0d6efd; /* Bootstrap Primary Blue */
21
+ color: white;
22
+ }
23
+ /* Fix toolbar icons in dark mode if needed */
24
+ .ce-toolbar__content, .ce-block__content, .ce-inline-toolbar {
25
+ color: black; /* EditorJS tools are usually light-themed by default, keeping them black text on their white bg might be safer unless we fully theme them */
26
+ }
27
+ /* But wait, we set bg-dark on the container.
28
+ EditorJS doesn't natively support dark mode well without a plugin or heavy overrides.
29
+ Let's try to keep the text inside blocks light.
30
+ */
31
+ .ce-block__content, .ce-header {
32
+ color: #e0e0e0 !important;
33
+ }
34
+
35
+ /* Selection fix for the dark container */
36
+ .codex-editor ::selection {
37
+ background-color: rgba(13, 110, 253, 0.5);
38
+ color: #ffffff;
39
+ }
40
+ </style>
41
+ {% endblock %}
42
+
43
+ {% block content %}
44
+ <div class="container mt-4">
45
+
46
+ <!-- Toolbar -->
47
+ <div class="d-flex justify-content-between align-items-center mb-3">
48
+ <div>
49
+ <h2 class="mb-0">Subjective Questions</h2>
50
+ <nav aria-label="breadcrumb">
51
+ <ol class="breadcrumb mb-0">
52
+ <li class="breadcrumb-item"><a href="{{ url_for('subjective.list_questions') }}">Home</a></li>
53
+ {% for crumb in breadcrumbs %}
54
+ <li class="breadcrumb-item"><a href="{{ url_for('subjective.list_questions', folder_path=crumb.path) }}">{{ crumb.name }}</a></li>
55
+ {% endfor %}
56
+ </ol>
57
+ </nav>
58
+ </div>
59
+ <div class="btn-group">
60
+ <button type="button" class="btn btn-success" onclick="openAddQuestionModal()">
61
+ <i class="bi bi-plus-circle"></i> Add Question
62
+ </button>
63
+ <button type="button" class="btn btn-primary" data-bs-toggle="modal" data-bs-target="#createFolderModal">
64
+ <i class="bi bi-folder-plus"></i> New Folder
65
+ </button>
66
+ <button type="button" class="btn btn-warning text-dark" id="moveBtn" disabled data-bs-toggle="modal" data-bs-target="#moveModal">
67
+ <i class="bi bi-arrow-right-square"></i> Move Selected
68
+ </button>
69
+ <button type="button" class="btn btn-outline-secondary" data-bs-toggle="modal" data-bs-target="#reorderTopicsModal">
70
+ <i class="bi bi-sort-down"></i> Reorder
71
+ </button>
72
+ <a href="{{ url_for('subjective.print_folder', folder_id=current_folder_id if current_folder_id else 'root') }}" target="_blank" class="btn btn-secondary">
73
+ <i class="bi bi-printer"></i> Print PDF
74
+ </a>
75
+ <a href="{{ url_for('subjective.generator') }}" class="btn btn-info">
76
+ <i class="bi bi-magic"></i> Generator
77
+ </a>
78
+ </div>
79
+ </div>
80
+
81
+ <!-- Flash Messages -->
82
+ {% with messages = get_flashed_messages(with_categories=true) %}
83
+ {% if messages %}
84
+ {% for category, message in messages %}
85
+ <div class="alert alert-{{ category }} alert-dismissible fade show" role="alert">
86
+ {{ message }}
87
+ <button type="button" class="btn-close" data-bs-dismiss="alert" aria-label="Close"></button>
88
+ </div>
89
+ {% endfor %}
90
+ {% endif %}
91
+ {% endwith %}
92
+
93
+ <!-- Folders Section -->
94
+ {% if subfolders %}
95
+ <div class="row row-cols-2 row-cols-md-4 row-cols-lg-6 g-3 mb-4">
96
+ {% for folder in subfolders %}
97
+ <div class="col">
98
+ <div class="card h-100 shadow-sm text-center folder-card position-relative bg-dark text-white border-secondary">
99
+ <!-- Selection Checkbox -->
100
+ <div class="position-absolute top-0 end-0 p-2">
101
+ <input type="checkbox" class="form-check-input folder-checkbox" value="{{ folder.id }}">
102
+ </div>
103
+
104
+ <a href="{{ url_for('subjective.list_questions', folder_path=(breadcrumbs|map(attribute='name')|join('/') + '/' + folder.name if breadcrumbs else folder.name)) }}" class="text-decoration-none text-white d-flex flex-column align-items-center justify-content-center h-100 pt-4">
105
+ <div class="card-body w-100">
106
+ <i class="bi bi-folder-fill text-primary display-4"></i>
107
+ <h6 class="card-title mt-2" style="white-space: normal; overflow-wrap: break-word;">{{ folder.name }}</h6>
108
+ </div>
109
+ </a>
110
+ </div>
111
+ </div>
112
+ {% endfor %}
113
+ </div>
114
+ {% endif %}
115
+
116
+ <!-- Questions Section -->
117
+ {% if grouped_questions %}
118
+ {% for topic, questions_list in grouped_questions.items() %}
119
+ <div class="mb-5 topic-group">
120
+ <div class="topic-header p-2 rounded text-white mb-2 shadow-sm d-flex justify-content-between align-items-center">
121
+ <h5 class="mb-0 fw-bold"><i class="bi bi-bookmarks me-2"></i>{{ topic }}</h5>
122
+ <div class="d-flex align-items-center">
123
+ <button class="btn btn-sm btn-outline-light me-2 border-0" title="Add Question to this Topic" onclick="addQuestionToTopic('{{ topic }}', this)">
124
+ <i class="bi bi-plus-circle"></i>
125
+ </button>
126
+ <button class="btn btn-sm btn-outline-light me-2 border-0" title="Rename Topic" onclick="renameTopic('{{ topic }}')">
127
+ <i class="bi bi-pencil"></i>
128
+ </button>
129
+ <button class="btn btn-sm btn-outline-light me-2 border-0" title="Delete Topic" onclick="deleteTopic('{{ topic }}')">
130
+ <i class="bi bi-trash"></i>
131
+ </button>
132
+ <span class="badge bg-light text-dark me-3">{{ questions_list|length }} Questions</span>
133
+ <div class="form-check">
134
+ <input class="form-check-input group-select-all" type="checkbox" title="Select all in this topic">
135
+ </div>
136
+ </div>
137
+ </div>
138
+ <div class="card shadow-sm border-0">
139
+ <div class="table-responsive">
140
+ <table class="table table-hover mb-0 align-middle">
141
+ <thead class="table-light">
142
+ <tr>
143
+ <th style="width: 40px;"></th>
144
+ <th style="width: 60px;">#</th>
145
+ <th>Question</th>
146
+ <th style="width: 150px;" class="text-end">Actions</th>
147
+ </tr>
148
+ </thead>
149
+ <tbody>
150
+ {% for q in questions_list %}
151
+ <tr>
152
+ <td><input type="checkbox" class="question-checkbox" value="{{ q.id }}"></td>
153
+ <td><span class="badge number-badge">{{ q.question_number_within_topic }}</span></td>
154
+ <td><div class="text-break" style="max-width: 600px;">{{ q.question_html | safe }}</div></td>
155
+ <td class="text-end">
156
+ <button class="btn btn-sm btn-outline-primary border-0" onclick='editQuestion("{{ q.id }}", "{{ q.question_topic }}", "{{ q.question_number_within_topic }}", `{{ q.question_html|safe }}`, `{{ q.question_json if q.question_json else "null" }}`)'>
157
+ <i class="bi bi-pencil-square"></i>
158
+ </button>
159
+ <button class="btn btn-sm btn-outline-danger border-0" onclick="deleteQuestion('{{ q.id }}')">
160
+ <i class="bi bi-trash"></i>
161
+ </button>
162
+ </td>
163
+ </tr>
164
+ {% endfor %}
165
+ </tbody>
166
+ </table>
167
+ </div>
168
+ </div>
169
+ </div>
170
+ {% endfor %}
171
+ {% elif not subfolders %}
172
+ <div class="alert alert-info text-center p-5">
173
+ <h4>No Content</h4>
174
+ <p>This folder is empty. Generate new questions or create subfolders.</p>
175
+ <a href="{{ url_for('subjective.generator') }}" class="btn btn-primary">Generate Questions</a>
176
+ </div>
177
+ {% endif %}
178
+ </div>
179
+
180
+ <!-- Create Folder Modal -->
181
+ <div class="modal fade" id="createFolderModal" tabindex="-1">
182
+ <div class="modal-dialog">
183
+ <div class="modal-content">
184
+ <div class="modal-header">
185
+ <h5 class="modal-title">Create New Folder</h5>
186
+ <button type="button" class="btn-close" data-bs-dismiss="modal"></button>
187
+ </div>
188
+ <div class="modal-body">
189
+ <form id="createFolderForm">
190
+ <div class="mb-3">
191
+ <label for="folderName" class="form-label">Folder Name</label>
192
+ <input type="text" class="form-control" id="folderName" required>
193
+ </div>
194
+ </form>
195
+ </div>
196
+ <div class="modal-footer">
197
+ <button type="button" class="btn btn-secondary" data-bs-dismiss="modal">Cancel</button>
198
+ <button type="button" class="btn btn-primary" onclick="createFolder()">Create</button>
199
+ </div>
200
+ </div>
201
+ </div>
202
+ </div>
203
+
204
+ <!-- Move Items Modal -->
205
+ <div class="modal fade" id="moveModal" tabindex="-1">
206
+ <div class="modal-dialog modal-lg">
207
+ <div class="modal-content bg-dark text-white">
208
+ <div class="modal-header">
209
+ <h5 class="modal-title">Move Selected Items</h5>
210
+ <button type="button" class="btn-close" data-bs-dismiss="modal"></button>
211
+ </div>
212
+ <div class="modal-body">
213
+ <p>Select destination folder:</p>
214
+ <div id="folder-tree-move" class="list-group" style="max-height: 300px; overflow-y: auto;"></div>
215
+ <hr>
216
+ <div class="input-group mb-3">
217
+ <input type="text" class="form-control" placeholder="New subfolder name" id="new-subfolder-name">
218
+ <button class="btn btn-outline-secondary" type="button" id="create-subfolder-btn">Create Subfolder</button>
219
+ </div>
220
+ </div>
221
+ <div class="modal-footer">
222
+ <button type="button" class="btn btn-secondary" data-bs-dismiss="modal">Cancel</button>
223
+ <button type="button" class="btn btn-primary" onclick="moveItems()">Move Here</button>
224
+ </div>
225
+ </div>
226
+ </div>
227
+ </div>
228
+
229
+ {% include 'reorder_modal.html' %}
230
+
231
+ <!-- Add/Edit Question Modal -->
232
+ <div class="modal fade" id="questionModal" tabindex="-1">
233
+ <div class="modal-dialog">
234
+ <div class="modal-content">
235
+ <div class="modal-header">
236
+ <h5 class="modal-title" id="questionModalTitle">Add Question</h5>
237
+ <button type="button" class="btn-close" data-bs-dismiss="modal"></button>
238
+ </div>
239
+ <div class="modal-body">
240
+ <input type="hidden" id="q_id">
241
+ <div class="mb-3">
242
+ <label class="form-label">Topic</label>
243
+ <input type="text" class="form-control" id="q_topic" list="topicList">
244
+ <datalist id="topicList">
245
+ {% for t in grouped_questions.keys() %}
246
+ <option value="{{ t }}">
247
+ {% endfor %}
248
+ </datalist>
249
+ </div>
250
+ <div class="mb-3">
251
+ <label class="form-label">Question Number</label>
252
+ <input type="text" class="form-control" id="q_number">
253
+ </div>
254
+ <div class="mb-3">
255
+ <label class="form-label">Question Content</label>
256
+ <div id="editorjs" class="border rounded p-3 bg-dark text-white" style="min-height: 300px;"></div>
257
+ </div>
258
+ </div>
259
+ <div class="modal-footer">
260
+ <button type="button" class="btn btn-secondary" data-bs-dismiss="modal">Cancel</button>
261
+ <button type="button" class="btn btn-primary" onclick="saveQuestion()">Save</button>
262
+ </div>
263
+ </div>
264
+ </div>
265
+ </div>
266
+
267
+ <!-- Rename Topic Modal -->
268
+ <div class="modal fade" id="renameTopicModal" tabindex="-1">
269
+ <div class="modal-dialog">
270
+ <div class="modal-content">
271
+ <div class="modal-header">
272
+ <h5 class="modal-title">Rename Topic</h5>
273
+ <button type="button" class="btn-close" data-bs-dismiss="modal"></button>
274
+ </div>
275
+ <div class="modal-body">
276
+ <input type="hidden" id="old_topic_name">
277
+ <div class="mb-3">
278
+ <label class="form-label">New Topic Name</label>
279
+ <input type="text" class="form-control" id="new_topic_name">
280
+ </div>
281
+ </div>
282
+ <div class="modal-footer">
283
+ <button type="button" class="btn btn-secondary" data-bs-dismiss="modal">Cancel</button>
284
+ <button type="button" class="btn btn-primary" onclick="performRenameTopic()">Rename</button>
285
+ </div>
286
+ </div>
287
+ </div>
288
+ </div>
289
+
290
+ <script>
291
+ let editor;
292
+
293
+ function initEditor(data = {}) {
294
+ if (editor) {
295
+ try {
296
+ editor.destroy();
297
+ } catch (e) {}
298
+ editor = null;
299
+ }
300
+
301
+ // Ensure data is an object
302
+ if (typeof data === 'string') {
303
+ try { data = JSON.parse(data); } catch(e) { data = {}; }
304
+ }
305
+
306
+ editor = new EditorJS({
307
+ holder: 'editorjs',
308
+ data: data,
309
+ tools: {
310
+ header: window.Header,
311
+ list: window.NestedList || window.List || window.EditorjsList,
312
+ raw: window.RawTool,
313
+ image: window.SimpleImage
314
+ },
315
+ placeholder: 'Type your question here...'
316
+ });
317
+ }
318
+
319
+ // Color generation for topics
320
+ function stringToHslColor(str, s, l) {
321
+ let hash = 0;
322
+ for (let i = 0; i < str.length; i++) {
323
+ hash = str.charCodeAt(i) + ((hash << 5) - hash);
324
+ }
325
+ const h = hash % 360;
326
+ return 'hsl(' + h + ', ' + s + '%, ' + l + '%)';
327
+ }
328
+
329
+ document.addEventListener('DOMContentLoaded', function() {
330
+ const folderTree = {{ folder_tree|tojson }};
331
+ let selectedFolderId = null;
332
+
333
+ // Apply colors to Topic Headers
334
+ document.querySelectorAll('.topic-header').forEach(header => {
335
+ // Use the text of the h5 element which contains the topic name
336
+ const topic = header.querySelector('h5').innerText.trim();
337
+ header.style.backgroundColor = stringToHslColor(topic, 70, 40);
338
+ });
339
+
340
+ // Apply consistent style to number badges
341
+ document.querySelectorAll('.number-badge').forEach(badge => {
342
+ badge.classList.remove('bg-secondary');
343
+ badge.style.backgroundColor = '#6c757d';
344
+ badge.style.color = 'white';
345
+ });
346
+
347
+ // Selection Logic
348
+ const selectAll = document.getElementById('selectAll');
349
+ const questionCheckboxes = document.querySelectorAll('.question-checkbox');
350
+ const folderCheckboxes = document.querySelectorAll('.folder-checkbox');
351
+ const moveBtn = document.getElementById('moveBtn');
352
+
353
+ function updateMoveBtn() {
354
+ const anyQuestionChecked = Array.from(questionCheckboxes).some(cb => cb.checked);
355
+ const anyFolderChecked = Array.from(folderCheckboxes).some(cb => cb.checked);
356
+ if (moveBtn) moveBtn.disabled = !(anyQuestionChecked || anyFolderChecked);
357
+ }
358
+
359
+ // Group Select All (Questions only)
360
+ document.querySelectorAll('.group-select-all').forEach(groupCb => {
361
+ groupCb.addEventListener('change', function() {
362
+ const group = this.closest('.topic-group');
363
+ const groupCheckboxes = group.querySelectorAll('.question-checkbox');
364
+ groupCheckboxes.forEach(cb => cb.checked = this.checked);
365
+ updateMoveBtn();
366
+ });
367
+ });
368
+
369
+ if (selectAll) {
370
+ selectAll.addEventListener('change', function() {
371
+ const isChecked = selectAll.checked;
372
+ questionCheckboxes.forEach(cb => cb.checked = isChecked);
373
+ folderCheckboxes.forEach(cb => cb.checked = isChecked);
374
+ // Also toggle group select alls for visual consistency
375
+ document.querySelectorAll('.group-select-all').forEach(cb => cb.checked = isChecked);
376
+ updateMoveBtn();
377
+ });
378
+ }
379
+
380
+ questionCheckboxes.forEach(cb => cb.addEventListener('change', updateMoveBtn));
381
+ folderCheckboxes.forEach(cb => cb.addEventListener('change', updateMoveBtn));
382
+
383
+ // Folder Tree Logic
384
+ function renderFolderTree(folders, container, level = 0) {
385
+ const ul = document.createElement('ul');
386
+ ul.className = 'list-group';
387
+ if (level > 0) ul.style.display = 'none';
388
+ container.appendChild(ul);
389
+ folders.forEach(folder => {
390
+ const li = document.createElement('li');
391
+ li.className = 'list-group-item list-group-item-action bg-dark text-white';
392
+ li.style.paddingLeft = `${level * 20 + 12}px`;
393
+ li.dataset.folderId = folder.id;
394
+ let icon = folder.children && folder.children.length > 0 ? '<i class="bi bi-chevron-right"></i>' : '<i class="bi bi-folder"></i>';
395
+ li.innerHTML = `${icon} ${folder.name}`;
396
+ li.style.cursor = 'pointer';
397
+ li.addEventListener('click', e => {
398
+ e.stopPropagation();
399
+ document.querySelectorAll('#folder-tree-move .list-group-item').forEach(item => item.classList.remove('active'));
400
+ li.classList.add('active');
401
+ selectedFolderId = folder.id;
402
+ const childUl = li.querySelector('ul');
403
+ if (childUl) {
404
+ const iconEl = li.querySelector('i');
405
+ if (childUl.style.display === 'none') {
406
+ childUl.style.display = 'block';
407
+ iconEl.classList.replace('bi-chevron-right', 'bi-chevron-down');
408
+ } else {
409
+ childUl.style.display = 'none';
410
+ iconEl.classList.replace('bi-chevron-down', 'bi-chevron-right');
411
+ }
412
+ }
413
+ });
414
+ ul.appendChild(li);
415
+ if (folder.children) renderFolderTree(folder.children, li, level + 1);
416
+ });
417
+ }
418
+
419
+ if (moveBtn) {
420
+ moveBtn.addEventListener('click', () => {
421
+ const folderTreeContainer = document.getElementById('folder-tree-move');
422
+ folderTreeContainer.innerHTML = '';
423
+ const rootItem = document.createElement('a');
424
+ rootItem.href = '#';
425
+ rootItem.className = 'list-group-item list-group-item-action bg-dark text-white';
426
+ rootItem.dataset.folderId = 'null';
427
+ rootItem.innerHTML = `<i class="bi bi-house-door"></i> Root`;
428
+ rootItem.addEventListener('click', e => {
429
+ e.preventDefault();
430
+ document.querySelectorAll('#folder-tree-move .list-group-item').forEach(item => item.classList.remove('active'));
431
+ rootItem.classList.add('active');
432
+ selectedFolderId = null;
433
+ });
434
+ folderTreeContainer.appendChild(rootItem);
435
+ renderFolderTree(folderTree, folderTreeContainer);
436
+ });
437
+ }
438
+
439
+ document.getElementById('create-subfolder-btn').addEventListener('click', () => {
440
+ const newFolderName = document.getElementById('new-subfolder-name').value;
441
+ if (!newFolderName) return alert('Please enter a name for the new subfolder.');
442
+
443
+ fetch("{{ url_for('subjective.create_folder') }}", {
444
+ method: "POST",
445
+ headers: {
446
+ "Content-Type": "application/json",
447
+ "X-CSRFToken": "{{ csrf_token() if csrf_token else '' }}"
448
+ },
449
+ body: JSON.stringify({
450
+ name: newFolderName,
451
+ parent_id: selectedFolderId
452
+ })
453
+ })
454
+ .then(res => res.json())
455
+ .then(data => {
456
+ if (data.success) {
457
+ document.getElementById('new-subfolder-name').value = '';
458
+ const newFolder = { id: data.id, name: data.name, parent_id: data.parent_id, children: [] };
459
+ function addFolderToTree(folders, parentId, newFolderItem) {
460
+ if (!parentId) { folders.push(newFolderItem); return true; }
461
+ for (let folder of folders) {
462
+ if (folder.id === parentId) {
463
+ if (!folder.children) folder.children = [];
464
+ folder.children.push(newFolderItem);
465
+ return true;
466
+ }
467
+ if (folder.children && addFolderToTree(folder.children, parentId, newFolderItem)) return true;
468
+ }
469
+ return false;
470
+ }
471
+ addFolderToTree(folderTree, selectedFolderId, newFolder);
472
+ const folderTreeContainer = document.getElementById('folder-tree-move');
473
+ folderTreeContainer.innerHTML = '';
474
+ const rootItem = document.createElement('a');
475
+ rootItem.href = '#';
476
+ rootItem.className = 'list-group-item list-group-item-action bg-dark text-white';
477
+ if (selectedFolderId === null) rootItem.classList.add('active');
478
+ rootItem.dataset.folderId = 'null';
479
+ rootItem.innerHTML = `<i class="bi bi-house-door"></i> Root`;
480
+ rootItem.addEventListener('click', e => {
481
+ e.preventDefault();
482
+ document.querySelectorAll('#folder-tree-move .list-group-item').forEach(item => item.classList.remove('active'));
483
+ rootItem.classList.add('active');
484
+ selectedFolderId = null;
485
+ });
486
+ folderTreeContainer.appendChild(rootItem);
487
+ renderFolderTree(folderTree, folderTreeContainer);
488
+ } else {
489
+ alert('Error: ' + data.error);
490
+ }
491
+ });
492
+ });
493
+
494
+ // Expose moveItems globally
495
+ window.moveItems = function() {
496
+ const questionIds = Array.from(document.querySelectorAll('.question-checkbox:checked')).map(cb => cb.value);
497
+ const folderIds = Array.from(document.querySelectorAll('.folder-checkbox:checked')).map(cb => cb.value);
498
+
499
+ if (questionIds.length === 0 && folderIds.length === 0) return;
500
+
501
+ fetch("{{ url_for('subjective.move_items') }}", {
502
+ method: "POST",
503
+ headers: {
504
+ "Content-Type": "application/json",
505
+ "X-CSRFToken": "{{ csrf_token() if csrf_token else '' }}"
506
+ },
507
+ body: JSON.stringify({
508
+ question_ids: questionIds,
509
+ folder_ids: folderIds,
510
+ target_folder_id: selectedFolderId
511
+ })
512
+ })
513
+ .then(res => res.json())
514
+ .then(data => {
515
+ if (data.success) {
516
+ location.reload();
517
+ } else {
518
+ alert('Error: ' + data.error);
519
+ }
520
+ });
521
+ };
522
+ });
523
+
524
+ function createFolder() {
525
+ const name = document.getElementById('folderName').value;
526
+ if (!name) return;
527
+
528
+ fetch("{{ url_for('subjective.create_folder') }}", {
529
+ method: "POST",
530
+ headers: {
531
+ "Content-Type": "application/json",
532
+ "X-CSRFToken": "{{ csrf_token() if csrf_token else '' }}"
533
+ },
534
+ body: JSON.stringify({
535
+ name: name,
536
+ parent_id: {{ current_folder_id if current_folder_id else 'null' }}
537
+ })
538
+ })
539
+ .then(res => res.json())
540
+ .then(data => {
541
+ if (data.success) {
542
+ location.reload();
543
+ } else {
544
+ alert('Error: ' + data.error);
545
+ }
546
+ });
547
+ }
548
+
549
+ // Manual Question Management Functions
550
+ let questionModal;
551
+ let renameTopicModal;
552
+
553
+ function addQuestionToTopic(topic, btn) {
554
+ // Find the parent topic group container
555
+ const topicGroup = btn.closest('.topic-group');
556
+ let maxNum = 0;
557
+
558
+ // Find all question number badges in this group
559
+ const badges = topicGroup.querySelectorAll('.number-badge');
560
+ badges.forEach(badge => {
561
+ // Extract number, handling potential non-numeric prefixes if any (though strictly they are just numbers in DB usually)
562
+ // Using regex to extract first sequence of digits
563
+ const text = badge.innerText;
564
+ const match = text.match(/(\d+)/);
565
+ if (match) {
566
+ const num = parseInt(match[0], 10);
567
+ if (num > maxNum) maxNum = num;
568
+ }
569
+ });
570
+
571
+ openAddQuestionModal(topic, maxNum + 1);
572
+ }
573
+
574
+ function openAddQuestionModal(prefillTopic = "", prefillNumber = "") {
575
+ if (!questionModal) questionModal = new bootstrap.Modal(document.getElementById('questionModal'));
576
+ document.getElementById('questionModalTitle').innerText = "Add Question";
577
+ document.getElementById('q_id').value = "";
578
+
579
+ // Prefill or clear
580
+ document.getElementById('q_topic').value = prefillTopic;
581
+ document.getElementById('q_number').value = prefillNumber;
582
+
583
+ initEditor();
584
+
585
+ questionModal.show();
586
+ }
587
+
588
+ function editQuestion(id, topic, number, html, jsonStr) {
589
+ if (!questionModal) questionModal = new bootstrap.Modal(document.getElementById('questionModal'));
590
+ document.getElementById('questionModalTitle').innerText = "Edit Question";
591
+ document.getElementById('q_id').value = id;
592
+ document.getElementById('q_topic').value = topic;
593
+ document.getElementById('q_number').value = number;
594
+
595
+ let data = {};
596
+ if (jsonStr && jsonStr !== 'None' && jsonStr !== 'null') {
597
+ try { data = JSON.parse(jsonStr); } catch(e) { console.error(e); }
598
+ } else if (html) {
599
+ // Migration: Convert legacy HTML to a Paragraph block
600
+ data = {
601
+ time: Date.now(),
602
+ blocks: [{
603
+ type: "paragraph",
604
+ data: { text: html }
605
+ }]
606
+ };
607
+ }
608
+
609
+ initEditor(data);
610
+ questionModal.show();
611
+ }
612
+
613
+ async function saveQuestion() {
614
+ const id = document.getElementById('q_id').value;
615
+ const topic = document.getElementById('q_topic').value;
616
+ const number = document.getElementById('q_number').value;
617
+ const folderId = {{ current_folder_id if current_folder_id else 'null' }};
618
+
619
+ const outputData = await editor.save();
620
+ const jsonStr = JSON.stringify(outputData);
621
+
622
+ // Generate HTML for legacy/display support (simple conversion)
623
+ let html = "";
624
+ outputData.blocks.forEach(block => {
625
+ if (block.type === 'header') html += `<h${block.data.level}>${block.data.text}</h${block.data.level}>`;
626
+ else if (block.type === 'paragraph') html += `<p>${block.data.text}</p>`;
627
+ else if (block.type === 'list') {
628
+ html += block.data.style === 'ordered' ? '<ol>' : '<ul>';
629
+ block.data.items.forEach(i => html += `<li>${i}</li>`);
630
+ html += block.data.style === 'ordered' ? '</ol>' : '</ul>';
631
+ }
632
+ else if (block.type === 'image' || block.type === 'simple-image') {
633
+ const url = block.data.url;
634
+ const caption = block.data.caption || '';
635
+ html += `<div class="text-center my-2"><img src="${url}" class="img-fluid rounded" style="max-height: 300px;" alt="${caption}"><br><small class="text-muted">${caption}</small></div>`;
636
+ }
637
+ else if (block.type === 'raw') html += block.data.html;
638
+ });
639
+
640
+ const url = id ? `/subjective/question/update/${id}` : '/subjective/question/add';
641
+ const payload = id ? { topic, number, html, json: jsonStr } : { topic, number, html, json: jsonStr, folder_id: folderId };
642
+
643
+ fetch(url, {
644
+ method: 'POST',
645
+ headers: {
646
+ 'Content-Type': 'application/json',
647
+ 'X-CSRFToken': "{{ csrf_token() if csrf_token else '' }}"
648
+ },
649
+ body: JSON.stringify(payload)
650
+ })
651
+ .then(res => res.json())
652
+ .then(data => {
653
+ if (data.success) {
654
+ location.reload();
655
+ } else {
656
+ alert("Error: " + data.error);
657
+ }
658
+ });
659
+ }
660
+
661
+ function deleteQuestion(id) {
662
+ if(!confirm("Are you sure you want to delete this question?")) return;
663
+ fetch(`/subjective/question/delete/${id}`, {
664
+ method: 'DELETE',
665
+ headers: {
666
+ 'X-CSRFToken': "{{ csrf_token() if csrf_token else '' }}"
667
+ }
668
+ })
669
+ .then(res => res.json())
670
+ .then(data => {
671
+ if (data.success) location.reload();
672
+ else alert("Error: " + data.error);
673
+ });
674
+ }
675
+
676
+ function renameTopic(oldName) {
677
+ if (!renameTopicModal) renameTopicModal = new bootstrap.Modal(document.getElementById('renameTopicModal'));
678
+ document.getElementById('old_topic_name').value = oldName;
679
+ document.getElementById('new_topic_name').value = oldName;
680
+ renameTopicModal.show();
681
+ }
682
+
683
+ function performRenameTopic() {
684
+ const oldTopic = document.getElementById('old_topic_name').value;
685
+ const newTopic = document.getElementById('new_topic_name').value;
686
+ const folderId = {{ current_folder_id if current_folder_id else 'null' }};
687
+
688
+ if (!newTopic) return;
689
+
690
+ fetch('/subjective/topic/rename', {
691
+ method: 'POST',
692
+ headers: {
693
+ 'Content-Type': 'application/json',
694
+ 'X-CSRFToken': "{{ csrf_token() if csrf_token else '' }}"
695
+ },
696
+ body: JSON.stringify({ old_topic: oldTopic, new_topic: newTopic, folder_id: folderId })
697
+ })
698
+ .then(res => res.json())
699
+ .then(data => {
700
+ if (data.success) location.reload();
701
+ else alert("Error: " + data.error);
702
+ });
703
+ }
704
+
705
+ function deleteTopic(topic) {
706
+ if(!confirm(`Are you sure you want to delete ALL questions in topic "${topic}"?`)) return;
707
+ const folderId = {{ current_folder_id if current_folder_id else 'null' }};
708
+
709
+ fetch('/subjective/topic/delete', {
710
+ method: 'POST',
711
+ headers: {
712
+ 'Content-Type': 'application/json',
713
+ 'X-CSRFToken': "{{ csrf_token() if csrf_token else '' }}"
714
+ },
715
+ body: JSON.stringify({ topic: topic, folder_id: folderId })
716
+ })
717
+ .then(res => res.json())
718
+ .then(data => {
719
+ if (data.success) location.reload();
720
+ else alert("Error: " + data.error);
721
+ });
722
+ }
723
+ </script>
724
+ {% endblock %}
dashboard.py ADDED
@@ -0,0 +1,253 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ from flask import Blueprint, render_template, request, redirect, url_for, flash, jsonify
3
+ from flask_login import login_required, current_user
4
+ from database import get_db_connection
5
+ import os
6
+ from flask import current_app
7
+
8
+ dashboard_bp = Blueprint('dashboard', __name__)
9
+
10
+ def get_session_size(session_id, user_id):
11
+ """Calculate the total size of files associated with a session."""
12
+ import os
13
+ from flask import current_app
14
+
15
+ # Import logging
16
+ try:
17
+ from rich.console import Console
18
+ from rich.table import Table
19
+ console = Console()
20
+ rich_available = True
21
+ except ImportError:
22
+ # Rich not available, just use basic logging
23
+ console = None
24
+ rich_available = False
25
+
26
+ current_app.logger.info(f"Calculating size for session_id: {session_id}")
27
+
28
+ total_size = 0
29
+ breakdown = []
30
+
31
+ conn = get_db_connection()
32
+
33
+ # Get all images associated with the session
34
+ images = conn.execute("""
35
+ SELECT filename, processed_filename, image_type
36
+ FROM images
37
+ WHERE session_id = ?
38
+ """, (session_id,)).fetchall()
39
+
40
+ # Add sizes of original and processed images
41
+ for image in images:
42
+ # Add original file size (in upload folder)
43
+ if image['filename']:
44
+ file_path = os.path.join(current_app.config['UPLOAD_FOLDER'], image['filename'])
45
+ if os.path.exists(file_path):
46
+ size = os.path.getsize(file_path)
47
+ total_size += size
48
+ current_app.logger.info(f" Original image {image['filename']}: {size} bytes")
49
+ breakdown.append(("Original Image", image['filename'], size))
50
+ else:
51
+ current_app.logger.info(f" Original image file not found: {file_path}")
52
+
53
+ # Add processed/cropped image size (in processed folder)
54
+ if image['processed_filename']:
55
+ file_path = os.path.join(current_app.config['PROCESSED_FOLDER'], image['processed_filename'])
56
+ if os.path.exists(file_path):
57
+ size = os.path.getsize(file_path)
58
+ total_size += size
59
+ current_app.logger.info(f" Processed image {image['processed_filename']}: {size} bytes")
60
+ breakdown.append(("Processed Image", image['processed_filename'], size))
61
+ else:
62
+ current_app.logger.info(f" Processed image file not found: {file_path}")
63
+
64
+ # Add size of original PDF file if it exists
65
+ session_info = conn.execute("SELECT original_filename FROM sessions WHERE id = ?", (session_id,)).fetchone()
66
+ if session_info and session_info['original_filename']:
67
+ # Try to find the original PDF in the upload folder with the session ID prefix
68
+ pdf_filename = f"{session_id}_{session_info['original_filename']}"
69
+ pdf_path = os.path.join(current_app.config['UPLOAD_FOLDER'], pdf_filename)
70
+ if os.path.exists(pdf_path):
71
+ size = os.path.getsize(pdf_path)
72
+ total_size += size
73
+ current_app.logger.info(f" Original PDF {pdf_filename}: {size} bytes")
74
+ breakdown.append(("Original PDF", pdf_filename, size))
75
+ else:
76
+ current_app.logger.info(f" Original PDF file not found: {pdf_path}")
77
+
78
+ # Add size of any generated PDFs for this session
79
+ generated_pdfs = conn.execute("""
80
+ SELECT filename
81
+ FROM generated_pdfs
82
+ WHERE session_id = ?
83
+ """, (session_id,)).fetchall()
84
+
85
+ for pdf in generated_pdfs:
86
+ if pdf['filename']:
87
+ pdf_path = os.path.join(current_app.config['OUTPUT_FOLDER'], pdf['filename'])
88
+ if os.path.exists(pdf_path):
89
+ size = os.path.getsize(pdf_path)
90
+ total_size += size
91
+ current_app.logger.info(f" Generated PDF {pdf['filename']}: {size} bytes")
92
+ breakdown.append(("Generated PDF", pdf['filename'], size))
93
+ else:
94
+ current_app.logger.info(f" Generated PDF file not found: {pdf_path}")
95
+
96
+ current_app.logger.info(f"Total size for session {session_id}: {total_size} bytes")
97
+
98
+ # Create a rich table to show breakdown if rich is available
99
+ if rich_available and console:
100
+ table = Table(title=f"Session {session_id} Size Breakdown")
101
+ table.add_column("File Type", style="cyan")
102
+ table.add_column("Filename", style="magenta")
103
+ table.add_column("Size (bytes)", style="green")
104
+
105
+ for file_type, filename, size in breakdown:
106
+ table.add_row(file_type, filename, str(size))
107
+
108
+ if breakdown:
109
+ console.print(table)
110
+ else:
111
+ console.print(f"[yellow]No files found for session {session_id}[/yellow]")
112
+
113
+ conn.close()
114
+ return total_size
115
+
116
+
117
+ def format_file_size(size_bytes):
118
+ """Convert bytes to human readable format."""
119
+ if size_bytes == 0:
120
+ return "0 B"
121
+
122
+ size_names = ["B", "KB", "MB", "GB"]
123
+ import math
124
+ i = int(math.floor(math.log(size_bytes, 1024)))
125
+ p = math.pow(1024, i)
126
+ s = round(size_bytes / p, 2)
127
+ return f"{s} {size_names[i]}"
128
+
129
+
130
+ @dashboard_bp.route('/dashboard')
131
+ @login_required
132
+ def dashboard():
133
+ # Check if size parameter is passed
134
+ show_size = request.args.get('size', type=int)
135
+
136
+ conn = get_db_connection()
137
+ sessions_rows = conn.execute("""
138
+ SELECT s.id, s.created_at, s.original_filename, s.persist, s.name, s.session_type,
139
+ COUNT(CASE WHEN i.image_type = 'original' THEN 1 END) as page_count,
140
+ COUNT(CASE WHEN i.image_type = 'cropped' THEN 1 END) as question_count
141
+ FROM sessions s
142
+ LEFT JOIN images i ON s.id = i.session_id
143
+ WHERE s.user_id = ?
144
+ GROUP BY s.id, s.created_at, s.original_filename, s.persist, s.name, s.session_type
145
+ ORDER BY s.created_at DESC
146
+ """, (current_user.id,)).fetchall()
147
+
148
+ sessions = []
149
+ for session in sessions_rows:
150
+ session_dict = dict(session)
151
+
152
+ # Calculate total size for this session only if requested
153
+ if show_size:
154
+ session_size = get_session_size(session_dict['id'], current_user.id)
155
+ session_dict['total_size'] = session_size
156
+ session_dict['total_size_formatted'] = format_file_size(session_size)
157
+
158
+ sessions.append(session_dict)
159
+
160
+ conn.close()
161
+
162
+ return render_template('dashboard.html', sessions=sessions, show_size=bool(show_size))
163
+
164
+ @dashboard_bp.route('/sessions/batch_delete', methods=['POST'])
165
+ @login_required
166
+ def batch_delete_sessions():
167
+ data = request.json
168
+ session_ids = data.get('ids', [])
169
+
170
+ if not session_ids:
171
+ return jsonify({'error': 'No session IDs provided'}), 400
172
+
173
+ try:
174
+ conn = get_db_connection()
175
+ for session_id in session_ids:
176
+ # Security Check: Ensure the session belongs to the current user
177
+ session_owner = conn.execute('SELECT user_id FROM sessions WHERE id = ?', (session_id,)).fetchone()
178
+ if not session_owner or session_owner['user_id'] != current_user.id:
179
+ # Silently skip or log an error, but don't delete
180
+ current_app.logger.warning(f"User {current_user.id} attempted to delete unauthorized session {session_id}.")
181
+ continue
182
+
183
+ # Delete associated files
184
+ images_to_delete = conn.execute('SELECT filename, processed_filename FROM images WHERE session_id = ?', (session_id,)).fetchall()
185
+ for img in images_to_delete:
186
+ if img['filename']:
187
+ try:
188
+ os.remove(os.path.join(current_app.config['UPLOAD_FOLDER'], img['filename']))
189
+ except OSError:
190
+ pass
191
+ if img['processed_filename']:
192
+ try:
193
+ os.remove(os.path.join(current_app.config['PROCESSED_FOLDER'], img['processed_filename']))
194
+ except OSError:
195
+ pass
196
+
197
+ # Delete from database
198
+ conn.execute('DELETE FROM questions WHERE session_id = ?', (session_id,))
199
+ conn.execute('DELETE FROM images WHERE session_id = ?', (session_id,))
200
+ conn.execute('DELETE FROM sessions WHERE id = ?', (session_id,))
201
+
202
+ conn.commit()
203
+ conn.close()
204
+
205
+ return jsonify({'success': True})
206
+ except Exception as e:
207
+ return jsonify({'error': str(e)}), 500
208
+
209
+
210
+ @dashboard_bp.route('/sessions/reduce_space/<session_id>', methods=['POST'])
211
+ @login_required
212
+ def reduce_space(session_id):
213
+ """Truncate original page images to reduce disk space."""
214
+ try:
215
+ conn = get_db_connection()
216
+
217
+ # Security Check: Ensure the session belongs to the current user
218
+ session_owner = conn.execute('SELECT user_id FROM sessions WHERE id = ?', (session_id,)).fetchone()
219
+ if not session_owner or session_owner['user_id'] != current_user.id:
220
+ current_app.logger.warning(f"User {current_user.id} attempted to reduce space for unauthorized session {session_id}.")
221
+ return jsonify({'error': 'Unauthorized access to session'}), 403
222
+
223
+ # Get all original images associated with the session
224
+ images = conn.execute("""
225
+ SELECT filename
226
+ FROM images
227
+ WHERE session_id = ? AND image_type = 'original'
228
+ """, (session_id,)).fetchall()
229
+
230
+ # Truncate original images to reduce space
231
+ truncated_count = 0
232
+ for image in images:
233
+ if image['filename']:
234
+ file_path = os.path.join(current_app.config['UPLOAD_FOLDER'], image['filename'])
235
+ if os.path.exists(file_path):
236
+ try:
237
+ # Truncate the file to 0 bytes
238
+ with open(file_path, 'w') as f:
239
+ f.truncate(0)
240
+ truncated_count += 1
241
+ except OSError as e:
242
+ current_app.logger.error(f"Error truncating file {file_path}: {str(e)}")
243
+
244
+ conn.close()
245
+
246
+ return jsonify({
247
+ 'success': True,
248
+ 'truncated_count': truncated_count,
249
+ 'message': f'Successfully reduced space by truncating {truncated_count} original page images'
250
+ })
251
+ except Exception as e:
252
+ current_app.logger.error(f"Error in reduce space: {str(e)}")
253
+ return jsonify({'error': str(e)}), 500
database.py ADDED
@@ -0,0 +1,466 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ import os
3
+ import sqlite3
4
+ from datetime import datetime, timedelta
5
+ from flask import current_app
6
+ from utils import get_db_connection
7
+
8
+ def setup_database():
9
+ """Initializes the database and creates/updates tables as needed."""
10
+ conn = get_db_connection()
11
+ cursor = conn.cursor()
12
+
13
+ cursor.execute("""
14
+ CREATE TABLE IF NOT EXISTS users (
15
+ id INTEGER PRIMARY KEY AUTOINCREMENT,
16
+ username TEXT NOT NULL UNIQUE,
17
+ email TEXT NOT NULL UNIQUE,
18
+ password_hash TEXT NOT NULL,
19
+ created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
20
+ );
21
+ """)
22
+
23
+ # Create sessions table
24
+ cursor.execute("""
25
+ CREATE TABLE IF NOT EXISTS sessions (
26
+ id TEXT PRIMARY KEY,
27
+ created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
28
+ original_filename TEXT,
29
+ persist INTEGER DEFAULT 0,
30
+ name TEXT,
31
+ user_id INTEGER,
32
+ session_type TEXT DEFAULT 'standard'
33
+ );
34
+ """)
35
+
36
+ # Create images table
37
+ cursor.execute("""
38
+ CREATE TABLE IF NOT EXISTS images (
39
+ id INTEGER PRIMARY KEY AUTOINCREMENT,
40
+ session_id TEXT NOT NULL,
41
+ image_index INTEGER NOT NULL,
42
+ filename TEXT NOT NULL,
43
+ original_name TEXT NOT NULL,
44
+ processed_filename TEXT,
45
+ image_type TEXT DEFAULT 'original',
46
+ box_id TEXT,
47
+ FOREIGN KEY (session_id) REFERENCES sessions (id)
48
+ );
49
+ """)
50
+
51
+ # Create questions table
52
+ cursor.execute("""
53
+ CREATE TABLE IF NOT EXISTS questions (
54
+ id INTEGER PRIMARY KEY AUTOINCREMENT,
55
+ session_id TEXT NOT NULL,
56
+ image_id INTEGER NOT NULL,
57
+ question_number TEXT,
58
+ subject TEXT,
59
+ status TEXT,
60
+ marked_solution TEXT,
61
+ actual_solution TEXT,
62
+ time_taken TEXT,
63
+ tags TEXT,
64
+ FOREIGN KEY (session_id) REFERENCES sessions (id),
65
+ FOREIGN KEY (image_id) REFERENCES images (id)
66
+ );
67
+ """)
68
+
69
+ # Create folders table
70
+ cursor.execute("""
71
+ CREATE TABLE IF NOT EXISTS folders (
72
+ id INTEGER PRIMARY KEY AUTOINCREMENT,
73
+ name TEXT NOT NULL,
74
+ parent_id INTEGER,
75
+ created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
76
+ FOREIGN KEY (parent_id) REFERENCES folders (id) ON DELETE CASCADE
77
+ );
78
+ """)
79
+
80
+ # Create generated_pdfs table
81
+ cursor.execute("""
82
+ CREATE TABLE IF NOT EXISTS generated_pdfs (
83
+ id INTEGER PRIMARY KEY AUTOINCREMENT,
84
+ session_id TEXT NOT NULL,
85
+ filename TEXT NOT NULL,
86
+ subject TEXT NOT NULL,
87
+ tags TEXT,
88
+ notes TEXT,
89
+ created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
90
+ source_filename TEXT,
91
+ folder_id INTEGER,
92
+ persist INTEGER DEFAULT 0,
93
+ FOREIGN KEY (session_id) REFERENCES sessions (id),
94
+ FOREIGN KEY (folder_id) REFERENCES folders (id) ON DELETE SET NULL
95
+ );
96
+ """)
97
+
98
+ # Create neetprep_questions table
99
+ cursor.execute("""
100
+ CREATE TABLE IF NOT EXISTS neetprep_questions (
101
+ id TEXT PRIMARY KEY,
102
+ question_text TEXT,
103
+ options TEXT,
104
+ correct_answer_index INTEGER,
105
+ level TEXT,
106
+ topic TEXT,
107
+ subject TEXT,
108
+ last_fetched_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
109
+ );
110
+ """)
111
+
112
+ # Create neetprep_processed_attempts table
113
+ cursor.execute("""
114
+ CREATE TABLE IF NOT EXISTS neetprep_processed_attempts (
115
+ attempt_id TEXT PRIMARY KEY,
116
+ processed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
117
+ );
118
+ """)
119
+
120
+ # Create subjective_folders table
121
+ cursor.execute("""
122
+ CREATE TABLE IF NOT EXISTS subjective_folders (
123
+ id INTEGER PRIMARY KEY AUTOINCREMENT,
124
+ name TEXT NOT NULL,
125
+ parent_id INTEGER,
126
+ user_id INTEGER,
127
+ created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
128
+ FOREIGN KEY (parent_id) REFERENCES subjective_folders (id) ON DELETE CASCADE,
129
+ FOREIGN KEY (user_id) REFERENCES users (id)
130
+ );
131
+ """)
132
+
133
+ # Create subjective_questions table
134
+ cursor.execute("""
135
+ CREATE TABLE IF NOT EXISTS subjective_questions (
136
+ id INTEGER PRIMARY KEY AUTOINCREMENT,
137
+ user_id INTEGER NOT NULL,
138
+ question_topic TEXT NOT NULL,
139
+ question_html TEXT NOT NULL,
140
+ question_number_within_topic TEXT,
141
+ folder_id INTEGER,
142
+ topic_order INTEGER DEFAULT 0,
143
+ created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
144
+ FOREIGN KEY (user_id) REFERENCES users (id),
145
+ FOREIGN KEY (folder_id) REFERENCES subjective_folders (id) ON DELETE SET NULL
146
+ );
147
+ """)
148
+
149
+ # Create drive_sources table
150
+ cursor.execute("""
151
+ CREATE TABLE IF NOT EXISTS drive_sources (
152
+ id INTEGER PRIMARY KEY AUTOINCREMENT,
153
+ name TEXT NOT NULL,
154
+ url TEXT NOT NULL,
155
+ local_path TEXT,
156
+ last_synced TIMESTAMP,
157
+ user_id INTEGER,
158
+ created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
159
+ FOREIGN KEY (user_id) REFERENCES users (id)
160
+ );
161
+ """)
162
+
163
+ # Create pdf_access_history table
164
+ cursor.execute("""
165
+ CREATE TABLE IF NOT EXISTS pdf_access_history (
166
+ id INTEGER PRIMARY KEY AUTOINCREMENT,
167
+ user_id INTEGER NOT NULL,
168
+ file_id TEXT NOT NULL,
169
+ filename TEXT NOT NULL,
170
+ source_type TEXT DEFAULT 'drive_api',
171
+ opened_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
172
+ FOREIGN KEY (user_id) REFERENCES users (id)
173
+ );
174
+ """)
175
+
176
+ # Create qtab_folders table for question table organization
177
+ cursor.execute("""
178
+ CREATE TABLE IF NOT EXISTS qtab_folders (
179
+ id INTEGER PRIMARY KEY AUTOINCREMENT,
180
+ name TEXT NOT NULL,
181
+ parent_id INTEGER,
182
+ user_id INTEGER,
183
+ created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
184
+ FOREIGN KEY (parent_id) REFERENCES qtab_folders (id) ON DELETE CASCADE,
185
+ FOREIGN KEY (user_id) REFERENCES users (id)
186
+ );
187
+ """)
188
+
189
+ # Create qtab_images table for question-answer extraction
190
+ cursor.execute("""
191
+ CREATE TABLE IF NOT EXISTS qtab_images (
192
+ id INTEGER PRIMARY KEY AUTOINCREMENT,
193
+ user_id INTEGER NOT NULL,
194
+ folder_id INTEGER,
195
+ filename TEXT NOT NULL,
196
+ original_name TEXT NOT NULL,
197
+ result_json TEXT,
198
+ status TEXT DEFAULT 'pending',
199
+ created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
200
+ FOREIGN KEY (user_id) REFERENCES users (id),
201
+ FOREIGN KEY (folder_id) REFERENCES qtab_folders (id) ON DELETE SET NULL
202
+ );
203
+ """)
204
+
205
+ # --- Migrations ---
206
+ try:
207
+ cursor.execute("SELECT topic_order FROM subjective_questions LIMIT 1")
208
+ except sqlite3.OperationalError:
209
+ cursor.execute("ALTER TABLE subjective_questions ADD COLUMN topic_order INTEGER DEFAULT 0")
210
+
211
+ try:
212
+ cursor.execute("SELECT folder_id FROM subjective_questions LIMIT 1")
213
+ except sqlite3.OperationalError:
214
+ cursor.execute("ALTER TABLE subjective_questions ADD COLUMN folder_id INTEGER REFERENCES subjective_folders(id) ON DELETE SET NULL")
215
+
216
+ try:
217
+ cursor.execute("SELECT tags FROM questions LIMIT 1")
218
+ except sqlite3.OperationalError:
219
+ cursor.execute("ALTER TABLE questions ADD COLUMN tags TEXT")
220
+
221
+ try:
222
+ cursor.execute("SELECT tags FROM questions LIMIT 1")
223
+ except sqlite3.OperationalError:
224
+ cursor.execute("ALTER TABLE questions ADD COLUMN tags TEXT")
225
+
226
+ try:
227
+ cursor.execute("SELECT image_type FROM images LIMIT 1")
228
+ except sqlite3.OperationalError:
229
+ cursor.execute("ALTER TABLE images ADD COLUMN image_type TEXT DEFAULT 'original'")
230
+
231
+ try:
232
+ cursor.execute("SELECT original_filename FROM sessions LIMIT 1")
233
+ except sqlite3.OperationalError:
234
+ cursor.execute("ALTER TABLE sessions ADD COLUMN original_filename TEXT")
235
+
236
+ try:
237
+ cursor.execute("SELECT persist FROM sessions LIMIT 1")
238
+ except sqlite3.OperationalError:
239
+ cursor.execute("ALTER TABLE sessions ADD COLUMN persist INTEGER DEFAULT 0")
240
+
241
+ try:
242
+ cursor.execute("SELECT name FROM sessions LIMIT 1")
243
+ except sqlite3.OperationalError:
244
+ cursor.execute("ALTER TABLE sessions ADD COLUMN name TEXT")
245
+
246
+ try:
247
+ cursor.execute("SELECT persist FROM generated_pdfs LIMIT 1")
248
+ except sqlite3.OperationalError:
249
+ cursor.execute("ALTER TABLE generated_pdfs ADD COLUMN persist INTEGER DEFAULT 0")
250
+
251
+ try:
252
+ cursor.execute("SELECT folder_id FROM generated_pdfs LIMIT 1")
253
+ except sqlite3.OperationalError:
254
+ cursor.execute("ALTER TABLE generated_pdfs ADD COLUMN folder_id INTEGER REFERENCES folders(id) ON DELETE SET NULL")
255
+
256
+ try:
257
+ cursor.execute("SELECT question_text FROM questions LIMIT 1")
258
+ except sqlite3.OperationalError:
259
+ cursor.execute("ALTER TABLE questions ADD COLUMN question_text TEXT")
260
+
261
+ try:
262
+ cursor.execute("SELECT chapter FROM questions LIMIT 1")
263
+ except sqlite3.OperationalError:
264
+ cursor.execute("ALTER TABLE questions ADD COLUMN chapter TEXT")
265
+
266
+ # --- Multi-user Migrations ---
267
+ try:
268
+ cursor.execute("SELECT user_id FROM sessions LIMIT 1")
269
+ except sqlite3.OperationalError:
270
+ cursor.execute("ALTER TABLE sessions ADD COLUMN user_id INTEGER REFERENCES users(id)")
271
+
272
+ try:
273
+ cursor.execute("SELECT user_id FROM generated_pdfs LIMIT 1")
274
+ except sqlite3.OperationalError:
275
+ cursor.execute("ALTER TABLE generated_pdfs ADD COLUMN user_id INTEGER REFERENCES users(id)")
276
+
277
+ try:
278
+ cursor.execute("SELECT user_id FROM folders LIMIT 1")
279
+ except sqlite3.OperationalError:
280
+ cursor.execute("ALTER TABLE folders ADD COLUMN user_id INTEGER REFERENCES users(id)")
281
+
282
+ try:
283
+ cursor.execute("SELECT neetprep_enabled FROM users LIMIT 1")
284
+ except sqlite3.OperationalError:
285
+ cursor.execute("ALTER TABLE users ADD COLUMN neetprep_enabled INTEGER DEFAULT 1")
286
+
287
+ try:
288
+ cursor.execute("SELECT dpi FROM users LIMIT 1")
289
+ except sqlite3.OperationalError:
290
+ cursor.execute("ALTER TABLE users ADD COLUMN dpi INTEGER DEFAULT 100")
291
+
292
+ try:
293
+ cursor.execute("SELECT color_rm_dpi FROM users LIMIT 1")
294
+ except sqlite3.OperationalError:
295
+ cursor.execute("ALTER TABLE users ADD COLUMN color_rm_dpi INTEGER DEFAULT 200")
296
+
297
+ try:
298
+ cursor.execute("SELECT box_id FROM images LIMIT 1")
299
+ except sqlite3.OperationalError:
300
+ cursor.execute("ALTER TABLE images ADD COLUMN box_id TEXT")
301
+
302
+ try:
303
+ cursor.execute("SELECT session_type FROM sessions LIMIT 1")
304
+ except sqlite3.OperationalError:
305
+ cursor.execute("ALTER TABLE sessions ADD COLUMN session_type TEXT DEFAULT 'standard'")
306
+
307
+ try:
308
+ cursor.execute("SELECT v2_default FROM users LIMIT 1")
309
+ except sqlite3.OperationalError:
310
+ cursor.execute("ALTER TABLE users ADD COLUMN v2_default INTEGER DEFAULT 0")
311
+
312
+ try:
313
+ cursor.execute("SELECT magnifier_enabled FROM users LIMIT 1")
314
+ except sqlite3.OperationalError:
315
+ cursor.execute("ALTER TABLE users ADD COLUMN magnifier_enabled INTEGER DEFAULT 1")
316
+
317
+ try:
318
+ cursor.execute("SELECT source_type FROM drive_sources LIMIT 1")
319
+ except sqlite3.OperationalError:
320
+ cursor.execute("ALTER TABLE drive_sources ADD COLUMN source_type TEXT DEFAULT 'folder'")
321
+
322
+ try:
323
+ cursor.execute("SELECT google_token FROM users LIMIT 1")
324
+ except sqlite3.OperationalError:
325
+ cursor.execute("ALTER TABLE users ADD COLUMN google_token TEXT")
326
+
327
+ try:
328
+ cursor.execute("SELECT question_json FROM subjective_questions LIMIT 1")
329
+ except sqlite3.OperationalError:
330
+ cursor.execute("ALTER TABLE subjective_questions ADD COLUMN question_json TEXT")
331
+
332
+ try:
333
+ cursor.execute("SELECT classifier_model FROM users LIMIT 1")
334
+ except sqlite3.OperationalError:
335
+ cursor.execute("ALTER TABLE users ADD COLUMN classifier_model TEXT DEFAULT 'gemini'")
336
+
337
+ conn.commit()
338
+ conn.close()
339
+
340
+
341
+ def cleanup_old_data():
342
+ """Removes sessions, files, and PDFs older than 1 day, unless persisted."""
343
+ print("Running cleanup of old data...")
344
+ conn = get_db_connection()
345
+ cutoff = datetime.now() - timedelta(days=1)
346
+
347
+ old_sessions = conn.execute('SELECT id FROM sessions WHERE created_at < ? AND persist = 0', (cutoff,)).fetchall()
348
+
349
+ for session in old_sessions:
350
+ session_id = session['id']
351
+ print(f"Deleting old session: {session_id}")
352
+
353
+ images_to_delete = conn.execute('SELECT filename, processed_filename FROM images WHERE session_id = ?', (session_id,)).fetchall()
354
+ for img in images_to_delete:
355
+ if img['filename']:
356
+ try: os.remove(os.path.join(current_app.config['UPLOAD_FOLDER'], img['filename']))
357
+ except OSError: pass
358
+ if img['processed_filename']:
359
+ try: os.remove(os.path.join(current_app.config['PROCESSED_FOLDER'], img['processed_filename']))
360
+ except OSError: pass
361
+
362
+ conn.execute('DELETE FROM questions WHERE session_id = ?', (session_id,))
363
+ conn.execute('DELETE FROM images WHERE session_id = ?', (session_id,))
364
+ conn.execute('DELETE FROM sessions WHERE id = ?', (session_id,))
365
+
366
+ old_pdfs = conn.execute('SELECT id, filename FROM generated_pdfs WHERE created_at < ? AND persist = 0', (cutoff,)).fetchall()
367
+ for pdf in old_pdfs:
368
+ pdf_id, pdf_filename = pdf['id'], pdf['filename']
369
+ print(f"Deleting old generated PDF: {pdf_filename}")
370
+ try:
371
+ os.remove(os.path.join(current_app.config['OUTPUT_FOLDER'], pdf_filename))
372
+ except OSError:
373
+ pass
374
+ conn.execute('DELETE FROM generated_pdfs WHERE id = ?', (pdf_id,))
375
+
376
+ db_filenames = {row['filename'] for row in conn.execute('SELECT filename FROM generated_pdfs').fetchall()}
377
+ for filename in os.listdir(current_app.config['OUTPUT_FOLDER']):
378
+ if filename not in db_filenames:
379
+ file_path = os.path.join(current_app.config['OUTPUT_FOLDER'], filename)
380
+ file_mtime = datetime.fromtimestamp(os.path.getmtime(file_path))
381
+ if file_mtime < cutoff:
382
+ print(f"Deleting old, orphaned PDF: {filename}")
383
+ try:
384
+ os.remove(file_path)
385
+ except OSError:
386
+ pass
387
+
388
+ conn.commit()
389
+ conn.close()
390
+ print("Cleanup finished.")
391
+
392
+ def get_folder_tree(user_id=None):
393
+ conn = get_db_connection()
394
+ if user_id:
395
+ folders = conn.execute('SELECT id, name, parent_id FROM folders WHERE user_id = ? ORDER BY name', (user_id,)).fetchall()
396
+ else:
397
+ # Fallback for old behavior or admin views
398
+ folders = conn.execute('SELECT id, name, parent_id FROM folders ORDER BY name').fetchall()
399
+ conn.close()
400
+
401
+ folder_map = {f['id']: dict(f) for f in folders}
402
+ tree = []
403
+
404
+ for folder_id, folder in folder_map.items():
405
+ if folder['parent_id']:
406
+ parent = folder_map.get(folder['parent_id'])
407
+ if parent:
408
+ if 'children' not in parent:
409
+ parent['children'] = []
410
+ parent['children'].append(folder)
411
+ else:
412
+ tree.append(folder)
413
+
414
+ return tree
415
+
416
+ def get_subjective_folder_tree(user_id):
417
+ conn = get_db_connection()
418
+ folders = conn.execute('SELECT id, name, parent_id FROM subjective_folders WHERE user_id = ? ORDER BY name', (user_id,)).fetchall()
419
+ conn.close()
420
+
421
+ folder_map = {f['id']: dict(f) for f in folders}
422
+ tree = []
423
+
424
+ for folder_id, folder in folder_map.items():
425
+ if folder['parent_id']:
426
+ parent = folder_map.get(folder['parent_id'])
427
+ if parent:
428
+ if 'children' not in parent:
429
+ parent['children'] = []
430
+ parent['children'].append(folder)
431
+ else:
432
+ tree.append(folder)
433
+
434
+ return tree
435
+
436
+ def get_qtab_folder_tree(user_id):
437
+ conn = get_db_connection()
438
+ folders = conn.execute('SELECT id, name, parent_id FROM qtab_folders WHERE user_id = ? ORDER BY name', (user_id,)).fetchall()
439
+ conn.close()
440
+
441
+ folder_map = {f['id']: dict(f) for f in folders}
442
+ tree = []
443
+
444
+ for folder_id, folder in folder_map.items():
445
+ if folder['parent_id']:
446
+ parent = folder_map.get(folder['parent_id'])
447
+ if parent:
448
+ if 'children' not in parent:
449
+ parent['children'] = []
450
+ parent['children'].append(folder)
451
+ else:
452
+ tree.append(folder)
453
+
454
+ return tree
455
+
456
+ def get_all_descendant_folder_ids(conn, folder_id, user_id=None):
457
+ """Recursively gets all descendant folder IDs for a given folder, scoped to a user."""
458
+ if user_id:
459
+ children = conn.execute('SELECT id FROM folders WHERE parent_id = ? AND user_id = ?', (folder_id, user_id)).fetchall()
460
+ else:
461
+ children = conn.execute('SELECT id FROM folders WHERE parent_id = ?', (folder_id,)).fetchall()
462
+
463
+ folder_ids = [f['id'] for f in children]
464
+ for child_id in folder_ids:
465
+ folder_ids.extend(get_all_descendant_folder_ids(conn, child_id, user_id))
466
+ return folder_ids
docs/API_KEY_MANAGER_GUIDE.md ADDED
@@ -0,0 +1,303 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # API Key Manager Guide
2
+
3
+ ## Overview
4
+
5
+ The API Key Manager provides automatic failover and rotation across multiple API keys for the same service. If one API key fails due to rate limiting or errors, the system automatically switches to a backup key.
6
+
7
+ ## Features
8
+
9
+ ✅ **Automatic Failover** - Switches to backup keys when one fails
10
+ ✅ **Load Balancing** - Rotates through keys using round-robin
11
+ ✅ **Failure Tracking** - Blocks keys after consecutive failures
12
+ ✅ **Auto-Recovery** - Unblocks keys after cooldown period
13
+ ✅ **Success Rate Monitoring** - Tracks performance of each key
14
+ ✅ **Thread-Safe** - Can be used in multi-threaded environments
15
+
16
+ ## Configuration
17
+
18
+ ### Setting Up Multiple API Keys
19
+
20
+ You can configure multiple API keys for each service using environment variables:
21
+
22
+ #### Method 1: Numbered Keys (Recommended)
23
+ ```bash
24
+ # Gemini API Keys
25
+ export GEMINI_API_KEY_1="your-first-gemini-key"
26
+ export GEMINI_API_KEY_2="your-second-gemini-key"
27
+ export GEMINI_API_KEY_3="your-third-gemini-key"
28
+
29
+ # NVIDIA API Keys
30
+ export NVIDIA_API_KEY_1="your-first-nvidia-key"
31
+ export NVIDIA_API_KEY_2="your-second-nvidia-key"
32
+
33
+ # OpenRouter API Keys (for Nova)
34
+ export OPENROUTER_API_KEY_1="your-first-openrouter-key"
35
+ export OPENROUTER_API_KEY_2="your-second-openrouter-key"
36
+ ```
37
+
38
+ #### Method 2: Single Key (Backward Compatible)
39
+ ```bash
40
+ export GEMINI_API_KEY="your-gemini-key"
41
+ export NVIDIA_API_KEY="your-nvidia-key"
42
+ export OPENROUTER_API_KEY="your-openrouter-key"
43
+ ```
44
+
45
+ #### Method 3: Mixed (Both Work Together)
46
+ ```bash
47
+ # These will all be combined into the pool
48
+ export GEMINI_API_KEY="key-1"
49
+ export GEMINI_API_KEY_1="key-2"
50
+ export GEMINI_API_KEY_2="key-3"
51
+ # Result: 3 keys total (duplicates are automatically removed)
52
+ ```
53
+
54
+ ### Supported Services
55
+
56
+ | Service | Environment Variable Pattern | Used For |
57
+ |---------|----------------------------|----------|
58
+ | `nvidia` | `NVIDIA_API_KEY` or `NVIDIA_API_KEY_1`, `NVIDIA_API_KEY_2`, etc. | OCR processing |
59
+ | `gemini` | `GEMINI_API_KEY`, `GOOGLE_API_KEY`, or numbered variants | Question classification, Q&A extraction |
60
+ | `openrouter` | `OPENROUTER_API_KEY` or `OPENROUTER_API_KEY_1`, etc. | Amazon Nova classification |
61
+
62
+ ## How It Works
63
+
64
+ ### 1. Key Rotation
65
+ Keys are automatically rotated using round-robin:
66
+ ```
67
+ Request 1 → Key 1
68
+ Request 2 → Key 2
69
+ Request 3 → Key 3
70
+ Request 4 → Key 1 (back to start)
71
+ ```
72
+
73
+ ### 2. Failure Handling
74
+ When a key fails:
75
+ - Failure count is incremented
76
+ - After **3 consecutive failures**, the key is **blocked for 5 minutes**
77
+ - System automatically switches to next available key
78
+ - After cooldown period, key is automatically unblocked
79
+
80
+ ### 3. Success Tracking
81
+ When a key succeeds:
82
+ - Success count is incremented
83
+ - Failure count is reset to 0
84
+ - Key is marked as available
85
+ - System rotates to next key for load balancing
86
+
87
+ ## Usage in Code
88
+
89
+ ### Automatic (Already Integrated)
90
+
91
+ The API Key Manager is already integrated into:
92
+ - ✅ `gemini_classifier.py` - Gemini question classification
93
+ - ✅ `nova_classifier.py` - Nova question classification
94
+ - ✅ `processing.py` - NVIDIA OCR API
95
+
96
+ **No code changes needed!** Just set up multiple API keys and the system handles the rest.
97
+
98
+ ### Manual Usage (Advanced)
99
+
100
+ If you need to add API key management to other modules:
101
+
102
+ ```python
103
+ from api_key_manager import get_api_key_manager
104
+
105
+ # Get the manager instance
106
+ manager = get_api_key_manager()
107
+
108
+ # Get an API key
109
+ api_key, key_index = manager.get_key('gemini')
110
+
111
+ if api_key:
112
+ try:
113
+ # Make your API call
114
+ response = make_api_call(api_key)
115
+
116
+ # Mark as successful
117
+ manager.mark_success('gemini', key_index)
118
+
119
+ except Exception as e:
120
+ # Mark as failed (will block after 3 failures)
121
+ manager.mark_failure('gemini', key_index)
122
+ else:
123
+ print("No API keys available!")
124
+ ```
125
+
126
+ ## Monitoring
127
+
128
+ ### Get Service Status
129
+
130
+ ```python
131
+ from api_key_manager import get_api_key_manager
132
+
133
+ manager = get_api_key_manager()
134
+
135
+ # Get status for one service
136
+ status = manager.get_service_status('gemini')
137
+ print(f"Available keys: {status['available_keys']}/{status['total_keys']}")
138
+ print(f"Blocked keys: {status['blocked_keys']}")
139
+
140
+ # Get status for all services
141
+ all_status = manager.get_all_services_status()
142
+ for service, info in all_status.items():
143
+ print(f"{service}: {info['available_keys']}/{info['total_keys']} keys available")
144
+ ```
145
+
146
+ ### Example Output
147
+ ```json
148
+ {
149
+ "service": "gemini",
150
+ "available": true,
151
+ "total_keys": 3,
152
+ "available_keys": 2,
153
+ "blocked_keys": 1,
154
+ "keys": [
155
+ {
156
+ "index": 0,
157
+ "is_available": true,
158
+ "is_blocked": false,
159
+ "failure_count": 0,
160
+ "total_requests": 15,
161
+ "success_rate": 100.0,
162
+ "blocked_until": null
163
+ },
164
+ {
165
+ "index": 1,
166
+ "is_available": true,
167
+ "is_blocked": false,
168
+ "failure_count": 0,
169
+ "total_requests": 12,
170
+ "success_rate": 100.0,
171
+ "blocked_until": null
172
+ },
173
+ {
174
+ "index": 2,
175
+ "is_available": false,
176
+ "is_blocked": true,
177
+ "failure_count": 3,
178
+ "total_requests": 8,
179
+ "success_rate": 62.5,
180
+ "blocked_until": "2025-12-08T04:30:00.000000"
181
+ }
182
+ ]
183
+ }
184
+ ```
185
+
186
+ ## Configuration Options
187
+
188
+ ### Block Duration
189
+
190
+ By default, keys are blocked for **5 minutes** after 3 failures. You can customize this:
191
+
192
+ ```python
193
+ # Block for 10 minutes instead
194
+ manager.mark_failure('gemini', key_index, block_duration_minutes=10)
195
+ ```
196
+
197
+ ### Failure Threshold
198
+
199
+ The failure threshold is currently hardcoded to **3 consecutive failures**. This is defined in `api_key_manager.py` in the `mark_failure()` method:
200
+
201
+ ```python
202
+ if self.failure_count >= 3:
203
+ self.is_blocked = True
204
+ ```
205
+
206
+ ## Troubleshooting
207
+
208
+ ### Problem: "No API keys available"
209
+
210
+ **Cause:** All keys are blocked or no keys are configured.
211
+
212
+ **Solution:**
213
+ 1. Check environment variables are set correctly
214
+ 2. Wait for cooldown period (5 minutes)
215
+ 3. Manually reset the service:
216
+ ```python
217
+ manager.reset_service('gemini')
218
+ ```
219
+
220
+ ### Problem: Keys getting blocked frequently
221
+
222
+ **Cause:** Rate limiting or invalid API keys.
223
+
224
+ **Solution:**
225
+ 1. Check API key validity
226
+ 2. Verify rate limits with your API provider
227
+ 3. Add more API keys to distribute load
228
+ 4. Increase block duration to avoid rapid retries
229
+
230
+ ### Problem: Not using multiple keys even though they're configured
231
+
232
+ **Cause:** Check if keys are being loaded correctly.
233
+
234
+ **Solution:**
235
+ ```python
236
+ manager = get_api_key_manager()
237
+ status = manager.get_service_status('gemini')
238
+ print(f"Total keys loaded: {status['total_keys']}")
239
+ ```
240
+
241
+ ## Best Practices
242
+
243
+ 1. **Use at least 2-3 keys per service** for better reliability
244
+ 2. **Monitor success rates** to identify problematic keys
245
+ 3. **Stagger API requests** to avoid hitting rate limits
246
+ 4. **Keep backup keys from different accounts** if possible
247
+ 5. **Test keys periodically** to ensure they're still valid
248
+
249
+ ## Logging
250
+
251
+ The API Key Manager logs important events:
252
+
253
+ ```
254
+ INFO: Loaded API keys: NVIDIA=2, Gemini=3, OpenRouter=2
255
+ INFO: Registered 3 API key(s) for service: gemini
256
+ DEBUG: Using API key 1/3 for gemini
257
+ DEBUG: API key 1 for gemini marked as successful
258
+ WARNING: API key 2 for gemini marked as failed
259
+ WARNING: API key for gemini blocked until 2025-12-08 04:30:00 after 3 failures
260
+ INFO: API key for gemini unblocked after cooldown period
261
+ ```
262
+
263
+ ## Architecture
264
+
265
+ ```
266
+ ┌─────────────────────────────────────────────────────────────┐
267
+ │ API Key Manager │
268
+ ├─────────────────────────────────────────────────────────────┤
269
+ │ │
270
+ │ Service: nvidia Service: gemini Service: openrouter│
271
+ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
272
+ │ │ Key 1 ✓ │ │ Key 1 ✓ │ │ Key 1 ✓ │ │
273
+ │ │ Key 2 ✓ │ │ Key 2 ✓ │ │ Key 2 ✓ │ │
274
+ │ └──────────────┘ │ Key 3 ✗ │ └──────────────┘ │
275
+ │ │ (blocked) │ │
276
+ │ └──────────────┘ │
277
+ │ │
278
+ │ Features: │
279
+ │ • Round-robin rotation │
280
+ │ • Automatic failover │
281
+ │ • Failure tracking │
282
+ │ • Auto-recovery after cooldown │
283
+ │ │
284
+ └─────────────────────────────────────────────────────────────┘
285
+ │ │ │
286
+ ▼ ▼ ▼
287
+ processing.py gemini_classifier.py nova_classifier.py
288
+ (NVIDIA OCR) (Gemini AI) (Amazon Nova)
289
+ ```
290
+
291
+ ## Future Enhancements
292
+
293
+ Potential improvements for the API Key Manager:
294
+
295
+ - [ ] Web dashboard for monitoring key status
296
+ - [ ] Configurable failure threshold per service
297
+ - [ ] Exponential backoff for blocked keys
298
+ - [ ] API key health checks
299
+ - [ ] Cost tracking per key
300
+ - [ ] Rate limit detection and adaptive throttling
301
+ - [ ] Database persistence for key statistics
302
+ - [ ] Email alerts when all keys are blocked
303
+ - [ ] Integration with settings page for user-visible status
docs/API_USAGE_LIST.md ADDED
@@ -0,0 +1,167 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # API Usage Inventory
2
+
3
+ This document lists all external API integrations in the application.
4
+
5
+ ## 1. NVIDIA NIM API
6
+ **Purpose:** OCR (Optical Character Recognition) for extracting text from images
7
+
8
+ **API Key Required:** `NVIDIA_API_KEY`
9
+
10
+ **Endpoints:**
11
+ - OCR: `https://ai.api.nvidia.com/v1/cv/nvidia/nemoretriever-ocr-v1`
12
+ - Chat/Parser: `https://integrate.api.nvidia.com/v1/chat/completions`
13
+
14
+ **Used In:**
15
+ - `config.py` - Configuration setup
16
+ - `processing.py` - `call_nim_ocr_api()` function for OCR
17
+ - `routes.py` - Question number extraction and redaction features
18
+ - `redact.py` - Picture redaction in images
19
+ - `test.sh` - Testing script
20
+ - `templates/question_entry_v2.html` - Frontend OCR feature
21
+
22
+ **Features:**
23
+ - Automatic question number extraction from cropped images
24
+ - Text detection and OCR processing
25
+ - Image redaction for removing pictures from scanned documents
26
+
27
+ ---
28
+
29
+ ## 2. Google Gemini API
30
+ **Purpose:** AI-powered question classification and question-answer extraction
31
+
32
+ **API Key Required:** `GEMINI_API_KEY` or `GOOGLE_API_KEY`
33
+
34
+ **Endpoints:**
35
+ - `https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash-lite:generateContent`
36
+
37
+ **Used In:**
38
+ - `gemini_classifier.py` - `classify_questions_with_gemini()` - Classifies questions into subjects and chapters
39
+ - `gemini_subjective.py` - Subjective question generation
40
+ - `qtab_routes.py` - `process_image_for_questions()` - Extracts question-answer pairs from images
41
+ - `classifier_routes.py` - Auto-classification of cropped questions (via user setting)
42
+ - `neetprep.py` - NeetPrep question classification (via user setting)
43
+
44
+ **Features:**
45
+ - Automatic question classification by subject and NCERT chapter
46
+ - Question-answer pair extraction from answer key images
47
+ - Subjective question generation
48
+ - Model: `gemini-2.0-flash-lite`
49
+
50
+ ---
51
+
52
+ ## 3. OpenRouter API (Amazon Nova)
53
+ **Purpose:** Alternative AI model for question classification
54
+
55
+ **API Key Required:** `OPENROUTER_API_KEY`
56
+
57
+ **Endpoints:**
58
+ - `https://openrouter.ai/api/v1/chat/completions`
59
+
60
+ **Used In:**
61
+ - `nova_classifier.py` - `classify_questions_with_nova()` - Classifies questions using Amazon Nova model
62
+ - `classifier_routes.py` - Auto-classification (when user selects Nova model)
63
+ - `neetprep.py` - NeetPrep question classification (when user selects Nova model)
64
+ - `test.py` - Testing script for Nova API
65
+
66
+ **Features:**
67
+ - Question classification by subject and NCERT chapter
68
+ - Alternative to Gemini classifier
69
+ - Model: `amazon/nova-2-lite-v1:free`
70
+ - User-selectable via Settings page
71
+
72
+ ---
73
+
74
+ ## 4. NeetPrep GraphQL API
75
+ **Purpose:** Fetch questions and test attempts from NeetPrep platform
76
+
77
+ **API Key Required:** None (uses session/headers)
78
+
79
+ **Endpoints:**
80
+ - `https://www.neetprep.com/graphql`
81
+
82
+ **Used In:**
83
+ - `neetprep.py` - `run_hardcoded_query()` function
84
+
85
+ **Features:**
86
+ - Fetch user test attempts
87
+ - Get incorrect question IDs
88
+ - Retrieve question details (text, options, correct answer, level, topics)
89
+ - Batch processing of questions
90
+
91
+ **GraphQL Queries:**
92
+ - `GetAttempts` - Fetch test attempts
93
+ - `GetIncorrectIds` - Get incorrect question IDs from attempts
94
+ - `GetQuestionDetails` - Retrieve full question data
95
+
96
+ ---
97
+
98
+ ## 5. External Resource Downloads
99
+ **Purpose:** Download files from external sources
100
+
101
+ **Used In:**
102
+ - `routes.py` - Download PDFs and images from URLs
103
+ - `cli.py` - CLI download functionality
104
+ - `utils.py` - Download Arial font from GitHub
105
+ - `json_processor.py` / `json_processor_v3.py` - Download images from URLs for PDF generation
106
+
107
+ **Endpoints (Examples):**
108
+ - GitHub: `https://github.com/kavin808/arial.ttf/raw/refs/heads/master/arial.ttf`
109
+ - User-provided PDF/image URLs
110
+
111
+ ---
112
+
113
+ ## API Key Summary
114
+
115
+ | Environment Variable | Required For | Used By |
116
+ |---------------------|--------------|---------|
117
+ | `NVIDIA_API_KEY` | NVIDIA NIM OCR | processing.py, routes.py, redact.py, config.py |
118
+ | `GEMINI_API_KEY` or `GOOGLE_API_KEY` | Google Gemini AI | gemini_classifier.py, gemini_subjective.py, qtab_routes.py |
119
+ | `OPENROUTER_API_KEY` | Amazon Nova via OpenRouter | nova_classifier.py, test.py |
120
+
121
+ ---
122
+
123
+ ## User-Configurable API Settings
124
+
125
+ ### Classifier Model Selection
126
+ **Location:** Settings page (`templates/settings.html`)
127
+
128
+ **Database Field:** `users.classifier_model`
129
+
130
+ **Options:**
131
+ 1. **Gemini Classifier (Default)** - Uses Google Gemini API
132
+ 2. **Amazon Nova Lite** - Uses OpenRouter API
133
+
134
+ **Affects:**
135
+ - `classifier_routes.py` - Auto-classification of cropped questions
136
+ - `neetprep.py` - NeetPrep question classification
137
+
138
+ Users can choose their preferred AI model for question classification based on:
139
+ - API key availability
140
+ - Model performance preferences
141
+ - Cost considerations
142
+
143
+ ---
144
+
145
+ ## Rate Limiting & Timeouts
146
+
147
+ ### Configured Timeouts:
148
+ - NVIDIA NIM OCR: 300 seconds (5 minutes)
149
+ - Gemini API: 300 seconds (5 minutes)
150
+ - Nova API: 300 seconds (5 minutes)
151
+ - NeetPrep GraphQL: 30 seconds
152
+ - Font download: 30 seconds
153
+
154
+ ### Batch Processing:
155
+ - Classifier batch size: 7 questions per batch
156
+ - Wait time between batches:
157
+ - Classifier routes: 5 seconds
158
+ - NeetPrep: 6 seconds
159
+
160
+ ---
161
+
162
+ ## Notes
163
+
164
+ 1. **API Key Storage:** All API keys are stored as environment variables, not in the database
165
+ 2. **Error Handling:** All API calls include error handling with logging
166
+ 3. **Fallback Behavior:** If APIs are unavailable, features gracefully disable with user notifications
167
+ 4. **Security:** API keys are never exposed in templates or client-side code
docs/API_v3.md ADDED
@@ -0,0 +1,197 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # JSON Upload API v3.0
2
+
3
+ ## Endpoint
4
+ `POST /json_upload_v3`
5
+
6
+ ## Description
7
+ This endpoint allows PWDLV3 (or any compatible client) to submit test data in a standardized JSON v3.0 format to the Report-Generator. The data includes test metadata, configuration for PDF generation, and detailed information about each question, including image URLs. Report-Generator will validate the schema, download images in parallel, store the data, and optionally generate a PDF or provide an edit URL.
8
+
9
+ ## Headers
10
+ * `Content-Type: application/json` (Required)
11
+ * `Authorization: Bearer <token>` (Optional, if authentication is enabled on Report-Generator)
12
+
13
+ ## Request Body
14
+ The request body must be a JSON object conforming to the following schema:
15
+
16
+ ```json
17
+ {
18
+ "type": "object",
19
+ "properties": {
20
+ "version": {
21
+ "type": "string",
22
+ "const": "3.0",
23
+ "description": "API version, must be '3.0'"
24
+ },
25
+ "source": {
26
+ "type": "string",
27
+ "description": "Source of the data, e.g., 'pwdlv3'",
28
+ "default": "manual"
29
+ },
30
+ "test_name": {
31
+ "type": "string",
32
+ "description": "Name of the test"
33
+ },
34
+ "test_id": {
35
+ "type": "string",
36
+ "description": "Unique ID of the test from the source system"
37
+ },
38
+ "test_mapping_id": {
39
+ "type": "string",
40
+ "description": "Unique ID for mapping purposes, often same as test_id or a derivative"
41
+ },
42
+ "metadata": {
43
+ "type": "object",
44
+ "patternProperties": {
45
+ ".*": { "type": "string" }
46
+ },
47
+ "description": "Arbitrary key-value metadata for the session"
48
+ },
49
+ "config": {
50
+ "type": "object",
51
+ "properties": {
52
+ "statuses_to_include": {
53
+ "type": "array",
54
+ "items": { "type": "string", "enum": ["wrong", "unattempted", "correct"] },
55
+ "description": "Question statuses to include in generated reports"
56
+ },
57
+ "layout": {
58
+ "type": "object",
59
+ "properties": {
60
+ "images_per_page": { "type": "integer", "minimum": 1 },
61
+ "orientation": { "type": "string", "enum": ["portrait", "landscape"] }
62
+ },
63
+ "required": ["images_per_page", "orientation"]
64
+ }
65
+ },
66
+ "required": ["statuses_to_include", "layout"]
67
+ },
68
+ "questions": {
69
+ "type": "array",
70
+ "items": {
71
+ "type": "object",
72
+ "properties": {
73
+ "question_number": { "type": "string", "description": "Display number for the question" },
74
+ "image_url": { "type": "string", "format": "uri", "description": "URL of the question image" },
75
+ "status": { "type": "string", "enum": ["wrong", "unattempted", "correct"], "description": "User's attempt status" },
76
+ "marked_solution": { "type": "string", "description": "User's marked option/answer" },
77
+ "correct_solution": { "type": "string", "description": "Correct option/answer" },
78
+ "subject": { "type": "string", "description": "Subject of the question" },
79
+ "chapter": { "type": "string", "description": "Chapter of the question" },
80
+ "topic": { "type": "string", "description": "Topic of the question" },
81
+ "time_taken": { "type": "integer", "minimum": 0, "description": "Time taken by user in seconds" }
82
+ },
83
+ "required": ["question_number", "image_url", "status", "marked_solution", "correct_solution", "subject", "time_taken"]
84
+ },
85
+ "minItems": 1
86
+ },
87
+ "view": {
88
+ "type": "boolean",
89
+ "description": "If true, Report-Generator will auto-generate PDF; if false, returns edit URL."
90
+ }
91
+ },
92
+ "required": ["version", "source", "test_name", "test_id", "test_mapping_id", "config", "questions", "view"]
93
+ }
94
+ ```
95
+
96
+ ## Response
97
+
98
+ ### Success Response (HTTP 200 OK)
99
+ ```json
100
+ {
101
+ "status": "success",
102
+ "message": "Data processed successfully",
103
+ "session_id": "uuid-of-new-session",
104
+ "edit_url": "/question_entry_v2/uuid-of-new-session",
105
+ "pdf_url": "/view_pdf/uuid-of-new-session.pdf" // Only if 'view' was true
106
+ }
107
+ ```
108
+
109
+ ### Error Response (HTTP 400 Bad Request / 500 Internal Server Error)
110
+ ```json
111
+ {
112
+ "status": "error",
113
+ "message": "Detailed error description, e.g., 'Schema validation failed: Missing required field test_id'",
114
+ "errors": [...] // Optional: specific validation errors
115
+ }
116
+ ```
117
+
118
+ ## Examples
119
+
120
+ ### Curl Example: Submit Test Data for Manual Review
121
+ This example sends a minimal payload for a single test, opting for manual review in Report-Generator (i.e., `view: false`).
122
+
123
+ ```bash
124
+ curl -X POST "http://localhost:5000/json_upload_v3" \
125
+ -H "Content-Type: application/json" \
126
+ -d '{
127
+ "version": "3.0",
128
+ "source": "pwdlv3",
129
+ "test_name": "Physics Midterm",
130
+ "test_id": "PHY101-MID-2024",
131
+ "test_mapping_id": "PHY101-MID-2024-STUDENT001",
132
+ "metadata": {
133
+ "student_id": "STU001",
134
+ "attempt_date": "2024-11-01"
135
+ },
136
+ "config": {
137
+ "statuses_to_include": ["wrong", "unattempted"],
138
+ "layout": { "images_per_page": 4, "orientation": "portrait" }
139
+ },
140
+ "questions": [
141
+ {
142
+ "question_number": "1",
143
+ "image_url": "https://example.com/question1.png",
144
+ "status": "wrong",
145
+ "marked_solution": "B",
146
+ "correct_solution": "C",
147
+ "subject": "Physics",
148
+ "time_taken": 90
149
+ },
150
+ {
151
+ "question_number": "2",
152
+ "image_url": "https://example.com/question2.png",
153
+ "status": "unattempted",
154
+ "marked_solution": "",
155
+ "correct_solution": "A",
156
+ "subject": "Physics",
157
+ "time_taken": 0
158
+ }
159
+ ],
160
+ "view": false
161
+ }'
162
+ ```
163
+
164
+ ### Curl Example: Submit Test Data for Auto-PDF Generation
165
+ This example sends a similar payload but instructs Report-Generator to automatically generate and save the PDF report (`view: true`).
166
+
167
+ ```bash
168
+ curl -X POST "http://localhost:5000/json_upload_v3" \
169
+ -H "Content-Type: application/json" \
170
+ -d '{
171
+ "version": "3.0",
172
+ "source": "pwdlv3",
173
+ "test_name": "Chemistry Final",
174
+ "test_id": "CHM202-FIN-2024",
175
+ "test_mapping_id": "CHM202-FIN-2024-STUDENT002",
176
+ "metadata": {
177
+ "student_id": "STU002",
178
+ "attempt_date": "2024-12-05"
179
+ },
180
+ "config": {
181
+ "statuses_to_include": ["wrong", "unattempted"],
182
+ "layout": { "images_per_page": 6, "orientation": "landscape" }
183
+ },
184
+ "questions": [
185
+ {
186
+ "question_number": "1",
187
+ "image_url": "https://example.com/chem_q1.png",
188
+ "status": "wrong",
189
+ "marked_solution": "D",
190
+ "correct_solution": "B",
191
+ "subject": "Chemistry",
192
+ "time_taken": 110
193
+ }
194
+ ],
195
+ "view": true
196
+ }'
197
+ ```
docs/SETUP_MULTIPLE_API_KEYS.md ADDED
@@ -0,0 +1,330 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # How to Use Multiple API Keys
2
+
3
+ ## Quick Setup Guide
4
+
5
+ ### Step 1: Set Environment Variables
6
+
7
+ You can set multiple API keys. The system loads them in order:
8
+ - `GEMINI_API_KEY` → **Index 0** (base key)
9
+ - `GEMINI_API_KEY_1` → **Index 1**
10
+ - `GEMINI_API_KEY_2` → **Index 2**
11
+ - And so on...
12
+
13
+ #### For Gemini API (Google AI)
14
+
15
+ ```bash
16
+ # Linux/Mac
17
+ export GEMINI_API_KEY="AIzaSyAbc123..." # Index 0 (base)
18
+ export GEMINI_API_KEY_1="AIzaSyDef456..." # Index 1
19
+ export GEMINI_API_KEY_2="AIzaSyGhi789..." # Index 2
20
+ export GEMINI_API_KEY_3="AIzaSyJkl012..." # Index 3
21
+
22
+ # Windows (Command Prompt)
23
+ set GEMINI_API_KEY_1=AIzaSyAbc123...
24
+ set GEMINI_API_KEY_2=AIzaSyDef456...
25
+ set GEMINI_API_KEY_3=AIzaSyGhi789...
26
+
27
+ # Windows (PowerShell)
28
+ $env:GEMINI_API_KEY_1="AIzaSyAbc123..."
29
+ $env:GEMINI_API_KEY_2="AIzaSyDef456..."
30
+ $env:GEMINI_API_KEY_3="AIzaSyGhi789..."
31
+ ```
32
+
33
+ #### For NVIDIA API
34
+
35
+ ```bash
36
+ # Linux/Mac
37
+ export NVIDIA_API_KEY_1="nvapi-abc123..."
38
+ export NVIDIA_API_KEY_2="nvapi-def456..."
39
+ export NVIDIA_API_KEY_3="nvapi-ghi789..."
40
+
41
+ # Windows (Command Prompt)
42
+ set NVIDIA_API_KEY_1=nvapi-abc123...
43
+ set NVIDIA_API_KEY_2=nvapi-def456...
44
+
45
+ # Windows (PowerShell)
46
+ $env:NVIDIA_API_KEY_1="nvapi-abc123..."
47
+ $env:NVIDIA_API_KEY_2="nvapi-def456..."
48
+ ```
49
+
50
+ #### For OpenRouter API (Amazon Nova)
51
+
52
+ ```bash
53
+ # Linux/Mac
54
+ export OPENROUTER_API_KEY_1="sk-or-v1-abc123..."
55
+ export OPENROUTER_API_KEY_2="sk-or-v1-def456..."
56
+ export OPENROUTER_API_KEY_3="sk-or-v1-ghi789..."
57
+
58
+ # Windows (Command Prompt)
59
+ set OPENROUTER_API_KEY_1=sk-or-v1-abc123...
60
+ set OPENROUTER_API_KEY_2=sk-or-v1-def456...
61
+
62
+ # Windows (PowerShell)
63
+ $env:OPENROUTER_API_KEY_1="sk-or-v1-abc123..."
64
+ $env:OPENROUTER_API_KEY_2="sk-or-v1-def456..."
65
+ ```
66
+
67
+ ### Step 2: Using .env File (Recommended - Already Configured!)
68
+
69
+ ✅ **Good news:** The app already has .env support built-in!
70
+
71
+ Just create a `.env` file in your project root:
72
+
73
+ ```bash
74
+ # .env file
75
+ # Gemini API Keys (get from: https://aistudio.google.com/app/apikey)
76
+ GEMINI_API_KEY=AIzaSyAbc123... # Index 0 (base key)
77
+ GEMINI_API_KEY_1=AIzaSyDef456... # Index 1
78
+ GEMINI_API_KEY_2=AIzaSyGhi789... # Index 2
79
+
80
+ # NVIDIA API Keys (get from: https://build.nvidia.com/)
81
+ NVIDIA_API_KEY=nvapi-abc123... # Index 0 (base key)
82
+ NVIDIA_API_KEY_1=nvapi-def456... # Index 1
83
+
84
+ # OpenRouter API Keys (get from: https://openrouter.ai/keys)
85
+ OPENROUTER_API_KEY=sk-or-v1-abc123... # Index 0 (base key)
86
+ OPENROUTER_API_KEY_1=sk-or-v1-def456... # Index 1
87
+ ```
88
+
89
+ **That's it!** Just run the app normally:
90
+
91
+ ```bash
92
+ python3 run.py
93
+ ```
94
+
95
+ The .env file is automatically loaded. No extra steps needed!
96
+
97
+ **Quick Start:**
98
+ ```bash
99
+ # 1. Copy the example file
100
+ cp .env.example .env
101
+
102
+ # 2. Edit .env and add your API keys
103
+ nano .env
104
+
105
+ # 3. Run the app
106
+ python3 run.py
107
+ ```
108
+
109
+ ### Step 3: Verify Keys Are Loaded
110
+
111
+ Run this to check if your keys are loaded correctly:
112
+
113
+ ```python
114
+ python3 -c "
115
+ from api_key_manager import get_api_key_manager
116
+
117
+ manager = get_api_key_manager()
118
+ status = manager.get_all_services_status()
119
+
120
+ for service, info in status.items():
121
+ print(f'{service.upper()}: {info[\"total_keys\"]} key(s) loaded')
122
+ "
123
+ ```
124
+
125
+ Expected output:
126
+ ```
127
+ NVIDIA: 2 key(s) loaded
128
+ GEMINI: 3 key(s) loaded
129
+ OPENROUTER: 2 key(s) loaded
130
+ ```
131
+
132
+ ---
133
+
134
+ ## How It Works
135
+
136
+ ### Automatic Rotation
137
+
138
+ Once you have multiple keys configured, the system automatically:
139
+
140
+ 1. **Rotates** through them (round-robin)
141
+ 2. **Fails over** when one key fails
142
+ 3. **Blocks** keys that fail 3 times (for 5 minutes)
143
+ 4. **Unblocks** keys after cooldown
144
+ 5. **Tracks** success rates for each key
145
+
146
+ ### Example Flow
147
+
148
+ Let's say you have 3 Gemini keys configured:
149
+
150
+ ```
151
+ Request 1 → Uses Key 1 ✓
152
+ Request 2 → Uses Key 2 ✓
153
+ Request 3 → Uses Key 3 ✓
154
+ Request 4 → Uses Key 1 ✓ (rotation back to start)
155
+ Request 5 → Uses Key 2 ✗ (fails - rate limit)
156
+ Request 6 → Uses Key 3 ✓ (automatically switched)
157
+ Request 7 → Uses Key 1 ✓
158
+ Request 8 → Uses Key 2 ✗ (fails again - 2nd failure)
159
+ Request 9 → Uses Key 3 ✓ (automatically switched)
160
+ Request 10 → Uses Key 2 ✗ (fails again - 3rd failure, BLOCKED)
161
+ Request 11 → Uses Key 3 ✓ (Key 2 is skipped)
162
+ Request 12 → Uses Key 1 ✓
163
+ Request 13 → Uses Key 3 ✓ (Key 2 still blocked)
164
+ ... 5 minutes later ...
165
+ Request N → Uses Key 2 ✓ (unblocked and back in rotation)
166
+ ```
167
+
168
+ ---
169
+
170
+ ## Getting API Keys
171
+
172
+ ### Gemini API (Google AI)
173
+ 1. Go to https://aistudio.google.com/app/apikey
174
+ 2. Click "Create API Key"
175
+ 3. Copy the key (starts with `AIzaSy...`)
176
+ 4. **Tip:** Create multiple keys from different Google accounts for more quota
177
+
178
+ ### NVIDIA API
179
+ 1. Go to https://build.nvidia.com/
180
+ 2. Sign in and navigate to API Keys
181
+ 3. Generate a new API key
182
+ 4. Copy the key (starts with `nvapi-...`)
183
+
184
+ ### OpenRouter API
185
+ 1. Go to https://openrouter.ai/keys
186
+ 2. Sign up and create an API key
187
+ 3. Copy the key (starts with `sk-or-v1-...`)
188
+ 4. **Tip:** OpenRouter gives free credits for Nova model
189
+
190
+ ---
191
+
192
+ ## Common Scenarios
193
+
194
+ ### Scenario 1: Maximize Free Tier Usage
195
+
196
+ If you have multiple Google accounts, create one Gemini API key from each:
197
+
198
+ ```bash
199
+ export GEMINI_API_KEY="key-from-account-1" # Index 0
200
+ export GEMINI_API_KEY_1="key-from-account-2" # Index 1
201
+ export GEMINI_API_KEY_2="key-from-account-3" # Index 2
202
+ export GEMINI_API_KEY_3="key-from-account-4" # Index 3
203
+ ```
204
+
205
+ This gives you 4x the free tier quota!
206
+
207
+ ### Scenario 2: Paid + Free Keys
208
+
209
+ Mix paid and free keys:
210
+
211
+ ```bash
212
+ export GEMINI_API_KEY="paid-key-with-high-quota" # Index 0 - tried first
213
+ export GEMINI_API_KEY_1="free-key-1" # Index 1 - backup
214
+ export GEMINI_API_KEY_2="free-key-2" # Index 2 - backup
215
+ ```
216
+
217
+ The system will rotate through all of them, maximizing your available quota.
218
+
219
+ ### Scenario 3: Single Key (Backward Compatible)
220
+
221
+ If you only have one key, the old method still works:
222
+
223
+ ```bash
224
+ export GEMINI_API_KEY="your-single-key"
225
+ ```
226
+
227
+ The system will use this single key without rotation.
228
+
229
+ ---
230
+
231
+ ## Troubleshooting
232
+
233
+ ### Problem: Keys not being loaded
234
+
235
+ **Check:**
236
+ 1. Environment variables are set in the same terminal/session where you run the app
237
+ 2. Variable names match exactly (case-sensitive)
238
+ 3. No extra spaces in variable values
239
+
240
+ **Test:**
241
+ ```bash
242
+ # Linux/Mac
243
+ echo $GEMINI_API_KEY_1
244
+ echo $GEMINI_API_KEY_2
245
+
246
+ # Windows (Command Prompt)
247
+ echo %GEMINI_API_KEY_1%
248
+ echo %GEMINI_API_KEY_2%
249
+
250
+ # Windows (PowerShell)
251
+ echo $env:GEMINI_API_KEY_1
252
+ echo $env:GEMINI_API_KEY_2
253
+ ```
254
+
255
+ ### Problem: Only first key is being used
256
+
257
+ **Likely cause:** Other keys aren't set properly.
258
+
259
+ **Fix:** Verify all keys are loaded:
260
+ ```python
261
+ import os
262
+ print("Key 1:", os.environ.get('GEMINI_API_KEY_1'))
263
+ print("Key 2:", os.environ.get('GEMINI_API_KEY_2'))
264
+ print("Key 3:", os.environ.get('GEMINI_API_KEY_3'))
265
+ ```
266
+
267
+ ### Problem: All keys get blocked quickly
268
+
269
+ **Causes:**
270
+ - Invalid API keys
271
+ - Insufficient quota/rate limits
272
+ - API service issues
273
+
274
+ **Fix:**
275
+ 1. Verify each key works individually
276
+ 2. Check your quota limits with the API provider
277
+ 3. Add more keys to distribute the load
278
+ 4. Increase wait times between requests
279
+
280
+ ---
281
+
282
+ ## Best Practices
283
+
284
+ ✅ **Use at least 2-3 keys per service** for reliability
285
+ ✅ **Get keys from different accounts** to maximize free tier
286
+ ✅ **Keep backup keys** from different providers if possible
287
+ ✅ **Monitor key usage** to identify which ones work best
288
+ ✅ **Store keys securely** in .env file (add to .gitignore)
289
+ ✅ **Don't commit keys to git** - use environment variables
290
+ ✅ **Rotate keys periodically** for security
291
+
292
+ ---
293
+
294
+ ## Advanced: Persistent Configuration
295
+
296
+ To make environment variables persist across reboots:
297
+
298
+ ### Linux/Mac - Add to ~/.bashrc or ~/.zshrc:
299
+ ```bash
300
+ export GEMINI_API_KEY_1="..."
301
+ export GEMINI_API_KEY_2="..."
302
+ export GEMINI_API_KEY_3="..."
303
+ ```
304
+
305
+ Then reload: `source ~/.bashrc`
306
+
307
+ ### Windows - Use System Environment Variables:
308
+ 1. Search for "Environment Variables" in Start Menu
309
+ 2. Click "Edit system environment variables"
310
+ 3. Click "Environment Variables" button
311
+ 4. Add your keys under "User variables"
312
+
313
+ ---
314
+
315
+ ## Summary
316
+
317
+ **To use multiple API keys:**
318
+
319
+ 1. **Set numbered environment variables:**
320
+ - `GEMINI_API_KEY_1`, `GEMINI_API_KEY_2`, etc.
321
+ - `NVIDIA_API_KEY_1`, `NVIDIA_API_KEY_2`, etc.
322
+ - `OPENROUTER_API_KEY_1`, `OPENROUTER_API_KEY_2`, etc.
323
+
324
+ 2. **That's it!** The system automatically:
325
+ - Loads all keys
326
+ - Rotates through them
327
+ - Handles failures
328
+ - Maximizes availability
329
+
330
+ No code changes needed - just set the environment variables and the API Key Manager handles everything!
drive_routes.py ADDED
@@ -0,0 +1,359 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import shutil
3
+ import gdown
4
+ from flask import Blueprint, render_template, request, jsonify, current_app, send_from_directory, url_for, redirect, session
5
+ from flask_login import login_required, current_user
6
+ from database import get_db_connection
7
+ from datetime import datetime
8
+ import threading
9
+ import re
10
+ import json
11
+
12
+ # Allow OAuth over HTTP for local testing
13
+ os.environ['OAUTHLIB_INSECURE_TRANSPORT'] = '1'
14
+
15
+ from gdrive_service import get_drive_service, create_flow, list_drive_files, download_file_to_stream, get_file_metadata
16
+
17
+ drive_bp = Blueprint('drive', __name__)
18
+
19
+ DRIVE_SYNC_FOLDER = 'drive_sync'
20
+
21
+ def extract_drive_id(url):
22
+ # Extracts Drive ID (File or Folder) - simplified regex for ~25+ chars
23
+ match = re.search(r'[-\w]{25,}', url)
24
+ return match.group(0) if match else None
25
+
26
+ def get_sync_folder_path(source_name=None):
27
+ base = os.path.join(current_app.config['OUTPUT_FOLDER'], DRIVE_SYNC_FOLDER)
28
+ if not os.path.exists(base):
29
+ os.makedirs(base)
30
+ if source_name:
31
+ path = os.path.join(base, source_name)
32
+ if not os.path.exists(path):
33
+ os.makedirs(path)
34
+ return path
35
+ return base
36
+
37
+ @drive_bp.route('/drive_manager')
38
+ @login_required
39
+ def drive_manager():
40
+ conn = get_db_connection()
41
+ sources = conn.execute('SELECT * FROM drive_sources WHERE user_id = ? ORDER BY created_at DESC', (current_user.id,)).fetchall()
42
+
43
+ # Get last 4 opened PDFs
44
+ recent_pdfs = conn.execute('''
45
+ SELECT file_id, filename, opened_at
46
+ FROM pdf_access_history
47
+ WHERE user_id = ?
48
+ ORDER BY opened_at DESC
49
+ LIMIT 4
50
+ ''', (current_user.id,)).fetchall()
51
+
52
+ conn.close()
53
+
54
+ # Check Drive API Status
55
+ drive_connected = bool(current_user.google_token)
56
+
57
+ return render_template('drive_manager.html',
58
+ sources=[dict(s) for s in sources],
59
+ drive_connected=drive_connected,
60
+ recent_pdfs=[dict(p) for p in recent_pdfs])
61
+
62
+ @drive_bp.route('/drive/connect')
63
+ @login_required
64
+ def connect_drive():
65
+ try:
66
+ redirect_uri = 'http://localhost'
67
+ flow = create_flow(redirect_uri)
68
+ authorization_url, state = flow.authorization_url(
69
+ access_type='offline',
70
+ include_granted_scopes='true')
71
+ session['oauth_state'] = state
72
+ return render_template('drive_connect_manual.html', auth_url=authorization_url)
73
+ except FileNotFoundError:
74
+ return "client_secret.json not found. Please upload it to the app root via Settings.", 404
75
+ except Exception as e:
76
+ return f"Error creating flow: {e}", 500
77
+
78
+ @drive_bp.route('/drive/manual_callback', methods=['POST'])
79
+ @login_required
80
+ def manual_callback():
81
+ state = session.get('oauth_state')
82
+ full_url = request.form.get('full_url')
83
+ if not full_url: return "URL is required", 400
84
+ try:
85
+ redirect_uri = 'http://localhost'
86
+ flow = create_flow(redirect_uri)
87
+ flow.fetch_token(authorization_response=full_url)
88
+ credentials = flow.credentials
89
+ token_json = credentials.to_json()
90
+ conn = get_db_connection()
91
+ conn.execute('UPDATE users SET google_token = ? WHERE id = ?', (token_json, current_user.id))
92
+ conn.commit()
93
+ conn.close()
94
+ current_user.google_token = token_json
95
+ return redirect(url_for('drive.drive_manager'))
96
+ except Exception as e:
97
+ return f"Auth failed: {e}<br><br>Make sure you copied the full URL correctly.", 500
98
+
99
+ @drive_bp.route('/oauth2callback')
100
+ def oauth2callback():
101
+ state = session.get('oauth_state')
102
+ if not state: return "Invalid state", 400
103
+ try:
104
+ redirect_uri = url_for('drive.oauth2callback', _external=True)
105
+ flow = create_flow(redirect_uri)
106
+ flow.fetch_token(authorization_response=request.url)
107
+ credentials = flow.credentials
108
+ token_json = credentials.to_json()
109
+ conn = get_db_connection()
110
+ conn.execute('UPDATE users SET google_token = ? WHERE id = ?', (token_json, current_user.id))
111
+ conn.commit()
112
+ conn.close()
113
+ current_user.google_token = token_json
114
+ return redirect(url_for('drive.drive_manager'))
115
+ except Exception as e:
116
+ return f"Auth failed: {e}", 500
117
+
118
+ @drive_bp.route('/drive/add', methods=['POST'])
119
+ @login_required
120
+ def add_source():
121
+ name = request.form.get('name')
122
+ url = request.form.get('url')
123
+ if not name or not url: return jsonify({'error': 'Name and URL required'}), 400
124
+ conn = get_db_connection()
125
+ try:
126
+ source_type = 'file'
127
+ if '/folders/' in url or 'drive/folders' in url: source_type = 'folder'
128
+ local_path = name.strip().replace(' ', '_')
129
+ conn.execute('INSERT INTO drive_sources (name, url, local_path, user_id, source_type) VALUES (?, ?, ?, ?, ?)',
130
+ (name, url, local_path, current_user.id, source_type))
131
+ conn.commit()
132
+ return jsonify({'success': True})
133
+ except Exception as e:
134
+ return jsonify({'error': str(e)}), 500
135
+ finally:
136
+ conn.close()
137
+
138
+ @drive_bp.route('/drive/delete/<int:id>', methods=['POST'])
139
+ @login_required
140
+ def delete_source(id):
141
+ conn = get_db_connection()
142
+ source = conn.execute('SELECT * FROM drive_sources WHERE id = ?', (id,)).fetchone()
143
+ if not source or source['user_id'] != current_user.id:
144
+ conn.close()
145
+ return jsonify({'error': 'Unauthorized'}), 403
146
+ conn.execute('DELETE FROM drive_sources WHERE id = ?', (id,))
147
+ conn.commit()
148
+ conn.close()
149
+ try:
150
+ path = get_sync_folder_path(source['local_path'])
151
+ if os.path.exists(path): shutil.rmtree(path)
152
+ except Exception as e:
153
+ print(f"Error deleting folder: {e}")
154
+ return jsonify({'success': True})
155
+
156
+ def sync_task(source_id, user_id, app_config):
157
+ import sqlite3
158
+ conn = sqlite3.connect('database.db')
159
+ conn.row_factory = sqlite3.Row
160
+ try:
161
+ source = conn.execute('SELECT * FROM drive_sources WHERE id = ?', (source_id,)).fetchone()
162
+ if not source: return
163
+ output_base = os.path.join(app_config['OUTPUT_FOLDER'], DRIVE_SYNC_FOLDER, source['local_path'])
164
+ if not os.path.exists(output_base): os.makedirs(output_base)
165
+ print(f"Syncing Drive: {source['name']} to {output_base}")
166
+ try:
167
+ gdown.download_folder(url=source['url'], output=output_base, quiet=False, use_cookies=False)
168
+ conn.execute('UPDATE drive_sources SET last_synced = CURRENT_TIMESTAMP WHERE id = ?', (source_id,))
169
+ conn.commit()
170
+ print("Sync complete.")
171
+ except Exception as e:
172
+ print(f"GDown Error: {e}")
173
+ except Exception as e:
174
+ print(f"Sync Task Error: {e}")
175
+ finally:
176
+ conn.close()
177
+
178
+ @drive_bp.route('/drive/sync/<int:id>', methods=['POST'])
179
+ @login_required
180
+ def sync_source(id):
181
+ conn = get_db_connection()
182
+ source = conn.execute('SELECT * FROM drive_sources WHERE id = ?', (id,)).fetchone()
183
+ conn.close()
184
+ if not source or source['user_id'] != current_user.id: return jsonify({'error': 'Unauthorized'}), 403
185
+ thread = threading.Thread(target=sync_task, args=(id, current_user.id, current_app.config.copy()))
186
+ thread.start()
187
+ return jsonify({'success': True, 'message': 'Sync started in background'})
188
+
189
+ @drive_bp.route('/drive/browse/<int:source_id>')
190
+ @drive_bp.route('/drive/browse/<int:source_id>/<path:subpath>')
191
+ @login_required
192
+ def browse_drive(source_id, subpath=''):
193
+ conn = get_db_connection()
194
+ source = conn.execute('SELECT * FROM drive_sources WHERE id = ?', (source_id,)).fetchone()
195
+ conn.close()
196
+
197
+ if not source or source['user_id'] != current_user.id: return "Unauthorized", 403
198
+
199
+ # === API Upgrade Logic ===
200
+ if current_user.google_token and not subpath:
201
+ drive_id = extract_drive_id(source['url'])
202
+ if drive_id:
203
+ # Pass source name as title
204
+ return redirect(url_for('drive.browse_drive_api', folder_id=drive_id, title=source['name']))
205
+ # =========================
206
+
207
+ base_path = get_sync_folder_path(source['local_path'])
208
+ current_path = os.path.join(base_path, subpath)
209
+
210
+ if not os.path.exists(current_path):
211
+ if source['source_type'] == 'file': pass
212
+ else: return "Path not found (Not synced yet). Click Sync Now in Manager.", 404
213
+
214
+ items = []
215
+ if os.path.exists(current_path):
216
+ try:
217
+ for entry in os.scandir(current_path):
218
+ is_dir = entry.is_dir()
219
+ file_type = 'file'
220
+ if is_dir: file_type = 'folder'
221
+ elif entry.name.lower().endswith('.pdf'): file_type = 'pdf'
222
+ elif entry.name.lower().endswith(('.png', '.jpg', '.jpeg')): file_type = 'image'
223
+
224
+ items.append({
225
+ 'name': entry.name,
226
+ 'type': file_type,
227
+ 'path': os.path.join(subpath, entry.name).strip('/')
228
+ })
229
+ except Exception as e: return f"Error listing files: {e}", 500
230
+
231
+ items.sort(key=lambda x: (x['type'] != 'folder', x['name'].lower()))
232
+
233
+ if not items and source['source_type'] == 'file':
234
+ items.append({'name': 'Tap to Download & View', 'type': 'pdf', 'path': 'document.pdf'})
235
+
236
+ breadcrumbs = []
237
+ if subpath:
238
+ parts = subpath.split('/')
239
+ built = ''
240
+ for part in parts:
241
+ built = os.path.join(built, part).strip('/')
242
+ breadcrumbs.append({'name': part, 'path': built})
243
+
244
+ return render_template('drive_browser.html', source=source, items=items, breadcrumbs=breadcrumbs, current_subpath=subpath)
245
+
246
+ @drive_bp.route('/drive/file/<int:source_id>/<path:filepath>')
247
+ @login_required
248
+ def view_drive_file(source_id, filepath):
249
+ conn = get_db_connection()
250
+ source = conn.execute('SELECT * FROM drive_sources WHERE id = ?', (source_id,)).fetchone()
251
+ conn.close()
252
+ if not source or source['user_id'] != current_user.id: return "Unauthorized", 403
253
+ base_path = get_sync_folder_path(source['local_path'])
254
+ full_path = os.path.join(base_path, filepath)
255
+
256
+ if not os.path.exists(full_path) and source['source_type'] == 'file':
257
+ try:
258
+ if not os.path.exists(base_path): os.makedirs(base_path)
259
+ gdown.download(url=source['url'], output=full_path, quiet=False, fuzzy=True)
260
+ except Exception as e: return f"Error downloading file: {e}", 500
261
+
262
+ if not os.path.exists(full_path): return "File not found.", 404
263
+ if full_path.lower().endswith('.pdf'):
264
+ file_url = url_for('drive.serve_drive_file', source_id=source_id, filepath=os.path.basename(full_path))
265
+ return render_template('pdfjs_viewer.html', pdf_url=file_url, pdf_title=os.path.basename(full_path))
266
+ return send_from_directory(os.path.dirname(full_path), os.path.basename(full_path))
267
+
268
+ @drive_bp.route('/drive/raw/<int:source_id>/<path:filepath>')
269
+ @login_required
270
+ def serve_drive_file(source_id, filepath):
271
+ conn = get_db_connection()
272
+ source = conn.execute('SELECT * FROM drive_sources WHERE id = ?', (source_id,)).fetchone()
273
+ conn.close()
274
+ if not source or source['user_id'] != current_user.id: return "Unauthorized", 403
275
+ base_path = get_sync_folder_path(source['local_path'])
276
+ return send_from_directory(base_path, filepath)
277
+
278
+ @drive_bp.route('/drive/api/list')
279
+ @drive_bp.route('/drive/api/list/<folder_id>')
280
+ @login_required
281
+ def api_list_files(folder_id='root'):
282
+ service = get_drive_service(current_user)
283
+ if not service: return jsonify({'error': 'Not connected'}), 401
284
+ files, next_token = list_drive_files(service, folder_id)
285
+ file_list = []
286
+ for f in files:
287
+ is_folder = f['mimeType'] == 'application/vnd.google-apps.folder'
288
+ icon = 'folder-fill text-warning' if is_folder else 'file-earmark-text text-secondary'
289
+ if f['mimeType'] == 'application/pdf': icon = 'file-earmark-pdf-fill text-danger'
290
+ elif 'image' in f['mimeType']: icon = 'file-earmark-image-fill text-info'
291
+ file_list.append({
292
+ 'id': f['id'],
293
+ 'name': f['name'],
294
+ 'type': 'folder' if is_folder else 'file',
295
+ 'mimeType': f['mimeType'],
296
+ 'icon': icon,
297
+ 'size': f.get('size')
298
+ })
299
+ return jsonify({'files': file_list, 'next_token': next_token})
300
+
301
+ @drive_bp.route('/drive/api/browse/<folder_id>')
302
+ @login_required
303
+ def browse_drive_api(folder_id):
304
+ service = get_drive_service(current_user)
305
+ if not service: return redirect(url_for('drive.drive_manager'))
306
+ title = request.args.get('title', 'My Drive')
307
+ files, next_token = list_drive_files(service, folder_id)
308
+ items = []
309
+ for f in files:
310
+ is_folder = f['mimeType'] == 'application/vnd.google-apps.folder'
311
+ f_type = 'folder' if is_folder else ('pdf' if f['mimeType'] == 'application/pdf' else 'file')
312
+ if 'image' in f['mimeType']: f_type = 'image'
313
+ items.append({
314
+ 'name': f['name'],
315
+ 'type': f_type,
316
+ 'path': f['id'],
317
+ 'is_api': True
318
+ })
319
+ return render_template('drive_browser.html', source={'id': 'api', 'name': title}, items=items, breadcrumbs=[], is_api=True)
320
+
321
+ @drive_bp.route('/drive/api/open/<file_id>')
322
+ @login_required
323
+ def api_open_file(file_id):
324
+ service = get_drive_service(current_user)
325
+ if not service: return "Not connected", 401
326
+ try:
327
+ meta = get_file_metadata(service, file_id)
328
+ if not meta: return "File not found", 404
329
+ filename = meta['name']
330
+ cache_dir = os.path.join(current_app.config['UPLOAD_FOLDER'], 'drive_cache')
331
+ if not os.path.exists(cache_dir): os.makedirs(cache_dir)
332
+ from werkzeug.utils import secure_filename
333
+ safe_name = secure_filename(filename)
334
+ file_path = os.path.join(cache_dir, safe_name)
335
+ if not os.path.exists(file_path):
336
+ with open(file_path, 'wb') as f:
337
+ download_file_to_stream(service, file_id, f)
338
+ if safe_name.lower().endswith('.pdf'):
339
+ # Log PDF access to history
340
+ conn = get_db_connection()
341
+ conn.execute('''
342
+ INSERT INTO pdf_access_history (user_id, file_id, filename, source_type, opened_at)
343
+ VALUES (?, ?, ?, ?, CURRENT_TIMESTAMP)
344
+ ''', (current_user.id, file_id, filename, 'drive_api'))
345
+ conn.commit()
346
+ conn.close()
347
+
348
+ file_url = url_for('drive.serve_cache_file', filename=safe_name)
349
+ return render_template('pdfjs_viewer.html', pdf_url=file_url, pdf_title=filename)
350
+ if safe_name.lower().endswith(('.png', '.jpg', '.jpeg')):
351
+ return send_from_directory(cache_dir, safe_name)
352
+ return "File downloaded but type not supported for viewing.", 200
353
+ except Exception as e: return f"Error opening file: {e}", 500
354
+
355
+ @drive_bp.route('/drive/cache/<filename>')
356
+ @login_required
357
+ def serve_cache_file(filename):
358
+ cache_dir = os.path.join(current_app.config['UPLOAD_FOLDER'], 'drive_cache')
359
+ return send_from_directory(cache_dir, filename)
entrypoint.sh ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash
2
+
3
+ # Ensure we are in the app directory
4
+ cd /app
5
+
6
+ # 1. Initialize local structure
7
+ python3 hf_sync.py init
8
+
9
+ # 2. Try to download existing data
10
+ if [ -n "$DATASET_REPO_ID" ]; then
11
+ python3 hf_sync.py download
12
+ else
13
+ echo "DATASET_REPO_ID not set, skipping initial download."
14
+ fi
15
+
16
+ # 3. Setup symlinks
17
+ # Remove existing if any
18
+ rm -rf database.db output processed uploads
19
+
20
+ # Ensure data_repo has what we need
21
+ mkdir -p data_repo/output data_repo/processed data_repo/uploads
22
+
23
+ # Create symlinks
24
+ ln -sf data_repo/database.db database.db
25
+ ln -sf data_repo/output output
26
+ ln -sf data_repo/processed processed
27
+ ln -sf data_repo/uploads uploads
28
+
29
+ # 4. Start periodic background upload
30
+ if [ -n "$DATASET_REPO_ID" ] && [ -n "$HF_TOKEN" ]; then
31
+ (
32
+ while true; do
33
+ sleep 3600 # Every hour
34
+ echo "Performing scheduled backup to HF Datasets..."
35
+ python3 hf_sync.py upload
36
+ done
37
+ ) &
38
+ else
39
+ echo "DATASET_REPO_ID or HF_TOKEN not set, periodic backup disabled."
40
+ fi
41
+
42
+ # 5. Start the application
43
+ echo "Starting application on port 7680..."
44
+ # Using gunicorn with eventlet for SocketIO support if needed,
45
+ # otherwise standard gunicorn. Since eventlet isn't in requirements.txt,
46
+ # we'll try to install it or use threads.
47
+ pip install eventlet gunicorn
48
+ exec gunicorn --bind 0.0.0.0:7680 --worker-class eventlet -w 1 app:app
gdrive_service.py ADDED
@@ -0,0 +1,76 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ import os
3
+ import json
4
+ from google.oauth2.credentials import Credentials
5
+ from google_auth_oauthlib.flow import Flow
6
+ from googleapiclient.discovery import build
7
+ from flask import current_app, url_for, session
8
+ from utils import get_db_connection
9
+
10
+ # Scopes required for Drive API
11
+ SCOPES = ['https://www.googleapis.com/auth/drive.readonly', 'https://www.googleapis.com/auth/drive.metadata.readonly']
12
+
13
+ def get_drive_service(user):
14
+ """
15
+ Returns a build('drive', 'v3', credentials=creds) service object
16
+ if user has valid tokens. Returns None otherwise.
17
+ """
18
+ if not user.google_token:
19
+ return None
20
+
21
+ try:
22
+ token_info = json.loads(user.google_token)
23
+ creds = Credentials.from_authorized_user_info(token_info, SCOPES)
24
+ return build('drive', 'v3', credentials=creds)
25
+ except Exception as e:
26
+ current_app.logger.error(f"Error building Drive service: {e}")
27
+ return None
28
+
29
+ def create_flow(redirect_uri):
30
+ """Creates an OAuth2 Flow object."""
31
+ # We need a client_secret.json.
32
+ # For now, we assume it's in the root or config.
33
+ # Or we can construct it from env vars if we had them.
34
+ # User needs to provide this. I'll check if it exists.
35
+
36
+ client_secrets_file = os.path.join(current_app.root_path, 'client_secret.json')
37
+ if not os.path.exists(client_secrets_file):
38
+ raise FileNotFoundError("client_secret.json not found. Please upload it to the root directory.")
39
+
40
+ flow = Flow.from_client_secrets_file(
41
+ client_secrets_file,
42
+ scopes=SCOPES,
43
+ redirect_uri=redirect_uri
44
+ )
45
+ return flow
46
+
47
+ def list_drive_files(service, folder_id='root', page_token=None):
48
+ """Lists files in a specific Drive folder."""
49
+ try:
50
+ results = service.files().list(
51
+ q=f"'{folder_id}' in parents and trashed = false",
52
+ pageSize=50,
53
+ pageToken=page_token,
54
+ fields="nextPageToken, files(id, name, mimeType, iconLink, webViewLink, size, modifiedTime)",
55
+ orderBy="folder,name"
56
+ ).execute()
57
+ return results.get('files', []), results.get('nextPageToken')
58
+ except Exception as e:
59
+ current_app.logger.error(f"Drive API List Error: {e}")
60
+ return [], None
61
+
62
+ def get_file_metadata(service, file_id):
63
+ try:
64
+ return service.files().get(fileId=file_id, fields="id, name, mimeType, size").execute()
65
+ except Exception as e:
66
+ return None
67
+
68
+ def download_file_to_stream(service, file_id, stream):
69
+ """Downloads file content to a writeable stream."""
70
+ from googleapiclient.http import MediaIoBaseDownload
71
+
72
+ request = service.files().get_media(fileId=file_id)
73
+ downloader = MediaIoBaseDownload(stream, request)
74
+ done = False
75
+ while done is False:
76
+ status, done = downloader.next_chunk()
gemini_classification_prompt.txt ADDED
@@ -0,0 +1,208 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ **System Role:** You are a question classifier for NEET/JEE exams, specialized in mapping questions to their corresponding subjects and chapters from the NCERT syllabus.
3
+
4
+ Your task is to analyze each question, first classify it into the most relevant subject, and then identify the most relevant chapter(s) from the official syllabus structures provided below.
5
+
6
+ **Available Subjects (Use these exact titles):**
7
+ - Biology
8
+ - Chemistry
9
+ - Physics
10
+ - Mathematics
11
+
12
+ **Syllabus Chapters (Use these exact titles for the respective subjects):**
13
+
14
+ ---
15
+ **1. BIOLOGY (Common for NEET & JEE)**
16
+
17
+ **Class XI**
18
+ 1. The Living World
19
+ 2. Biological Classification
20
+ 3. Plant Kingdom
21
+ 4. Animal Kingdom
22
+ 5. Morphology of Flowering Plants
23
+ 6. Anatomy of Flowering Plants
24
+ 7. Structural Organisation in Animals
25
+ 8. Cell: The Unit of Life
26
+ 9. Biomolecules
27
+ 10. Cell Cycle and Cell Division
28
+ 11. Photosynthesis in Higher Plants
29
+ 12. Respiration in Plants
30
+ 13. Plant Growth and Development
31
+ 14. Breathing and Exchange of Gases
32
+ 15. Body Fluids and Circulation
33
+ 16. Excretory Products and their Elimination
34
+ 17. Locomotion and Movement
35
+ 18. Neural Control and Coordination
36
+ 19. Chemical Coordination and Integration
37
+ 20. Sexual Reproduction in Flowering Plants
38
+ 21. Human Reproduction
39
+ 22. Reproductive Health
40
+ 23. Principles of Inheritance and Variation
41
+ 24. Molecular Basis of Inheritance
42
+ 25. Evolution
43
+ 26. Health and Disease
44
+ 27. Improvement in Food Production
45
+ 28. Microbes in Human Welfare
46
+ 29. Biotechnology - Principles and Processes
47
+ 30. Biotechnology and Its Applications
48
+ 31. Organisms and Populations
49
+ 32. Ecosystem
50
+ 33. Biodiversity and Its Conservation
51
+
52
+ ---
53
+ **2. CHEMISTRY (Common for NEET & JEE)**
54
+
55
+ **Class XI**
56
+ 1. Some Basic Concepts of Chemistry
57
+ 2. Structure of Atom
58
+ 3. Classification of Elements and Periodicity in Properties
59
+ 4. Chemical Bonding and Molecular Structure
60
+ 5. States of Matter: Gases and Liquids
61
+ 6. Thermodynamics
62
+ 7. Equilibrium
63
+ 8. Redox Reactions
64
+ 9. Hydrogen
65
+ 10. The s-Block Elements
66
+ 11. The p-Block Elements (Group 13 and 14)
67
+ 12. Organic Chemistry – Some Basic Principles and Techniques (GOC)
68
+ 13. Hydrocarbons
69
+ 14. Environmental Chemistry
70
+
71
+ **Class XII**
72
+ 1. The Solid State
73
+ 2. Solutions
74
+ 3. Electrochemistry
75
+ 4. Chemical Kinetics
76
+ 5. Surface Chemistry
77
+ 6. General Principles and Processes of Isolation of Elements (Metallurgy)
78
+ 7. The p-Block Elements (Group 15 to 18)
79
+ 8. The d- and f- Block Elements
80
+ 9. Coordination Compounds
81
+ 10. Haloalkanes and Haloarenes
82
+ 11. Alcohols, Phenols and Ethers
83
+ 12. Aldehydes, Ketones and Carboxylic Acids
84
+ 13. Amines
85
+ 14. Biomolecules
86
+ 15. Polymers
87
+ 16. Chemistry in Everyday Life
88
+
89
+ ---
90
+ **3. PHYSICS (Common for NEET & JEE)**
91
+
92
+ **Class XI**
93
+ 1. Units and Measurements
94
+ 2. Motion in a Straight Line
95
+ 3. Motion in a Plane
96
+ 4. Laws of Motion
97
+ 5. Work, Energy and Power
98
+ 6. System of Particles and Rotational Motion
99
+ 7. Gravitation
100
+ 8. Mechanical Properties of Solids
101
+ 9. Mechanical Properties of Fluids
102
+ 10. Thermal Properties of Matter
103
+ 11. Thermodynamics
104
+ 12. Kinetic Theory
105
+ 13. Oscillations
106
+ 14. Waves
107
+
108
+ **Class XII**
109
+ 1. Electric Charges and Fields
110
+ 2. Electrostatic Potential and Capacitance
111
+ 3. Current Electricity
112
+ 4. Moving Charges and Magnetism
113
+ 5. Magnetism and Matter
114
+ 6. Electromagnetic Induction
115
+ 7. Alternating Current
116
+ 8. Electromagnetic Waves
117
+ 9. Ray Optics and Optical Instruments
118
+ 10. Wave Optics
119
+ 11. Dual Nature of Radiation and Matter
120
+ 12. Atoms
121
+ 13. Nuclei
122
+ 14. Semiconductor Electronics: Materials, Devices and Simple Circuits
123
+ 15. Communication Systems
124
+
125
+ ---
126
+ **4. MATHEMATICS (For JEE Only)**
127
+
128
+ **Class XI**
129
+ 1. Sets
130
+ 2. Relations and Functions
131
+ 3. Trigonometric Functions
132
+ 4. Principle of Mathematical Induction
133
+ 5. Complex Numbers and Quadratic Equations
134
+ 6. Linear Inequalities
135
+ 7. Permutations and Combinations
136
+ 8. Binomial Theorem
137
+ 9. Sequences and Series
138
+ 10. Straight Lines
139
+ 11. Conic Sections
140
+ 12. Introduction to Three Dimensional Geometry
141
+ 13. Limits and Derivatives
142
+ 14. Mathematical Reasoning
143
+ 15. Statistics
144
+ 16. Probability
145
+
146
+ **Class XII**
147
+ 1. Relations and Functions
148
+ 2. Inverse Trigonometric Functions
149
+ 3. Matrices
150
+ 4. Determinants
151
+ 5. Continuity and Differentiability
152
+ 6. Application of Derivatives
153
+ 7. Integrals
154
+ 8. Application of Integrals
155
+ 9. Differential Equations
156
+ 10. Vector Algebra
157
+ 11. Three Dimensional Geometry
158
+ 12. Linear Programming
159
+ 13. Probability
160
+
161
+ ---
162
+
163
+ **Classification Guidelines:**
164
+
165
+ 1. **Primary Classification**: Identify the single most relevant subject, and then the most relevant chapter(s) within that subject, that directly addresses the question's core concept.
166
+ 2. **Multi-Chapter Questions**: If a question explicitly spans 2-3 distinct chapters, include all relevant chapters.
167
+ 3. **Confidence Scoring** (0.0 to 1.0):
168
+ * **1.0**: Perfect match
169
+ * **0.8-0.9**: Strong match
170
+ * **0.5-0.7**: Moderate match
171
+ * **Below 0.5**: Avoid unless unavoidable.
172
+ 4. **Non-Syllabus Questions**: If a question is not from any of the provided subjects/chapters, set `subject` to 'Unclassified' and `chapter_title` to 'Unclassified'.
173
+
174
+ **Critical Requirements:**
175
+
176
+ - Use ONLY the subject titles exactly as listed above, or 'Unclassified'.
177
+ - Use ONLY the chapter titles exactly as listed above, or 'Unclassified'.
178
+ - Preserve the original question text completely.
179
+ - Output ONLY valid JSON.
180
+ - The "index" field MUST match the question number shown in the input (e.g., if the question is numbered "8.", then "index": 8).
181
+
182
+ **Output JSON Schema:**
183
+
184
+ ```json
185
+ {
186
+ "data": [
187
+ {
188
+ "index": 1,
189
+ "subject": "<exact subject title from list or 'Unclassified'>",
190
+ "chapter_index": <chapter number or 0>,
191
+ "chapter_title": "<exact chapter title from list or 'Unclassified'>",
192
+ "original_question_text": "<complete original question with all formatting>",
193
+ "confidence": <0.0 to 1.0>
194
+ }
195
+ ],
196
+ "success": [true]
197
+ }
198
+ ```
199
+
200
+ Now classify the following question(s):
201
+ ```
202
+ 8. first-order reaction has half-life 200 s. Time required for the amount of reactant to become one- eighth of its initial value is: (A) 200 s (B) 400 s (C) 600 s (D) 800 s
203
+ 9. Which hormone helps in internode/petiole elongation in deep water rice plants to keep leaves/ upper parts of the shoot above water? (A) Gibberellins (B) Zeatin (C) ABA (D) Ethylene
204
+ 10. Given below are two statements: The interphase nucleus has highly extended and elaborate nucleoprotein fibres called chromatin which contains DNA and some basic proteins called histones, some non-histone proteins and also RNA. Statement I: Statement II: A haploid set of chromosomes in humans contains 3.3 X 109 bp which is approximately two metre long thread of DNA distributed among its twenty three chromosomes. In the light of the above statements, choose the most appropriate answer from the options given below: (A) Statement is correct but Statement Il is incorrect. (B) Statement is incorrect but Statement ll is correct. (C) Both Statement and Statement ll are correct. (D) Both Statement and Statement ll are incorrect.
205
+ 11. A force F = - (yi + +x)) N acts on a particle moving in the xy plane. Starting from the origin, the particle is taken along the positive x-axis to the point (a, 0) m and then parallel to the y-axis to the point (a, a) m. The total work done (in joules) by the force is: (A) ka² (B) ka² (C) -2ka² (D) Zero
206
+ 12. On electrolysis of dilute nitric acid using platinum electrodes, the product obtained at the anode is: (A) H2 gas (B) O2 gas (C) NO2 gas (D) N2 gas
207
+ 13. Identify the incorrectly matched pair: (A) Petiole is modified for photosynthesis Australian acacia (B) Leaves modified into spines Cactus (C) Stem modified into a fleshy, cylindrical photosynthetic structure Opuntia (D) Stem modified into thorns Citrus
208
+ ```
gemini_classifier.py ADDED
@@ -0,0 +1,278 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import json
3
+ import requests
4
+ import sys
5
+ from typing import List, Optional, Dict, Any
6
+ from api_key_manager import get_api_key_manager
7
+
8
+ def classify_questions_with_gemini(questions: List[str], start_index: int = 0) -> Optional[Dict[Any, Any]]:
9
+ """
10
+ Classifies a single batch of questions using the Gemini API.
11
+ `questions` should be a list of strings representing one batch.
12
+ `start_index` is the overall starting index for this batch (e.g., 0, 7, 14...).
13
+ """
14
+ # Get API key from the manager
15
+ manager = get_api_key_manager()
16
+ api_key, key_index = manager.get_key('gemini')
17
+
18
+ if not api_key:
19
+ raise ValueError("No available Gemini API keys. Please set GEMINI_API_KEY or GOOGLE_API_KEY environment variable.")
20
+
21
+ # Construct the input text with the current batch of questions
22
+ # The model expects 1-based indexing in the prompt.
23
+ input_text = "\n".join([f"{j + start_index + 1}. {q}" for j, q in enumerate(questions)])
24
+
25
+ prompt = f"""
26
+ **System Role:** You are a question classifier for NEET/JEE exams, specialized in mapping questions to their corresponding subjects and chapters from the NCERT syllabus.
27
+
28
+ Your task is to analyze each question, first classify it into the most relevant subject, and then identify the most relevant chapter(s) from the official syllabus structures provided below.
29
+
30
+ **Available Subjects (Use these exact titles):**
31
+ - Biology
32
+ - Chemistry
33
+ - Physics
34
+ - Mathematics
35
+
36
+ **Syllabus Chapters (Use these exact titles for the respective subjects):**
37
+
38
+ ---
39
+ **1. BIOLOGY (Common for NEET & JEE)**
40
+
41
+ **Class XI**
42
+ 1. The Living World
43
+ 2. Biological Classification
44
+ 3. Plant Kingdom
45
+ 4. Animal Kingdom
46
+ 5. Morphology of Flowering Plants
47
+ 6. Anatomy of Flowering Plants
48
+ 7. Structural Organisation in Animals
49
+ 8. Cell: The Unit of Life
50
+ 9. Biomolecules
51
+ 10. Cell Cycle and Cell Division
52
+ 11. Photosynthesis in Higher Plants
53
+ 12. Respiration in Plants
54
+ 13. Plant Growth and Development
55
+ 14. Breathing and Exchange of Gases
56
+ 15. Body Fluids and Circulation
57
+ 16. Excretory Products and their Elimination
58
+ 17. Locomotion and Movement
59
+ 18. Neural Control and Coordination
60
+ 19. Chemical Coordination and Integration
61
+ 20. Sexual Reproduction in Flowering Plants
62
+ 21. Human Reproduction
63
+ 22. Reproductive Health
64
+ 23. Principles of Inheritance and Variation
65
+ 24. Molecular Basis of Inheritance
66
+ 25. Evolution
67
+ 26. Health and Disease
68
+ 27. Improvement in Food Production
69
+ 28. Microbes in Human Welfare
70
+ 29. Biotechnology - Principles and Processes
71
+ 30. Biotechnology and Its Applications
72
+ 31. Organisms and Populations
73
+ 32. Ecosystem
74
+ 33. Biodiversity and Its Conservation
75
+
76
+ ---
77
+ **2. CHEMISTRY (Common for NEET & JEE)**
78
+
79
+ **Class XI**
80
+ 1. Some Basic Concepts of Chemistry
81
+ 2. Structure of Atom
82
+ 3. Classification of Elements and Periodicity in Properties
83
+ 4. Chemical Bonding and Molecular Structure
84
+ 5. States of Matter: Gases and Liquids
85
+ 6. Thermodynamics
86
+ 7. Equilibrium
87
+ 8. Redox Reactions
88
+ 9. Hydrogen
89
+ 10. The s-Block Elements
90
+ 11. The p-Block Elements (Group 13 and 14)
91
+ 12. Organic Chemistry – Some Basic Principles and Techniques (GOC)
92
+ 13. Hydrocarbons
93
+ 14. Environmental Chemistry
94
+
95
+ **Class XII**
96
+ 1. The Solid State
97
+ 2. Solutions
98
+ 3. Electrochemistry
99
+ 4. Chemical Kinetics
100
+ 5. Surface Chemistry
101
+ 6. General Principles and Processes of Isolation of Elements (Metallurgy)
102
+ 7. The p-Block Elements (Group 15 to 18)
103
+ 8. The d- and f- Block Elements
104
+ 9. Coordination Compounds
105
+ 10. Haloalkanes and Haloarenes
106
+ 11. Alcohols, Phenols and Ethers
107
+ 12. Aldehydes, Ketones and Carboxylic Acids
108
+ 13. Amines
109
+ 14. Biomolecules
110
+ 15. Polymers
111
+ 16. Chemistry in Everyday Life
112
+
113
+ ---
114
+ **3. PHYSICS (Common for NEET & JEE)**
115
+
116
+ **Class XI**
117
+ 1. Units and Measurements
118
+ 2. Motion in a Straight Line
119
+ 3. Motion in a Plane
120
+ 4. Laws of Motion
121
+ 5. Work, Energy and Power
122
+ 6. System of Particles and Rotational Motion
123
+ 7. Gravitation
124
+ 8. Mechanical Properties of Solids
125
+ 9. Mechanical Properties of Fluids
126
+ 10. Thermal Properties of Matter
127
+ 11. Thermodynamics
128
+ 12. Kinetic Theory
129
+ 13. Oscillations
130
+ 14. Waves
131
+
132
+ **Class XII**
133
+ 1. Electric Charges and Fields
134
+ 2. Electrostatic Potential and Capacitance
135
+ 3. Current Electricity
136
+ 4. Moving Charges and Magnetism
137
+ 5. Magnetism and Matter
138
+ 6. Electromagnetic Induction
139
+ 7. Alternating Current
140
+ 8. Electromagnetic Waves
141
+ 9. Ray Optics and Optical Instruments
142
+ 10. Wave Optics
143
+ 11. Dual Nature of Radiation and Matter
144
+ 12. Atoms
145
+ 13. Nuclei
146
+ 14. Semiconductor Electronics: Materials, Devices and Simple Circuits
147
+ 15. Communication Systems
148
+
149
+ ---
150
+ **4. MATHEMATICS (For JEE Only)**
151
+
152
+ **Class XI**
153
+ 1. Sets
154
+ 2. Relations and Functions
155
+ 3. Trigonometric Functions
156
+ 4. Principle of Mathematical Induction
157
+ 5. Complex Numbers and Quadratic Equations
158
+ 6. Linear Inequalities
159
+ 7. Permutations and Combinations
160
+ 8. Binomial Theorem
161
+ 9. Sequences and Series
162
+ 10. Straight Lines
163
+ 11. Conic Sections
164
+ 12. Introduction to Three Dimensional Geometry
165
+ 13. Limits and Derivatives
166
+ 14. Mathematical Reasoning
167
+ 15. Statistics
168
+ 16. Probability
169
+
170
+ **Class XII**
171
+ 1. Relations and Functions
172
+ 2. Inverse Trigonometric Functions
173
+ 3. Matrices
174
+ 4. Determinants
175
+ 5. Continuity and Differentiability
176
+ 6. Application of Derivatives
177
+ 7. Integrals
178
+ 8. Application of Integrals
179
+ 9. Differential Equations
180
+ 10. Vector Algebra
181
+ 11. Three Dimensional Geometry
182
+ 12. Linear Programming
183
+ 13. Probability
184
+
185
+ ---
186
+
187
+ **Classification Guidelines:**
188
+
189
+ 1. **Primary Classification**: Identify the single most relevant subject, and then the most relevant chapter(s) within that subject, that directly addresses the question's core concept.
190
+ 2. **Multi-Chapter Questions**: If a question explicitly spans 2-3 distinct chapters, include all relevant chapters.
191
+ 3. **Confidence Scoring** (0.0 to 1.0):
192
+ * **1.0**: Perfect match
193
+ * **0.8-0.9**: Strong match
194
+ * **0.5-0.7**: Moderate match
195
+ * **Below 0.5**: Avoid unless unavoidable.
196
+ 4. **Non-Syllabus Questions**: If a question is not from any of the provided subjects/chapters, set `subject` to 'Unclassified' and `chapter_title` to 'Unclassified'.
197
+
198
+ **Critical Requirements:**
199
+
200
+ - Use ONLY the subject titles exactly as listed above, or 'Unclassified'.
201
+ - Use ONLY the chapter titles exactly as listed above, or 'Unclassified'.
202
+ - Preserve the original question text completely.
203
+ - Output ONLY valid JSON.
204
+ - The "index" field MUST match the question number shown in the input (e.g., if the question is numbered "8.", then "index": 8).
205
+
206
+ **Output JSON Schema:**
207
+
208
+ ```json
209
+ {{
210
+ "data": [
211
+ {{
212
+ "index": 1,
213
+ "subject": "<exact subject title from list or 'Unclassified'>",
214
+ "chapter_index": <chapter number or 0>,
215
+ "chapter_title": "<exact chapter title from list or 'Unclassified'>",
216
+ "original_question_text": "<complete original question with all formatting>",
217
+ "confidence": <0.0 to 1.0>
218
+ }}
219
+ ],
220
+ "success": [true]
221
+ }}
222
+ ```
223
+
224
+ Now classify the following question(s):
225
+ ```
226
+ {input_text}
227
+ ```
228
+ """
229
+ with open('gemini_classification_prompt.txt', 'w') as f:
230
+ f.write(prompt)
231
+
232
+ url = f"https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent?key={api_key}"
233
+ headers = {'Content-Type': 'application/json'}
234
+
235
+ request_body = {
236
+ "contents": [{"role": "user", "parts": [{"text": prompt}]}],
237
+ "generationConfig": {
238
+ "responseMimeType": "application/json",
239
+ }
240
+ }
241
+
242
+ print(f"Sending batch to Gemini API with {len(questions)} questions.")
243
+ print(f"Sending request to Gemini API. Body: {json.dumps(request_body, indent=2)}") # Full logging enabled
244
+
245
+ try:
246
+ response = requests.post(url, headers=headers, json=request_body, timeout=300)
247
+ response.raise_for_status()
248
+
249
+ print(f"Received raw response from Gemini: {response.text}") # Full logging enabled
250
+
251
+ # Parse the response JSON
252
+ response_json = response.json()
253
+
254
+ # Check if the response has valid content and parts
255
+ candidate = response_json.get('candidates', [{}])[0]
256
+ content = candidate.get('content', {})
257
+ parts = content.get('parts', [])
258
+
259
+ if not parts:
260
+ print("Error: Model generated thoughts but no output text.")
261
+ manager.mark_failure('gemini', key_index)
262
+ return None
263
+ else:
264
+ text = parts[0]['text']
265
+ batch_result = json.loads(text)
266
+ manager.mark_success('gemini', key_index)
267
+ return batch_result
268
+
269
+ except requests.exceptions.RequestException as e:
270
+ print(f"Error during Gemini API call: {repr(e)}", file=sys.stderr)
271
+ print(f"Response body: {e.response.text if e.response else 'N/A'}", file=sys.stderr)
272
+ manager.mark_failure('gemini', key_index)
273
+ return None
274
+ except (json.JSONDecodeError, KeyError, IndexError) as e:
275
+ print(f"Error parsing Gemini response: {repr(e)}", file=sys.stderr)
276
+ print(f"Raw response text: {response.text if 'response' in locals() else 'N/A'}", file=sys.stderr)
277
+ manager.mark_failure('gemini', key_index)
278
+ return None
gemini_subjective.py ADDED
@@ -0,0 +1,114 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import json
3
+ import requests
4
+ import sys
5
+ import base64
6
+ from typing import List, Optional, Dict, Any
7
+
8
+ def generate_subjective_questions(image_path: str) -> Optional[Dict[Any, Any]]:
9
+ """
10
+ Transcribes and structures subjective questions from an image using the Gemini API.
11
+ """
12
+ api_key = os.environ.get("GEMINI_API_KEY")
13
+ if not api_key:
14
+ print("Error: GEMINI_API_KEY environment variable is not set.", file=sys.stderr)
15
+ return None
16
+
17
+ # Read and encode image
18
+ try:
19
+ with open(image_path, "rb") as image_file:
20
+ encoded_string = base64.b64encode(image_file.read()).decode('utf-8')
21
+ except Exception as e:
22
+ print(f"Error reading image file: {e}", file=sys.stderr)
23
+ return None
24
+
25
+ model_id = "gemini-flash-latest"
26
+ url = f"https://generativelanguage.googleapis.com/v1beta/models/{model_id}:generateContent?key={api_key}"
27
+ headers = {'Content-Type': 'application/json'}
28
+
29
+ prompt_text = """
30
+ Analyze the provided image. It contains a list of subjective questions (handwritten or printed).
31
+
32
+ Task:
33
+ 1. **Transcribe** each question exactly as written.
34
+ 2. **Identify the Topic:** Determine the subject or topic for each question (e.g., "Ascomycetes", "Thermodynamics"). If the header specifies a topic, use that.
35
+ 3. **Structure:** Return the data in the specified JSON format.
36
+ 4. **Numbering:** Use the question number found in the image.
37
+
38
+ If the image contains multiple questions, extract all of them.
39
+ """
40
+
41
+ request_body = {
42
+ "contents": [
43
+ {
44
+ "role": "user",
45
+ "parts": [
46
+ {
47
+ "inline_data": {
48
+ "mime_type": "image/jpeg", # Assuming JPEG/PNG, API is flexible with image/* usually, but let's send jpeg or png based on file if needed, usually jpeg works for generic
49
+ "data": encoded_string
50
+ }
51
+ },
52
+ {
53
+ "text": prompt_text
54
+ }
55
+ ]
56
+ }
57
+ ],
58
+ "generationConfig": {
59
+ "responseMimeType": "application/json",
60
+ "responseSchema": {
61
+ "type": "object",
62
+ "properties": {
63
+ "success": {"type": "boolean"},
64
+ "data": {
65
+ "type": "array",
66
+ "items": {
67
+ "type": "object",
68
+ "properties": {
69
+ "question_topic": {"type": "string"},
70
+ "question_html": {"type": "string"},
71
+ "question_number_within_topic": {"type": "string"}
72
+ },
73
+ "required": ["question_topic", "question_html", "question_number_within_topic"]
74
+ }
75
+ }
76
+ },
77
+ "required": ["success", "data"]
78
+ }
79
+ }
80
+ }
81
+
82
+ try:
83
+ response = requests.post(url, headers=headers, json=request_body, timeout=120)
84
+ response.raise_for_status()
85
+
86
+ response_json = response.json()
87
+
88
+ # Extract text from candidate
89
+ candidate = response_json.get('candidates', [{}])[0]
90
+ content = candidate.get('content', {})
91
+ parts = content.get('parts', [])
92
+
93
+ if not parts:
94
+ print("Error: Gemini generated no content.")
95
+ return None
96
+
97
+ text = parts[0]['text']
98
+ return json.loads(text)
99
+
100
+ except requests.exceptions.RequestException as e:
101
+ print(f"Error during Gemini API call: {e}", file=sys.stderr)
102
+ if e.response:
103
+ print(f"Response: {e.response.text}", file=sys.stderr)
104
+ return None
105
+ except json.JSONDecodeError as e:
106
+ print(f"Error parsing JSON response: {e}", file=sys.stderr)
107
+ print(f"Raw text: {text}", file=sys.stderr)
108
+ return None
109
+
110
+ if __name__ == "__main__":
111
+ # Test the function
112
+ result = generate_subjective_questions("Ascomycetes")
113
+ if result:
114
+ print(json.dumps(result, indent=2))
gemma_classifier.py ADDED
@@ -0,0 +1,297 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import json
3
+ import requests
4
+ import sys
5
+ from typing import List, Dict, Any, Optional
6
+ from api_key_manager import get_api_key_manager
7
+
8
+ class GemmaClassifier:
9
+ def __init__(self):
10
+ # API key will be fetched dynamically via get_api_key_manager
11
+ pass
12
+
13
+ def classify(self, questions: List[str], start_index: int = 0) -> Optional[Dict[str, Any]]:
14
+ """
15
+ Classifies a list of questions using the NVIDIA NIM Gemma API.
16
+ `questions` should be a list of strings representing the questions to classify.
17
+ `start_index` is the overall starting index for this batch (e.g., 0, 7, 14...).
18
+ """
19
+ manager = get_api_key_manager()
20
+ api_key, key_index = manager.get_key('nvidia')
21
+
22
+ if not api_key:
23
+ raise ValueError("No available NVIDIA API keys. Please set NVIDIA_API_KEY environment variable.")
24
+
25
+ full_prompt = self._generate_gemma_prompt(questions=questions, start_index=start_index)
26
+
27
+ url = "https://integrate.api.nvidia.com/v1/chat/completions"
28
+ headers = {
29
+ "Authorization": f"Bearer {api_key}",
30
+ "Content-Type": "application/json"
31
+ }
32
+
33
+ payload = {
34
+ "model": "google/gemma-3n-e4b-it",
35
+ "messages": [{"role": "user", "content": full_prompt}],
36
+ "temperature": 0.2,
37
+ "max_tokens": 2048,
38
+ "stream": False
39
+ }
40
+
41
+ print(f"Sending batch to NVIDIA NIM Gemma API with {len(questions)} questions.")
42
+
43
+ try:
44
+ response = requests.post(url, headers=headers, json=payload, timeout=300)
45
+ response.raise_for_status()
46
+
47
+ response_json = response.json()
48
+
49
+ if 'choices' in response_json and len(response_json['choices']) > 0:
50
+ first_choice = response_json['choices'][0]
51
+ if 'message' in first_choice and 'content' in first_choice['message']:
52
+ model_output_content = first_choice['message']['content']
53
+
54
+ if model_output_content.startswith("```json") and model_output_content.endswith("```"):
55
+ model_output_content = model_output_content[7:-3].strip()
56
+
57
+ try:
58
+ batch_result = json.loads(model_output_content)
59
+ manager.mark_success('nvidia', key_index)
60
+ return batch_result
61
+ except json.JSONDecodeError as e:
62
+ print(f"Error decoding JSON from model output: {e}", file=sys.stderr)
63
+ print(f"Model output content: {model_output_content}", file=sys.stderr)
64
+ manager.mark_failure('nvidia', key_index)
65
+ return None
66
+ else:
67
+ print("Error: 'message' or 'content' not found in NVIDIA NIM Gemma response choice.", file=sys.stderr)
68
+ manager.mark_failure('nvidia', key_index)
69
+ return None
70
+ else:
71
+ print("Error: 'choices' not found or empty in NVIDIA NIM Gemma response.", file=sys.stderr)
72
+ manager.mark_failure('nvidia', key_index)
73
+ return None
74
+
75
+ except requests.exceptions.RequestException as e:
76
+ print(f"Error during NVIDIA NIM Gemma API call: {repr(e)}", file=sys.stderr)
77
+ if e.response is not None:
78
+ print(f"Response status code: {e.response.status_code}", file=sys.stderr)
79
+ print(f"Response body: {e.response.text}", file=sys.stderr)
80
+ manager.mark_failure('nvidia', key_index)
81
+ return None
82
+ except Exception as e:
83
+ print(f"An unexpected error occurred: {e}", file=sys.stderr)
84
+ manager.mark_failure('nvidia', key_index)
85
+ return None
86
+
87
+ def _generate_gemma_prompt(self, questions: List[str], start_index: int) -> str:
88
+ """
89
+ Generates the detailed prompt for the Gemma classifier, similar to gemini_classifier.py.
90
+ """
91
+ input_text = "\n".join([f"{j + start_index + 1}. {q}" for j, q in enumerate(questions)])
92
+
93
+ prompt = f"""
94
+ **System Role:** You are a question classifier for NEET/JEE exams, specialized in mapping questions to their corresponding subjects and chapters from the NCERT syllabus.
95
+
96
+ Your task is to analyze each question, first classify it into the most relevant subject, and then identify the most relevant chapter(s) from the official syllabus structures provided below.
97
+
98
+ **Available Subjects (Use these exact titles):**
99
+ - Biology
100
+ - Chemistry
101
+ - Physics
102
+ - Mathematics
103
+
104
+ **Syllabus Chapters (Use these exact titles for the respective subjects):**
105
+
106
+ ---
107
+ **1. BIOLOGY (Common for NEET & JEE)**
108
+
109
+ **Class XI**
110
+ 1. The Living World
111
+ 2. Biological Classification
112
+ 3. Plant Kingdom
113
+ 4. Animal Kingdom
114
+ 5. Morphology of Flowering Plants
115
+ 6. Anatomy of Flowering Plants
116
+ 7. Structural Organisation in Animals
117
+ 8. Cell: The Unit of Life
118
+ 9. Biomolecules
119
+ 10. Cell Cycle and Cell Division
120
+ 11. Photosynthesis in Higher Plants
121
+ 12. Respiration in Plants
122
+ 13. Plant Growth and Development
123
+ 14. Breathing and Exchange of Gases
124
+ 15. Body Fluids and Circulation
125
+ 16. Excretory Products and their Elimination
126
+ 17. Locomotion and Movement
127
+ 18. Neural Control and Coordination
128
+ 19. Chemical Coordination and Integration
129
+ 20. Sexual Reproduction in Flowering Plants
130
+ 21. Human Reproduction
131
+ 22. Reproductive Health
132
+ 23. Principles of Inheritance and Variation
133
+ 24. Molecular Basis of Inheritance
134
+ 25. Evolution
135
+ 26. Health and Disease
136
+ 27. Improvement in Food Production
137
+ 28. Microbes in Human Welfare
138
+ 29. Biotechnology - Principles and Processes
139
+ 30. Biotechnology and Its Applications
140
+ 31. Organisms and Populations
141
+ 32. Ecosystem
142
+ 33. Biodiversity and Its Conservation
143
+
144
+ ---
145
+ **2. CHEMISTRY (Common for NEET & JEE)**
146
+
147
+ **Class XI**
148
+ 1. Some Basic Concepts of Chemistry
149
+ 2. Structure of Atom
150
+ 3. Classification of Elements and Periodicity in Properties
151
+ 4. Chemical Bonding and Molecular Structure
152
+ 5. States of Matter: Gases and Liquids
153
+ 6. Thermodynamics
154
+ 7. Equilibrium
155
+ 8. Redox Reactions
156
+ 9. Hydrogen
157
+ 10. The s-Block Elements
158
+ 11. The p-Block Elements (Group 13 and 14)
159
+ 12. Organic Chemistry – Some Basic Principles and Techniques (GOC)
160
+ 13. Hydrocarbons
161
+ 14. Environmental Chemistry
162
+
163
+ **Class XII**
164
+ 1. The Solid State
165
+ 2. Solutions
166
+ 3. Electrochemistry
167
+ 4. Chemical Kinetics
168
+ 5. Surface Chemistry
169
+ 6. General Principles and Processes of Isolation of Elements (Metallurgy)
170
+ 7. The p-Block Elements (Group 15 to 18)
171
+ 8. The d- and f- Block Elements
172
+ 9. Coordination Compounds
173
+ 10. Haloalkanes and Haloarenes
174
+ 11. Alcohols, Phenols and Ethers
175
+ 12. Aldehydes, Ketones and Carboxylic Acids
176
+ 13. Amines
177
+ 14. Biomolecules
178
+ 15. Polymers
179
+ 16. Chemistry in Everyday Life
180
+
181
+ ---
182
+ **3. PHYSICS (Common for NEET & JEE)**
183
+
184
+ **Class XI**
185
+ 1. Units and Measurements
186
+ 2. Motion in a Straight Line
187
+ 3. Motion in a Plane
188
+ 4. Laws of Motion
189
+ 5. Work, Energy and Power
190
+ 6. System of Particles and Rotational Motion
191
+ 7. Gravitation
192
+ 8. Mechanical Properties of Solids
193
+ 9. Mechanical Properties of Fluids
194
+ 10. Thermal Properties of Matter
195
+ 11. Thermodynamics
196
+ 12. Kinetic Theory
197
+ 13. Oscillations
198
+ 14. Waves
199
+
200
+ **Class XII**
201
+ 1. Electric Charges and Fields
202
+ 2. Electrostatic Potential and Capacitance
203
+ 3. Current Electricity
204
+ 4. Moving Charges and Magnetism
205
+ 5. Magnetism and Matter
206
+ 6. Electromagnetic Induction
207
+ 7. Alternating Current
208
+ 8. Electromagnetic Waves
209
+ 9. Ray Optics and Optical Instruments
210
+ 10. Wave Optics
211
+ 11. Dual Nature of Radiation and Matter
212
+ 12. Atoms
213
+ 13. Nuclei
214
+ 14. Semiconductor Electronics: Materials, Devices and Simple Circuits
215
+ 15. Communication Systems
216
+
217
+ ---
218
+ **4. MATHEMATICS (For JEE Only)**
219
+
220
+ **Class XI**
221
+ 1. Sets
222
+ 2. Relations and Functions
223
+ 3. Trigonometric Functions
224
+ 4. Principle of Mathematical Induction
225
+ 5. Complex Numbers and Quadratic Equations
226
+ 6. Linear Inequalities
227
+ 7. Permutations and Combinations
228
+ 8. Binomial Theorem
229
+ 9. Sequences and Series
230
+ 10. Straight Lines
231
+ 11. Conic Sections
232
+ 12. Introduction to Three Dimensional Geometry
233
+ 13. Limits and Derivatives
234
+ 14. Mathematical Reasoning
235
+ 15. Statistics
236
+ 16. Probability
237
+
238
+ **Class XII**
239
+ 1. Relations and Functions
240
+ 2. Inverse Trigonometric Functions
241
+ 3. Matrices
242
+ 4. Determinants
243
+ 5. Continuity and Differentiability
244
+ 6. Application of Derivatives
245
+ 7. Integrals
246
+ 8. Application of Integrals
247
+ 9. Differential Equations
248
+ 10. Vector Algebra
249
+ 11. Three Dimensional Geometry
250
+ 12. Linear Programming
251
+ 13. Probability
252
+
253
+ ---
254
+
255
+ **Classification Guidelines:**
256
+
257
+ 1. **Primary Classification**: Identify the single most relevant subject, and then the most relevant chapter(s) within that subject, that directly addresses the question's core concept.
258
+ 2. **Multi-Chapter Questions**: If a question explicitly spans 2-3 distinct chapters, include all relevant chapters.
259
+ 3. **Confidence Scoring** (0.0 to 1.0):
260
+ * **1.0**: Perfect match
261
+ * **0.8-0.9**: Strong match
262
+ * **0.5-0.7**: Moderate match
263
+ * **Below 0.5**: Avoid unless unavoidable.
264
+ 4. **Non-Syllabus Questions**: If a question is not from any of the provided subjects/chapters, set `subject` to 'Unclassified' and `chapter_title` to 'Unclassified'.
265
+
266
+ **Critical Requirements:**
267
+
268
+ - Use ONLY the subject titles exactly as listed above, or 'Unclassified'.
269
+ - Use ONLY the chapter titles exactly as listed above, or 'Unclassified'.
270
+ - Preserve the original question text completely.
271
+ - Output ONLY valid JSON.
272
+ - The "index" field MUST match the question number shown in the input (e.g., if the question is numbered "8.", then "index": 8).
273
+
274
+ **Output JSON Schema:**
275
+
276
+ ```json
277
+ {{
278
+ "data": [
279
+ {{
280
+ "index": 1,
281
+ "subject": "<exact subject title from list or 'Unclassified'>",
282
+ "chapter_index": <chapter number or 0>,
283
+ "chapter_title": "<exact chapter title from list or 'Unclassified'>",
284
+ "original_question_text": "<complete original question with all formatting>",
285
+ "confidence": <0.0 to 1.0>
286
+ }}
287
+ ],
288
+ "success": [true]
289
+ }}
290
+ ```
291
+
292
+ Now classify the following question(s):
293
+ ```
294
+ {input_text}
295
+ ```
296
+ """
297
+ return prompt
hf_sync.py ADDED
@@ -0,0 +1,77 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import subprocess
3
+ import sys
4
+ import shutil
5
+
6
+ # Configuration
7
+ REPO_ID = os.environ.get("DATASET_REPO_ID")
8
+ HF_TOKEN = os.environ.get("HF_TOKEN")
9
+
10
+ def run_command(command):
11
+ print(f"Running: {' '.join(command)}")
12
+ # Ensure HF_TOKEN is in environment for the command
13
+ env = os.environ.copy()
14
+ if HF_TOKEN:
15
+ env["HF_TOKEN"] = HF_TOKEN
16
+
17
+ result = subprocess.run(command, capture_output=True, text=True, env=env)
18
+ if result.returncode != 0:
19
+ print(f"Error: {result.stderr}")
20
+ else:
21
+ print(f"Output: {result.stdout}")
22
+ return result.returncode == 0
23
+
24
+ def download():
25
+ if not REPO_ID:
26
+ print("DATASET_REPO_ID not set, skipping download.")
27
+ return
28
+
29
+ print(f"Downloading data from {REPO_ID}...")
30
+ # hf download REPO_ID --repo-type dataset --local-dir data_repo
31
+ # Using --local-dir-use-symlinks False to avoid issues in some environments
32
+ success = run_command(["hf", "download", REPO_ID, "--repo-type", "dataset", "--local-dir", "data_repo", "--local-dir-use-symlinks", "False"])
33
+ if success:
34
+ print("Download successful.")
35
+ else:
36
+ print("Download failed or repository is empty.")
37
+
38
+ def upload():
39
+ if not REPO_ID:
40
+ print("DATASET_REPO_ID not set, skipping upload.")
41
+ return
42
+ if not HF_TOKEN:
43
+ print("HF_TOKEN not set, skipping upload.")
44
+ return
45
+
46
+ print(f"Uploading data to {REPO_ID}...")
47
+ # hf upload REPO_ID data_repo / --repo-type dataset
48
+ # We upload the contents of data_repo to the root of the dataset
49
+ success = run_command(["hf", "upload", REPO_ID, "data_repo", ".", "--repo-type", "dataset"])
50
+ if success:
51
+ print("Upload successful.")
52
+ else:
53
+ print("Upload failed.")
54
+
55
+ def init_local():
56
+ """Ensure data_repo has the necessary structure if download failed or it's new."""
57
+ os.makedirs("data_repo/output", exist_ok=True)
58
+ os.makedirs("data_repo/processed", exist_ok=True)
59
+ os.makedirs("data_repo/uploads", exist_ok=True)
60
+ # database.db will be created by the app if it doesn't exist,
61
+ # but we should ensure it's in data_repo.
62
+ # We'll handle this in entrypoint.sh by symlinking.
63
+
64
+ if __name__ == "__main__":
65
+ if len(sys.argv) < 2:
66
+ print("Usage: python hf_sync.py [download|upload|init]")
67
+ sys.exit(1)
68
+
69
+ action = sys.argv[1]
70
+ if action == "download":
71
+ download()
72
+ elif action == "upload":
73
+ upload()
74
+ elif action == "init":
75
+ init_local()
76
+ else:
77
+ print(f"Unknown action: {action}")
image_routes.py ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from flask import Blueprint, send_from_directory, current_app
2
+
3
+ image_bp = Blueprint('image_bp', __name__)
4
+
5
+ @image_bp.route('/processed/<path:filename>')
6
+ def serve_processed_image(filename):
7
+ current_app.logger.info(f"Serving processed image: {filename}")
8
+ return send_from_directory(current_app.config['PROCESSED_FOLDER'], filename)
9
+
10
+ @image_bp.route('/tmp/<path:filename>')
11
+ def serve_tmp_image(filename):
12
+ current_app.logger.info(f"Serving temporary image: {filename}")
13
+ return send_from_directory(current_app.config['TEMP_FOLDER'], filename)
14
+
15
+ # Proxy routes for /neetprep/processed and /neetprep/tmp
16
+ @image_bp.route('/neetprep/processed/<path:filename>')
17
+ def serve_neetprep_processed_image(filename):
18
+ current_app.logger.info(f"Serving /neetprep/processed image: {filename}")
19
+ return send_from_directory(current_app.config['PROCESSED_FOLDER'], filename)
20
+
21
+ @image_bp.route('/neetprep/tmp/<path:filename>')
22
+ def serve_neetprep_tmp_image(filename):
23
+ current_app.logger.info(f"Serving /neetprep/tmp image: {filename}")
24
+ return send_from_directory(current_app.config['TEMP_FOLDER'], filename)
iperf3_client.sh ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash
2
+ #
3
+ # This script runs iperf3 in client mode to test network speed.
4
+ #
5
+
6
+ # Check if iperf3 is installed
7
+ if ! command -v iperf3 &> /dev/null
8
+ then
9
+ echo "iperf3 could not be found. Please install it."
10
+ echo "For Debian/Ubuntu, use: sudo apt install iperf3"
11
+ echo "For CentOS/RHEL, use: sudo yum install iperf3"
12
+ echo "For macOS (with Homebrew), use: brew install iperf3"
13
+ echo "For Windows, download from iperf.fr"
14
+ exit 1
15
+ fi
16
+
17
+ # Check for server IP argument
18
+ if [ -z "$1" ]
19
+ then
20
+ echo "Usage: $0 <server_ip_address>"
21
+ exit 1
22
+ fi
23
+
24
+ SERVER_IP=$1
25
+ PORT=5201
26
+ PARALLEL_STREAMS=4
27
+
28
+ echo ""
29
+ echo "------------------------------------------------------------------"
30
+ echo " iperf3 Client"
31
+ echo "------------------------------------------------------------------"
32
+ echo " Server IP: ${SERVER_IP}"
33
+ echo " Port: ${PORT}"
34
+ echo "------------------------------------------------------------------"
35
+
36
+ echo ""
37
+ echo "Running standard test (client to server)..."
38
+ iperf3 -c ${SERVER_IP} -p ${PORT}
39
+
40
+ echo ""
41
+ echo "Running reverse test (server to client)..."
42
+ iperf3 -c ${SERVER_IP} -p ${PORT} -R
43
+
44
+ echo ""
45
+ echo "Running test with ${PARALLEL_STREAMS} parallel streams (client to server)..."
46
+ iperf3 -c ${SERVER_IP} -p ${PORT} -P ${PARALLEL_STREAMS}
47
+
48
+ echo ""
49
+ echo "Running reverse test with ${PARALLEL_STREAMS} parallel streams (server to client)..."
50
+ iperf3 -c ${SERVER_IP} -p ${PORT} -P ${PARALLEL_STREAMS} -R
51
+
52
+ echo ""
53
+ echo "------------------------------------------------------------------"
54
+ echo " Test complete."
55
+ echo "------------------------------------------------------------------"
iperf3_server.sh ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash
2
+ #
3
+ # This script starts an iperf3 server and displays the server's IP address.
4
+ #
5
+
6
+ # Check if iperf3 is installed
7
+ if ! command -v iperf3 &> /dev/null
8
+ then
9
+ echo "iperf3 could not be found. Please install it."
10
+ echo "For Debian/Ubuntu, use: sudo apt install iperf3"
11
+ echo "For CentOS/RHEL, use: sudo yum install iperf3"
12
+ exit 1
13
+ fi
14
+
15
+ # Get the server's IP address
16
+ IP_ADDRESS=$(hostname -I | awk '{print $1}')
17
+ PORT=5201
18
+
19
+ echo ""
20
+ echo "------------------------------------------------------------------"
21
+ echo " iperf3 Server"
22
+ echo "------------------------------------------------------------------"
23
+ echo " Server IP: ${IP_ADDRESS}"
24
+ echo " Port: ${PORT}"
25
+ echo ""
26
+ echo "Starting iperf3 server..."
27
+ echo "Press Ctrl+C to stop the server."
28
+ echo "------------------------------------------------------------------"
29
+
30
+ # Start the iperf3 server
31
+ iperf3 -s -p ${PORT}
json_processor.py ADDED
@@ -0,0 +1,483 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from flask import Blueprint, render_template, request, jsonify, current_app, redirect, url_for
2
+ from utils import get_db_connection
3
+ from PIL import Image, ImageDraw
4
+ import os
5
+ from utils import get_or_download_font
6
+ import json
7
+ import imgkit
8
+ from bs4 import BeautifulSoup
9
+ import re
10
+ import uuid
11
+ import requests
12
+ import base64
13
+ import html
14
+ import sys
15
+ from concurrent.futures import ThreadPoolExecutor, as_completed
16
+ from json_processor_v3 import JSONProcessorV3
17
+
18
+ json_bp = Blueprint('json_bp', __name__)
19
+
20
+ # --- SCHEMAS ---
21
+ SCHEMA_V2_1 = {
22
+ "version": "2.1",
23
+ }
24
+
25
+ SCHEMA_V2 = {
26
+ # To be defined by the user
27
+ }
28
+
29
+ SCHEMAS = {
30
+ "2.1": SCHEMA_V2_1,
31
+ "2": SCHEMA_V2,
32
+ }
33
+
34
+
35
+ # --- JSON PROCESSOR CLASS ---
36
+ class JSONProcessor:
37
+ def __init__(self, json_data):
38
+ self.data = json_data
39
+ self.version = self._detect_version()
40
+
41
+ def _detect_version(self):
42
+ if self.data and "version" in self.data:
43
+ return str(self.data["version"])
44
+ if self.data and "data" in self.data and "root" in self.data["data"]:
45
+ return "original"
46
+ if self.data and "root" in self.data:
47
+ return "original"
48
+ return None
49
+
50
+ def process(self, statuses=None):
51
+ if self.version == "2.1":
52
+ return self._process_v2_1()
53
+ elif self.version == "2":
54
+ return self._process_v2()
55
+ elif self.version == "original":
56
+ return self._process_original(statuses=statuses)
57
+ else:
58
+ raise ValueError(f"Unsupported or unknown JSON version: {self.version}")
59
+
60
+ def _process_v2_1(self):
61
+ def safe_int(value):
62
+ try:
63
+ return int(value)
64
+ except (ValueError, TypeError):
65
+ return None
66
+
67
+ processed_questions = []
68
+ statuses_to_include = self.data.get("config", {}).get("statuses_to_include", ["wrong", "unattempted"])
69
+ for q in self.data.get("questions", []):
70
+ status = q.get("status")
71
+ if status in statuses_to_include:
72
+ options = q.get("options", [])
73
+ user_answer = "N/A"
74
+ if q.get('source') == 'classified':
75
+ user_answer = q.get('user_answer_index')
76
+ else:
77
+ user_answer_index = safe_int(q.get("user_answer_index"))
78
+ if user_answer_index is not None and user_answer_index < len(options):
79
+ user_answer = options[user_answer_index]
80
+
81
+ correct_answer = "N/A"
82
+ if q.get('source') == 'classified':
83
+ correct_answer = q.get('correct_answer_index')
84
+ else:
85
+ correct_answer_index = safe_int(q.get("correct_answer_index"))
86
+ if correct_answer_index is not None and correct_answer_index < len(options):
87
+ correct_answer = options[correct_answer_index]
88
+
89
+ processed_questions.append({
90
+ "question": q.get("question_text"),
91
+ "yourAnswer": user_answer,
92
+ "correctAnswer": correct_answer,
93
+ "status": status,
94
+ "custom_fields": q.get("custom_fields", {})
95
+ })
96
+ return {
97
+ "test_name": self.data.get("test_name", "Unnamed Test"),
98
+ "questions": processed_questions,
99
+ "font_size": self.data.get("config", {}).get("font_size", 24),
100
+ "metadata": self.data.get("metadata", {}),
101
+ "config": self.data.get("config", {})
102
+ }
103
+
104
+ def _process_v2(self):
105
+ raise NotImplementedError("Processing for JSON schema v2 is not yet implemented. Please provide the schema.")
106
+
107
+ def _process_original(self, statuses=None):
108
+ data_root = self.data
109
+ if 'data' in self.data and 'root' in self.data['data']:
110
+ data_root = self.data['data']
111
+
112
+ questions_data = data_root.get("root", {}).get("_testAttempt4d9rq8", {}).get("test", {}).get("_questions4dxVsH", {}).get("edges", [])
113
+ user_answers = data_root.get("root", {}).get("_testAttempt4d9rq8", {}).get("userAnswers", {})
114
+
115
+ selected_statuses = statuses if statuses is not None else self.data.get('statuses', ['wrong', 'unattempted'])
116
+
117
+ processed_questions = []
118
+ for edge in questions_data:
119
+ node = edge.get("node", {})
120
+ question_id_encoded = node.get("id", "")
121
+ try:
122
+ question_id = base64.b64decode(question_id_encoded).decode('utf-8').split(':')[1]
123
+ except (IndexError, ValueError, TypeError):
124
+ continue
125
+
126
+ question_text = node.get("question", "")
127
+ question_text = fix_font_family_in_html(question_text)
128
+
129
+ options = node.get("options", [])
130
+ correct_option_index = node.get("correctOptionIndex")
131
+ user_answer_index_str = user_answers.get(question_id)
132
+ user_answer_index = int(user_answer_index_str) if user_answer_index_str is not None else None
133
+
134
+ status = "unattempted"
135
+ if user_answer_index is not None:
136
+ status = "correct" if user_answer_index == correct_option_index else "wrong"
137
+
138
+ if status in selected_statuses:
139
+ user_answer = "N/A"
140
+ if user_answer_index is not None and user_answer_index < len(options):
141
+ user_answer = options[user_answer_index]
142
+
143
+ correct_answer = "N/A"
144
+ if correct_option_index is not None and correct_option_index < len(options):
145
+ correct_answer = options[correct_option_index]
146
+
147
+ processed_questions.append({
148
+ "question": question_text,
149
+ "yourAnswer": user_answer,
150
+ "correctAnswer": correct_answer,
151
+ "status": status
152
+ })
153
+
154
+ test_name = self.data.get('test_name')
155
+ if not test_name:
156
+ try:
157
+ test_name = data_root['root']['_testAttempt4d9rq8']['test']['name']
158
+ except KeyError:
159
+ test_name = 'Uploaded Test'
160
+
161
+ return {
162
+ "test_name": test_name,
163
+ "questions": processed_questions,
164
+ "font_size": self.data.get('font_size', 24)
165
+ }
166
+
167
+ def html_to_image_worker(item, session_id, font_size, processed_folder, original_filename, index):
168
+ """Worker function to convert a single HTML question to an image."""
169
+ question_html = item.get('question')
170
+ if not question_html:
171
+ question_html = "<p>Question text not provided.</p>"
172
+
173
+ soup = BeautifulSoup(question_html, 'html.parser')
174
+ for img in soup.find_all('img'):
175
+ img_src = img.get('src')
176
+ if img_src:
177
+ if img_src.startswith('http'):
178
+ try:
179
+ response = requests.get(img_src)
180
+ if response.status_code == 200:
181
+ img_b64 = base64.b64encode(response.content).decode('utf-8')
182
+ img['src'] = f"data:image/png;base64,{img_b64}"
183
+ except Exception as e:
184
+ current_app.logger.error(f"Could not embed image {img_src}: {e}")
185
+ elif os.path.exists(img_src):
186
+ with open(img_src, 'rb') as f:
187
+ img_b64 = base64.b64encode(f.read()).decode('utf-8')
188
+ img['src'] = f"data:image/jpeg;base64,{img_b64}"
189
+
190
+ question_html = str(soup)
191
+
192
+ style = f"<style>body {{ font-size: {font_size}px; }}</style>"
193
+ question_html = style + question_html
194
+
195
+ processed_filename = f"processed_{session_id}_page0_crop{index}.jpg"
196
+ image_path = os.path.join(processed_folder, processed_filename)
197
+
198
+ try:
199
+ imgkit.from_string(question_html, image_path)
200
+ except Exception:
201
+ image_font = get_or_download_font(font_size=font_size)
202
+ soup = BeautifulSoup(question_html, 'html.parser')
203
+ question_text = soup.get_text()
204
+ image = Image.new('RGB', (800, 600), 'white')
205
+ draw = ImageDraw.Draw(image)
206
+ final_y = draw_multiline_text(draw, question_text, (20, 20), image_font, 760, 'black')
207
+ image = image.crop((0, 0, 800, final_y + 20))
208
+ image.save(image_path, 'JPEG')
209
+
210
+ return {
211
+ 'processed_filename': processed_filename,
212
+ 'original_filename': original_filename,
213
+ 'item': item,
214
+ 'index': index
215
+ }
216
+
217
+ from flask_login import login_required, current_user
218
+
219
+ def _process_json_and_generate_pdf(raw_data, user_id):
220
+ """
221
+ Helper function to process JSON data, generate images, and create a PDF.
222
+ This is called by both the /json_upload route and directly from other modules.
223
+ """
224
+ from utils import get_or_download_font, create_a4_pdf_from_images
225
+
226
+ conn = get_db_connection()
227
+ try:
228
+ if not raw_data:
229
+ return {'error': 'No JSON payload received.'}, 400
230
+
231
+ processor = JSONProcessor(raw_data)
232
+ processed_data = processor.process()
233
+
234
+ test_name = processed_data.get("test_name")
235
+ processed_questions = processed_data.get("questions")
236
+ font_size = processed_data.get("font_size")
237
+ metadata = processed_data.get("metadata", {})
238
+ tags = metadata.get("tags", "programmatic")
239
+ layout = processed_data.get("config", {}).get("layout", {})
240
+
241
+ images_per_page = int(layout.get('images_per_page', 4))
242
+ orientation = layout.get('orientation', 'portrait')
243
+ grid_rows = int(layout.get('grid_rows')) if layout.get('grid_rows') else None
244
+ grid_cols = int(layout.get('grid_cols')) if layout.get('grid_cols') else None
245
+ practice_mode = layout.get('practice_mode', 'none')
246
+
247
+ session_id = str(uuid.uuid4())
248
+ conn.execute('INSERT INTO sessions (id, original_filename, user_id) VALUES (?, ?, ?)', (session_id, f"{test_name}.json", user_id))
249
+
250
+ original_filename = f"{session_id}_dummy_original.png"
251
+ conn.execute(
252
+ 'INSERT INTO images (session_id, image_index, filename, original_name, image_type) VALUES (?, ?, ?, ?, ?)',
253
+ (session_id, 0, original_filename, 'JSON Upload', 'original')
254
+ )
255
+
256
+ with ThreadPoolExecutor(max_workers=10) as executor:
257
+ list(executor.map(
258
+ lambda p: html_to_image_worker(*p),
259
+ [(item, session_id, font_size, current_app.config['PROCESSED_FOLDER'], original_filename, i) for i, item in enumerate(processed_questions)]
260
+ ))
261
+
262
+ for i, item in enumerate(processed_questions):
263
+ processed_filename = f"processed_{session_id}_page0_crop{i}.jpg"
264
+ image_insert_result = conn.execute(
265
+ 'INSERT INTO images (session_id, image_index, filename, original_name, processed_filename, image_type) VALUES (?, ?, ?, ?, ?, ?)',
266
+ (session_id, i + 1, original_filename, f"Question {i+1}", processed_filename, 'cropped')
267
+ )
268
+ image_id = image_insert_result.lastrowid
269
+ conn.execute(
270
+ 'INSERT INTO questions (session_id, image_id, question_number, status, marked_solution, actual_solution) VALUES (?, ?, ?, ?, ?, ?)',
271
+ (session_id, image_id, str(i + 1), item.get('status'), item.get('yourAnswer'), item.get('correctAnswer'))
272
+ )
273
+
274
+ conn.commit()
275
+
276
+ if raw_data.get('view') is True:
277
+ query = "SELECT q.*, i.processed_filename FROM questions q JOIN images i ON q.image_id = i.id WHERE q.session_id = ? ORDER BY i.id"
278
+ all_questions = [dict(row) for row in conn.execute(query, (session_id,)).fetchall()]
279
+ if not all_questions:
280
+ return {'error': 'No questions were processed to generate a PDF.'}, 400
281
+
282
+ from datetime import datetime
283
+ from werkzeug.utils import secure_filename
284
+ pdf_filename = f"{secure_filename(test_name)}_{session_id[:8]}.pdf"
285
+
286
+ create_a4_pdf_from_images(
287
+ image_info=all_questions, base_folder=current_app.config['PROCESSED_FOLDER'], output_filename=pdf_filename,
288
+ images_per_page=images_per_page, output_folder=current_app.config['OUTPUT_FOLDER'],
289
+ orientation=orientation, grid_rows=grid_rows, grid_cols=grid_cols, practice_mode=practice_mode
290
+ )
291
+ conn.execute(
292
+ 'INSERT INTO generated_pdfs (session_id, filename, subject, tags, notes, source_filename, user_id) VALUES (?, ?, ?, ?, ?, ?, ?)',
293
+ (session_id, pdf_filename, test_name, tags, 'Generated automatically via JSON upload.', f"{test_name}.json", user_id)
294
+ )
295
+ conn.commit()
296
+ return {'success': True, 'view_url': url_for('main.view_pdf', filename=pdf_filename, _external=True)}, 200
297
+ else:
298
+ return {'success': True, 'edit_url': url_for('main.question_entry_v2', session_id=session_id, test_name=test_name, _external=True)}, 200
299
+
300
+ except Exception as e:
301
+ if conn:
302
+ conn.rollback()
303
+ current_app.logger.error(f"Error in _process_json_and_generate_pdf: {repr(e)}")
304
+ return {'error': str(e)}, 500
305
+ finally:
306
+ if conn:
307
+ conn.close()
308
+
309
+ @json_bp.route('/json_upload', methods=['GET', 'POST'])
310
+ @login_required
311
+ def json_upload():
312
+ if request.method == 'POST':
313
+ result, status_code = _process_json_and_generate_pdf(request.json, current_user.id)
314
+ return jsonify(result), status_code
315
+ return render_template('json_upload.html')
316
+
317
+
318
+ def draw_multiline_text(draw, text, position, font, max_width, fill):
319
+ x, y = position
320
+ lines = text.split('\n')
321
+ wrapped_lines = []
322
+ for line in lines:
323
+ if font.getlength(line) <= max_width:
324
+ wrapped_lines.append(line)
325
+ else:
326
+ current_line = ''
327
+ for word in line.split(' '):
328
+ if font.getlength(current_line + word + ' ') <= max_width:
329
+ current_line += word + ' '
330
+ else:
331
+ wrapped_lines.append(current_line)
332
+ current_line = word + ' '
333
+ wrapped_lines.append(current_line)
334
+
335
+ line_height = font.getbbox('A')[3] - font.getbbox('A')[1] if hasattr(font, 'getbbox') else font.getsize('A')[1]
336
+ for line in wrapped_lines:
337
+ draw.text((x, y), line, fill=fill, font=font)
338
+ y += line_height + 5
339
+ return y
340
+
341
+ def fix_font_family_in_html(html_string):
342
+ if not html_string:
343
+ return html_string
344
+
345
+ html_string = html.unescape(html_string)
346
+ pattern = r'font-family:\s*"([^"]+(?:,\s*"[^"]+"\s*)*)"'
347
+
348
+ def replace_font_family(match):
349
+ font_value = match.group(1)
350
+ font_value = font_value.replace('"', "'")
351
+ return f"font-family:'{font_value}'"
352
+
353
+ html_string = re.sub(pattern, replace_font_family, html_string)
354
+ html_string = re.sub(r'"', "'", html_string)
355
+
356
+ return html_string
357
+
358
+
359
+ @json_bp.route('/process_json', methods=['POST'])
360
+ def process_json():
361
+ request_data = request.json
362
+ data_to_process = request_data.get('data', request_data)
363
+ selected_statuses = request_data.get('statuses', ['wrong', 'unattempted'])
364
+
365
+ try:
366
+ processor = JSONProcessor(data_to_process)
367
+ processed_data = processor.process(statuses=selected_statuses)
368
+ return jsonify({'success': True, 'questions': processed_data.get('questions')})
369
+ except Exception as e:
370
+ current_app.logger.error(f"Error in process_json: {repr(e)}")
371
+ return jsonify({'success': False, 'error': str(e)})
372
+
373
+
374
+ @json_bp.route('/save_processed_json', methods=['POST'])
375
+ @login_required
376
+ def save_processed_json():
377
+ from app import get_db_connection
378
+ questions_data = request.form.get('questions_data')
379
+ test_name = request.form.get('test_name')
380
+ font_size = int(request.form.get('font_size', 24))
381
+
382
+ try:
383
+ questions = json.loads(questions_data)
384
+ except json.JSONDecodeError as e:
385
+ try:
386
+ fixed_data = questions_data.replace('"', "'")
387
+ fixed_data = re.sub(r'font-family:"([^"]+)"', lambda m: f"font-family:'{m.group(1).replace('"', "'")}'", fixed_data)
388
+ questions = json.loads(fixed_data)
389
+ except Exception as inner_e:
390
+ current_app.logger.error(f"Initial JSONDecodeError: {e}")
391
+ current_app.logger.error(f"Could not fix JSON data. Error: {inner_e}")
392
+ current_app.logger.error(f"Problematic JSON data (raw): {repr(questions_data)}")
393
+ return jsonify({'error': 'Invalid JSON data received.'}), 400
394
+
395
+ session_id = str(uuid.uuid4())
396
+ conn = get_db_connection()
397
+
398
+ try:
399
+ conn.execute('INSERT INTO sessions (id, original_filename, user_id) VALUES (?, ?, ?)', (session_id, 'JSON Upload', current_user.id))
400
+
401
+ original_filename = f"{session_id}_dummy_original.png"
402
+ conn.execute(
403
+ 'INSERT INTO images (session_id, image_index, filename, original_name, image_type) VALUES (?, ?, ?, ?, ?)',
404
+ (session_id, 0, original_filename, 'JSON Upload', 'original')
405
+ )
406
+
407
+ for i, item in enumerate(questions):
408
+ question_html = item.get('question')
409
+ your_answer = item.get('yourAnswer')
410
+ correct_answer = item.get('correctAnswer')
411
+
412
+ if not question_html:
413
+ question_html = "<p>Question text was not provided.</p>"
414
+
415
+ soup = BeautifulSoup(question_html, 'html.parser')
416
+ for img in soup.find_all('img'):
417
+ img_src = img.get('src')
418
+ if img_src and img_src.startswith('http'):
419
+ try:
420
+ response = requests.get(img_src)
421
+ if response.status_code == 200:
422
+ img_b64 = base64.b64encode(response.content).decode('utf-8')
423
+ img['src'] = f"data:image/png;base64,{img_b64}"
424
+ except Exception as e:
425
+ current_app.logger.error(f"Could not embed image {img_src}: {e}")
426
+
427
+ question_html = str(soup)
428
+
429
+ style = f"<style>body {{ font-size: {font_size}px; }}</style>"
430
+ question_html = style + question_html
431
+
432
+ processed_filename = f"processed_{session_id}_page0_crop{i}.jpg"
433
+ image_path = os.path.join(current_app.config['PROCESSED_FOLDER'], processed_filename)
434
+
435
+ try:
436
+ imgkit.from_string(question_html, image_path)
437
+ except Exception as e:
438
+ image_font = get_or_download_font(font_size=font_size)
439
+ soup = BeautifulSoup(question_html, 'html.parser')
440
+ question_text = soup.get_text()
441
+ image = Image.new('RGB', (800, 600), 'white')
442
+ draw = ImageDraw.Draw(image)
443
+ final_y = draw_multiline_text(draw, question_text, (20, 20), image_font, 760, 'black')
444
+ image = image.crop((0, 0, 800, final_y + 20))
445
+ image.save(image_path, 'JPEG')
446
+
447
+ image_insert_result = conn.execute(
448
+ 'INSERT INTO images (session_id, image_index, filename, original_name, processed_filename, image_type) VALUES (?, ?, ?, ?, ?, ?)',
449
+ (session_id, i + 1, original_filename, f"Question {i+1}", processed_filename, 'cropped')
450
+ )
451
+ image_id = image_insert_result.lastrowid
452
+
453
+ status = item.get('status')
454
+ conn.execute(
455
+ 'INSERT INTO questions (session_id, image_id, question_number, status, marked_solution, actual_solution) VALUES (?, ?, ?, ?, ?, ?)',
456
+ (session_id, image_id, str(i + 1), status, your_answer, correct_answer)
457
+ )
458
+
459
+ conn.commit()
460
+ return redirect(url_for('main.question_entry_v2', session_id=session_id, test_name=test_name))
461
+ except Exception as e:
462
+ conn.rollback()
463
+ current_app.logger.error(f"Error in save_processed_json: {repr(e)}")
464
+ return jsonify({'error': str(e)}), 500
465
+ finally:
466
+ conn.close()
467
+
468
+ @json_bp.route('/json_upload_v3', methods=['POST'])
469
+ def json_upload_v3():
470
+ if not request.json:
471
+ return jsonify({'error': 'No JSON payload received.'}), 400
472
+
473
+ processor_v3 = JSONProcessorV3(request.json)
474
+ try:
475
+ # Pass a user_id, for now a default. In a real app, this might come from an API key.
476
+ result = processor_v3.process(user_id=45)
477
+ return jsonify(result), 200
478
+ except ValueError as e:
479
+ current_app.logger.error(f"JSON v3.0 processing error: {e}")
480
+ return jsonify({'error': str(e)}), 400
481
+ except Exception as e:
482
+ current_app.logger.error(f"Unhandled error during JSON v3.0 processing: {e}")
483
+ return jsonify({'error': 'An internal server error occurred.'}), 500
json_processor_v3.py ADDED
@@ -0,0 +1,248 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import json
2
+ import os
3
+ import requests
4
+ from concurrent.futures import ThreadPoolExecutor, as_completed
5
+ from jsonschema import validate, ValidationError
6
+ import uuid
7
+ from flask import current_app, url_for
8
+ from werkzeug.utils import secure_filename
9
+ import sqlite3 # Import sqlite3
10
+ import sys
11
+
12
+ # Ensure current directory is in Python path for local imports
13
+ sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '..')))
14
+
15
+ from database import get_db_connection
16
+ from utils import create_a4_pdf_from_images
17
+
18
+
19
+ # JSON v3.0 Schema for validation
20
+ JSON_V3_SCHEMA = {
21
+ "type": "object",
22
+ "properties": {
23
+ "version": {"type": "string", "const": "3.0"},
24
+ "source": {"type": "string"},
25
+ "test_name": {"type": "string"},
26
+ "test_id": {"type": "string"},
27
+ "test_mapping_id": {"type": "string"},
28
+ "metadata": {"type": "object"},
29
+ "config": {
30
+ "type": "object",
31
+ "properties": {
32
+ "statuses_to_include": {"type": "array", "items": {"type": "string"}},
33
+ "layout": {
34
+ "type": "object",
35
+ "properties": {
36
+ "images_per_page": {"type": "integer"},
37
+ "orientation": {"type": "string"}
38
+ },
39
+ "required": ["images_per_page", "orientation"]
40
+ }
41
+ },
42
+ "required": ["statuses_to_include", "layout"]
43
+ },
44
+ "questions": {
45
+ "type": "array",
46
+ "items": {
47
+ "type": "object",
48
+ "properties": {
49
+ "question_number": {"type": "string"},
50
+ "image_url": {"type": "string", "format": "uri"},
51
+ "status": {"type": "string"},
52
+ "marked_solution": {"type": "string"},
53
+ "correct_solution": {"type": "string"},
54
+ "subject": {"type": "string"},
55
+ "chapter": {"type": "string"},
56
+ "topic": {"type": "string"},
57
+ "time_taken": {"type": "integer"}
58
+ },
59
+ "required": ["question_number", "image_url", "status", "marked_solution", "correct_solution", "subject", "time_taken"]
60
+ }
61
+ },
62
+ "view": {"type": "boolean"}
63
+ },
64
+ "required": ["version", "source", "test_name", "test_id", "test_mapping_id", "config", "questions", "view"]
65
+ }
66
+
67
+ class JSONProcessorV3:
68
+ def __init__(self, data=None):
69
+ self.data = data
70
+
71
+ def validate(self):
72
+ """Validates the JSON data against the v3.0 schema."""
73
+ try:
74
+ validate(instance=self.data, schema=JSON_V3_SCHEMA)
75
+ return True
76
+ except ValidationError as e:
77
+ raise ValueError(f"Schema validation failed: {e.message}")
78
+
79
+ def download_image_from_url(self, url, save_path, timeout=30):
80
+ """Downloads an image from a URL and saves it to a path."""
81
+ try:
82
+ response = requests.get(url, timeout=timeout)
83
+ response.raise_for_status()
84
+ with open(save_path, 'wb') as f:
85
+ f.write(response.content)
86
+ return save_path
87
+ except requests.exceptions.RequestException as e:
88
+ print(f"Error downloading image from {url}: {e}") # Keep print for tests
89
+ if current_app:
90
+ current_app.logger.error(f"Error downloading image from {url}: {e}")
91
+ return None
92
+
93
+ def download_images_parallel(self, questions, output_dir, session_id, max_workers=10):
94
+ """Downloads all images in parallel and returns a map of question number to local path."""
95
+ image_paths = {}
96
+ with ThreadPoolExecutor(max_workers=max_workers) as executor:
97
+ future_to_question = {
98
+ executor.submit(
99
+ self.download_image_from_url,
100
+ q['image_url'],
101
+ os.path.join(output_dir, f"{session_id}_q_{q['question_number']}.png")
102
+ ): q for q in questions if q.get('image_url')
103
+ }
104
+
105
+ for future in as_completed(future_to_question):
106
+ question = future_to_question[future]
107
+ url = question['image_url']
108
+ try:
109
+ path = future.result()
110
+ if path:
111
+ image_paths[question['question_number']] = path
112
+ current_app.logger.info(f"Successfully downloaded image from {url}")
113
+ else:
114
+ current_app.logger.error(f"Failed to download image from {url}")
115
+ except Exception as e:
116
+ current_app.logger.error(f"Error processing image for question {question.get('question_number')} from {url}: {e}")
117
+ return image_paths
118
+
119
+ def process(self, user_id=1): # Default user_id for now, replace with actual user
120
+ """Main processing logic for the v3.0 payload, including DB insertion and PDF generation."""
121
+ if not self.data:
122
+ raise ValueError("No data provided to process.")
123
+
124
+ current_app.logger.info("Starting processing of JSON v3.0 payload.")
125
+ current_app.logger.info(f"Test Name: {self.data.get('test_name')}")
126
+ current_app.logger.info(f"Test ID: {self.data.get('test_id')}")
127
+ current_app.logger.info(f"Metadata: {self.data.get('metadata')}")
128
+
129
+ if not self.validate():
130
+ raise ValueError("Schema validation failed.")
131
+
132
+ conn = get_db_connection()
133
+ try:
134
+ test_name = self.data['test_name']
135
+ test_id = self.data['test_id']
136
+ test_mapping_id = self.data['test_mapping_id']
137
+ questions_payload = self.data['questions']
138
+ view_mode = self.data.get('view', False)
139
+ metadata = json.dumps(self.data.get('metadata', {})) # Store metadata as JSON string
140
+
141
+ config = self.data.get('config', {})
142
+ layout = config.get('layout', {})
143
+ images_per_page = layout.get('images_per_page', 4)
144
+ orientation = layout.get('orientation', 'portrait')
145
+
146
+ session_id = str(uuid.uuid4())
147
+ original_filename = f"{test_name}.json" # Name of the JSON file that was uploaded
148
+
149
+ conn.execute(
150
+ 'INSERT INTO sessions (id, original_filename, user_id, test_id, test_mapping_id, source, metadata) VALUES (?, ?, ?, ?, ?, ?, ?)',
151
+ (session_id, original_filename, user_id, test_id, test_mapping_id, self.data.get('source', 'manual'), metadata)
152
+ )
153
+
154
+ processed_folder = current_app.config.get('PROCESSED_FOLDER', 'processed')
155
+ os.makedirs(processed_folder, exist_ok=True)
156
+
157
+ current_app.logger.info(f"Downloading images for test {test_id} to {processed_folder}")
158
+ image_path_map = self.download_images_parallel(questions_payload, processed_folder, session_id)
159
+
160
+ image_records = []
161
+ question_records = []
162
+
163
+ for i, q_data in enumerate(questions_payload):
164
+ question_number = q_data['question_number']
165
+
166
+ # Check if image was downloaded
167
+ processed_filename = None
168
+ local_image_path = image_path_map.get(question_number)
169
+ if local_image_path:
170
+ processed_filename = os.path.basename(local_image_path)
171
+
172
+ # Insert into images table
173
+ image_insert_result = conn.execute(
174
+ 'INSERT INTO images (session_id, image_index, filename, original_name, processed_filename, image_type) VALUES (?, ?, ?, ?, ?, ?)',
175
+ (session_id, i + 1, q_data.get('image_url', ''), f"Question {question_number}", processed_filename, 'cropped' if processed_filename else 'original_url_only')
176
+ )
177
+ image_id = image_insert_result.lastrowid
178
+
179
+ # Insert into questions table
180
+ question_records.append((
181
+ session_id, image_id, question_number, q_data['status'],
182
+ q_data['marked_solution'], q_data['correct_solution'],
183
+ q_data.get('subject'), q_data.get('chapter'), q_data.get('topic'), q_data.get('time_taken')
184
+ ))
185
+
186
+ conn.executemany(
187
+ 'INSERT INTO questions (session_id, image_id, question_number, status, marked_solution, actual_solution, subject, chapter, topic, time_taken) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)',
188
+ question_records
189
+ )
190
+
191
+ conn.commit()
192
+
193
+ response_data = {
194
+ "status": "success",
195
+ "message": "JSON v3.0 processed successfully."
196
+ }
197
+
198
+ if view_mode:
199
+ query = "SELECT q.*, i.processed_filename FROM questions q JOIN images i ON q.image_id = i.id WHERE q.session_id = ? ORDER BY i.id"
200
+ all_questions = [dict(row) for row in conn.execute(query, (session_id,)).fetchall()]
201
+
202
+ if not all_questions:
203
+ conn.rollback()
204
+ raise ValueError('No questions found for PDF generation.')
205
+
206
+ pdf_output_folder = current_app.config.get('OUTPUT_FOLDER', 'output')
207
+ os.makedirs(pdf_output_folder, exist_ok=True)
208
+
209
+ pdf_filename = f"{secure_filename(test_name)}_{session_id[:8]}.pdf"
210
+
211
+ create_a4_pdf_from_images(
212
+ image_info=all_questions, base_folder=processed_folder, output_filename=pdf_filename,
213
+ images_per_page=images_per_page, output_folder=pdf_output_folder,
214
+ orientation=orientation
215
+ )
216
+
217
+ conn.execute(
218
+ 'INSERT INTO generated_pdfs (session_id, filename, subject, tags, notes, source_filename, user_id) VALUES (?, ?, ?, ?, ?, ?, ?)',
219
+ (session_id, pdf_filename, test_name, test_mapping_id, 'Generated automatically via JSON v3.0 upload.', original_filename, user_id)
220
+ )
221
+ conn.commit()
222
+ response_data['view_url'] = url_for('main.view_pdf', filename=pdf_filename, _external=True)
223
+ response_data['message'] = "PDF auto-generated and saved."
224
+ else:
225
+ response_data['edit_url'] = url_for('main.question_entry_v2', session_id=session_id, test_name=test_name, _external=True)
226
+ response_data['message'] = "Session created for manual review."
227
+
228
+ return response_data
229
+
230
+ except ValueError as e:
231
+ if conn:
232
+ conn.rollback()
233
+ current_app.logger.error(f"JSON v3.0 processing error: {e}")
234
+ raise # Re-raise to be caught by the endpoint
235
+ except sqlite3.Error as e:
236
+ if conn:
237
+ conn.rollback()
238
+ current_app.logger.error(f"Database error during JSON v3.0 processing: {e}")
239
+ raise ValueError(f"Database error: {e}")
240
+ except Exception as e:
241
+ if conn:
242
+ conn.rollback()
243
+ current_app.logger.error(f"Unhandled error during JSON v3.0 processing: {e}")
244
+ raise ValueError(f"An unexpected error occurred: {e}")
245
+ finally:
246
+ if conn:
247
+ conn.close()
248
+
migrations/add_v3_fields.sql ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ -- Add new columns to the 'questions' table
2
+ ALTER TABLE questions ADD COLUMN topic TEXT;
3
+ ALTER TABLE questions ADD COLUMN time_taken INTEGER;
4
+ ALTER TABLE questions ADD COLUMN difficulty TEXT;
5
+ ALTER TABLE questions ADD COLUMN source TEXT DEFAULT 'manual';
6
+ ALTER TABLE questions ADD COLUMN test_id TEXT;
7
+ ALTER TABLE questions ADD COLUMN test_mapping_id TEXT;
8
+
9
+ -- Add new columns to the 'sessions' table
10
+ ALTER TABLE sessions ADD COLUMN test_id TEXT;
11
+ ALTER TABLE sessions ADD COLUMN test_mapping_id TEXT;
12
+ ALTER TABLE sessions ADD COLUMN source TEXT DEFAULT 'manual';
13
+ ALTER TABLE sessions ADD COLUMN metadata TEXT;
14
+
15
+ -- Create indexes for performance
16
+ CREATE INDEX IF NOT EXISTS idx_questions_test_mapping_id ON questions (test_mapping_id);
17
+ CREATE INDEX IF NOT EXISTS idx_sessions_test_mapping_id ON sessions (test_mapping_id);
migrations/migrate.py ADDED
@@ -0,0 +1,117 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import sqlite3
2
+ import os
3
+ import shutil
4
+ from datetime import datetime
5
+
6
+ DATABASE_PATH = 'database.db'
7
+ MIGRATIONS_DIR = 'migrations'
8
+ BACKUP_DIR = 'backups'
9
+
10
+ def backup_database():
11
+ """Creates a timestamped backup of the database."""
12
+ if not os.path.exists(BACKUP_DIR):
13
+ os.makedirs(BACKUP_DIR)
14
+
15
+ timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
16
+ backup_filename = f"database_backup_{timestamp}.db"
17
+ backup_path = os.path.join(BACKUP_DIR, backup_filename)
18
+
19
+ try:
20
+ shutil.copy2(DATABASE_PATH, backup_path)
21
+ print(f"Successfully created database backup at: {backup_path}")
22
+ return backup_path
23
+ except FileNotFoundError:
24
+ print(f"Warning: Database file not found at {DATABASE_PATH}. Cannot create backup.")
25
+ return None
26
+
27
+ def apply_migration(migration_file):
28
+ """Applies a single SQL migration file to the database."""
29
+ try:
30
+ conn = sqlite3.connect(DATABASE_PATH)
31
+ cursor = conn.cursor()
32
+
33
+ with open(migration_file, 'r') as f:
34
+ sql_script = f.read()
35
+
36
+ # Split script into individual statements
37
+ statements = [s.strip() for s in sql_script.split(';') if s.strip()]
38
+
39
+ print(f"Applying migration: {migration_file}...")
40
+ for statement in statements:
41
+ try:
42
+ cursor.execute(statement)
43
+ print(f" Executed: {statement[:80]}...")
44
+ except sqlite3.OperationalError as e:
45
+ # This is a common error if the column already exists. We can treat it as a warning.
46
+ if "duplicate column name" in str(e):
47
+ print(f" Warning: {e}. Skipping statement.")
48
+ else:
49
+ raise # Re-raise other operational errors
50
+
51
+ conn.commit()
52
+ conn.close()
53
+ print(f"Successfully applied migration: {migration_file}")
54
+ except sqlite3.Error as e:
55
+ print(f"Error applying migration {migration_file}: {e}")
56
+ return False
57
+ return True
58
+
59
+ def verify_migration():
60
+ """Verifies that the new columns exist in the tables."""
61
+ print("\nVerifying migration...")
62
+ try:
63
+ conn = sqlite3.connect(DATABASE_PATH)
64
+ cursor = conn.cursor()
65
+
66
+ # Verify 'questions' table
67
+ cursor.execute("PRAGMA table_info(questions);")
68
+ questions_columns = [row[1] for row in cursor.fetchall()]
69
+ expected_q_cols = ['topic', 'time_taken', 'difficulty', 'source', 'test_id', 'test_mapping_id']
70
+ missing_q_cols = [col for col in expected_q_cols if col not in questions_columns]
71
+ if not missing_q_cols:
72
+ print("✅ 'questions' table verification successful.")
73
+ else:
74
+ print(f"❌ 'questions' table verification failed. Missing columns: {missing_q_cols}")
75
+
76
+ # Verify 'sessions' table
77
+ cursor.execute("PRAGMA table_info(sessions);")
78
+ sessions_columns = [row[1] for row in cursor.fetchall()]
79
+ expected_s_cols = ['test_id', 'test_mapping_id', 'source', 'metadata']
80
+ missing_s_cols = [col for col in expected_s_cols if col not in sessions_columns]
81
+ if not missing_s_cols:
82
+ print("✅ 'sessions' table verification successful.")
83
+ else:
84
+ print(f"❌ 'sessions' table verification failed. Missing columns: {missing_s_cols}")
85
+
86
+ conn.close()
87
+
88
+ return not missing_q_cols and not missing_s_cols
89
+
90
+ except sqlite3.Error as e:
91
+ print(f"Error during verification: {e}")
92
+ return False
93
+
94
+ def main():
95
+ """Main function to run the migration process."""
96
+ print("--- Starting Database Migration ---")
97
+
98
+ backup_path = backup_database()
99
+ if not backup_path and os.path.exists(DATABASE_PATH):
100
+ print("Aborting migration due to backup failure.")
101
+ return
102
+
103
+ migration_file = os.path.join(MIGRATIONS_DIR, 'add_v3_fields.sql')
104
+ if not os.path.exists(migration_file):
105
+ print(f"Error: Migration file not found at {migration_file}")
106
+ return
107
+
108
+ if apply_migration(migration_file):
109
+ verify_migration()
110
+ else:
111
+ print("\nMigration failed. Please check the errors above.")
112
+ print("You may need to restore from the backup if the database is in an inconsistent state.")
113
+
114
+ print("--- Migration Process Finished ---")
115
+
116
+ if __name__ == "__main__":
117
+ main()
migrations/migrate_subjective.py ADDED
@@ -0,0 +1,64 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import sqlite3
2
+ import json
3
+ import time
4
+
5
+ DATABASE = 'database.db'
6
+
7
+ def get_db_connection():
8
+ conn = sqlite3.connect(DATABASE)
9
+ conn.row_factory = sqlite3.Row
10
+ return conn
11
+
12
+ def migrate_subjective_questions():
13
+ print("Starting migration of subjective questions...")
14
+ conn = get_db_connection()
15
+
16
+ try:
17
+ # Fetch all subjective questions
18
+ questions = conn.execute('SELECT * FROM subjective_questions').fetchall()
19
+
20
+ updated_count = 0
21
+
22
+ for q in questions:
23
+ q_id = q['id']
24
+ q_html = q['question_html']
25
+ q_json = q['question_json']
26
+
27
+ # Check if json is empty or None
28
+ if not q_json or q_json.strip() == '':
29
+ print(f"Migrating Question ID: {q_id}")
30
+
31
+ # Create EditorJS block structure
32
+ editor_js_data = {
33
+ "time": int(time.time() * 1000),
34
+ "blocks": [
35
+ {
36
+ "type": "paragraph",
37
+ "data": {
38
+ "text": q_html
39
+ }
40
+ }
41
+ ],
42
+ "version": "2.22.2" # Using a standard version
43
+ }
44
+
45
+ json_string = json.dumps(editor_js_data)
46
+
47
+ # Update the record
48
+ conn.execute(
49
+ 'UPDATE subjective_questions SET question_json = ? WHERE id = ?',
50
+ (json_string, q_id)
51
+ )
52
+ updated_count += 1
53
+
54
+ conn.commit()
55
+ print(f"Migration completed. Updated {updated_count} questions.")
56
+
57
+ except Exception as e:
58
+ conn.rollback()
59
+ print(f"Error during migration: {e}")
60
+ finally:
61
+ conn.close()
62
+
63
+ if __name__ == "__main__":
64
+ migrate_subjective_questions()
neetprep.py ADDED
@@ -0,0 +1,467 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from flask import Blueprint, render_template, request, jsonify, current_app, url_for
2
+ from flask_login import login_required, current_user
3
+ from utils import get_db_connection
4
+ import requests
5
+ import time
6
+ import os
7
+ import json
8
+ import sys
9
+ from concurrent.futures import ThreadPoolExecutor, as_completed
10
+ from bs4 import BeautifulSoup
11
+ import math
12
+ import imgkit
13
+
14
+ from gemini_classifier import classify_questions_with_gemini
15
+ from nova_classifier import classify_questions_with_nova
16
+ from json_processor import _process_json_and_generate_pdf
17
+ from json_processor import _process_json_and_generate_pdf
18
+
19
+ neetprep_bp = Blueprint('neetprep_bp', __name__)
20
+
21
+ # ... (Constants and GraphQL queries remain the same) ...
22
+ ENDPOINT_URL = "https://www.neetprep.com/graphql"
23
+ USER_ID = "VXNlcjozNTY5Mzcw="
24
+
25
+ HEADERS = {
26
+ 'accept': '*/*',
27
+ 'content-type': 'application/json',
28
+ 'origin': 'https://www.neetprep.com',
29
+ 'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/141.0.0.0 Safari/537.36',
30
+ }
31
+
32
+ # --- Queries ---
33
+ query_template_step1 = 'query GetAttempts {{ testAttempts( limit: {limit}, offset: {offset}, where: {{ userId: "{userId}" }} ) {{ id completed }} }}'
34
+ query_template_step2 = 'query GetIncorrectIds {{ incorrectQuestions( testAttemptId: "{attemptId}", first: 200 ) {{ id }} }}'
35
+ query_template_step3 = '''
36
+ query GetQuestionDetails {{
37
+ question(id: "{questionId}") {{
38
+ id
39
+ question
40
+ options
41
+ correctOptionIndex
42
+ level
43
+ topics(first: 1) {{
44
+ edges {{
45
+ node {{
46
+ name
47
+ subjects(first: 1) {{
48
+ edges {{
49
+ node {{ name }}
50
+ }}
51
+ }}
52
+ }}
53
+ }}
54
+ }}
55
+ }}
56
+ }}
57
+ '''
58
+
59
+ def fetch_question_details(q_id):
60
+ """Worker function to fetch details for a single question."""
61
+ result = run_hardcoded_query(query_template_step3, questionId=q_id)
62
+ if result and 'data' in result and 'question' in result['data'] and result['data']['question']:
63
+ return result['data']['question']
64
+ return None
65
+
66
+ @neetprep_bp.route('/neetprep')
67
+ @login_required
68
+ def index():
69
+ """Renders the main NeetPrep UI with topics and counts."""
70
+ conn = get_db_connection()
71
+ selected_subject = request.args.get('subject', 'All')
72
+ AVAILABLE_SUBJECTS = ["All", "Biology", "Chemistry", "Physics", "Mathematics"]
73
+
74
+ neetprep_topic_counts = {}
75
+ unclassified_count = 0
76
+ if current_user.neetprep_enabled:
77
+ # Get NeetPrep question counts per topic, filtered by subject
78
+ if selected_subject != 'All':
79
+ neetprep_topics_query = 'SELECT topic, COUNT(*) as count FROM neetprep_questions WHERE subject = ? GROUP BY topic'
80
+ neetprep_topics = conn.execute(neetprep_topics_query, (selected_subject,)).fetchall()
81
+ else:
82
+ neetprep_topics_query = 'SELECT topic, COUNT(*) as count FROM neetprep_questions GROUP BY topic'
83
+ neetprep_topics = conn.execute(neetprep_topics_query).fetchall()
84
+ neetprep_topic_counts = {row['topic']: row['count'] for row in neetprep_topics}
85
+ unclassified_count = conn.execute("SELECT COUNT(*) as count FROM neetprep_questions WHERE topic = 'Unclassified'").fetchone()['count']
86
+
87
+
88
+ # Get classified question counts per chapter for the current user, filtered by subject
89
+ query_params = [current_user.id]
90
+ base_query = """
91
+ SELECT q.chapter, COUNT(*) as count
92
+ FROM questions q
93
+ JOIN sessions s ON q.session_id = s.id
94
+ WHERE s.user_id = ? AND q.subject IS NOT NULL AND q.chapter IS NOT NULL
95
+ """
96
+ if selected_subject != 'All':
97
+ base_query += " AND q.subject = ? "
98
+ query_params.append(selected_subject)
99
+
100
+ base_query += " GROUP BY q.chapter"
101
+
102
+ classified_chapters = conn.execute(base_query, query_params).fetchall()
103
+ classified_chapter_counts = {row['chapter']: row['count'] for row in classified_chapters}
104
+
105
+ # Combine the topics
106
+ all_topics = set(neetprep_topic_counts.keys()) | set(classified_chapter_counts.keys())
107
+
108
+ combined_topics = []
109
+ for topic in sorted(list(all_topics)):
110
+ combined_topics.append({
111
+ 'topic': topic,
112
+ 'neetprep_count': neetprep_topic_counts.get(topic, 0),
113
+ 'my_questions_count': classified_chapter_counts.get(topic, 0)
114
+ })
115
+
116
+ conn.close()
117
+ return render_template('neetprep.html',
118
+ topics=combined_topics,
119
+ unclassified_count=unclassified_count,
120
+ available_subjects=AVAILABLE_SUBJECTS,
121
+ selected_subject=selected_subject,
122
+ neetprep_enabled=current_user.neetprep_enabled)
123
+
124
+ @neetprep_bp.route('/neetprep/sync', methods=['POST'])
125
+ @login_required
126
+ def sync_neetprep_data():
127
+ data = request.json
128
+ force_sync = data.get('force', False)
129
+ print(f"NeetPrep sync started by user {current_user.id}. Force sync: {force_sync}")
130
+
131
+ try:
132
+ conn = get_db_connection()
133
+
134
+ if force_sync:
135
+ print("Force sync enabled. Clearing processed attempts and questions tables.")
136
+ conn.execute('DELETE FROM neetprep_processed_attempts')
137
+ conn.execute('DELETE FROM neetprep_questions')
138
+ conn.commit()
139
+
140
+ processed_attempts_rows = conn.execute('SELECT attempt_id FROM neetprep_processed_attempts').fetchall()
141
+ processed_attempt_ids = {row['attempt_id'] for row in processed_attempts_rows}
142
+
143
+ all_attempt_ids = []
144
+ offset = 0
145
+ limit = 100
146
+ print("Fetching test attempts from NeetPrep API...")
147
+ while True:
148
+ result = run_hardcoded_query(query_template_step1, limit=limit, offset=offset, userId=USER_ID)
149
+ if not result or 'data' not in result or not result['data'].get('testAttempts'):
150
+ break
151
+ attempts = result['data']['testAttempts']
152
+ if not attempts: break
153
+ all_attempt_ids.extend([a['id'] for a in attempts if a.get('completed')])
154
+ offset += limit
155
+ time.sleep(0.2)
156
+
157
+ new_attempts = [aid for aid in all_attempt_ids if aid not in processed_attempt_ids]
158
+ print(f"Found {len(new_attempts)} new attempts to process.")
159
+ if not new_attempts:
160
+ conn.close()
161
+ return jsonify({'status': 'No new test attempts to sync. Everything is up-to-date.'}), 200
162
+
163
+ incorrect_question_ids = set()
164
+ print("Fetching incorrect question IDs for new attempts...")
165
+ for attempt_id in new_attempts:
166
+ result = run_hardcoded_query(query_template_step2, attemptId=attempt_id)
167
+ if result and 'data' in result and result['data'].get('incorrectQuestions'):
168
+ for q in result['data']['incorrectQuestions']:
169
+ incorrect_question_ids.add(q['id'])
170
+ time.sleep(0.2)
171
+
172
+ existing_question_ids_rows = conn.execute('SELECT id FROM neetprep_questions').fetchall()
173
+ existing_question_ids = {row['id'] for row in existing_question_ids_rows}
174
+ new_question_ids = list(incorrect_question_ids - existing_question_ids)
175
+ print(f"Found {len(new_question_ids)} new unique incorrect questions to fetch details for.")
176
+
177
+ if not new_question_ids:
178
+ for attempt_id in new_attempts:
179
+ conn.execute('INSERT INTO neetprep_processed_attempts (attempt_id) VALUES (?)', (attempt_id,))
180
+ conn.commit()
181
+ conn.close()
182
+ return jsonify({'status': 'Sync complete. No new questions found, but attempts log updated.'}), 200
183
+
184
+ questions_to_insert = []
185
+ total_new = len(new_question_ids)
186
+ completed = 0
187
+ print(f"Fetching details for {total_new} questions...")
188
+ with ThreadPoolExecutor(max_workers=10) as executor:
189
+ future_to_qid = {executor.submit(fetch_question_details, qid): qid for qid in new_question_ids}
190
+ for future in as_completed(future_to_qid):
191
+ q_data = future.result()
192
+ if q_data:
193
+ topic_name = "Unclassified"
194
+ try:
195
+ topic_name = q_data['topics']['edges'][0]['node']['name']
196
+ except (IndexError, TypeError, KeyError): pass
197
+
198
+ questions_to_insert.append((q_data.get('id'), q_data.get('question'), json.dumps(q_data.get('options', [])), q_data.get('correctOptionIndex'), q_data.get('level', 'N/A'), topic_name, "Unclassified"))
199
+
200
+ completed += 1
201
+ percentage = int((completed / total_new) * 100)
202
+ sys.stdout.write(f'\rSync Progress: {completed}/{total_new} ({percentage}%)')
203
+ sys.stdout.flush()
204
+
205
+ print("\nAll questions fetched.")
206
+
207
+ if questions_to_insert:
208
+ conn.executemany("INSERT INTO neetprep_questions (id, question_text, options, correct_answer_index, level, topic, subject) VALUES (?, ?, ?, ?, ?, ?, ?)", questions_to_insert)
209
+
210
+ for attempt_id in new_attempts:
211
+ conn.execute('INSERT INTO neetprep_processed_attempts (attempt_id) VALUES (?)', (attempt_id,))
212
+
213
+ conn.commit()
214
+ conn.close()
215
+
216
+ return jsonify({'status': f'Sync complete. Added {len(questions_to_insert)} new questions.'}), 200
217
+
218
+ except Exception as e:
219
+ current_app.logger.error(f"Error during NeetPrep sync: {repr(e)}")
220
+ if 'conn' in locals() and conn:
221
+ conn.close()
222
+ return jsonify({'error': f"A critical error occurred during sync: {repr(e)}"}), 500
223
+
224
+ @neetprep_bp.route('/neetprep/classify', methods=['POST'])
225
+ @login_required
226
+ def classify_unclassified_questions():
227
+ """Classifies all questions marked as 'Unclassified' in batches."""
228
+ print("Starting classification of 'Unclassified' questions.")
229
+ conn = get_db_connection()
230
+ unclassified_questions = conn.execute("SELECT id, question_text FROM neetprep_questions WHERE topic = 'Unclassified'").fetchall()
231
+ total_to_classify = len(unclassified_questions)
232
+
233
+ if total_to_classify == 0:
234
+ conn.close()
235
+ return jsonify({'status': 'No unclassified questions to process.'})
236
+
237
+ batch_size = 10
238
+ num_batches = math.ceil(total_to_classify / batch_size)
239
+ total_classified_count = 0
240
+
241
+ print(f"Found {total_to_classify} questions. Processing in {num_batches} batches of {batch_size}.")
242
+
243
+ for i in range(num_batches):
244
+ batch_start_time = time.time()
245
+ start_index = i * batch_size
246
+ end_index = start_index + batch_size
247
+
248
+ batch = unclassified_questions[start_index:end_index]
249
+
250
+ question_texts = [q['question_text'] for q in batch]
251
+ question_ids = [q['id'] for q in batch]
252
+
253
+ print(f"\nProcessing Batch {i+1}/{num_batches}...")
254
+
255
+ try:
256
+ # Choose classifier based on user preference
257
+ classifier_model = getattr(current_user, 'classifier_model', 'gemini')
258
+
259
+ if classifier_model == 'nova':
260
+ print("Classifying with Nova API...")
261
+ classification_result = classify_questions_with_nova(question_texts, start_index=0)
262
+ model_name = "Nova"
263
+ else:
264
+ print("Classifying with Gemini API...")
265
+ classification_result = classify_questions_with_gemini(question_texts, start_index=0)
266
+ model_name = "Gemini"
267
+
268
+ if not classification_result or not classification_result.get('data'):
269
+ print(f"Batch {i+1} failed: {model_name} API did not return valid data.")
270
+ continue
271
+
272
+ update_count_in_batch = 0
273
+ for item in classification_result.get('data', []):
274
+ item_index = item.get('index')
275
+ if item_index is not None and 1 <= item_index <= len(question_ids):
276
+ # The item['index'] is 1-based, so we convert to 0-based
277
+ matched_id = question_ids[item_index - 1]
278
+ new_topic = item.get('chapter_title')
279
+ if new_topic:
280
+ conn.execute('UPDATE neetprep_questions SET topic = ? WHERE id = ?', (new_topic, matched_id))
281
+ update_count_in_batch += 1
282
+
283
+ conn.commit()
284
+ total_classified_count += update_count_in_batch
285
+ print(f"Batch {i+1} complete. Classified {update_count_in_batch} questions.")
286
+
287
+ # Wait before the next batch
288
+ if i < num_batches - 1:
289
+ print("Waiting 6 seconds before next batch...")
290
+ time.sleep(6)
291
+
292
+ except Exception as e:
293
+ print(f"\nAn error occurred during batch {i+1}: {repr(e)}")
294
+ continue
295
+
296
+ conn.close()
297
+ print(f"\nClassification finished. In total, {total_classified_count} questions were updated.")
298
+ return jsonify({'status': f'Classification complete. Updated {total_classified_count} of {total_to_classify} questions.'})
299
+
300
+
301
+ from rich.table import Table
302
+ from rich.console import Console
303
+
304
+ @neetprep_bp.route('/neetprep/generate', methods=['POST'])
305
+ @login_required
306
+ def generate_neetprep_pdf():
307
+ if request.is_json:
308
+ data = request.json
309
+ else:
310
+ data = request.form
311
+
312
+ pdf_type = data.get('type')
313
+ topics_str = data.get('topics')
314
+ topics = json.loads(topics_str) if topics_str and topics_str != '[]' else []
315
+
316
+ conn = get_db_connection()
317
+ all_questions = []
318
+
319
+ # Only fetch NeetPrep questions if the feature is enabled for the user
320
+ if current_user.neetprep_enabled:
321
+ if pdf_type == 'quiz' and topics:
322
+ placeholders = ', '.join('?' for _ in topics)
323
+ neetprep_questions_from_db = conn.execute(f"SELECT * FROM neetprep_questions WHERE topic IN ({placeholders})", topics).fetchall()
324
+ for q in neetprep_questions_from_db:
325
+ try:
326
+ html_content = f"""<html><head><meta charset="utf-8"></head><body>{q['question_text']}</body></html>"""
327
+ img_path = os.path.join(current_app.config['TEMP_FOLDER'], f"neetprep_{q['id']}.jpg")
328
+ imgkit.from_string(html_content, img_path, options={'width': 800})
329
+ all_questions.append({
330
+ 'image_path': img_path,
331
+ 'details': {'id': q['id'], 'options': json.loads(q['options']), 'correct_answer_index': q['correct_answer_index'], 'user_answer_index': None, 'source': 'neetprep', 'topic': q['topic'], 'subject': q['subject']}
332
+ })
333
+ except Exception as e:
334
+ current_app.logger.error(f"Failed to convert NeetPrep question {q['id']} to image: {e}")
335
+
336
+ elif pdf_type == 'all':
337
+ neetprep_questions_from_db = conn.execute("SELECT * FROM neetprep_questions").fetchall()
338
+ for q in neetprep_questions_from_db:
339
+ all_questions.append({"id": q['id'], "question_text": q['question_text'], "options": json.loads(q['options']), "correct_answer_index": q['correct_answer_index'], "user_answer_index": None, "status": "wrong", "source": "neetprep", "custom_fields": {"difficulty": q['level'], "topic": q['topic'], "subject": q['subject']}})
340
+
341
+ elif pdf_type == 'selected' and topics:
342
+ placeholders = ', '.join('?' for _ in topics)
343
+ neetprep_questions_from_db = conn.execute(f"SELECT * FROM neetprep_questions WHERE topic IN ({placeholders})", topics).fetchall()
344
+ for q in neetprep_questions_from_db:
345
+ all_questions.append({"id": q['id'], "question_text": q['question_text'], "options": json.loads(q['options']), "correct_answer_index": q['correct_answer_index'], "user_answer_index": None, "status": "wrong", "source": "neetprep", "custom_fields": {"difficulty": q['level'], "topic": q['topic'], "subject": q['subject']}})
346
+
347
+ # Always fetch the user's own classified questions if topics are selected or if it's a quiz
348
+ if topics or pdf_type == 'quiz':
349
+ # If no topics are selected for a quiz/selection, this should not run or fetch all
350
+ if not topics:
351
+ # For a quiz, topics are mandatory. For 'selected', topics are mandatory.
352
+ if pdf_type in ['quiz', 'selected']:
353
+ conn.close()
354
+ return jsonify({'error': 'No topics selected.'}), 400
355
+ else:
356
+ placeholders = ', '.join('?' for _ in topics)
357
+ classified_questions_from_db = conn.execute(f"""
358
+ SELECT q.* FROM questions q JOIN sessions s ON q.session_id = s.id
359
+ WHERE q.chapter IN ({placeholders}) AND s.user_id = ?
360
+ """, (*topics, current_user.id)).fetchall()
361
+ for q in classified_questions_from_db:
362
+ image_info = conn.execute("SELECT processed_filename FROM images WHERE id = ?", (q['image_id'],)).fetchone()
363
+ if image_info and image_info['processed_filename']:
364
+ if pdf_type == 'quiz':
365
+ all_questions.append({'image_path': os.path.join(current_app.config['PROCESSED_FOLDER'], image_info['processed_filename']),'details': {'id': q['id'], 'options': [], 'correct_answer_index': q['actual_solution'], 'user_answer_index': q['marked_solution'], 'source': 'classified', 'topic': q['chapter'], 'subject': q['subject']}})
366
+ else:
367
+ all_questions.append({"id": q['id'], "question_text": f"<img src=\"{os.path.join(current_app.config['PROCESSED_FOLDER'], image_info['processed_filename'])}\" />", "options": [], "correct_answer_index": q['actual_solution'], "user_answer_index": q['marked_solution'], "status": q['status'], "source": "classified", "custom_fields": {"subject": q['subject'], "chapter": q['chapter'], "question_number": q['question_number']}})
368
+
369
+ # For 'all' type, also include user's classified questions
370
+ if pdf_type == 'all':
371
+ classified_questions_from_db = conn.execute("""
372
+ SELECT q.* FROM questions q JOIN sessions s ON q.session_id = s.id
373
+ WHERE s.user_id = ? AND q.subject IS NOT NULL AND q.chapter IS NOT NULL
374
+ """, (current_user.id,)).fetchall()
375
+ for q in classified_questions_from_db:
376
+ image_info = conn.execute("SELECT processed_filename FROM images WHERE id = ?", (q['image_id'],)).fetchone()
377
+ if image_info and image_info['processed_filename']:
378
+ all_questions.append({"id": q['id'], "question_text": f"<img src=\"{os.path.join(current_app.config['PROCESSED_FOLDER'], image_info['processed_filename'])}\" />", "options": [], "correct_answer_index": q['actual_solution'], "user_answer_index": q['marked_solution'], "status": q['status'], "source": "classified", "custom_fields": {"subject": q['subject'], "chapter": q['chapter'], "question_number": q['question_number']}})
379
+
380
+ conn.close()
381
+
382
+ if not all_questions:
383
+ return jsonify({'error': 'No questions found for the selected criteria.'}), 404
384
+
385
+ if pdf_type == 'quiz':
386
+ return render_template('quiz_v2.html', questions=all_questions)
387
+
388
+ test_name = "All Incorrect Questions"
389
+ if pdf_type == 'selected':
390
+ test_name = f"Incorrect Questions - {', '.join(topics)}"
391
+
392
+ final_json_output = {
393
+ "version": "2.1", "test_name": test_name,
394
+ "config": { "font_size": 22, "auto_generate_pdf": False, "layout": data.get('layout', {}) },
395
+ "metadata": { "source_book": "NeetPrep & Classified", "student_id": USER_ID, "tags": ", ".join(topics) },
396
+ "questions": all_questions, "view": True
397
+ }
398
+
399
+ try:
400
+ result, status_code = _process_json_and_generate_pdf(final_json_output, current_user.id)
401
+ if status_code != 200:
402
+ return jsonify(result), status_code
403
+
404
+ if result.get('success'):
405
+ return jsonify({'success': True, 'pdf_url': result.get('view_url')})
406
+ else:
407
+ return jsonify({'error': result.get('error', 'Failed to generate PDF via internal call.')}), 500
408
+ except Exception as e:
409
+ current_app.logger.error(f"Failed to call _process_json_and_generate_pdf: {repr(e)}")
410
+ return jsonify({'error': str(e)}), 500
411
+
412
+ @neetprep_bp.route('/neetprep/edit')
413
+ @login_required
414
+ def edit_neetprep_questions():
415
+ """Renders the page for editing NeetPrep questions."""
416
+ conn = get_db_connection()
417
+ topics = conn.execute('SELECT DISTINCT topic FROM neetprep_questions ORDER BY topic').fetchall()
418
+ questions = conn.execute('SELECT id, question_text, topic, subject FROM neetprep_questions ORDER BY id').fetchall()
419
+
420
+ questions_plain = []
421
+ for q in questions:
422
+ q_dict = dict(q)
423
+ soup = BeautifulSoup(q_dict['question_text'], 'html.parser')
424
+ plain_text = soup.get_text(strip=True)
425
+ q_dict['question_text_plain'] = (plain_text[:100] + '...') if len(plain_text) > 100 else plain_text
426
+ questions_plain.append(q_dict)
427
+
428
+ conn.close()
429
+ return render_template('neetprep_edit.html', questions=questions_plain, topics=[t['topic'] for t in topics])
430
+
431
+ @neetprep_bp.route('/neetprep/update_question/<question_id>', methods=['POST'])
432
+ @login_required
433
+ def update_neetprep_question(question_id):
434
+ """Handles updating a question's metadata."""
435
+ # This route modifies global neetprep data. In a real multi-user app,
436
+ # this should be restricted to admin users. For now, @login_required is a basic protection.
437
+ data = request.json
438
+ new_topic = data.get('topic')
439
+ new_subject = data.get('subject')
440
+
441
+ if not new_topic or not new_subject:
442
+ return jsonify({'error': 'Topic and Subject cannot be empty.'}), 400
443
+
444
+ try:
445
+ conn = get_db_connection()
446
+ conn.execute(
447
+ 'UPDATE neetprep_questions SET topic = ?, subject = ? WHERE id = ?',
448
+ (new_topic, new_subject, question_id)
449
+ )
450
+ conn.commit()
451
+ conn.close()
452
+ return jsonify({'success': True})
453
+ except Exception as e:
454
+ current_app.logger.error(f"Error updating question {question_id}: {repr(e)}")
455
+ return jsonify({'error': str(e)}), 500
456
+
457
+ def run_hardcoded_query(query_template, **kwargs):
458
+ """Helper function to run a GraphQL query."""
459
+ final_query = query_template.format(**kwargs)
460
+ payload = {'query': final_query, 'variables': {}}
461
+ try:
462
+ response = requests.post(ENDPOINT_URL, headers=HEADERS, json=payload, timeout=30)
463
+ response.raise_for_status()
464
+ return response.json()
465
+ except requests.exceptions.RequestException as e:
466
+ current_app.logger.error(f"NeetPrep API Request Error: {repr(e)}")
467
+ return None
nova_classifier.py ADDED
@@ -0,0 +1,296 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import json
3
+ import requests
4
+ import sys
5
+ from typing import List, Optional, Dict, Any
6
+ from api_key_manager import get_api_key_manager
7
+
8
+ def classify_questions_with_nova(questions: List[str], start_index: int = 0) -> Optional[Dict[Any, Any]]:
9
+ """
10
+ Classifies a single batch of questions using Amazon Nova via OpenRouter API.
11
+ `questions` should be a list of strings representing one batch.
12
+ `start_index` is the overall starting index for this batch (e.g., 0, 7, 14...).
13
+ """
14
+ # Get API key from the manager
15
+ manager = get_api_key_manager()
16
+ api_key, key_index = manager.get_key('openrouter')
17
+
18
+ if not api_key:
19
+ raise ValueError("No available OpenRouter API keys. Please set OPENROUTER_API_KEY environment variable.")
20
+
21
+ # Construct the input text with the current batch of questions
22
+ # The model expects 1-based indexing in the prompt.
23
+ input_text = "\n".join([f"{j + start_index + 1}. {q}" for j, q in enumerate(questions)])
24
+
25
+ prompt = f"""
26
+ **System Role:** You are a question classifier for NEET/JEE exams, specialized in mapping questions to their corresponding subjects and chapters from the NCERT syllabus.
27
+
28
+ Your task is to analyze each question, first classify it into the most relevant subject, and then identify the most relevant chapter(s) from the official syllabus structures provided below.
29
+
30
+ **Available Subjects (Use these exact titles):**
31
+ - Biology
32
+ - Chemistry
33
+ - Physics
34
+ - Mathematics
35
+
36
+ **Syllabus Chapters (Use these exact titles for the respective subjects):**
37
+
38
+ ---
39
+ **1. BIOLOGY (Common for NEET & JEE)**
40
+
41
+ **Class XI**
42
+ 1. The Living World
43
+ 2. Biological Classification
44
+ 3. Plant Kingdom
45
+ 4. Animal Kingdom
46
+ 5. Morphology of Flowering Plants
47
+ 6. Anatomy of Flowering Plants
48
+ 7. Structural Organisation in Animals
49
+ 8. Cell: The Unit of Life
50
+ 9. Biomolecules
51
+ 10. Cell Cycle and Cell Division
52
+ 11. Photosynthesis in Higher Plants
53
+ 12. Respiration in Plants
54
+ 13. Plant Growth and Development
55
+ 14. Breathing and Exchange of Gases
56
+ 15. Body Fluids and Circulation
57
+ 16. Excretory Products and their Elimination
58
+ 17. Locomotion and Movement
59
+ 18. Neural Control and Coordination
60
+ 19. Chemical Coordination and Integration
61
+ 20. Sexual Reproduction in Flowering Plants
62
+ 21. Human Reproduction
63
+ 22. Reproductive Health
64
+ 23. Principles of Inheritance and Variation
65
+ 24. Molecular Basis of Inheritance
66
+ 25. Evolution
67
+ 26. Health and Disease
68
+ 27. Improvement in Food Production
69
+ 28. Microbes in Human Welfare
70
+ 29. Biotechnology - Principles and Processes
71
+ 30. Biotechnology and Its Applications
72
+ 31. Organisms and Populations
73
+ 32. Ecosystem
74
+ 33. Biodiversity and Its Conservation
75
+
76
+ ---
77
+ **2. CHEMISTRY (Common for NEET & JEE)**
78
+
79
+ **Class XI**
80
+ 1. Some Basic Concepts of Chemistry
81
+ 2. Structure of Atom
82
+ 3. Classification of Elements and Periodicity in Properties
83
+ 4. Chemical Bonding and Molecular Structure
84
+ 5. States of Matter: Gases and Liquids
85
+ 6. Thermodynamics
86
+ 7. Equilibrium
87
+ 8. Redox Reactions
88
+ 9. Hydrogen
89
+ 10. The s-Block Elements
90
+ 11. The p-Block Elements (Group 13 and 14)
91
+ 12. Organic Chemistry – Some Basic Principles and Techniques (GOC)
92
+ 13. Hydrocarbons
93
+ 14. Environmental Chemistry
94
+
95
+ **Class XII**
96
+ 1. The Solid State
97
+ 2. Solutions
98
+ 3. Electrochemistry
99
+ 4. Chemical Kinetics
100
+ 5. Surface Chemistry
101
+ 6. General Principles and Processes of Isolation of Elements (Metallurgy)
102
+ 7. The p-Block Elements (Group 15 to 18)
103
+ 8. The d- and f- Block Elements
104
+ 9. Coordination Compounds
105
+ 10. Haloalkanes and Haloarenes
106
+ 11. Alcohols, Phenols and Ethers
107
+ 12. Aldehydes, Ketones and Carboxylic Acids
108
+ 13. Amines
109
+ 14. Biomolecules
110
+ 15. Polymers
111
+ 16. Chemistry in Everyday Life
112
+
113
+ ---
114
+ **3. PHYSICS (Common for NEET & JEE)**
115
+
116
+ **Class XI**
117
+ 1. Units and Measurements
118
+ 2. Motion in a Straight Line
119
+ 3. Motion in a Plane
120
+ 4. Laws of Motion
121
+ 5. Work, Energy and Power
122
+ 6. System of Particles and Rotational Motion
123
+ 7. Gravitation
124
+ 8. Mechanical Properties of Solids
125
+ 9. Mechanical Properties of Fluids
126
+ 10. Thermal Properties of Matter
127
+ 11. Thermodynamics
128
+ 12. Kinetic Theory
129
+ 13. Oscillations
130
+ 14. Waves
131
+
132
+ **Class XII**
133
+ 1. Electric Charges and Fields
134
+ 2. Electrostatic Potential and Capacitance
135
+ 3. Current Electricity
136
+ 4. Moving Charges and Magnetism
137
+ 5. Magnetism and Matter
138
+ 6. Electromagnetic Induction
139
+ 7. Alternating Current
140
+ 8. Electromagnetic Waves
141
+ 9. Ray Optics and Optical Instruments
142
+ 10. Wave Optics
143
+ 11. Dual Nature of Radiation and Matter
144
+ 12. Atoms
145
+ 13. Nuclei
146
+ 14. Semiconductor Electronics: Materials, Devices and Simple Circuits
147
+ 15. Communication Systems
148
+
149
+ ---
150
+ **4. MATHEMATICS (For JEE Only)**
151
+
152
+ **Class XI**
153
+ 1. Sets
154
+ 2. Relations and Functions
155
+ 3. Trigonometric Functions
156
+ 4. Principle of Mathematical Induction
157
+ 5. Complex Numbers and Quadratic Equations
158
+ 6. Linear Inequalities
159
+ 7. Permutations and Combinations
160
+ 8. Binomial Theorem
161
+ 9. Sequences and Series
162
+ 10. Straight Lines
163
+ 11. Conic Sections
164
+ 12. Introduction to Three Dimensional Geometry
165
+ 13. Limits and Derivatives
166
+ 14. Mathematical Reasoning
167
+ 15. Statistics
168
+ 16. Probability
169
+
170
+ **Class XII**
171
+ 1. Relations and Functions
172
+ 2. Inverse Trigonometric Functions
173
+ 3. Matrices
174
+ 4. Determinants
175
+ 5. Continuity and Differentiability
176
+ 6. Application of Derivatives
177
+ 7. Integrals
178
+ 8. Application of Integrals
179
+ 9. Differential Equations
180
+ 10. Vector Algebra
181
+ 11. Three Dimensional Geometry
182
+ 12. Linear Programming
183
+ 13. Probability
184
+
185
+ ---
186
+
187
+ **Classification Guidelines:**
188
+
189
+ 1. **Primary Classification**: Identify the single most relevant subject, and then the most relevant chapter(s) within that subject, that directly addresses the question's core concept.
190
+ 2. **Multi-Chapter Questions**: If a question explicitly spans 2-3 distinct chapters, include all relevant chapters.
191
+ 3. **Confidence Scoring** (0.0 to 1.0):
192
+ * **1.0**: Perfect match
193
+ * **0.8-0.9**: Strong match
194
+ * **0.5-0.7**: Moderate match
195
+ * **Below 0.5**: Avoid unless unavoidable.
196
+ 4. **Non-Syllabus Questions**: If a question is not from any of the provided subjects/chapters, set `subject` to 'Unclassified' and `chapter_title` to 'Unclassified'.
197
+
198
+ **Critical Requirements:**
199
+
200
+ - Use ONLY the subject titles exactly as listed above, or 'Unclassified'.
201
+ - Use ONLY the chapter titles exactly as listed above, or 'Unclassified'.
202
+ - Preserve the original question text completely.
203
+ - Output ONLY valid JSON.
204
+ - The "index" field MUST match the question number shown in the input (e.g., if the question is numbered "8.", then "index": 8).
205
+
206
+ **Output JSON Schema:**
207
+
208
+ ```json
209
+ {{
210
+ "data": [
211
+ {{
212
+ "index": 1,
213
+ "subject": "<exact subject title from list or 'Unclassified'>",
214
+ "chapter_index": <chapter number or 0>,
215
+ "chapter_title": "<exact chapter title from list or 'Unclassified'>",
216
+ "original_question_text": "<complete original question with all formatting>",
217
+ "confidence": <0.0 to 1.0>
218
+ }}
219
+ ],
220
+ "success": [true]
221
+ }}
222
+ ```
223
+
224
+ Now classify the following question(s):
225
+ ```
226
+ {input_text}
227
+ ```
228
+
229
+ Output ONLY the JSON response, nothing else.
230
+ """
231
+
232
+ url = "https://openrouter.ai/api/v1/chat/completions"
233
+ headers = {
234
+ "Authorization": f"Bearer {api_key}",
235
+ "Content-Type": "application/json",
236
+ }
237
+
238
+ request_body = {
239
+ "model": "amazon/nova-2-lite-v1:free",
240
+ "messages": [
241
+ {"role": "user", "content": prompt}
242
+ ],
243
+ }
244
+
245
+ print(f"Sending batch to Nova API with {len(questions)} questions.")
246
+ print(f"Sending request to Nova API. Body: {json.dumps(request_body, indent=2)}") # Full logging enabled
247
+
248
+ try:
249
+ response = requests.post(url, headers=headers, json=request_body, timeout=300)
250
+ response.raise_for_status()
251
+
252
+ print(f"Received raw response from Nova: {response.text}") # Full logging enabled
253
+
254
+ # Parse the response JSON
255
+ response_json = response.json()
256
+
257
+ # Extract the content from Nova's response
258
+ choices = response_json.get('choices', [])
259
+ if not choices:
260
+ print("Error: Nova API returned no choices.")
261
+ return None
262
+
263
+ content = choices[0].get('message', {}).get('content', '')
264
+ if not content:
265
+ print("Error: Nova API returned empty content.")
266
+ return None
267
+
268
+ # Nova often wraps JSON in markdown code blocks, so we need to extract it
269
+ content = content.strip()
270
+
271
+ # Remove markdown code block markers if present
272
+ if content.startswith('```json'):
273
+ content = content[7:] # Remove ```json
274
+ elif content.startswith('```'):
275
+ content = content[3:] # Remove ```
276
+
277
+ if content.endswith('```'):
278
+ content = content[:-3] # Remove closing ```
279
+
280
+ content = content.strip()
281
+
282
+ # Parse the JSON from the content
283
+ batch_result = json.loads(content)
284
+ manager.mark_success('openrouter', key_index)
285
+ return batch_result
286
+
287
+ except requests.exceptions.RequestException as e:
288
+ print(f"Error during Nova API call: {repr(e)}", file=sys.stderr)
289
+ print(f"Response body: {e.response.text if e.response else 'N/A'}", file=sys.stderr)
290
+ manager.mark_failure('openrouter', key_index)
291
+ return None
292
+ except (json.JSONDecodeError, KeyError, IndexError) as e:
293
+ print(f"Error parsing Nova response: {repr(e)}", file=sys.stderr)
294
+ print(f"Raw response text: {response.text if 'response' in locals() else 'N/A'}", file=sys.stderr)
295
+ manager.mark_failure('openrouter', key_index)
296
+ return None
pre-migration-report.md ADDED
@@ -0,0 +1,368 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Pre-Migration Report: Single-User to Multi-User Architecture
2
+
3
+ This document outlines the necessary changes to migrate the DocuPDF application from a single-user to a multi-user architecture. The migration is designed to be completed in phases, ensuring that existing data is preserved and correctly associated with the primary user.
4
+
5
+ ---
6
+
7
+ ## Phase 1: User Authentication Foundation
8
+
9
+ This phase introduces the core concepts of users and authentication.
10
+
11
+ ### 1.1 New `users` Table
12
+
13
+ A new table will be created to store user credentials.
14
+
15
+ ```sql
16
+ -- file: database.py (addition)
17
+ CREATE TABLE IF NOT EXISTS users (
18
+ id INTEGER PRIMARY KEY AUTOINCREMENT,
19
+ username TEXT NOT NULL UNIQUE,
20
+ email TEXT NOT NULL UNIQUE,
21
+ password_hash TEXT NOT NULL,
22
+ created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
23
+ );
24
+ ```
25
+
26
+ ### 1.2 New File: `user_manager.py`
27
+
28
+ A new file will handle user session management, password hashing, and provide the user model required by Flask-Login.
29
+
30
+ ```python
31
+ # file: user_manager.py (new file)
32
+ from flask_login import LoginManager, UserMixin
33
+ from werkzeug.security import generate_password_hash, check_password_hash
34
+ from utils import get_db_connection
35
+
36
+ class User(UserMixin):
37
+ def __init__(self, id, username, email, password_hash):
38
+ self.id = id
39
+ self.username = username
40
+ self.email = email
41
+ self.password_hash = password_hash
42
+
43
+ @staticmethod
44
+ def get(user_id):
45
+ conn = get_db_connection()
46
+ user_row = conn.execute('SELECT * FROM users WHERE id = ?', (user_id,)).fetchone()
47
+ conn.close()
48
+ if user_row:
49
+ return User(user_row['id'], user_row['username'], user_row['email'], user_row['password_hash'])
50
+ return None
51
+
52
+ @staticmethod
53
+ def get_by_username(username):
54
+ conn = get_db_connection()
55
+ user_row = conn.execute('SELECT * FROM users WHERE username = ?', (username,)).fetchone()
56
+ conn.close()
57
+ if user_row:
58
+ return User(user_row['id'], user_row['username'], user_row['email'], user_row['password_hash'])
59
+ return None
60
+
61
+ def setup_login_manager(app):
62
+ login_manager = LoginManager()
63
+ login_manager.init_app(app)
64
+ login_manager.login_view = 'user_auth.login' # New blueprint for user auth
65
+
66
+ @login_manager.user_loader
67
+ def load_user(user_id):
68
+ return User.get(user_id)
69
+
70
+ # (Additional functions for creating users, etc. will be added here)
71
+ ```
72
+
73
+ ### 1.3 Application Setup (`app.py`)
74
+
75
+ The main `app.py` will be updated to initialize the `LoginManager` and register the new authentication blueprint.
76
+
77
+ ```python
78
+ # file: app.py (changes)
79
+ # Current
80
+ def create_app():
81
+ app = Flask(__name__)
82
+ # ...
83
+ # Register Blueprints
84
+ from routes import main_bp
85
+ # ...
86
+ app.register_blueprint(main_bp)
87
+ return app
88
+
89
+ # After
90
+ from flask_login import LoginManager
91
+
92
+ def create_app():
93
+ app = Flask(__name__)
94
+ app.config['SECRET_KEY'] = os.urandom(24) # Important for session security
95
+ # ...
96
+
97
+ # Setup Login Manager
98
+ from user_manager import setup_login_manager
99
+ setup_login_manager(app)
100
+
101
+ # Register Blueprints
102
+ from routes import main_bp
103
+ from user_auth_routes import auth_bp # New blueprint for login/register
104
+ # ...
105
+ app.register_blueprint(main_bp)
106
+ app.register_blueprint(auth_bp)
107
+ return app
108
+ ```
109
+
110
+ ### 1.4 New Templates: `login.html` and `register.html`
111
+
112
+ New HTML templates will be created for the user login and registration forms. These will be standard forms with fields for username, password, and email.
113
+
114
+ ---
115
+
116
+ ## Phase 2: Database and Data Segregation
117
+
118
+ This phase links all application data to specific users.
119
+
120
+ ### 2.1 Database Schema Changes
121
+
122
+ The following tables will be altered to include a `user_id` foreign key.
123
+
124
+ ```sql
125
+ -- file: database.py (migrations)
126
+
127
+ -- Add user_id to sessions
128
+ ALTER TABLE sessions ADD COLUMN user_id INTEGER REFERENCES users(id);
129
+
130
+ -- Add user_id to generated_pdfs
131
+ ALTER TABLE generated_pdfs ADD COLUMN user_id INTEGER REFERENCES users(id);
132
+
133
+ -- Add user_id to folders
134
+ ALTER TABLE folders ADD COLUMN user_id INTEGER REFERENCES users(id);
135
+ ```
136
+
137
+ ### 2.2 Backend Logic Changes (Code-by-Code)
138
+
139
+ All routes and functions that interact with user-specific data must be updated. This will be done by using the `current_user` object provided by Flask-Login after a user logs in. All routes will also be protected with the `@login_required` decorator.
140
+
141
+ #### **`routes.py`**
142
+
143
+ ##### `v2_upload()`
144
+
145
+ **Current:**
146
+ ```python
147
+ @main_bp.route('/v2/upload', methods=['POST'])
148
+ def v2_upload():
149
+ session_id = str(uuid.uuid4())
150
+ # ...
151
+ conn = get_db_connection()
152
+ conn.execute('INSERT INTO sessions (id, original_filename, name) VALUES (?, ?, ?)', (session_id, original_filename, original_filename))
153
+ # ...
154
+ ```
155
+
156
+ **After:**
157
+ ```python
158
+ from flask_login import login_required, current_user
159
+
160
+ @main_bp.route('/v2/upload', methods=['POST'])
161
+ @login_required
162
+ def v2_upload():
163
+ session_id = str(uuid.uuid4())
164
+ # ...
165
+ conn = get_db_connection()
166
+ conn.execute('INSERT INTO sessions (id, original_filename, name, user_id) VALUES (?, ?, ?, ?)',
167
+ (session_id, original_filename, original_filename, current_user.id))
168
+ # ...
169
+ ```
170
+
171
+ ##### `question_entry_v2(session_id)`
172
+
173
+ **Current:**
174
+ ```python
175
+ @main_bp.route('/question_entry_v2/<session_id>')
176
+ def question_entry_v2(session_id):
177
+ conn = get_db_connection()
178
+ session_data = conn.execute(
179
+ 'SELECT original_filename, subject, tags, notes FROM sessions WHERE id = ?', (session_id,)
180
+ ).fetchone()
181
+ #...
182
+ ```
183
+
184
+ **After:**
185
+ ```python
186
+ from flask_login import login_required, current_user
187
+
188
+ @main_bp.route('/question_entry_v2/<session_id>')
189
+ @login_required
190
+ def question_entry_v2(session_id):
191
+ conn = get_db_connection()
192
+ # Add user_id check to prevent unauthorized access
193
+ session_data = conn.execute(
194
+ 'SELECT original_filename, subject, tags, notes FROM sessions WHERE id = ? AND user_id = ?',
195
+ (session_id, current_user.id)
196
+ ).fetchone()
197
+ if not session_data:
198
+ return "Unauthorized", 403
199
+ #...
200
+ ```
201
+ *(Note: This pattern of adding `@login_required` and `AND user_id = ?` to queries will be repeated for almost every route in `routes.py`, `dashboard.py`, `json_processor.py`, etc. The examples above illustrate the core change.)*
202
+
203
+ #### **`dashboard.py`**
204
+
205
+ ##### `dashboard()`
206
+
207
+ **Current:**
208
+ ```python
209
+ @dashboard_bp.route('/dashboard')
210
+ def dashboard():
211
+ conn = get_db_connection()
212
+ sessions_rows = conn.execute("""
213
+ SELECT s.id, ...
214
+ FROM sessions s
215
+ ...
216
+ """).fetchall()
217
+ #...
218
+ ```
219
+
220
+ **After:**
221
+ ```python
222
+ from flask_login import login_required, current_user
223
+
224
+ @dashboard_bp.route('/dashboard')
225
+ @login_required
226
+ def dashboard():
227
+ conn = get_db_connection()
228
+ sessions_rows = conn.execute("""
229
+ SELECT s.id, ...
230
+ FROM sessions s
231
+ LEFT JOIN images i ON s.id = i.session_id
232
+ WHERE s.user_id = ?
233
+ GROUP BY s.id, ...
234
+ ORDER BY s.created_at DESC
235
+ """, (current_user.id,)).fetchall()
236
+ #...
237
+ ```
238
+
239
+ ---
240
+
241
+ ## Phase 3: Security and UI
242
+
243
+ This phase focuses on the user-facing elements and securing file access.
244
+
245
+ ### 3.1 UI Navigation (`_nav_links.html`)
246
+
247
+ The navigation links will be updated to show context-aware links for login, registration, and logout.
248
+
249
+ **Current:**
250
+ ```html
251
+ <!-- file: templates/_nav_links.html -->
252
+ <div class="navbar-nav ms-auto">
253
+ <a class="nav-link" href="/dashboard">...</a>
254
+ <a class="nav-link" href="{{ url_for('neetprep_bp.index') }}">...</a>
255
+ ...
256
+ </div>
257
+ ```
258
+
259
+ **After:**
260
+ ```html
261
+ <!-- file: templates/_nav_links.html -->
262
+ <div class="navbar-nav ms-auto">
263
+ {% if current_user.is_authenticated %}
264
+ <li class="nav-item">
265
+ <span class="navbar-text">Welcome, {{ current_user.username }}</span>
266
+ </li>
267
+ <a class="nav-link" href="/dashboard">...</a>
268
+ <a class="nav-link" href="{{ url_for('neetprep_bp.index') }}">...</a>
269
+ ...
270
+ <a class="nav-link" href="{{ url_for('user_auth.logout') }}">
271
+ <i class="bi bi-box-arrow-right me-1"></i> Logout
272
+ </a>
273
+ {% else %}
274
+ <a class="nav-link" href="{{ url_for('user_auth.login') }}">
275
+ <i class="bi bi-box-arrow-in-right me-1"></i> Login
276
+ </a>
277
+ <a class="nav-link" href="{{ url_for('user_auth.register') }}">
278
+ <i class="bi bi-person-plus-fill me-1"></i> Register
279
+ </a>
280
+ {% endif %}
281
+ </div>
282
+ ```
283
+
284
+ ### 3.2 Secure File Access (`routes.py`)
285
+
286
+ Routes that serve files directly must check for ownership before sending the file.
287
+
288
+ ##### `download_file(filename)`
289
+
290
+ **Current:**
291
+ ```python
292
+ # file: routes.py
293
+ @main_bp.route('/download/<filename>')
294
+ def download_file(filename):
295
+ return send_file(os.path.join(current_app.config['OUTPUT_FOLDER'], filename), as_attachment=True)
296
+ ```
297
+
298
+ **After:**
299
+ ```python
300
+ # file: routes.py
301
+ from flask_login import login_required, current_user
302
+
303
+ @main_bp.route('/download/<filename>')
304
+ @login_required
305
+ def download_file(filename):
306
+ conn = get_db_connection()
307
+ # Check if the requested file belongs to the current user
308
+ pdf_owner = conn.execute(
309
+ 'SELECT user_id FROM generated_pdfs WHERE filename = ?', (filename,)
310
+ ).fetchone()
311
+ conn.close()
312
+
313
+ if pdf_owner and pdf_owner['user_id'] == current_user.id:
314
+ return send_file(os.path.join(current_app.config['OUTPUT_FOLDER'], filename), as_attachment=True)
315
+ else:
316
+ return "Unauthorized", 403
317
+ ```
318
+
319
+ ---
320
+
321
+ ## Data Migration Script (Conceptual)
322
+
323
+ A one-time script will be created to migrate the existing data.
324
+
325
+ ```python
326
+ # file: migrate_to_multiuser.py (conceptual)
327
+ import sqlite3
328
+ from werkzeug.security import generate_password_hash
329
+
330
+ def migrate():
331
+ conn = sqlite3.connect('database.db')
332
+ cursor = conn.cursor()
333
+
334
+ # 1. Create a default user (credentials should be provided securely)
335
+ default_username = 'admin' # Or your preferred username
336
+ default_password = 'your_secure_password'
337
+ password_hash = generate_password_hash(default_password)
338
+
339
+ try:
340
+ cursor.execute(
341
+ "INSERT INTO users (username, email, password_hash) VALUES (?, ?, ?)",
342
+ (default_username, 'admin@local.host', password_hash)
343
+ )
344
+ user_id = cursor.lastrowid
345
+ print(f"Created default user '{default_username}' with ID {user_id}")
346
+ except sqlite3.IntegrityError:
347
+ print("Default user already exists.")
348
+ user_id = cursor.execute("SELECT id FROM users WHERE username = ?", (default_username,)).fetchone()[0]
349
+
350
+ # 2. Add user_id columns (This should be done via ALTER TABLE statements first)
351
+ # ...
352
+
353
+ # 3. Assign all existing data to the default user
354
+ tables_to_update = ['sessions', 'generated_pdfs', 'folders']
355
+ for table in tables_to_update:
356
+ try:
357
+ cursor.execute(f"UPDATE {table} SET user_id = ? WHERE user_id IS NULL", (user_id,))
358
+ print(f"Assigned {cursor.rowcount} records in '{table}' to user {user_id}")
359
+ except sqlite3.OperationalError as e:
360
+ print(f"Could not update table {table}. Maybe user_id column doesn't exist? Error: {e}")
361
+
362
+ conn.commit()
363
+ conn.close()
364
+ print("Data migration complete.")
365
+
366
+ if __name__ == '__main__':
367
+ migrate()
368
+ ```
processing.py ADDED
@@ -0,0 +1,305 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ import os
3
+ import base64
4
+ import io
5
+ import re
6
+ import json
7
+ import requests
8
+ import cv2
9
+ import numpy as np
10
+ from PIL import Image
11
+ from flask import current_app
12
+ from api_key_manager import get_api_key_manager
13
+
14
+ # --- NVIDIA NIM Configuration ---
15
+ NIM_API_URL = "https://ai.api.nvidia.com/v1/cv/nvidia/nemoretriever-ocr-v1"
16
+
17
+ def resize_image_if_needed(image_path: str) -> bytes:
18
+ """Resizes an image to a maximum of 500x500 pixels and returns bytes."""
19
+ with Image.open(image_path) as image:
20
+ MAX_SIZE = 500
21
+ width, height = image.size
22
+
23
+ if width > height:
24
+ new_width = min(width, MAX_SIZE)
25
+ new_height = int(height * (new_width / width))
26
+ else:
27
+ new_height = min(height, MAX_SIZE)
28
+ new_width = int(width * (new_height / height))
29
+
30
+ if new_width > MAX_SIZE:
31
+ new_width = MAX_SIZE
32
+ new_height = int(height * (new_width / width))
33
+ if new_height > MAX_SIZE:
34
+ new_height = MAX_SIZE
35
+ new_width = int(width * (new_height / height))
36
+
37
+ resized_image = image.resize((new_width, new_height), Image.Resampling.LANCZOS)
38
+
39
+ if resized_image.mode == 'RGBA':
40
+ resized_image = resized_image.convert('RGB')
41
+
42
+ img_byte_arr = io.BytesIO()
43
+ resized_image.save(img_byte_arr, format='JPEG', quality=85, optimize=True)
44
+ image_bytes = img_byte_arr.getvalue()
45
+
46
+ base64_size = len(base64.b64encode(image_bytes).decode('utf-8'))
47
+ if base64_size > 180000:
48
+ quality = max(50, int(85 * (180000 / base64_size)))
49
+ img_byte_arr = io.BytesIO()
50
+ resized_image.save(img_byte_arr, format='JPEG', quality=quality, optimize=True)
51
+ image_bytes = img_byte_arr.getvalue()
52
+
53
+ return image_bytes
54
+
55
+ def call_nim_ocr_api(image_bytes: bytes):
56
+ """Calls the NVIDIA NIM API to perform OCR on an image."""
57
+ # Get API key from the manager
58
+ manager = get_api_key_manager()
59
+ api_key, key_index = manager.get_key('nvidia')
60
+
61
+ if not api_key:
62
+ raise Exception("No available NVIDIA API keys. Please set NVIDIA_API_KEY environment variable.")
63
+
64
+ NIM_HEADERS = {
65
+ "Authorization": f"Bearer {api_key}",
66
+ "Accept": "application/json",
67
+ "Content-Type": "application/json",
68
+ }
69
+
70
+ base64_encoded_data = base64.b64encode(image_bytes)
71
+ base64_string = base64_encoded_data.decode('utf-8')
72
+
73
+ if len(base64_string) > 180000:
74
+ raise Exception("Image too large. To upload larger images, use the assets API.")
75
+
76
+ image_url = f"data:image/png;base64,{base64_string}"
77
+
78
+ payload = {
79
+ "input": [
80
+ {
81
+ "type": "image_url",
82
+ "url": image_url
83
+ }
84
+ ]
85
+ }
86
+
87
+ try:
88
+ response = requests.post(NIM_API_URL, headers=NIM_HEADERS, json=payload, timeout=300)
89
+ response.raise_for_status()
90
+ result = response.json()
91
+ manager.mark_success('nvidia', key_index)
92
+ return result
93
+ except requests.exceptions.RequestException as e:
94
+ manager.mark_failure('nvidia', key_index)
95
+ error_detail = str(e)
96
+ if e.response is not None:
97
+ try:
98
+ error_detail = e.response.json().get("error", e.response.text)
99
+ except json.JSONDecodeError:
100
+ error_detail = e.response.text
101
+ raise Exception(f"NIM API Error: {error_detail}")
102
+
103
+ def extract_question_number_from_ocr_result(ocr_result: dict) -> str:
104
+ """Extracts the question number from the OCR result."""
105
+ try:
106
+ if "data" in ocr_result and len(ocr_result["data"]) > 0:
107
+ text_detections = ocr_result["data"][0].get("text_detections", [])
108
+ content = " ".join([detection["text_prediction"]["text"] for detection in text_detections])
109
+ else:
110
+ content = str(ocr_result)
111
+
112
+ match = re.search(r'^\s*(\d+)', content)
113
+ if match:
114
+ return match.group(1)
115
+
116
+ match = re.search(r'(?:^|\s)(?:[Qq][\.:]?\s*|QUESTION\s+)(\d+)', content, re.IGNORECASE)
117
+ if match:
118
+ return match.group(1)
119
+
120
+ match = re.search(r'^\s*(\d+)[\.\)]', content)
121
+ if match:
122
+ return match.group(1)
123
+
124
+ return ""
125
+ except (KeyError, IndexError, TypeError):
126
+ return ""
127
+
128
+ def crop_image_perspective(image_path, points):
129
+ if len(points) < 4: return cv2.imread(image_path)
130
+ img = cv2.imread(image_path)
131
+ if img is None: raise ValueError("Could not read the image file.")
132
+ height, width = img.shape[:2]
133
+ def clamp(val): return max(0.0, min(1.0, val))
134
+ src_points = np.array([[clamp(p.get('x', 0.0)) * width, clamp(p.get('y', 0.0)) * height] for p in points[:4]], dtype=np.float32)
135
+ (tl, tr, br, bl) = src_points
136
+ width_top, width_bottom = np.linalg.norm(tr - tl), np.linalg.norm(br - bl)
137
+ max_width = int(max(width_top, width_bottom))
138
+ height_right, height_left = np.linalg.norm(tr - br), np.linalg.norm(tl - bl)
139
+ max_height = int(max(height_right, height_left))
140
+ if max_width == 0 or max_height == 0: return img
141
+ dst_points = np.array([[0, 0], [max_width - 1, 0], [max_width - 1, max_height - 1], [0, max_height - 1]], dtype=np.float32)
142
+ matrix = cv2.getPerspectiveTransform(src_points, dst_points)
143
+ return cv2.warpPerspective(img, matrix, (max_width, max_height))
144
+
145
+ def create_pdf_from_full_images(image_paths, output_filename, resolution=300.0):
146
+ """
147
+ Creates a PDF from a list of full-page images, preserving image quality
148
+ by creating pages of the same size as the images.
149
+ """
150
+ if not image_paths:
151
+ return False
152
+
153
+ try:
154
+ pdf_pages = []
155
+ for image_path in image_paths:
156
+ try:
157
+ with Image.open(image_path) as img:
158
+ # Ensure image is in a format that can be saved to PDF
159
+ img = img.convert('RGB')
160
+
161
+ # Create a new image with a white background of the same size.
162
+ # This avoids issues with alpha channels and ensures consistency.
163
+ page = Image.new('RGB', img.size, 'white')
164
+ page.paste(img, (0, 0))
165
+ pdf_pages.append(page)
166
+ except Exception as e:
167
+ print(f"Error opening or processing image {image_path}: {e}")
168
+
169
+ if not pdf_pages:
170
+ return False
171
+
172
+ # Save the first page and append the rest
173
+ pdf_pages[0].save(
174
+ output_filename,
175
+ "PDF",
176
+ save_all=True,
177
+ append_images=pdf_pages[1:],
178
+ resolution=resolution
179
+ )
180
+ return True
181
+ except Exception as e:
182
+ print(f"Error saving final PDF: {e}")
183
+ return False
184
+
185
+ def remove_color_from_image(image_path, target_colors, threshold, bg_mode, region_box=None):
186
+ """
187
+ Removes specific colors from an image using CIELAB Delta E distance.
188
+ Uses manual RGB->Lab conversion to strictly match frontend JS logic (Standard CIELAB).
189
+ """
190
+ # Read image (OpenCV loads as BGR)
191
+ img = cv2.imread(image_path, cv2.IMREAD_UNCHANGED)
192
+ if img is None:
193
+ raise ValueError(f"Could not read image: {image_path}")
194
+
195
+ # Handle Alpha Channel
196
+ if img.shape[2] == 3:
197
+ img = cv2.cvtColor(img, cv2.COLOR_BGR2BGRA)
198
+
199
+ # 1. PREPARE IMAGE (BGR -> RGB -> Normalized Float)
200
+ # We work on a copy for calculation
201
+ img_bgr = img[:, :, :3]
202
+ img_rgb = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB)
203
+
204
+ # Normalize to 0-1 for formula consistency with typical JS/CSS definitions
205
+ # (Frontend JS might be using 0-255 raw, let's verify frontend code provided earlier)
206
+ # Frontend code: r = rgb[0] / 255 ...
207
+ # Yes, frontend normalizes.
208
+ rgb_norm = img_rgb.astype(np.float32) / 255.0
209
+
210
+ # 2. RGB to XYZ (Vectorized)
211
+ # Formula matches JS: r = (r > 0.04045) ? ...
212
+ mask_linear = rgb_norm > 0.04045
213
+ rgb_linear = np.where(mask_linear, np.power((rgb_norm + 0.055) / 1.055, 2.4), rgb_norm / 12.92)
214
+
215
+ R, G, B = rgb_linear[:,:,0], rgb_linear[:,:,1], rgb_linear[:,:,2]
216
+
217
+ X = R * 0.4124 + G * 0.3576 + B * 0.1805
218
+ Y = R * 0.2126 + G * 0.7152 + B * 0.0722
219
+ Z = R * 0.0193 + G * 0.1192 + B * 0.9505
220
+
221
+ # Scale XYZ
222
+ X /= 0.95047
223
+ Y /= 1.00000
224
+ Z /= 1.08883
225
+
226
+ # 3. XYZ to Lab
227
+ # Formula: x = (x > 0.008856) ? ...
228
+ xyz_stack = np.stack([X, Y, Z], axis=-1)
229
+ mask_xyz = xyz_stack > 0.008856
230
+ f_xyz = np.where(mask_xyz, np.power(xyz_stack, 1/3), (7.787 * xyz_stack) + 16/116)
231
+
232
+ fx, fy, fz = f_xyz[:,:,0], f_xyz[:,:,1], f_xyz[:,:,2]
233
+
234
+ L_chn = (116.0 * fy) - 16.0
235
+ a_chn = 500.0 * (fx - fy)
236
+ b_chn = 200.0 * (fy - fz)
237
+
238
+ # 4. CALCULATE DISTANCE
239
+ # Threshold mapping matches frontend
240
+ max_delta_e = 110.0 - (float(threshold) * 100.0)
241
+ max_dist_sq = max_delta_e ** 2
242
+
243
+ final_keep_mask = np.zeros(L_chn.shape, dtype=bool)
244
+
245
+ if target_colors:
246
+ # Convert Targets (RGB -> Lab) using same math
247
+ # Since targets are few, we can do simple loop or small array
248
+ for c in target_colors:
249
+ # Normalize
250
+ r, g, b = c['r']/255.0, c['g']/255.0, c['b']/255.0
251
+
252
+ # Linearize
253
+ r = ((r + 0.055) / 1.055) ** 2.4 if r > 0.04045 else r / 12.92
254
+ g = ((g + 0.055) / 1.055) ** 2.4 if g > 0.04045 else g / 12.92
255
+ b = ((b + 0.055) / 1.055) ** 2.4 if b > 0.04045 else b / 12.92
256
+
257
+ # XYZ
258
+ x = (r * 0.4124 + g * 0.3576 + b * 0.1805) / 0.95047
259
+ y = (r * 0.2126 + g * 0.7152 + b * 0.0722) / 1.00000
260
+ z = (r * 0.0193 + g * 0.1192 + b * 0.9505) / 1.08883
261
+
262
+ # Lab
263
+ fx = x ** (1/3) if x > 0.008856 else (7.787 * x) + 16/116
264
+ fy = y ** (1/3) if y > 0.008856 else (7.787 * y) + 16/116
265
+ fz = z ** (1/3) if z > 0.008856 else (7.787 * z) + 16/116
266
+
267
+ tL = (116.0 * fy) - 16.0
268
+ ta = 500.0 * (fx - fy)
269
+ tb = 200.0 * (fy - fz)
270
+
271
+ # Dist
272
+ dist_sq = (L_chn - tL)**2 + (a_chn - ta)**2 + (b_chn - tb)**2
273
+ final_keep_mask |= (dist_sq <= max_dist_sq)
274
+
275
+ # Handle Region Box
276
+ if region_box:
277
+ h, w = img.shape[:2]
278
+ rx = int(region_box['x'] * w)
279
+ ry = int(region_box['y'] * h)
280
+ rw = int(region_box['w'] * w)
281
+ rh = int(region_box['h'] * h)
282
+
283
+ # Mask is TRUE everywhere EXCEPT the region (Keep outside)
284
+ region_protection_mask = np.ones(L_chn.shape, dtype=bool)
285
+ # Ensure coords are within bounds
286
+ ry = max(0, ry); rx = max(0, rx)
287
+ if rw > 0 and rh > 0:
288
+ region_protection_mask[ry:ry+rh, rx:rx+rw] = False
289
+
290
+ final_keep_mask |= region_protection_mask
291
+
292
+ # Apply Mask to Image
293
+ result = img.copy()
294
+
295
+ if bg_mode == 'black':
296
+ bg_color = [0, 0, 0, 255]
297
+ elif bg_mode == 'white':
298
+ bg_color = [255, 255, 255, 255]
299
+ else: # transparent
300
+ bg_color = [0, 0, 0, 0]
301
+
302
+ remove_mask = ~final_keep_mask
303
+ result[remove_mask] = bg_color
304
+
305
+ return result
qtab_routes.py ADDED
@@ -0,0 +1,509 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from flask import Blueprint, render_template, request, jsonify, redirect, url_for, flash, current_app, send_from_directory
2
+ from flask_login import login_required, current_user
3
+ from database import get_db_connection, get_qtab_folder_tree
4
+ from werkzeug.utils import secure_filename
5
+ import json
6
+ import os
7
+ import base64
8
+ from datetime import datetime, timezone
9
+ from google import genai
10
+ from google.genai import types
11
+ import logging
12
+
13
+ qtab_bp = Blueprint('qtab', __name__)
14
+
15
+ logger = logging.getLogger(__name__)
16
+
17
+ def extract_json_from_response(response_text):
18
+ """Extract JSON from Gemini response, handling code blocks."""
19
+ try:
20
+ if "```json" in response_text:
21
+ json_text = response_text.split("```json")[1].split("```")[0]
22
+ else:
23
+ json_text = response_text
24
+ return json.loads(json_text)
25
+ except Exception as e:
26
+ logger.error(f"JSON extraction error: {str(e)}")
27
+ return {
28
+ "status": "error",
29
+ "message": "Failed to parse response JSON",
30
+ "error": str(e)
31
+ }
32
+
33
+ def process_image_for_questions(image_path, username):
34
+ """Process an image to extract question-answer pairs using Gemini."""
35
+ try:
36
+ current_time = datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M:%S")
37
+
38
+ # Read image file
39
+ with open(image_path, 'rb') as f:
40
+ image_bytes = f.read()
41
+
42
+ image_base64 = base64.b64encode(image_bytes).decode("utf-8")
43
+
44
+ # Initialize GenAI client
45
+ api_key = os.getenv("GEMINI_API_KEY")
46
+ if not api_key:
47
+ return {
48
+ "status": "error",
49
+ "message": "GEMINI_API_KEY not configured"
50
+ }
51
+
52
+ client = genai.Client(api_key=api_key)
53
+ model = "gemini-flash-latest"
54
+
55
+ prompt = f"""Current Date and Time (UTC): {current_time}
56
+ Current User's Login: {username}
57
+
58
+ You are given an image file. Your task is to:
59
+
60
+ Extract question numbers and their corresponding answers, forming pairs.
61
+
62
+ Group them under appropriate sections, such as "Question Paper 1", "Section A", "Part B", etc., if such headers are present in the image.
63
+
64
+ If no sections are present, just list the question–answer pairs normally.
65
+
66
+ If the image is faulty, unclear, or does not contain extractable question-answer data, return a clear error in JSON.
67
+
68
+ Do not output anything except a valid JSON object.
69
+
70
+ Output Format:
71
+
72
+ If sections are detected (try your level best to do so):
73
+
74
+ {{
75
+ "status": "success",
76
+ "data": {{
77
+ "Section 1": [
78
+ {{"question_number": "1", "answer": "B"}},
79
+ {{"question_number": "2", "answer": "C"}}
80
+ ],
81
+ "Section 2": [
82
+ {{"question_number": "1", "answer": "A"}},
83
+ {{"question_number": "2", "answer": "D"}}
84
+ ]
85
+ }}
86
+ }}
87
+
88
+ If no sections are detected:
89
+
90
+ {{
91
+ "status": "success",
92
+ "data": [
93
+ {{"question_number": "1", "answer": "B"}},
94
+ {{"question_number": "2", "answer": "C"}}
95
+ ]
96
+ }}
97
+
98
+ If the image is faulty or data cannot be extracted:
99
+
100
+ {{
101
+ "status": "error",
102
+ "message": "Image is unreadable or does not contain question-answer data."
103
+ }}
104
+
105
+ Ensure the output is strictly in JSON format with no additional explanations or text."""
106
+
107
+ result = client.models.generate_content(
108
+ model=model,
109
+ contents=[
110
+ types.Content(
111
+ role="user",
112
+ parts=[
113
+ types.Part.from_text(text=prompt),
114
+ types.Part.from_bytes(
115
+ data=base64.b64decode(image_base64),
116
+ mime_type="image/jpeg"
117
+ ),
118
+ ],
119
+ )
120
+ ],
121
+ )
122
+
123
+ parsed_result = extract_json_from_response(result.text)
124
+ parsed_result.update({
125
+ "metadata": {
126
+ "processed_at": current_time,
127
+ "processed_by": username
128
+ }
129
+ })
130
+ return parsed_result
131
+
132
+ except Exception as e:
133
+ logger.error(f"Image processing error: {str(e)}")
134
+ return {
135
+ "status": "error",
136
+ "message": "Failed to process image.",
137
+ "error": str(e),
138
+ "metadata": {
139
+ "processed_at": datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M:%S"),
140
+ "processed_by": username
141
+ }
142
+ }
143
+
144
+
145
+ @qtab_bp.route('/qtab')
146
+ @qtab_bp.route('/qtab/<path:folder_path>')
147
+ @login_required
148
+ def qtab_list(folder_path=''):
149
+ """Display the question table interface with folder navigation."""
150
+ conn = get_db_connection()
151
+
152
+ # Folder Navigation Logic (same as subjective_list)
153
+ folder_id = None
154
+ breadcrumbs = []
155
+
156
+ if folder_path:
157
+ parts = folder_path.split('/')
158
+ parent_id = None
159
+ for i, part in enumerate(parts):
160
+ res = conn.execute(
161
+ "SELECT id FROM qtab_folders WHERE name = ? AND user_id = ? AND (parent_id = ? OR (? IS NULL AND parent_id IS NULL))",
162
+ (part, current_user.id, parent_id, parent_id)
163
+ ).fetchone()
164
+ if not res:
165
+ conn.close()
166
+ flash('Folder not found.', 'danger')
167
+ return redirect(url_for('qtab.qtab_list'))
168
+ parent_id = res['id']
169
+ breadcrumbs.append({'name': part, 'path': '/'.join(parts[:i+1])})
170
+ folder_id = parent_id
171
+
172
+ # Fetch Subfolders
173
+ if folder_id:
174
+ subfolders = conn.execute(
175
+ 'SELECT * FROM qtab_folders WHERE parent_id = ? AND user_id = ? ORDER BY name',
176
+ (folder_id, current_user.id)
177
+ ).fetchall()
178
+ else:
179
+ subfolders = conn.execute(
180
+ 'SELECT * FROM qtab_folders WHERE parent_id IS NULL AND user_id = ? ORDER BY name',
181
+ (current_user.id,)
182
+ ).fetchall()
183
+
184
+ # Fetch images in this folder from the qtab_images table
185
+ if folder_id:
186
+ images = conn.execute(
187
+ 'SELECT * FROM qtab_images WHERE folder_id = ? AND user_id = ? ORDER BY created_at DESC',
188
+ (folder_id, current_user.id)
189
+ ).fetchall()
190
+ else:
191
+ images = conn.execute(
192
+ 'SELECT * FROM qtab_images WHERE folder_id IS NULL AND user_id = ? ORDER BY created_at DESC',
193
+ (current_user.id,)
194
+ ).fetchall()
195
+
196
+ conn.close()
197
+
198
+ # Convert to dicts for template compatibility
199
+ subfolders = [dict(row) for row in subfolders]
200
+ images = [dict(row) for row in images]
201
+
202
+ folder_tree = get_qtab_folder_tree(current_user.id)
203
+
204
+ return render_template(
205
+ 'qtab_list.html',
206
+ images=images,
207
+ subfolders=subfolders,
208
+ breadcrumbs=breadcrumbs,
209
+ current_folder_id=folder_id,
210
+ folder_tree=folder_tree
211
+ )
212
+
213
+
214
+ @qtab_bp.route('/qtab/upload', methods=['POST'])
215
+ @login_required
216
+ def qtab_upload():
217
+ """Upload and process images for question extraction."""
218
+ if 'image' not in request.files:
219
+ return jsonify({"status": "error", "message": "No image file provided."}), 400
220
+
221
+ image = request.files['image']
222
+ if image.filename == '':
223
+ return jsonify({"status": "error", "message": "No selected file."}), 400
224
+
225
+ folder_id = request.form.get('folder_id')
226
+ if folder_id == 'null' or folder_id == '':
227
+ folder_id = None
228
+
229
+ try:
230
+ # Save the uploaded image
231
+ filename = secure_filename(image.filename)
232
+ timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
233
+ saved_filename = f"qtab_{current_user.id}_{timestamp}_{filename}"
234
+
235
+ # Create qtab folder if it doesn't exist
236
+ qtab_folder = os.path.join(current_app.config['UPLOAD_FOLDER'], 'qtab')
237
+ os.makedirs(qtab_folder, exist_ok=True)
238
+
239
+ file_path = os.path.join(qtab_folder, saved_filename)
240
+ image.save(file_path)
241
+
242
+ # Process the image with Gemini
243
+ result = process_image_for_questions(file_path, current_user.username)
244
+
245
+ # Store in database
246
+ conn = get_db_connection()
247
+ conn.execute(
248
+ '''INSERT INTO qtab_images
249
+ (user_id, folder_id, filename, original_name, result_json, status)
250
+ VALUES (?, ?, ?, ?, ?, ?)''',
251
+ (current_user.id, folder_id, saved_filename, filename,
252
+ json.dumps(result), result.get('status', 'error'))
253
+ )
254
+ conn.commit()
255
+ image_id = conn.execute('SELECT last_insert_rowid()').fetchone()[0]
256
+ conn.close()
257
+
258
+ result['image_id'] = image_id
259
+ return jsonify(result)
260
+
261
+ except Exception as e:
262
+ logger.error(f"Upload error: {str(e)}")
263
+ return jsonify({
264
+ "status": "error",
265
+ "message": "An error occurred.",
266
+ "error": str(e),
267
+ "metadata": {
268
+ "processed_at": datetime.now(timezone.utc).strftime("%Y-%m-%d %H:%M:%S"),
269
+ "processed_by": current_user.username
270
+ }
271
+ }), 500
272
+
273
+
274
+ @qtab_bp.route('/qtab/image/<int:image_id>')
275
+ @login_required
276
+ def qtab_get_image(image_id):
277
+ """Get details of a specific qtab image."""
278
+ conn = get_db_connection()
279
+ image = conn.execute(
280
+ 'SELECT * FROM qtab_images WHERE id = ? AND user_id = ?',
281
+ (image_id, current_user.id)
282
+ ).fetchone()
283
+ conn.close()
284
+
285
+ if not image:
286
+ return jsonify({'error': 'Image not found'}), 404
287
+
288
+ return jsonify(dict(image))
289
+
290
+
291
+ @qtab_bp.route('/qtab/image/<int:image_id>/delete', methods=['DELETE'])
292
+ @login_required
293
+ def qtab_delete_image(image_id):
294
+ """Delete a qtab image."""
295
+ conn = get_db_connection()
296
+
297
+ # Check ownership
298
+ image = conn.execute(
299
+ 'SELECT filename FROM qtab_images WHERE id = ? AND user_id = ?',
300
+ (image_id, current_user.id)
301
+ ).fetchone()
302
+
303
+ if not image:
304
+ conn.close()
305
+ return jsonify({'error': 'Image not found or unauthorized'}), 404
306
+
307
+ # Delete file
308
+ qtab_folder = os.path.join(current_app.config['UPLOAD_FOLDER'], 'qtab')
309
+ file_path = os.path.join(qtab_folder, image['filename'])
310
+ try:
311
+ if os.path.exists(file_path):
312
+ os.remove(file_path)
313
+ except OSError as e:
314
+ logger.error(f"Error deleting file: {e}")
315
+
316
+ # Delete from database
317
+ conn.execute('DELETE FROM qtab_images WHERE id = ?', (image_id,))
318
+ conn.commit()
319
+ conn.close()
320
+
321
+ return jsonify({'success': True})
322
+
323
+
324
+ @qtab_bp.route('/qtab/image/<int:image_id>/rename', methods=['PUT'])
325
+ @login_required
326
+ def qtab_rename_image(image_id):
327
+ """Rename a qtab image."""
328
+ data = request.json
329
+ new_name = data.get('name', '').strip()
330
+
331
+ if not new_name:
332
+ return jsonify({'error': 'Name is required'}), 400
333
+
334
+ conn = get_db_connection()
335
+
336
+ # Check ownership
337
+ image = conn.execute(
338
+ 'SELECT id FROM qtab_images WHERE id = ? AND user_id = ?',
339
+ (image_id, current_user.id)
340
+ ).fetchone()
341
+
342
+ if not image:
343
+ conn.close()
344
+ return jsonify({'error': 'Image not found or unauthorized'}), 404
345
+
346
+ try:
347
+ conn.execute(
348
+ 'UPDATE qtab_images SET original_name = ? WHERE id = ?',
349
+ (new_name, image_id)
350
+ )
351
+ conn.commit()
352
+ conn.close()
353
+ return jsonify({'success': True})
354
+ except Exception as e:
355
+ conn.rollback()
356
+ conn.close()
357
+ return jsonify({'error': str(e)}), 500
358
+
359
+
360
+ @qtab_bp.route('/qtab/create_folder', methods=['POST'])
361
+ @login_required
362
+ def qtab_create_folder():
363
+ """Create a new folder in qtab_folders table."""
364
+ data = request.json
365
+ folder_name = data.get('name', '').strip()
366
+ parent_id = data.get('parent_id')
367
+
368
+ if not folder_name:
369
+ return jsonify({'error': 'Folder name is required'}), 400
370
+
371
+ if parent_id == 'null' or parent_id == '':
372
+ parent_id = None
373
+
374
+ conn = get_db_connection()
375
+ try:
376
+ # Check if folder with same name exists at this level
377
+ if parent_id:
378
+ existing = conn.execute(
379
+ 'SELECT id FROM qtab_folders WHERE name = ? AND parent_id = ? AND user_id = ?',
380
+ (folder_name, parent_id, current_user.id)
381
+ ).fetchone()
382
+ else:
383
+ existing = conn.execute(
384
+ 'SELECT id FROM qtab_folders WHERE name = ? AND parent_id IS NULL AND user_id = ?',
385
+ (folder_name, current_user.id)
386
+ ).fetchone()
387
+
388
+ if existing:
389
+ conn.close()
390
+ return jsonify({'error': 'Folder with this name already exists'}), 400
391
+
392
+ conn.execute(
393
+ 'INSERT INTO qtab_folders (name, parent_id, user_id) VALUES (?, ?, ?)',
394
+ (folder_name, parent_id, current_user.id)
395
+ )
396
+ conn.commit()
397
+ folder_id = conn.execute('SELECT last_insert_rowid()').fetchone()[0]
398
+ conn.close()
399
+
400
+ return jsonify({'success': True, 'folder_id': folder_id})
401
+ except Exception as e:
402
+ conn.rollback()
403
+ conn.close()
404
+ return jsonify({'error': str(e)}), 500
405
+
406
+
407
+ @qtab_bp.route('/qtab/move_images', methods=['POST'])
408
+ @login_required
409
+ def qtab_move_images():
410
+ """Move images to a different folder."""
411
+ data = request.json
412
+ image_ids = data.get('image_ids', [])
413
+ target_folder_id = data.get('target_folder_id')
414
+
415
+ if not image_ids:
416
+ return jsonify({'error': 'No images selected'}), 400
417
+
418
+ if target_folder_id == 'null' or target_folder_id == '':
419
+ target_folder_id = None
420
+
421
+ conn = get_db_connection()
422
+ try:
423
+ # Verify target folder ownership if not root
424
+ if target_folder_id:
425
+ owner = conn.execute(
426
+ 'SELECT user_id FROM qtab_folders WHERE id = ?',
427
+ (target_folder_id,)
428
+ ).fetchone()
429
+ if not owner or owner['user_id'] != current_user.id:
430
+ conn.close()
431
+ return jsonify({'error': 'Unauthorized target folder'}), 403
432
+
433
+ # Move images
434
+ placeholders = ', '.join('?' * len(image_ids))
435
+ conn.execute(
436
+ f'UPDATE qtab_images SET folder_id = ? WHERE id IN ({placeholders}) AND user_id = ?',
437
+ (target_folder_id, *image_ids, current_user.id)
438
+ )
439
+
440
+ conn.commit()
441
+ return jsonify({'success': True})
442
+ except Exception as e:
443
+ conn.rollback()
444
+ return jsonify({'error': str(e)}), 500
445
+ finally:
446
+ conn.close()
447
+
448
+
449
+ @qtab_bp.route('/qtab/serve/<filename>')
450
+ @login_required
451
+ def serve_qtab_image(filename):
452
+ """Serve qtab images with security check."""
453
+ conn = get_db_connection()
454
+
455
+ # Verify ownership
456
+ image = conn.execute(
457
+ 'SELECT user_id FROM qtab_images WHERE filename = ?',
458
+ (filename,)
459
+ ).fetchone()
460
+ conn.close()
461
+
462
+ if not image or image['user_id'] != current_user.id:
463
+ return "Unauthorized", 403
464
+
465
+ qtab_folder = os.path.join(current_app.config['UPLOAD_FOLDER'], 'qtab')
466
+ return send_from_directory(qtab_folder, filename)
467
+
468
+
469
+ @qtab_bp.route('/qtab/exam/<int:image_id>')
470
+ @login_required
471
+ def qtab_exam_mode(image_id):
472
+ """Exam mode - one question at a time with keyboard navigation."""
473
+ conn = get_db_connection()
474
+
475
+ # Get the image and verify ownership
476
+ image = conn.execute(
477
+ 'SELECT * FROM qtab_images WHERE id = ? AND user_id = ?',
478
+ (image_id, current_user.id)
479
+ ).fetchone()
480
+
481
+ if not image:
482
+ conn.close()
483
+ flash('Image not found or unauthorized', 'danger')
484
+ return redirect(url_for('qtab.qtab_list'))
485
+
486
+ # Parse the result JSON to get all questions
487
+ result_json = json.loads(image['result_json']) if image['result_json'] else {}
488
+
489
+ # Get all images in the same folder for navigation
490
+ if image['folder_id']:
491
+ all_images = conn.execute(
492
+ 'SELECT id, original_name, status FROM qtab_images WHERE folder_id = ? AND user_id = ? ORDER BY created_at',
493
+ (image['folder_id'], current_user.id)
494
+ ).fetchall()
495
+ else:
496
+ all_images = conn.execute(
497
+ 'SELECT id, original_name, status FROM qtab_images WHERE folder_id IS NULL AND user_id = ? ORDER BY created_at',
498
+ (current_user.id,)
499
+ ).fetchall()
500
+
501
+ conn.close()
502
+
503
+ return render_template(
504
+ 'qtab_exam.html',
505
+ image=dict(image),
506
+ result_json=result_json,
507
+ all_images=[dict(img) for img in all_images],
508
+ current_index=next((i for i, img in enumerate(all_images) if img['id'] == image_id), 0)
509
+ )
redact.py ADDED
@@ -0,0 +1,218 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # main_redaction_processor.py
2
+
3
+ # Required packages: pip install requests Pillow
4
+ import os
5
+ import requests
6
+ from PIL import Image, ImageDraw
7
+ import io
8
+ import base64
9
+ import json
10
+
11
+ # --- Configuration ---
12
+ # API endpoints should remain constant
13
+ INVOKE_URL_OCR = "https://ai.api.nvidia.com/v1/cv/nvidia/nemoretriever-ocr-v1"
14
+ INVOKE_URL_PARSER = "https://integrate.api.nvidia.com/v1/chat/completions"
15
+
16
+ # Define a max pixel count for the parser model to avoid sending overly large images.
17
+ MAX_PIXELS_FOR_PARSER = 1024 * 1024 # 1 Megapixel
18
+
19
+ # --- Internal Helper Functions ---
20
+
21
+ def _get_average_color_from_regions(image: Image.Image, regions: list[tuple]):
22
+ """Calculates the average RGB color from a list of regions in an image."""
23
+ total_r, total_g, total_b = 0, 0, 0
24
+ pixel_count = 0
25
+ img_width, img_height = image.size
26
+ if image.mode == 'RGBA': image = image.convert('RGB')
27
+ pixels = image.load()
28
+ for region in regions:
29
+ x1, y1, x2, y2 = [max(0, int(c)) for c in region]
30
+ x2 = min(img_width, x2); y2 = min(img_height, y2)
31
+ for x in range(x1, x2):
32
+ for y in range(y1, y2):
33
+ r, g, b = pixels[x, y]
34
+ total_r += r; total_g += g; total_b += b
35
+ pixel_count += 1
36
+ if pixel_count == 0: return (0, 0, 0)
37
+ return (total_r // pixel_count, total_g // pixel_count, total_b // pixel_count)
38
+
39
+
40
+ def _detect_pictures_with_parser(image_to_process: Image.Image, api_key: str):
41
+ """Sends an image to the NemoRetriever Parser model to detect 'Picture' elements."""
42
+ headers = {"Authorization": f"Bearer {api_key}", "Accept": "application/json"}
43
+ buffered = io.BytesIO()
44
+ image_to_process.save(buffered, format="PNG")
45
+ b64_str = base64.b64encode(buffered.getvalue()).decode("ascii")
46
+
47
+ content = f'<img src="data:image/png;base64,{b64_str}" />'
48
+ tool_name = "markdown_bbox"
49
+ payload = {
50
+ "model": "nvidia/nemoretriever-parse",
51
+ "messages": [{"role": "user", "content": content}],
52
+ "tools": [{"type": "function", "function": {"name": tool_name}}],
53
+ "tool_choice": {"type": "function", "function": {"name": tool_name}},
54
+ "max_tokens": 2048,
55
+ }
56
+
57
+ response = requests.post(INVOKE_URL_PARSER, headers=headers, json=payload, timeout=120)
58
+ response.raise_for_status()
59
+ response_json = response.json()
60
+
61
+ picture_bboxes = []
62
+ tool_calls = response_json.get('choices', [{}])[0].get('message', {}).get('tool_calls', [])
63
+ if tool_calls:
64
+ arguments_str = tool_calls[0].get('function', {}).get('arguments', '[]')
65
+ parsed_arguments = json.loads(arguments_str)
66
+ if parsed_arguments and isinstance(parsed_arguments, list):
67
+ for element in parsed_arguments[0]:
68
+ if element.get("type") == "Picture" and element.get("bbox"):
69
+ picture_bboxes.append(element["bbox"])
70
+ return picture_bboxes
71
+
72
+
73
+ def _redact_text_in_image(input_image: Image.Image, api_key: str):
74
+ """Sends a (cropped) image to the OCR model and returns a redacted version."""
75
+ headers = {"Authorization": f"Bearer {api_key}", "Accept": "application/json"}
76
+ buffered = io.BytesIO()
77
+ input_image.save(buffered, format="PNG")
78
+ image_b64 = base64.b64encode(buffered.getvalue()).decode()
79
+
80
+ payload = {"input": [{"type": "image_url", "url": f"data:image/png;base64,{image_b64}"}]}
81
+ try:
82
+ response = requests.post(INVOKE_URL_OCR, headers=headers, json=payload, timeout=60)
83
+ response.raise_for_status()
84
+ response_json = response.json()
85
+ except requests.exceptions.RequestException: return input_image
86
+
87
+ image_with_redactions = input_image.copy()
88
+ draw = ImageDraw.Draw(image_with_redactions)
89
+ img_width, img_height = image_with_redactions.size
90
+ radius = max(1, int(((img_width**2 + img_height**2)**0.5) / 100))
91
+
92
+ try:
93
+ detections = response_json['data'][0]['text_detections']
94
+ for detection in detections:
95
+ bbox = detection.get("bounding_box")
96
+ if bbox and bbox.get("points"):
97
+ points = bbox["points"]
98
+ p1 = (points[0]['x'] * img_width, points[0]['y'] * img_height)
99
+ p3 = (points[2]['x'] * img_width, points[2]['y'] * img_height)
100
+ sample_regions = [(p1[0], p1[1] - radius, p3[0], p1[1]), (p1[0], p3[1], p3[0], p3[1] + radius), (p1[0] - radius, p1[1], p1[0], p3[1]), (p3[0], p1[1], p3[0] + radius, p3[1])]
101
+ redaction_color = _get_average_color_from_regions(image_with_redactions, sample_regions)
102
+ draw.rectangle([p1, p3], fill=redaction_color)
103
+ return image_with_redactions
104
+ except (KeyError, IndexError, TypeError): return input_image
105
+
106
+
107
+ # --- Main Public Function ---
108
+
109
+ def redact_pictures_in_image(image_source: str, api_key: str, callback: callable = None) -> Image.Image:
110
+ """
111
+ Analyzes an image to find pictures, then redacts text within those pictures.
112
+
113
+ Args:
114
+ image_source (str): The source of the image. Can be a local file path
115
+ or a base64 encoded string.
116
+ api_key (str): Your NVIDIA API key.
117
+ callback (callable, optional): A function to call with progress updates.
118
+ Defaults to None. The function should accept
119
+ a single string argument.
120
+
121
+ Returns:
122
+ Image.Image: A PIL Image object with the text inside pictures redacted.
123
+ """
124
+
125
+ def _progress(message: str):
126
+ if callback:
127
+ callback(message)
128
+
129
+ _progress("Step 1: Loading image...")
130
+ try:
131
+ if os.path.exists(image_source):
132
+ input_image = Image.open(image_source).convert("RGB")
133
+ else:
134
+ image_bytes = base64.b64decode(image_source)
135
+ input_image = Image.open(io.BytesIO(image_bytes)).convert("RGB")
136
+ except Exception as e:
137
+ raise ValueError(f"Invalid image_source: not a valid file path or base64 string. Error: {e}")
138
+
139
+ # --- Resize if necessary for analysis ---
140
+ image_to_analyze = input_image
141
+ original_width, original_height = input_image.size
142
+ if (original_width * original_height) > MAX_PIXELS_FOR_PARSER:
143
+ _progress(f"Image is large, resizing for initial analysis...")
144
+ scale = (MAX_PIXELS_FOR_PARSER / (original_width * original_height))**0.5
145
+ new_dims = (int(original_width * scale), int(original_height * scale))
146
+ image_to_analyze = input_image.resize(new_dims, Image.Resampling.LANCZOS)
147
+
148
+ # --- Detect Pictures ---
149
+ _progress("Step 2: Detecting 'Picture' elements...")
150
+ try:
151
+ picture_bboxes = _detect_pictures_with_parser(image_to_analyze, api_key)
152
+ except requests.exceptions.RequestException as e:
153
+ _progress(f"API Error during picture detection: {e}")
154
+ raise # Re-raise the exception after reporting progress
155
+
156
+ if not picture_bboxes:
157
+ _progress("No 'Picture' elements were found. Returning original image.")
158
+ return input_image
159
+
160
+ _progress(f"Step 3: Found {len(picture_bboxes)} 'Picture' element(s). Redacting text...")
161
+ final_image = input_image.copy()
162
+
163
+ # --- Crop, Redact, and Paste ---
164
+ for i, box in enumerate(picture_bboxes):
165
+ _progress(f" - Processing picture {i + 1} of {len(picture_bboxes)}...")
166
+ x1 = int(box["xmin"] * original_width)
167
+ y1 = int(box["ymin"] * original_height)
168
+ x2 = int(box["xmax"] * original_width)
169
+ y2 = int(box["ymax"] * original_height)
170
+
171
+ # Crop from the original, high-resolution image
172
+ cropped_element = input_image.crop((x1, y1, x2, y2))
173
+
174
+ redacted_crop = _redact_text_in_image(cropped_element, api_key)
175
+
176
+ # Paste the redacted, high-resolution crop back
177
+ final_image.paste(redacted_crop, (x1, y1))
178
+
179
+ _progress("Step 4: Redaction process complete.")
180
+ return final_image
181
+
182
+
183
+ # --- Example Usage ---
184
+ if __name__ == "__main__":
185
+
186
+ # Define a simple callback function to print progress to the console.
187
+ def print_progress(message: str):
188
+ print(f"[PROGRESS] {message}")
189
+
190
+ # 1. Get API Key from environment variable
191
+ my_api_key = os.getenv("NVIDIA_API_KEY")
192
+ if not my_api_key:
193
+ print("ERROR: Please set the NVIDIA_API_KEY environment variable.")
194
+ else:
195
+ # 2. Define the path to your input image
196
+ # (replace with your actual image file)
197
+ input_image_path = "yolox1.png" # Make sure this image exists
198
+
199
+ if not os.path.exists(input_image_path):
200
+ print(f"ERROR: Input image not found at '{input_image_path}'")
201
+ else:
202
+ print("--- Running Redaction on Image Path ---")
203
+ try:
204
+ # 3. Call the main function with the image path and callback
205
+ redacted_image = redact_pictures_in_image(
206
+ image_source=input_image_path,
207
+ api_key=my_api_key,
208
+ callback=print_progress
209
+ )
210
+
211
+ # 4. Save the result
212
+ output_path = "redacted_output.png"
213
+ redacted_image.save(output_path)
214
+ print(f"\nSuccessfully saved redacted image to '{output_path}'")
215
+
216
+ except Exception as e:
217
+ print(f"\nAn error occurred: {e}")
218
+
requirements.txt ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # requirements.txt
2
+
3
+ Flask
4
+ Flask-Cors
5
+ numpy
6
+ opencv-python
7
+ Pillow
8
+ requests
9
+ gunicorn
10
+ PyMuPDF
11
+ tqdm
12
+ rich
13
+
14
+ imgkit
15
+ beautifulsoup4
16
+ google-genai
17
+ python-dotenv
18
+ flask-socketio
19
+ simple-websocket
resize.py ADDED
@@ -0,0 +1,180 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import fitz # PyMuPDF
2
+ import sys
3
+
4
+ def expand_pdf_for_notes(input_pdf, output_pdf, bg_color=(1, 1, 1), mode='notes_only', stitch_direction='horizontal', add_space=True, pattern=None, pattern_color=(0.8, 0.8, 0.8)):
5
+ """
6
+ Expand or rearrange a PDF for note-taking.
7
+
8
+ Args:
9
+ input_pdf (str): Path to input PDF file.
10
+ output_pdf (str): Path to output PDF file.
11
+ bg_color (tuple): RGB color for the notes area background.
12
+ mode (str): The processing mode: 'notes_only', 'split', or 'stitch'.
13
+ stitch_direction (str): For 'stitch' mode, how to rearrange columns ('horizontal' or 'vertical').
14
+ add_space (bool): If True, add space for notes.
15
+ pattern (str): Name of the pattern to draw ('grid', 'dots').
16
+ pattern_color (tuple): RGB color for the pattern.
17
+ """
18
+ doc = fitz.open(input_pdf)
19
+ new_doc = fitz.open()
20
+
21
+ for page_num in range(len(doc)):
22
+ page = doc[page_num]
23
+ orig_rect = page.rect
24
+ orig_width = orig_rect.width
25
+ orig_height = orig_rect.height
26
+
27
+ left_half_clip = fitz.Rect(0, 0, orig_width / 2, orig_height)
28
+ right_half_clip = fitz.Rect(orig_width / 2, 0, orig_width, orig_height)
29
+
30
+ if mode == 'split':
31
+ # Create a new page for the left half
32
+ new_page_width = orig_width / 2 if not add_space else orig_width
33
+ left_page = new_doc.new_page(width=new_page_width, height=orig_height)
34
+ left_page.show_pdf_page(fitz.Rect(0, 0, orig_width / 2, orig_height), doc, page_num, clip=left_half_clip)
35
+ if add_space:
36
+ notes_rect = fitz.Rect(orig_width / 2, 0, orig_width, orig_height)
37
+ left_page.draw_rect(notes_rect, color=None, fill=bg_color)
38
+ if pattern:
39
+ _draw_pattern(new_doc, left_page, notes_rect, pattern, pattern_color)
40
+
41
+ # Create a new page for the right half
42
+ right_page = new_doc.new_page(width=new_page_width, height=orig_height)
43
+ right_page.show_pdf_page(fitz.Rect(0, 0, orig_width / 2, orig_height), doc, page_num, clip=right_half_clip)
44
+ if add_space:
45
+ notes_rect = fitz.Rect(orig_width / 2, 0, orig_width, orig_height)
46
+ right_page.draw_rect(notes_rect, color=None, fill=bg_color)
47
+ if pattern:
48
+ _draw_pattern(new_doc, right_page, notes_rect, pattern, pattern_color)
49
+
50
+ elif mode == 'stitch':
51
+ if stitch_direction == 'horizontal':
52
+ new_width = orig_width
53
+ if add_space:
54
+ new_width *= 2
55
+ new_page = new_doc.new_page(width=new_width, height=orig_height)
56
+ new_page.show_pdf_page(fitz.Rect(0, 0, orig_width / 2, orig_height), doc, page_num, clip=left_half_clip)
57
+ new_page.show_pdf_page(fitz.Rect(orig_width / 2, 0, orig_width, orig_height), doc, page_num, clip=right_half_clip)
58
+ if add_space:
59
+ notes_rect = fitz.Rect(orig_width, 0, new_width, orig_height)
60
+ new_page.draw_rect(notes_rect, color=None, fill=bg_color)
61
+ if pattern:
62
+ _draw_pattern(new_doc, new_page, notes_rect, pattern, pattern_color)
63
+
64
+ else: # vertical
65
+ new_width = orig_width / 2
66
+ if add_space:
67
+ new_width = orig_width
68
+ new_height = orig_height * 2
69
+ new_page = new_doc.new_page(width=new_width, height=new_height)
70
+ new_page.show_pdf_page(fitz.Rect(0, 0, orig_width / 2, orig_height), doc, page_num, clip=left_half_clip)
71
+ new_page.show_pdf_page(fitz.Rect(0, orig_height, orig_width / 2, new_height), doc, page_num, clip=right_half_clip)
72
+ if add_space:
73
+ notes_rect = fitz.Rect(orig_width / 2, 0, new_width, new_height)
74
+ new_page.draw_rect(notes_rect, color=None, fill=bg_color)
75
+ if pattern:
76
+ _draw_pattern(new_doc, new_page, notes_rect, pattern, pattern_color)
77
+
78
+ elif mode == 'notes_only':
79
+ if add_space:
80
+ new_page = new_doc.new_page(width=orig_width * 2, height=orig_height)
81
+ right_rect = fitz.Rect(orig_width, 0, orig_width * 2, orig_height)
82
+ new_page.draw_rect(right_rect, color=None, fill=bg_color)
83
+ if pattern:
84
+ _draw_pattern(new_doc, new_page, right_rect, pattern, pattern_color)
85
+ new_page.show_pdf_page(fitz.Rect(0, 0, orig_width, orig_height), doc, page_num)
86
+ else:
87
+ new_doc.insert_pdf(doc, from_page=page_num, to_page=page_num)
88
+
89
+ else: # Default to copying the page if mode is unknown
90
+ new_doc.insert_pdf(doc, from_page=page_num, to_page=page_num)
91
+
92
+ new_doc.save(output_pdf)
93
+ total_pages = len(new_doc)
94
+ new_doc.close()
95
+ doc.close()
96
+ print(f"✓ Successfully created: {output_pdf}")
97
+ print(f" Pages processed: {total_pages}")
98
+
99
+ def _draw_pattern(doc, page, rect, pattern, color):
100
+ if pattern == 'grid':
101
+ _draw_grid(page, rect, color=color)
102
+ elif pattern == 'dots':
103
+ _draw_dots(doc, page, rect, color=color)
104
+
105
+ def _draw_grid(page, rect, spacing=20, color=(0.8, 0.8, 0.8)):
106
+ # Draw vertical lines
107
+ for x in range(int(rect.x0), int(rect.x1), spacing):
108
+ page.draw_line(fitz.Point(x, rect.y0), fitz.Point(x, rect.y1), color=color, width=0.5)
109
+ # Draw horizontal lines
110
+ for y in range(int(rect.y0), int(rect.y1), spacing):
111
+ page.draw_line(fitz.Point(rect.x0, y), fitz.Point(rect.x1, y), color=color, width=0.5)
112
+
113
+ def _draw_dots(doc, page, rect, spacing=20, radius=1, color=(0.8, 0.8, 0.8)):
114
+ """Creates a tileable dot pattern using a Form XObject for efficiency."""
115
+ # Create a small rectangle for one pattern unit
116
+ stamp_rect = fitz.Rect(0, 0, spacing, spacing)
117
+ # Create a new PDF for the stamp
118
+ stamp_doc = fitz.open()
119
+ stamp_page = stamp_doc.new_page(width=spacing, height=spacing)
120
+
121
+ # Draw a single dot in the corner of the stamp page
122
+ stamp_page.draw_circle(fitz.Point(radius, radius), radius, color=color, fill=color)
123
+
124
+ # Convert the stamp page to a stamp (Form XObject) and get its cross-reference number
125
+ stamp_xref = doc.get_xref(stamp_doc.convert_to_pdf())
126
+ stamp_doc.close()
127
+
128
+ # Tile the stamp across the target rectangle
129
+ for x in range(int(rect.x0), int(rect.x1), spacing):
130
+ for y in range(int(rect.y0), int(rect.y1), spacing):
131
+ page.show_pdf_page(fitz.Rect(x, y, x + spacing, y + spacing), stamp_xref)
132
+
133
+
134
+
135
+ def main():
136
+ """Main function with command-line interface"""
137
+ import argparse
138
+ parser = argparse.ArgumentParser(description="Expand or rearrange a PDF for note-taking.")
139
+ parser.add_argument("input_pdf", help="Path to input PDF file.")
140
+ parser.add_argument("output_pdf", nargs='?', help="Path to output PDF file.")
141
+ parser.add_argument("--mode", choices=['notes_only', 'split', 'stitch'], default='notes_only', help="Processing mode.")
142
+ parser.add_argument("--stitch-direction", choices=['horizontal', 'vertical'], default='horizontal', help="Direction for 'stitch' mode.")
143
+ parser.add_argument("--no-space", action='store_true', help="Don't add extra space for notes.")
144
+ parser.add_argument("--bg", default='white', help="Background color (white, lightgray, cream)." )
145
+
146
+ args = parser.parse_args()
147
+
148
+ output_pdf = args.output_pdf
149
+ if not output_pdf:
150
+ suffix = f'_{args.mode}'
151
+ if args.mode == 'stitch':
152
+ suffix += f'_{args.stitch_direction[:4]}'
153
+ if not args.no_space:
154
+ suffix += '_notes'
155
+ suffix += '.pdf'
156
+ output_pdf = args.input_pdf.replace('.pdf', suffix)
157
+
158
+ bg_colors = {
159
+ 'white': (1, 1, 1),
160
+ 'lightgray': (0.95, 0.95, 0.95),
161
+ 'cream': (1, 0.99, 0.94),
162
+ }
163
+ bg_color = bg_colors.get(args.bg, (1, 1, 1))
164
+
165
+ try:
166
+ expand_pdf_for_notes(
167
+ args.input_pdf,
168
+ output_pdf,
169
+ bg_color=bg_color,
170
+ mode=args.mode,
171
+ stitch_direction=args.stitch_direction,
172
+ add_space=not args.no_space
173
+ )
174
+ except Exception as e:
175
+ print(f"Error: {e}")
176
+ sys.exit(1)
177
+
178
+
179
+ if __name__ == "__main__":
180
+ main()
restore.py ADDED
@@ -0,0 +1,103 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ import sqlite3
3
+ import json
4
+ import os
5
+ import shutil
6
+ import zipfile
7
+ from database import setup_database
8
+
9
+ def restore_from_backup(zip_filename='backup.zip', tmp_dir='tmp_restore'):
10
+ """
11
+ Restores the application state from a zip backup.
12
+
13
+ :param zip_filename: Name of the backup zip file.
14
+ :param tmp_dir: Temporary directory to extract the backup.
15
+ """
16
+ if not os.path.exists(zip_filename):
17
+ print(f"Backup file not found: {zip_filename}")
18
+ return
19
+
20
+ # Confirmation prompt
21
+ confirm = input("This will wipe all existing data. Are you sure you want to continue? (y/n): ")
22
+ if confirm.lower() != 'y':
23
+ print("Restore operation cancelled.")
24
+ return
25
+
26
+ # 1. Clean existing data
27
+ print("Cleaning existing data...")
28
+ for dir_to_clean in ['instance', 'output', 'processed', 'uploads']:
29
+ if os.path.exists(dir_to_clean):
30
+ shutil.rmtree(dir_to_clean)
31
+ os.makedirs(dir_to_clean)
32
+
33
+ # 2. Recreate database schema
34
+ print("Setting up new database schema...")
35
+ setup_database()
36
+
37
+ # 3. Extract the backup
38
+ if os.path.exists(tmp_dir):
39
+ shutil.rmtree(tmp_dir)
40
+ os.makedirs(tmp_dir)
41
+
42
+ print(f"Extracting {zip_filename} to {tmp_dir}...")
43
+ with zipfile.ZipFile(zip_filename, 'r') as zipf:
44
+ zipf.extractall(tmp_dir)
45
+
46
+ # 4. Restore database from JSON files
47
+ try:
48
+ conn = sqlite3.connect('instance/database.db')
49
+ cursor = conn.cursor()
50
+
51
+ json_files = [f for f in os.listdir(tmp_dir) if f.endswith('.json')] #
52
+ for json_file in json_files:
53
+ table_name = os.path.splitext(json_file)[0]
54
+ file_path = os.path.join(tmp_dir, json_file)
55
+
56
+ print(f"Restoring table: {table_name}")
57
+ with open(file_path, 'r') as f:
58
+ data = json.load(f)
59
+
60
+ if not data:
61
+ continue
62
+
63
+ columns = data[0].keys()
64
+ placeholders = ', '.join(['?' for _ in columns])
65
+ query = f"INSERT INTO {table_name} ({', '.join(columns)}) VALUES ({placeholders})"
66
+
67
+ for row in data:
68
+ values = [row.get(col) for col in columns]
69
+ cursor.execute(query, values)
70
+
71
+ conn.commit()
72
+ print("Database restore complete.")
73
+
74
+ except sqlite3.Error as e:
75
+ print(f"Database error during restore: {e}")
76
+ finally:
77
+ if conn:
78
+ conn.close()
79
+
80
+ # 5. Restore file directories
81
+ print("Restoring file directories...")
82
+ for dir_name in ['output', 'processed', 'uploads']:
83
+ source_dir = os.path.join(tmp_dir, dir_name)
84
+ dest_dir = dir_name
85
+ if os.path.exists(source_dir):
86
+ # Copy contents, not the directory itself
87
+ for item in os.listdir(source_dir):
88
+ s = os.path.join(source_dir, item)
89
+ d = os.path.join(dest_dir, item)
90
+ if os.path.isdir(s):
91
+ shutil.copytree(s, d, dirs_exist_ok=True)
92
+ else:
93
+ shutil.copy2(s, d)
94
+ print(f"Restored directory: {dir_name}")
95
+
96
+ # 6. Clean up temporary directory
97
+ shutil.rmtree(tmp_dir)
98
+ print(f"Cleaned up temporary directory: {tmp_dir}")
99
+
100
+ print("\nRestore complete!")
101
+
102
+ if __name__ == '__main__':
103
+ restore_from_backup()