Spaces:
Sleeping
Sleeping
Commit
·
a706099
1
Parent(s):
9e83da7
Attempt at markdown-based document rendering (failed implementation)
Browse filesIssues identified:
- Markdown conversion from PDF OCR is lossy and breaks document layout
- Two-column papers and figures cause paragraph fragmentation
- Complex academic documents don't render properly in markdown
- Example: figures interrupting text cause incomplete paragraph chunks
This approach preserved 80% of chunking functionality but failed at
document preservation. Switching back to PDF viewer approach.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
- .claude/sessions/.current-session +1 -0
- .claude/sessions/2025-08-02-0000.md +256 -0
- .claude/sessions/2025-08-03-1043.md +108 -0
- .claude/sessions/2025-08-03-1200.md +37 -0
- backend/app.py +290 -35
- frontend/src/components/ChunkNavigation.jsx +47 -0
- frontend/src/components/ChunkPanel.jsx +198 -0
- frontend/src/components/DocumentProcessor.jsx +111 -784
- frontend/src/components/DocumentProcessor.jsx.backup +889 -0
- frontend/src/components/DocumentViewer.jsx +53 -0
- frontend/src/components/ImageComponent.jsx +115 -0
- frontend/src/components/LoadingAnimation.jsx +45 -0
- frontend/src/hooks/useChat.js +109 -0
- frontend/src/hooks/useChunkNavigation.js +59 -0
- frontend/src/hooks/useDocumentProcessor.js +121 -0
- frontend/src/hooks/usePanelResize.js +45 -0
- frontend/src/utils/markdownComponents.jsx +98 -0
- frontend/src/utils/markdownUtils.js +33 -0
.claude/sessions/.current-session
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
2025-08-03-1200.md
|
.claude/sessions/2025-08-02-0000.md
ADDED
|
@@ -0,0 +1,256 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Development Session - 2025-08-02 00:00
|
| 2 |
+
|
| 3 |
+
## Session Overview
|
| 4 |
+
- **Start Time:** 2025-08-02 00:00
|
| 5 |
+
- **Working Directory:** /home/alleinzell/SokratesAI
|
| 6 |
+
- **Git Status:** Modified files in frontend/src/components/
|
| 7 |
+
|
| 8 |
+
## Goals
|
| 9 |
+
Please specify your goals for this development session.
|
| 10 |
+
|
| 11 |
+
## Progress
|
| 12 |
+
|
| 13 |
+
### Update - 2025-08-02 00:30
|
| 14 |
+
|
| 15 |
+
**Summary**: Major refactoring of DocumentProcessor component completed
|
| 16 |
+
|
| 17 |
+
**Git Changes**:
|
| 18 |
+
- Modified: frontend/src/components/DocumentProcessor.jsx
|
| 19 |
+
- Added: frontend/src/components/ChunkNavigation.jsx, ChunkPanel.jsx, DocumentViewer.jsx, LoadingAnimation.jsx
|
| 20 |
+
- Added: frontend/src/hooks/ (4 custom hooks)
|
| 21 |
+
- Added: frontend/src/utils/ (markdown utilities)
|
| 22 |
+
- Current branch: main (commit: 9e83da7)
|
| 23 |
+
|
| 24 |
+
**Todo Progress**: 10 completed, 0 in progress, 0 pending
|
| 25 |
+
- ✓ Completed: Analyze DocumentProcessor structure and identify refactoring opportunities
|
| 26 |
+
- ✓ Completed: Extract ImageComponent into separate file
|
| 27 |
+
- ✓ Completed: Extract LoadingAnimation component
|
| 28 |
+
- ✓ Completed: Create custom hooks for document processing logic
|
| 29 |
+
- ✓ Completed: Create custom hooks for chat functionality
|
| 30 |
+
- ✓ Completed: Create custom hooks for chunk navigation and state management
|
| 31 |
+
- ✓ Completed: Extract panel resizing logic into custom hook
|
| 32 |
+
- ✓ Completed: Create separate components for different UI sections
|
| 33 |
+
- ✓ Completed: Clean up the main DocumentProcessor component
|
| 34 |
+
- ✓ Completed: Test refactored components
|
| 35 |
+
|
| 36 |
+
**Details**: Successfully refactored 885-line monolithic DocumentProcessor component into 8 focused files:
|
| 37 |
+
- Main DocumentProcessor.jsx reduced to 162 lines
|
| 38 |
+
- Extracted 4 custom hooks for business logic separation
|
| 39 |
+
- Created 4 new UI components for better organization
|
| 40 |
+
- Added utility functions for markdown processing
|
| 41 |
+
- Maintained all existing functionality while improving maintainability
|
| 42 |
+
- Code is now much more modular and easier to debug
|
| 43 |
+
|
| 44 |
+
**Issues Resolved**:
|
| 45 |
+
- Component was too large and difficult to maintain
|
| 46 |
+
- Mixed UI and business logic concerns
|
| 47 |
+
- Repetitive code patterns
|
| 48 |
+
- Hard to debug and modify specific features
|
| 49 |
+
|
| 50 |
+
**Solutions Implemented**:
|
| 51 |
+
- Custom hooks pattern for state management
|
| 52 |
+
- Component composition for UI separation
|
| 53 |
+
- Utility functions for shared logic
|
| 54 |
+
- Proper separation of concerns
|
| 55 |
+
|
| 56 |
+
### Update - 2025-08-02 01:30
|
| 57 |
+
|
| 58 |
+
**Summary**: Implemented robust academic paper chunking and LaTeX rendering fixes
|
| 59 |
+
|
| 60 |
+
**Git Changes**:
|
| 61 |
+
- Modified: backend/app.py, frontend/src/components/DocumentProcessor.jsx
|
| 62 |
+
- Added: frontend/src/components/ChunkNavigation.jsx, ChunkPanel.jsx, DocumentViewer.jsx, LoadingAnimation.jsx
|
| 63 |
+
- Added: frontend/src/hooks/ (4 custom hooks), frontend/src/utils/ (markdown utilities)
|
| 64 |
+
- Current branch: main (commit: 9e83da7)
|
| 65 |
+
|
| 66 |
+
**Todo Progress**: 4 completed, 0 in progress, 2 pending
|
| 67 |
+
- ✓ Completed: Fix LaTeX rendering in chunk topic titles
|
| 68 |
+
- ✓ Completed: Identify and fix other LaTeX rendering edge cases in highlighting
|
| 69 |
+
- ✓ Completed: Implement academic content cleaning system
|
| 70 |
+
- ✓ Completed: Fix import errors for regex patterns
|
| 71 |
+
|
| 72 |
+
**Issues Resolved**:
|
| 73 |
+
- LaTeX expressions not rendering in chunk titles (fixed with ReactMarkdown wrapper)
|
| 74 |
+
- Markdown structure broken by HTML div highlighting (fixed with blockquote approach)
|
| 75 |
+
- Academic paper noise breaking chunking (footnotes, copyright notices, author contributions)
|
| 76 |
+
- Mid-sentence chunk cuts (improved with programmatic paragraph boundaries)
|
| 77 |
+
- Import errors causing chunking failures
|
| 78 |
+
|
| 79 |
+
**Solutions Implemented**:
|
| 80 |
+
- **Programmatic Chunking**: Replaced LLM-based chunking with regex pattern matching for `[.!?]\n\n`
|
| 81 |
+
- **Academic Content Cleaning**: Added 15+ regex patterns to remove footnotes, copyright notices, funding acknowledgments
|
| 82 |
+
- **LaTeX-Preserving Highlighting**: Used markdown blockquotes instead of HTML divs to preserve formatting
|
| 83 |
+
- **Quality Validation**: Added chunk filtering to skip low-quality content (excessive footnotes, citations, symbols)
|
| 84 |
+
- **Improved Topic Rendering**: Topics now render LaTeX expressions correctly using ReactMarkdown
|
| 85 |
+
|
| 86 |
+
**Code Changes**:
|
| 87 |
+
- Backend: Enhanced `programmatic_chunk_document()` with academic cleaning and validation
|
| 88 |
+
- Frontend: Replaced HTML highlighting with markdown blockquote approach
|
| 89 |
+
- Frontend: Added LaTeX support to chunk topic titles via ReactMarkdown
|
| 90 |
+
- Added imports: `re`, `string` modules for text processing
|
| 91 |
+
|
| 92 |
+
---
|
| 93 |
+
|
| 94 |
+
## Session Summary - ENDED 2025-08-03 10:41
|
| 95 |
+
|
| 96 |
+
**Total Duration**: ~34 hours 41 minutes
|
| 97 |
+
**Session Type**: Major refactoring and feature enhancement
|
| 98 |
+
|
| 99 |
+
### Git Summary
|
| 100 |
+
**Files Changed**: 11 total
|
| 101 |
+
- **Modified (2)**: backend/app.py, frontend/src/components/DocumentProcessor.jsx
|
| 102 |
+
- **Added (9)**:
|
| 103 |
+
- frontend/src/components/ChunkNavigation.jsx
|
| 104 |
+
- frontend/src/components/ChunkPanel.jsx
|
| 105 |
+
- frontend/src/components/DocumentViewer.jsx
|
| 106 |
+
- frontend/src/components/ImageComponent.jsx
|
| 107 |
+
- frontend/src/components/LoadingAnimation.jsx
|
| 108 |
+
- frontend/src/components/DocumentProcessor.jsx.backup
|
| 109 |
+
- frontend/src/hooks/ (4 custom hooks)
|
| 110 |
+
- frontend/src/utils/ (markdown utilities)
|
| 111 |
+
- .claude/ (session management)
|
| 112 |
+
|
| 113 |
+
**Commits Made**: 0 (changes remain staged/unstaged)
|
| 114 |
+
**Final Git Status**: 2 modified, 9 untracked files
|
| 115 |
+
**Current Branch**: main (latest commit: 9e83da7)
|
| 116 |
+
|
| 117 |
+
### Todo Summary
|
| 118 |
+
**Total Tasks**: 14 completed, 0 in progress, 2 pending
|
| 119 |
+
**Completion Rate**: 87.5%
|
| 120 |
+
|
| 121 |
+
**All Completed Tasks**:
|
| 122 |
+
1. ✓ Analyze DocumentProcessor structure and identify refactoring opportunities
|
| 123 |
+
2. ✓ Extract ImageComponent into separate file
|
| 124 |
+
3. ✓ Extract LoadingAnimation component
|
| 125 |
+
4. ✓ Create custom hooks for document processing logic
|
| 126 |
+
5. ✓ Create custom hooks for chat functionality
|
| 127 |
+
6. ✓ Create custom hooks for chunk navigation and state management
|
| 128 |
+
7. ✓ Extract panel resizing logic into custom hook
|
| 129 |
+
8. ✓ Create separate components for different UI sections
|
| 130 |
+
9. ✓ Clean up the main DocumentProcessor component
|
| 131 |
+
10. ✓ Test refactored components
|
| 132 |
+
11. ✓ Fix LaTeX rendering in chunk topic titles
|
| 133 |
+
12. ✓ Identify and fix other LaTeX rendering edge cases in highlighting
|
| 134 |
+
13. ✓ Implement academic content cleaning system
|
| 135 |
+
14. ✓ Fix import errors for regex patterns
|
| 136 |
+
|
| 137 |
+
**Incomplete Tasks (2 pending)**:
|
| 138 |
+
- Improve chunk quality validation
|
| 139 |
+
- Add error handling for edge cases
|
| 140 |
+
|
| 141 |
+
### Key Accomplishments
|
| 142 |
+
|
| 143 |
+
#### 1. Major Component Refactoring
|
| 144 |
+
- **Before**: 885-line monolithic DocumentProcessor component
|
| 145 |
+
- **After**: 162-line main component + 8 focused modules
|
| 146 |
+
- **Impact**: 80% reduction in main component size, vastly improved maintainability
|
| 147 |
+
|
| 148 |
+
#### 2. Academic Paper Processing Enhancement
|
| 149 |
+
- Implemented programmatic chunking using regex patterns (`[.!?]\n\n`)
|
| 150 |
+
- Added 15+ academic content cleaning patterns
|
| 151 |
+
- Replaced unreliable LLM-based chunking with deterministic approach
|
| 152 |
+
- Added chunk quality validation and filtering
|
| 153 |
+
|
| 154 |
+
#### 3. LaTeX Rendering Fixes
|
| 155 |
+
- Fixed LaTeX expressions in chunk topic titles using ReactMarkdown
|
| 156 |
+
- Replaced HTML div highlighting with markdown blockquotes
|
| 157 |
+
- Preserved mathematical notation and formatting integrity
|
| 158 |
+
|
| 159 |
+
### Features Implemented
|
| 160 |
+
|
| 161 |
+
1. **Custom Hooks Architecture**:
|
| 162 |
+
- `useDocumentProcessing` - Document upload and processing logic
|
| 163 |
+
- `useChat` - Chat functionality and message handling
|
| 164 |
+
- `useChunkNavigation` - Chunk navigation and state management
|
| 165 |
+
- `usePanelResize` - Panel resizing logic
|
| 166 |
+
|
| 167 |
+
2. **Component Extraction**:
|
| 168 |
+
- `ChunkNavigation` - Chunk list and navigation controls
|
| 169 |
+
- `ChunkPanel` - Individual chunk display and interaction
|
| 170 |
+
- `DocumentViewer` - PDF/document display component
|
| 171 |
+
- `ImageComponent` - Image rendering with LaTeX support
|
| 172 |
+
- `LoadingAnimation` - Reusable loading states
|
| 173 |
+
|
| 174 |
+
3. **Academic Content Processing**:
|
| 175 |
+
- Automatic removal of footnotes, citations, copyright notices
|
| 176 |
+
- Author contribution section filtering
|
| 177 |
+
- Funding acknowledgment cleanup
|
| 178 |
+
- Reference list handling
|
| 179 |
+
|
| 180 |
+
4. **Utility Functions**:
|
| 181 |
+
- Markdown processing utilities
|
| 182 |
+
- Text cleaning and validation functions
|
| 183 |
+
- Academic content pattern matching
|
| 184 |
+
|
| 185 |
+
### Problems Encountered and Solutions
|
| 186 |
+
|
| 187 |
+
#### Problem 1: LaTeX Rendering in Highlighted Text
|
| 188 |
+
- **Issue**: HTML div highlighting broke LaTeX expressions
|
| 189 |
+
- **Solution**: Switched to markdown blockquote approach that preserves ReactMarkdown rendering
|
| 190 |
+
|
| 191 |
+
#### Problem 2: Poor Academic Paper Chunking
|
| 192 |
+
- **Issue**: LLM chunking produced inconsistent results with academic papers
|
| 193 |
+
- **Solution**: Implemented regex-based programmatic chunking with academic content cleaning
|
| 194 |
+
|
| 195 |
+
#### Problem 3: Component Maintainability
|
| 196 |
+
- **Issue**: 885-line component was impossible to debug and modify
|
| 197 |
+
- **Solution**: Applied React best practices with custom hooks and component composition
|
| 198 |
+
|
| 199 |
+
#### Problem 4: Academic Noise in Chunks
|
| 200 |
+
- **Issue**: Footnotes, citations, and metadata polluted chunk content
|
| 201 |
+
- **Solution**: Created comprehensive cleaning system with 15+ regex patterns
|
| 202 |
+
|
| 203 |
+
### Breaking Changes
|
| 204 |
+
- **Component Structure**: DocumentProcessor now requires new component dependencies
|
| 205 |
+
- **Backend API**: Enhanced chunking endpoint with new academic processing parameters
|
| 206 |
+
- **Import Dependencies**: Added `re` and `string` modules to backend requirements
|
| 207 |
+
|
| 208 |
+
### Dependencies Added
|
| 209 |
+
- No new external dependencies
|
| 210 |
+
- Enhanced usage of existing React patterns (hooks, composition)
|
| 211 |
+
- Added internal utility modules
|
| 212 |
+
|
| 213 |
+
### Configuration Changes
|
| 214 |
+
- No configuration file changes required
|
| 215 |
+
- Enhanced backend processing logic maintains API compatibility
|
| 216 |
+
|
| 217 |
+
### Code Quality Improvements
|
| 218 |
+
- **Lines of Code**: ~35,630 total (after refactoring)
|
| 219 |
+
- **Maintainability**: Dramatically improved through separation of concerns
|
| 220 |
+
- **Testability**: Custom hooks enable isolated unit testing
|
| 221 |
+
- **Reusability**: Extracted components can be reused across the application
|
| 222 |
+
|
| 223 |
+
### Lessons Learned
|
| 224 |
+
|
| 225 |
+
1. **Component Size Matters**: Large components become exponentially harder to maintain
|
| 226 |
+
2. **Academic Content is Noisy**: Real-world documents require extensive cleaning
|
| 227 |
+
3. **LaTeX + React**: Careful consideration needed for mathematical content rendering
|
| 228 |
+
4. **Programmatic > LLM**: For structured tasks, deterministic algorithms often outperform LLMs
|
| 229 |
+
5. **Separation of Concerns**: Custom hooks provide excellent business logic isolation
|
| 230 |
+
|
| 231 |
+
### What Wasn't Completed
|
| 232 |
+
|
| 233 |
+
1. **Testing Suite**: No unit tests written for new components and hooks
|
| 234 |
+
2. **Error Handling**: Limited error boundary implementation
|
| 235 |
+
3. **Performance Optimization**: No lazy loading or memoization added
|
| 236 |
+
4. **Documentation**: No inline documentation or README updates
|
| 237 |
+
5. **Type Safety**: TypeScript conversion not implemented
|
| 238 |
+
|
| 239 |
+
### Tips for Future Developers
|
| 240 |
+
|
| 241 |
+
1. **Testing Priority**: Implement unit tests for custom hooks first - they contain core logic
|
| 242 |
+
2. **Error Boundaries**: Add React error boundaries around new components
|
| 243 |
+
3. **Performance**: Consider `React.memo` for ChunkPanel if rendering many chunks
|
| 244 |
+
4. **Documentation**: Document the academic cleaning patterns for future maintenance
|
| 245 |
+
5. **Type Safety**: Consider TypeScript migration for better development experience
|
| 246 |
+
6. **Monitoring**: Add error tracking for chunk processing failures
|
| 247 |
+
7. **Accessibility**: Review new components for keyboard navigation and screen reader support
|
| 248 |
+
|
| 249 |
+
### Next Session Recommendations
|
| 250 |
+
|
| 251 |
+
1. Implement comprehensive testing suite
|
| 252 |
+
2. Add TypeScript for better type safety
|
| 253 |
+
3. Create error boundaries and improved error handling
|
| 254 |
+
4. Add performance optimizations (memoization, lazy loading)
|
| 255 |
+
5. Write documentation for the refactored architecture
|
| 256 |
+
6. Consider implementing user feedback mechanisms for chunk quality
|
.claude/sessions/2025-08-03-1043.md
ADDED
|
@@ -0,0 +1,108 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Development Session - 2025-08-03 10:43
|
| 2 |
+
|
| 3 |
+
## Session Overview
|
| 4 |
+
- **Start Time:** 2025-08-03 10:43
|
| 5 |
+
- **Working Directory:** /home/alleinzell/SokratesAI
|
| 6 |
+
- **Git Status:** 2 modified, 9 untracked files from previous refactoring
|
| 7 |
+
|
| 8 |
+
## Goals
|
| 9 |
+
Fix scrolling issue in the left content panel (DocumentViewer component).
|
| 10 |
+
|
| 11 |
+
## Progress
|
| 12 |
+
|
| 13 |
+
### Update - 2025-08-03 10:45
|
| 14 |
+
|
| 15 |
+
**Summary**: Fixed scrolling issue in DocumentViewer component
|
| 16 |
+
|
| 17 |
+
**Todo Progress**: 2 completed, 0 in progress, 0 pending
|
| 18 |
+
- ✓ Completed: Investigate scrolling issue in left content panel
|
| 19 |
+
- ✓ Completed: Fix scrolling behavior in DocumentViewer component
|
| 20 |
+
|
| 21 |
+
**Problem**: Left content panel scrolling not working despite having `overflow-y-auto`
|
| 22 |
+
**Root Cause**: Missing height constraints in component hierarchy
|
| 23 |
+
**Solution**: Added `height: '100%'` to panel container and DocumentViewer root div
|
| 24 |
+
|
| 25 |
+
---
|
| 26 |
+
|
| 27 |
+
## Session Summary - ENDED 2025-08-03 10:45
|
| 28 |
+
|
| 29 |
+
**Total Duration**: 2 minutes
|
| 30 |
+
**Session Type**: Quick bug fix
|
| 31 |
+
|
| 32 |
+
### Git Summary
|
| 33 |
+
**Files Changed**: 2 total
|
| 34 |
+
- **Modified (2)**:
|
| 35 |
+
- frontend/src/components/DocumentProcessor.jsx (added height: '100%' to left panel)
|
| 36 |
+
- frontend/src/components/DocumentViewer.jsx (added height: '100%' to root container)
|
| 37 |
+
|
| 38 |
+
**Commits Made**: 0 (changes remain unstaged)
|
| 39 |
+
**Final Git Status**: 2 modified, 9 untracked files
|
| 40 |
+
**Current Branch**: main (latest commit: 9e83da7)
|
| 41 |
+
|
| 42 |
+
### Todo Summary
|
| 43 |
+
**Total Tasks**: 2 completed, 0 in progress, 0 pending
|
| 44 |
+
**Completion Rate**: 100%
|
| 45 |
+
|
| 46 |
+
**All Completed Tasks**:
|
| 47 |
+
1. ✓ Investigate scrolling issue in left content panel
|
| 48 |
+
2. ✓ Fix scrolling behavior in DocumentViewer component
|
| 49 |
+
|
| 50 |
+
### Key Accomplishments
|
| 51 |
+
|
| 52 |
+
#### 1. Scrolling Bug Fix
|
| 53 |
+
- **Problem**: DocumentViewer panel content not scrollable despite `overflow-y-auto` styling
|
| 54 |
+
- **Root Cause**: Parent containers lacked explicit height constraints
|
| 55 |
+
- **Solution**: Added `height: '100%'` to both panel container and DocumentViewer root div
|
| 56 |
+
- **Impact**: Restored proper scrolling functionality in left content panel
|
| 57 |
+
|
| 58 |
+
### Problems Encountered and Solutions
|
| 59 |
+
|
| 60 |
+
#### Problem: CSS Overflow Not Working
|
| 61 |
+
- **Issue**: `overflow-y-auto` on DocumentViewer content area wasn't enabling scrolling
|
| 62 |
+
- **Investigation**: Parent container had `h-screen` and `overflow-hidden` but child containers lacked height
|
| 63 |
+
- **Solution**: Applied explicit `height: '100%'` to establish proper height inheritance chain
|
| 64 |
+
- **Technical Detail**: CSS flexbox `flex-1` requires parent with defined height for proper overflow behavior
|
| 65 |
+
|
| 66 |
+
### Code Changes Made
|
| 67 |
+
|
| 68 |
+
**frontend/src/components/DocumentProcessor.jsx:131**
|
| 69 |
+
```jsx
|
| 70 |
+
// Before
|
| 71 |
+
<div style={{ width: `${leftPanelWidth}%` }}>
|
| 72 |
+
|
| 73 |
+
// After
|
| 74 |
+
<div style={{ width: `${leftPanelWidth}%`, height: '100%' }}>
|
| 75 |
+
```
|
| 76 |
+
|
| 77 |
+
**frontend/src/components/DocumentViewer.jsx:11**
|
| 78 |
+
```jsx
|
| 79 |
+
// Before
|
| 80 |
+
<div className="bg-white rounded-lg shadow-sm flex flex-col" style={{ width: '100%' }}>
|
| 81 |
+
|
| 82 |
+
// After
|
| 83 |
+
<div className="bg-white rounded-lg shadow-sm flex flex-col" style={{ width: '100%', height: '100%' }}>
|
| 84 |
+
```
|
| 85 |
+
|
| 86 |
+
### Lessons Learned
|
| 87 |
+
|
| 88 |
+
1. **CSS Height Inheritance**: Flexbox children need explicit height when parent has constrained height
|
| 89 |
+
2. **Overflow Debugging**: Check entire parent-child height chain when `overflow-y-auto` fails
|
| 90 |
+
3. **Component Hierarchy**: Height constraints must flow through all levels for proper scrolling
|
| 91 |
+
|
| 92 |
+
### Breaking Changes
|
| 93 |
+
None - purely additive CSS styling changes.
|
| 94 |
+
|
| 95 |
+
### Dependencies Added/Removed
|
| 96 |
+
None
|
| 97 |
+
|
| 98 |
+
### Configuration Changes
|
| 99 |
+
None
|
| 100 |
+
|
| 101 |
+
### What Wasn't Completed
|
| 102 |
+
All objectives completed successfully.
|
| 103 |
+
|
| 104 |
+
### Tips for Future Developers
|
| 105 |
+
|
| 106 |
+
1. **CSS Debugging**: When overflow doesn't work, inspect the full height chain in browser DevTools
|
| 107 |
+
2. **Flexbox Heights**: Remember that `flex-1` needs parent height to calculate properly
|
| 108 |
+
3. **Quick Fixes**: Simple CSS issues often have simple solutions - check height/width constraints first
|
.claude/sessions/2025-08-03-1200.md
ADDED
|
@@ -0,0 +1,37 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Development Session - 2025-08-03 12:00
|
| 2 |
+
|
| 3 |
+
## Session Overview
|
| 4 |
+
- **Start Time:** 2025-08-03 12:00
|
| 5 |
+
- **Project:** SokratesAI
|
| 6 |
+
- **Current Branch:** main
|
| 7 |
+
|
| 8 |
+
## Goals
|
| 9 |
+
Refine academic paper chunking system to address:
|
| 10 |
+
1. Abstract should be skipped as separate chunk
|
| 11 |
+
2. Two-column paper designs with figures breaking chunk continuity
|
| 12 |
+
3. Footnotes causing chunking issues
|
| 13 |
+
4. Document rendering being affected by chunking modifications
|
| 14 |
+
5. Preserve original document integrity (no cleaning/modification)
|
| 15 |
+
|
| 16 |
+
## Progress
|
| 17 |
+
|
| 18 |
+
### Analysis - 2025-08-03 12:05
|
| 19 |
+
|
| 20 |
+
**Current Implementation Issues Identified**:
|
| 21 |
+
|
| 22 |
+
**Backend (`backend/app.py`)**:
|
| 23 |
+
- `programmatic_chunk_document()` (lines 374-466) modifies original document via `clean_academic_content()`
|
| 24 |
+
- Uses simple regex `([.!?])\n\n` for paragraph endings (line 393)
|
| 25 |
+
- Returns cleaned markdown for highlighting, violating preservation principle
|
| 26 |
+
- Position mapping based on cleaned text, not original
|
| 27 |
+
|
| 28 |
+
**Frontend (`frontend/src/utils/markdownUtils.js`)**:
|
| 29 |
+
- `highlightChunkInMarkdown()` replaces chunk text with blockquote format
|
| 30 |
+
- Modifies document structure by injecting `> **Current Learning Section**` headers
|
| 31 |
+
- Works by text replacement which can break if positions are wrong
|
| 32 |
+
|
| 33 |
+
**Key Problems**:
|
| 34 |
+
1. **Document Modification**: Original document gets cleaned (academic content removal)
|
| 35 |
+
2. **Figure Handling**: Simple paragraph-ending regex can't handle figures interrupting text flow
|
| 36 |
+
3. **Position Mapping**: Positions calculated on cleaned text, not original
|
| 37 |
+
4. **Highlighting Injection**: Blockquote injection modifies document structure
|
backend/app.py
CHANGED
|
@@ -6,6 +6,8 @@ from mistralai import Mistral
|
|
| 6 |
import os
|
| 7 |
import tempfile
|
| 8 |
import json
|
|
|
|
|
|
|
| 9 |
from dotenv import load_dotenv
|
| 10 |
from difflib import SequenceMatcher
|
| 11 |
from pydantic import BaseModel, Field
|
|
@@ -119,8 +121,28 @@ async def process_ocr_content(file_id: str):
|
|
| 119 |
|
| 120 |
print(f"✅ OCR processing complete! Found {len(ocr_response.pages)} pages")
|
| 121 |
|
| 122 |
-
#
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 123 |
processed_pages = []
|
|
|
|
|
|
|
| 124 |
for page_idx, page in enumerate(ocr_response.pages):
|
| 125 |
print(f"📄 Page {page_idx + 1}: {len(page.markdown)} chars, {len(page.images)} images")
|
| 126 |
|
|
@@ -149,19 +171,26 @@ async def process_ocr_content(file_id: str):
|
|
| 149 |
}
|
| 150 |
page_data["images"].append(image_data)
|
| 151 |
|
| 152 |
-
# Auto-chunk this page
|
| 153 |
-
try:
|
| 154 |
-
print(f"🧠 Auto-chunking page {page_idx + 1}...")
|
| 155 |
-
chunks = await auto_chunk_page(page.markdown, client)
|
| 156 |
-
page_data["chunks"] = chunks
|
| 157 |
-
print(f"📊 Page {page_idx + 1} chunks found: {len(chunks)}")
|
| 158 |
-
for i, chunk in enumerate(chunks):
|
| 159 |
-
print(f" {i+1}. {chunk.get('topic', 'Unknown')}: {chunk.get('start_phrase', '')[:50]}...")
|
| 160 |
-
except Exception as chunk_error:
|
| 161 |
-
print(f"⚠️ Chunking failed for page {page_idx + 1}: {chunk_error}")
|
| 162 |
-
page_data["chunks"] = []
|
| 163 |
-
|
| 164 |
processed_pages.append(page_data)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 165 |
|
| 166 |
print(f"📝 Total processed pages: {len(processed_pages)}")
|
| 167 |
|
|
@@ -169,6 +198,8 @@ async def process_ocr_content(file_id: str):
|
|
| 169 |
"file_id": file_id,
|
| 170 |
"pages": processed_pages,
|
| 171 |
"total_pages": len(processed_pages),
|
|
|
|
|
|
|
| 172 |
"status": "processed"
|
| 173 |
}
|
| 174 |
|
|
@@ -233,9 +264,34 @@ class ChunkList(BaseModel):
|
|
| 233 |
"""Container for a list of document chunks."""
|
| 234 |
chunks: List[ChunkSchema] = Field(description="List of identified chunks for interactive lessons")
|
| 235 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 236 |
def fuzzy_find(text, pattern, start_pos=0):
|
| 237 |
"""Find the best fuzzy match for pattern in text starting from start_pos"""
|
| 238 |
-
best_match = None
|
| 239 |
best_ratio = 0
|
| 240 |
best_pos = -1
|
| 241 |
|
|
@@ -244,18 +300,175 @@ def fuzzy_find(text, pattern, start_pos=0):
|
|
| 244 |
for i in range(start_pos, len(text) - pattern_len + 1):
|
| 245 |
window = text[i:i + pattern_len]
|
| 246 |
ratio = SequenceMatcher(None, pattern.lower(), window.lower()).ratio()
|
| 247 |
-
|
| 248 |
-
if ratio > best_ratio and ratio > 0.
|
| 249 |
best_ratio = ratio
|
| 250 |
best_pos = i
|
| 251 |
-
best_match = window
|
| 252 |
|
| 253 |
return best_pos if best_pos != -1 else None
|
| 254 |
|
| 255 |
-
|
| 256 |
-
"""
|
| 257 |
-
|
| 258 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 259 |
|
| 260 |
# Get Fireworks API key
|
| 261 |
fireworks_api_key = os.environ.get("FIREWORKS_API_KEY")
|
|
@@ -275,9 +488,9 @@ async def auto_chunk_page(page_markdown, client=None):
|
|
| 275 |
structured_llm = llm.with_structured_output(ChunkList)
|
| 276 |
|
| 277 |
# Create chunking prompt
|
| 278 |
-
prompt = f"""Imagine you are a teacher. You are given
|
| 279 |
-
DOCUMENT
|
| 280 |
-
{
|
| 281 |
|
| 282 |
Rules:
|
| 283 |
1. Each chunk should contain 2-3 valuable lessons
|
|
@@ -286,25 +499,63 @@ Rules:
|
|
| 286 |
4. More dense content should have more chunks, less dense content fewer chunks
|
| 287 |
5. Identify chunks that would make good interactive lessons
|
| 288 |
|
| 289 |
-
Return a list of chunks with topic, start_phrase, and end_phrase for each."""
|
| 290 |
|
| 291 |
# Call Fireworks with structured output
|
| 292 |
chunk_response = structured_llm.invoke(prompt)
|
| 293 |
chunks = chunk_response.chunks
|
| 294 |
|
| 295 |
-
# Find positions using fuzzy matching
|
| 296 |
positioned_chunks = []
|
| 297 |
-
for chunk in chunks:
|
| 298 |
-
|
| 299 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 300 |
# Add the length of the end_phrase plus a bit more to include punctuation
|
| 301 |
if end_phrase_start is not None:
|
| 302 |
end_pos = end_phrase_start + len(chunk.end_phrase)
|
| 303 |
# Try to include punctuation that might follow
|
| 304 |
-
|
| 305 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 306 |
else:
|
| 307 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 308 |
|
| 309 |
if start_pos is not None:
|
| 310 |
positioned_chunks.append({
|
|
@@ -317,6 +568,10 @@ Return a list of chunks with topic, start_phrase, and end_phrase for each."""
|
|
| 317 |
"found_end": end_pos is not None
|
| 318 |
})
|
| 319 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 320 |
return positioned_chunks
|
| 321 |
|
| 322 |
except Exception as e:
|
|
@@ -374,13 +629,13 @@ Return a list of chunks with topic, start_phrase, and end_phrase for each."""
|
|
| 374 |
# Find positions using fuzzy matching
|
| 375 |
positioned_chunks = []
|
| 376 |
for chunk in chunks:
|
| 377 |
-
start_pos = fuzzy_find(
|
| 378 |
-
end_phrase_start = fuzzy_find(
|
| 379 |
# Add the length of the end_phrase plus a bit more to include punctuation
|
| 380 |
if end_phrase_start is not None:
|
| 381 |
end_pos = end_phrase_start + len(chunk.end_phrase)
|
| 382 |
# Try to include punctuation that might follow
|
| 383 |
-
if end_pos < len(
|
| 384 |
end_pos += 1
|
| 385 |
else:
|
| 386 |
end_pos = None
|
|
|
|
| 6 |
import os
|
| 7 |
import tempfile
|
| 8 |
import json
|
| 9 |
+
import re
|
| 10 |
+
import string
|
| 11 |
from dotenv import load_dotenv
|
| 12 |
from difflib import SequenceMatcher
|
| 13 |
from pydantic import BaseModel, Field
|
|
|
|
| 121 |
|
| 122 |
print(f"✅ OCR processing complete! Found {len(ocr_response.pages)} pages")
|
| 123 |
|
| 124 |
+
# Debug: Print raw OCR response structure
|
| 125 |
+
print("\n" + "="*80)
|
| 126 |
+
print("🔍 RAW MISTRAL OCR RESPONSE DEBUG:")
|
| 127 |
+
print("="*80)
|
| 128 |
+
|
| 129 |
+
for page_idx, page in enumerate(ocr_response.pages):
|
| 130 |
+
print(f"\n📄 PAGE {page_idx + 1} RAW MARKDOWN:")
|
| 131 |
+
print("-" * 50)
|
| 132 |
+
print(repr(page.markdown)) # Using repr() to show escape characters
|
| 133 |
+
print("-" * 50)
|
| 134 |
+
print("RENDERED:")
|
| 135 |
+
print(page.markdown[:500] + "..." if len(page.markdown) > 500 else page.markdown)
|
| 136 |
+
print(f"TOTAL LENGTH: {len(page.markdown)} characters")
|
| 137 |
+
|
| 138 |
+
print("="*80)
|
| 139 |
+
print("END RAW OCR DEBUG")
|
| 140 |
+
print("="*80 + "\n")
|
| 141 |
+
|
| 142 |
+
# Process each page and extract structured data (without per-page chunking)
|
| 143 |
processed_pages = []
|
| 144 |
+
all_page_markdown = []
|
| 145 |
+
|
| 146 |
for page_idx, page in enumerate(ocr_response.pages):
|
| 147 |
print(f"📄 Page {page_idx + 1}: {len(page.markdown)} chars, {len(page.images)} images")
|
| 148 |
|
|
|
|
| 171 |
}
|
| 172 |
page_data["images"].append(image_data)
|
| 173 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 174 |
processed_pages.append(page_data)
|
| 175 |
+
all_page_markdown.append(page.markdown)
|
| 176 |
+
|
| 177 |
+
# Combine all markdown into single document
|
| 178 |
+
combined_markdown = '\n\n---\n\n'.join(all_page_markdown)
|
| 179 |
+
print(f"📋 Combined document: {len(combined_markdown)} chars total")
|
| 180 |
+
|
| 181 |
+
# Auto-chunk the entire document once
|
| 182 |
+
document_chunks = []
|
| 183 |
+
original_markdown = combined_markdown
|
| 184 |
+
try:
|
| 185 |
+
print(f"🧠 Auto-chunking entire document...")
|
| 186 |
+
document_chunks, original_markdown = await auto_chunk_document(combined_markdown, client)
|
| 187 |
+
print(f"📊 Document chunks found: {len(document_chunks)}")
|
| 188 |
+
for i, chunk in enumerate(document_chunks):
|
| 189 |
+
print(f" {i+1}. {chunk.get('topic', 'Unknown')}: {chunk.get('start_phrase', '')[:50]}...")
|
| 190 |
+
except Exception as chunk_error:
|
| 191 |
+
print(f"⚠️ Document chunking failed: {chunk_error}")
|
| 192 |
+
document_chunks = []
|
| 193 |
+
original_markdown = combined_markdown
|
| 194 |
|
| 195 |
print(f"📝 Total processed pages: {len(processed_pages)}")
|
| 196 |
|
|
|
|
| 198 |
"file_id": file_id,
|
| 199 |
"pages": processed_pages,
|
| 200 |
"total_pages": len(processed_pages),
|
| 201 |
+
"combined_markdown": original_markdown, # Send original version for highlighting
|
| 202 |
+
"chunks": document_chunks,
|
| 203 |
"status": "processed"
|
| 204 |
}
|
| 205 |
|
|
|
|
| 264 |
"""Container for a list of document chunks."""
|
| 265 |
chunks: List[ChunkSchema] = Field(description="List of identified chunks for interactive lessons")
|
| 266 |
|
| 267 |
+
def find_paragraph_end(text, start_pos):
|
| 268 |
+
"""Find the end of a paragraph starting from start_pos"""
|
| 269 |
+
end_pos = start_pos
|
| 270 |
+
while end_pos < len(text) and text[end_pos] not in ['\n', '\r']:
|
| 271 |
+
end_pos += 1
|
| 272 |
+
|
| 273 |
+
return end_pos
|
| 274 |
+
|
| 275 |
+
def find_paragraph_end(text, start_pos):
|
| 276 |
+
"""Find the end of current paragraph (looks for \\n\\n or document end)"""
|
| 277 |
+
pos = start_pos
|
| 278 |
+
while pos < len(text):
|
| 279 |
+
if pos < len(text) - 1 and text[pos:pos+2] == '\n\n':
|
| 280 |
+
return pos # End at paragraph break
|
| 281 |
+
elif text[pos] in '.!?':
|
| 282 |
+
# Found sentence end, check if paragraph continues
|
| 283 |
+
next_pos = pos + 1
|
| 284 |
+
while next_pos < len(text) and text[next_pos] in ' \t':
|
| 285 |
+
next_pos += 1
|
| 286 |
+
if next_pos < len(text) - 1 and text[next_pos:next_pos+2] == '\n\n':
|
| 287 |
+
return next_pos # Paragraph ends after this sentence
|
| 288 |
+
pos = next_pos
|
| 289 |
+
else:
|
| 290 |
+
pos += 1
|
| 291 |
+
return min(pos, len(text))
|
| 292 |
+
|
| 293 |
def fuzzy_find(text, pattern, start_pos=0):
|
| 294 |
"""Find the best fuzzy match for pattern in text starting from start_pos"""
|
|
|
|
| 295 |
best_ratio = 0
|
| 296 |
best_pos = -1
|
| 297 |
|
|
|
|
| 300 |
for i in range(start_pos, len(text) - pattern_len + 1):
|
| 301 |
window = text[i:i + pattern_len]
|
| 302 |
ratio = SequenceMatcher(None, pattern.lower(), window.lower()).ratio()
|
| 303 |
+
|
| 304 |
+
if ratio > best_ratio and ratio > 0.8: # Much stricter: 80% similarity
|
| 305 |
best_ratio = ratio
|
| 306 |
best_pos = i
|
|
|
|
| 307 |
|
| 308 |
return best_pos if best_pos != -1 else None
|
| 309 |
|
| 310 |
+
def clean_academic_content(text):
|
| 311 |
+
"""Remove common academic paper noise that breaks natural chunking"""
|
| 312 |
+
|
| 313 |
+
# Patterns to remove/clean
|
| 314 |
+
patterns_to_remove = [
|
| 315 |
+
# Author contribution footnotes
|
| 316 |
+
r'\[\^\d+\]:\s*[∗\*]+\s*Equal contribution[^.]*\.',
|
| 317 |
+
r'\[\^\d+\]:\s*[†\*]+\s*Correspondence to[^.]*\.',
|
| 318 |
+
r'\[\^\d+\]:\s*[†\*]+\s*Corresponding author[^.]*\.',
|
| 319 |
+
|
| 320 |
+
# Copyright notices
|
| 321 |
+
r'Copyright \(c\) \d{4}[^.]*\.',
|
| 322 |
+
r'All rights reserved\.',
|
| 323 |
+
|
| 324 |
+
# Common academic noise
|
| 325 |
+
r'\[\^\d+\]:\s*Code available at[^.]*\.',
|
| 326 |
+
r'\[\^\d+\]:\s*Data available at[^.]*\.',
|
| 327 |
+
r'\[\^\d+\]:\s*This work was[^.]*\.',
|
| 328 |
+
|
| 329 |
+
# Funding acknowledgments (often break paragraphs)
|
| 330 |
+
r'This research was supported by[^.]*\.',
|
| 331 |
+
r'Funded by[^.]*\.',
|
| 332 |
+
|
| 333 |
+
# Page numbers and headers that shouldn't end paragraphs
|
| 334 |
+
r'^\d+$', # Standalone page numbers
|
| 335 |
+
r'^Page \d+',
|
| 336 |
+
|
| 337 |
+
# DOI and URL patterns that break paragraphs
|
| 338 |
+
r'DOI:\s*\S+',
|
| 339 |
+
r'arXiv:\d{4}\.\d{4,5}',
|
| 340 |
+
]
|
| 341 |
+
|
| 342 |
+
cleaned_text = text
|
| 343 |
+
for pattern in patterns_to_remove:
|
| 344 |
+
cleaned_text = re.sub(pattern, '', cleaned_text, flags=re.MULTILINE | re.IGNORECASE)
|
| 345 |
+
|
| 346 |
+
# Clean up multiple newlines created by removals
|
| 347 |
+
cleaned_text = re.sub(r'\n\n\n+', '\n\n', cleaned_text)
|
| 348 |
+
|
| 349 |
+
return cleaned_text.strip()
|
| 350 |
+
|
| 351 |
+
def validate_paragraph_chunk(chunk_text):
|
| 352 |
+
"""Check if a chunk looks like valid content (not metadata/noise)"""
|
| 353 |
+
# Skip very short chunks
|
| 354 |
+
if len(chunk_text.strip()) < 50:
|
| 355 |
+
return False
|
| 356 |
+
|
| 357 |
+
# Skip chunks that are mostly footnote references
|
| 358 |
+
footnote_refs = len(re.findall(r'\[\^\d+\]', chunk_text))
|
| 359 |
+
if footnote_refs > len(chunk_text.split()) / 10: # More than 10% footnote refs
|
| 360 |
+
return False
|
| 361 |
+
|
| 362 |
+
# Skip chunks that are mostly citations
|
| 363 |
+
citations = len(re.findall(r'\[\d+\]', chunk_text))
|
| 364 |
+
if citations > len(chunk_text.split()) / 8: # More than 12.5% citations
|
| 365 |
+
return False
|
| 366 |
+
|
| 367 |
+
# Skip chunks that are mostly symbols/special chars
|
| 368 |
+
normal_chars = sum(1 for c in chunk_text if c.isalnum() or c in string.whitespace)
|
| 369 |
+
if normal_chars / len(chunk_text) < 0.7: # Less than 70% normal content
|
| 370 |
+
return False
|
| 371 |
+
|
| 372 |
+
return True
|
| 373 |
+
|
| 374 |
+
def programmatic_chunk_document(document_markdown):
|
| 375 |
+
"""Chunk document by natural paragraph boundaries - much more reliable than LLM"""
|
| 376 |
+
if not document_markdown or len(document_markdown.strip()) < 100:
|
| 377 |
+
return []
|
| 378 |
+
|
| 379 |
+
# Use original document without any cleaning to preserve integrity
|
| 380 |
+
original_markdown = document_markdown
|
| 381 |
+
print(f"📄 Using original document: {len(document_markdown)} chars")
|
| 382 |
+
|
| 383 |
+
chunks = []
|
| 384 |
+
start_pos = 0
|
| 385 |
+
chunk_count = 0
|
| 386 |
+
|
| 387 |
+
print(f"🧠 Using programmatic paragraph-based chunking...")
|
| 388 |
+
|
| 389 |
+
# Find all proper paragraph endings: [.!?] followed by \n\n
|
| 390 |
+
paragraph_ends = []
|
| 391 |
+
|
| 392 |
+
# Pattern: sentence punctuation followed by \n\n
|
| 393 |
+
pattern = r'([.!?])\n\n'
|
| 394 |
+
matches = re.finditer(pattern, original_markdown)
|
| 395 |
+
|
| 396 |
+
for match in matches:
|
| 397 |
+
end_pos = match.end() - 3 # Position right after punctuation, before \n\n
|
| 398 |
+
paragraph_ends.append(end_pos)
|
| 399 |
+
|
| 400 |
+
print(f"📊 Found {len(paragraph_ends)} natural paragraph endings")
|
| 401 |
+
|
| 402 |
+
# Create chunks from paragraph boundaries using original document
|
| 403 |
+
for i, end_pos in enumerate(paragraph_ends):
|
| 404 |
+
# Extract from original markdown
|
| 405 |
+
chunk_text_clean = original_markdown[start_pos:end_pos + 1]
|
| 406 |
+
|
| 407 |
+
# Validate chunk quality
|
| 408 |
+
if not validate_paragraph_chunk(chunk_text_clean):
|
| 409 |
+
print(f" ❌ Skipping low-quality chunk: {chunk_text_clean[:50]}...")
|
| 410 |
+
start_pos = end_pos + 3 # Skip past .\n\n
|
| 411 |
+
continue
|
| 412 |
+
|
| 413 |
+
chunk_count += 1
|
| 414 |
+
|
| 415 |
+
# Map positions back to original document for highlighting
|
| 416 |
+
# For now, use cleaned positions (we could implement position mapping if needed)
|
| 417 |
+
chunk_text = chunk_text_clean
|
| 418 |
+
|
| 419 |
+
# Create a simple topic from first few words
|
| 420 |
+
first_line = chunk_text.split('\n')[0].strip()
|
| 421 |
+
topic = first_line[:50] + "..." if len(first_line) > 50 else first_line
|
| 422 |
+
|
| 423 |
+
chunks.append({
|
| 424 |
+
"topic": topic,
|
| 425 |
+
"start_position": start_pos,
|
| 426 |
+
"end_position": end_pos + 1,
|
| 427 |
+
"start_phrase": chunk_text[:20] + "...", # First 20 chars
|
| 428 |
+
"end_phrase": "..." + chunk_text[-20:], # Last 20 chars
|
| 429 |
+
"found_start": True,
|
| 430 |
+
"found_end": True
|
| 431 |
+
})
|
| 432 |
+
|
| 433 |
+
print(f" ✅ Chunk {chunk_count}: {start_pos}-{end_pos + 1} (length: {end_pos + 1 - start_pos})")
|
| 434 |
+
print(f" Topic: {topic}")
|
| 435 |
+
print(f" Preview: {chunk_text[:80]}...")
|
| 436 |
+
|
| 437 |
+
# Next chunk starts after \n\n
|
| 438 |
+
start_pos = end_pos + 3 # Skip past .\n\n
|
| 439 |
+
|
| 440 |
+
# Handle any remaining text (document might not end with proper paragraph)
|
| 441 |
+
if start_pos < len(original_markdown):
|
| 442 |
+
remaining_text = original_markdown[start_pos:].strip()
|
| 443 |
+
if remaining_text and validate_paragraph_chunk(remaining_text):
|
| 444 |
+
chunk_count += 1
|
| 445 |
+
first_line = remaining_text.split('\n')[0].strip()
|
| 446 |
+
topic = first_line[:50] + "..." if len(first_line) > 50 else first_line
|
| 447 |
+
|
| 448 |
+
chunks.append({
|
| 449 |
+
"topic": topic,
|
| 450 |
+
"start_position": start_pos,
|
| 451 |
+
"end_position": len(original_markdown),
|
| 452 |
+
"start_phrase": remaining_text[:20] + "...",
|
| 453 |
+
"end_phrase": "..." + remaining_text[-20:],
|
| 454 |
+
"found_start": True,
|
| 455 |
+
"found_end": True
|
| 456 |
+
})
|
| 457 |
+
|
| 458 |
+
print(f" ✅ Final chunk {chunk_count}: {start_pos}-{len(original_markdown)} (remaining text)")
|
| 459 |
+
else:
|
| 460 |
+
print(f" ❌ Skipping low-quality remaining text")
|
| 461 |
+
|
| 462 |
+
print(f"📊 Created {len(chunks)} high-quality paragraph-based chunks")
|
| 463 |
+
|
| 464 |
+
# Note: We're returning chunks based on original document positions
|
| 465 |
+
# The frontend will use the original document for highlighting
|
| 466 |
+
return chunks, document_markdown
|
| 467 |
+
|
| 468 |
+
async def auto_chunk_document(document_markdown, client=None):
|
| 469 |
+
"""Auto-chunk a document - now using programmatic approach instead of LLM"""
|
| 470 |
+
chunks, original_markdown = programmatic_chunk_document(document_markdown)
|
| 471 |
+
return chunks, original_markdown
|
| 472 |
|
| 473 |
# Get Fireworks API key
|
| 474 |
fireworks_api_key = os.environ.get("FIREWORKS_API_KEY")
|
|
|
|
| 488 |
structured_llm = llm.with_structured_output(ChunkList)
|
| 489 |
|
| 490 |
# Create chunking prompt
|
| 491 |
+
prompt = f"""Imagine you are a teacher. You are given a document, and you have to decide how to dissect this document. Your task is to identify chunks of content by providing start and end phrases that can be used to create interactive lessons. Here's the document:
|
| 492 |
+
DOCUMENT:
|
| 493 |
+
{document_markdown}
|
| 494 |
|
| 495 |
Rules:
|
| 496 |
1. Each chunk should contain 2-3 valuable lessons
|
|
|
|
| 499 |
4. More dense content should have more chunks, less dense content fewer chunks
|
| 500 |
5. Identify chunks that would make good interactive lessons
|
| 501 |
|
| 502 |
+
Return a list of chunks with topic, start_phrase, and end_phrase for each. Importantly, you are passed Markdown text, so output the start and end phrases as Markdown text, and include punctuation. Never stop an end phrase in the middle of a sentence, always include the full sentence or phrase."""
|
| 503 |
|
| 504 |
# Call Fireworks with structured output
|
| 505 |
chunk_response = structured_llm.invoke(prompt)
|
| 506 |
chunks = chunk_response.chunks
|
| 507 |
|
| 508 |
+
# Find positions using fuzzy matching with detailed debugging
|
| 509 |
positioned_chunks = []
|
| 510 |
+
for i, chunk in enumerate(chunks):
|
| 511 |
+
print(f"\n🔍 Processing chunk {i+1}: {chunk.topic}")
|
| 512 |
+
print(f" Start phrase: '{chunk.start_phrase}'")
|
| 513 |
+
print(f" End phrase: '{chunk.end_phrase}'")
|
| 514 |
+
|
| 515 |
+
start_pos = fuzzy_find(document_markdown, chunk.start_phrase)
|
| 516 |
+
end_phrase_start = fuzzy_find(document_markdown, chunk.end_phrase, start_pos or 0)
|
| 517 |
+
|
| 518 |
+
print(f" Found start_pos: {start_pos}")
|
| 519 |
+
print(f" Found end_phrase_start: {end_phrase_start}")
|
| 520 |
+
|
| 521 |
# Add the length of the end_phrase plus a bit more to include punctuation
|
| 522 |
if end_phrase_start is not None:
|
| 523 |
end_pos = end_phrase_start + len(chunk.end_phrase)
|
| 524 |
# Try to include punctuation that might follow
|
| 525 |
+
|
| 526 |
+
# Look ahead for good stopping points, but be more careful about spaces
|
| 527 |
+
max_extend = 15 # Don't go crazy far
|
| 528 |
+
extended = 0
|
| 529 |
+
|
| 530 |
+
while end_pos < len(document_markdown) and extended < max_extend:
|
| 531 |
+
char = document_markdown[end_pos]
|
| 532 |
+
|
| 533 |
+
# Good stopping points - include punctuation and stop
|
| 534 |
+
if char in '.!?':
|
| 535 |
+
end_pos += 1 # Include the punctuation
|
| 536 |
+
break
|
| 537 |
+
elif char in ';:,':
|
| 538 |
+
end_pos += 1 # Include and stop
|
| 539 |
+
break
|
| 540 |
+
# Stop at paragraph breaks
|
| 541 |
+
elif end_pos < len(document_markdown) - 1 and document_markdown[end_pos:end_pos+2] == '\n\n':
|
| 542 |
+
break
|
| 543 |
+
# Stop at LaTeX boundaries
|
| 544 |
+
elif char == '$':
|
| 545 |
+
break
|
| 546 |
+
# Continue through normal chars and whitespace
|
| 547 |
+
else:
|
| 548 |
+
end_pos += 1
|
| 549 |
+
extended += 1
|
| 550 |
+
print(f" Final end_pos: {end_pos}")
|
| 551 |
else:
|
| 552 |
+
print(f" End phrase not found! Finding paragraph end...")
|
| 553 |
+
end_pos = find_paragraph_end(document_markdown, start_pos)
|
| 554 |
+
|
| 555 |
+
if start_pos is not None and end_pos is not None:
|
| 556 |
+
# Show actual extracted text for debugging
|
| 557 |
+
extracted_text = document_markdown[start_pos:end_pos]
|
| 558 |
+
print(f" Extracted text: '{extracted_text[:100]}...'")
|
| 559 |
|
| 560 |
if start_pos is not None:
|
| 561 |
positioned_chunks.append({
|
|
|
|
| 568 |
"found_end": end_pos is not None
|
| 569 |
})
|
| 570 |
|
| 571 |
+
# Sort chunks by position in document for chronological order
|
| 572 |
+
positioned_chunks.sort(key=lambda chunk: chunk.get('start_position', 0))
|
| 573 |
+
print(f"📊 Final sorted chunks: {len(positioned_chunks)}")
|
| 574 |
+
|
| 575 |
return positioned_chunks
|
| 576 |
|
| 577 |
except Exception as e:
|
|
|
|
| 629 |
# Find positions using fuzzy matching
|
| 630 |
positioned_chunks = []
|
| 631 |
for chunk in chunks:
|
| 632 |
+
start_pos = fuzzy_find(document_markdown, chunk.start_phrase)
|
| 633 |
+
end_phrase_start = fuzzy_find(document_markdown, chunk.end_phrase, start_pos or 0)
|
| 634 |
# Add the length of the end_phrase plus a bit more to include punctuation
|
| 635 |
if end_phrase_start is not None:
|
| 636 |
end_pos = end_phrase_start + len(chunk.end_phrase)
|
| 637 |
# Try to include punctuation that might follow
|
| 638 |
+
if end_pos < len(document_markdown) and document_markdown[end_pos] in '.!?;:,':
|
| 639 |
end_pos += 1
|
| 640 |
else:
|
| 641 |
end_pos = None
|
frontend/src/components/ChunkNavigation.jsx
ADDED
|
@@ -0,0 +1,47 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
const ChunkNavigation = ({
|
| 2 |
+
currentChunkIndex,
|
| 3 |
+
documentData,
|
| 4 |
+
chunkStates,
|
| 5 |
+
goToPrevChunk,
|
| 6 |
+
goToNextChunk
|
| 7 |
+
}) => {
|
| 8 |
+
return (
|
| 9 |
+
<div className="flex items-center justify-center gap-4 mb-4 px-4">
|
| 10 |
+
<button
|
| 11 |
+
onClick={goToPrevChunk}
|
| 12 |
+
disabled={currentChunkIndex === 0}
|
| 13 |
+
className="p-3 bg-white hover:bg-gray-50 disabled:opacity-30 disabled:cursor-not-allowed rounded-lg shadow-sm transition-all"
|
| 14 |
+
>
|
| 15 |
+
<svg className="w-5 h-5 text-gray-700" fill="none" stroke="currentColor" viewBox="0 0 24 24" strokeWidth={3}>
|
| 16 |
+
<path strokeLinecap="round" strokeLinejoin="round" d="M15 19l-7-7 7-7" />
|
| 17 |
+
</svg>
|
| 18 |
+
</button>
|
| 19 |
+
|
| 20 |
+
<div className="flex space-x-2">
|
| 21 |
+
{documentData?.chunks?.map((_, index) => (
|
| 22 |
+
<div
|
| 23 |
+
key={index}
|
| 24 |
+
className={`w-3 h-3 rounded-full ${
|
| 25 |
+
chunkStates[index] === 'understood' ? 'bg-green-500' :
|
| 26 |
+
chunkStates[index] === 'skipped' ? 'bg-red-500' :
|
| 27 |
+
chunkStates[index] === 'interactive' ? 'bg-blue-500' :
|
| 28 |
+
index === currentChunkIndex ? 'bg-gray-600' : 'bg-gray-300'
|
| 29 |
+
}`}
|
| 30 |
+
/>
|
| 31 |
+
))}
|
| 32 |
+
</div>
|
| 33 |
+
|
| 34 |
+
<button
|
| 35 |
+
onClick={goToNextChunk}
|
| 36 |
+
disabled={!documentData?.chunks || currentChunkIndex === documentData.chunks.length - 1}
|
| 37 |
+
className="p-3 bg-white hover:bg-gray-50 disabled:opacity-30 disabled:cursor-not-allowed rounded-lg shadow-sm transition-all"
|
| 38 |
+
>
|
| 39 |
+
<svg className="w-5 h-5 text-gray-700" fill="none" stroke="currentColor" viewBox="0 0 24 24" strokeWidth={3}>
|
| 40 |
+
<path strokeLinecap="round" strokeLinejoin="round" d="M9 5l7 7-7 7" />
|
| 41 |
+
</svg>
|
| 42 |
+
</button>
|
| 43 |
+
</div>
|
| 44 |
+
);
|
| 45 |
+
};
|
| 46 |
+
|
| 47 |
+
export default ChunkNavigation;
|
frontend/src/components/ChunkPanel.jsx
ADDED
|
@@ -0,0 +1,198 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import ReactMarkdown from 'react-markdown';
|
| 2 |
+
import remarkMath from 'remark-math';
|
| 3 |
+
import rehypeKatex from 'rehype-katex';
|
| 4 |
+
import rehypeRaw from 'rehype-raw';
|
| 5 |
+
import { getChunkMarkdownComponents, getChatMarkdownComponents } from '../utils/markdownComponents.jsx';
|
| 6 |
+
|
| 7 |
+
const ChunkPanel = ({
|
| 8 |
+
documentData,
|
| 9 |
+
currentChunkIndex,
|
| 10 |
+
chunkExpanded,
|
| 11 |
+
setChunkExpanded,
|
| 12 |
+
chunkStates,
|
| 13 |
+
skipChunk,
|
| 14 |
+
markChunkUnderstood,
|
| 15 |
+
startInteractiveLesson,
|
| 16 |
+
chatLoading,
|
| 17 |
+
chatMessages,
|
| 18 |
+
typingMessage,
|
| 19 |
+
userInput,
|
| 20 |
+
setUserInput,
|
| 21 |
+
fetchImage,
|
| 22 |
+
imageCache,
|
| 23 |
+
setImageCache
|
| 24 |
+
}) => {
|
| 25 |
+
const chunkMarkdownComponents = getChunkMarkdownComponents(documentData, fetchImage, imageCache, setImageCache);
|
| 26 |
+
const chatMarkdownComponents = getChatMarkdownComponents();
|
| 27 |
+
|
| 28 |
+
return (
|
| 29 |
+
<>
|
| 30 |
+
{/* Chunk Header */}
|
| 31 |
+
<div className="px-6 py-4 flex-shrink-0 bg-white rounded-t-lg border-b border-gray-200 z-10">
|
| 32 |
+
<div className="flex items-center justify-between">
|
| 33 |
+
<button
|
| 34 |
+
onClick={() => setChunkExpanded(!chunkExpanded)}
|
| 35 |
+
className="flex items-center hover:bg-gray-50 py-2 px-3 rounded-lg transition-all -ml-3"
|
| 36 |
+
>
|
| 37 |
+
<div className="font-semibold text-gray-900 text-left flex-1">
|
| 38 |
+
<ReactMarkdown
|
| 39 |
+
remarkPlugins={[remarkMath]}
|
| 40 |
+
rehypePlugins={[rehypeRaw, rehypeKatex]}
|
| 41 |
+
components={{
|
| 42 |
+
p: ({ children }) => <span>{children}</span>, // Render as inline span
|
| 43 |
+
...chatMarkdownComponents
|
| 44 |
+
}}
|
| 45 |
+
>
|
| 46 |
+
{documentData?.chunks?.[currentChunkIndex]?.topic || "Loading..."}
|
| 47 |
+
</ReactMarkdown>
|
| 48 |
+
</div>
|
| 49 |
+
<span className="text-gray-400 ml-3">
|
| 50 |
+
{chunkExpanded ? '▲' : '▼'}
|
| 51 |
+
</span>
|
| 52 |
+
</button>
|
| 53 |
+
|
| 54 |
+
<button
|
| 55 |
+
onClick={markChunkUnderstood}
|
| 56 |
+
className="py-2 px-4 bg-gray-50 hover:bg-gray-100 text-gray-600 rounded-lg transition-all text-sm"
|
| 57 |
+
>
|
| 58 |
+
✓
|
| 59 |
+
</button>
|
| 60 |
+
</div>
|
| 61 |
+
|
| 62 |
+
{/* Expandable Chunk Content */}
|
| 63 |
+
{chunkExpanded && documentData?.chunks?.[currentChunkIndex] && (
|
| 64 |
+
<div className="prose prose-sm max-w-none">
|
| 65 |
+
<ReactMarkdown
|
| 66 |
+
remarkPlugins={[remarkMath]}
|
| 67 |
+
rehypePlugins={[rehypeRaw, rehypeKatex]}
|
| 68 |
+
components={chunkMarkdownComponents}
|
| 69 |
+
>
|
| 70 |
+
{documentData.markdown.slice(
|
| 71 |
+
documentData.chunks[currentChunkIndex].start_position,
|
| 72 |
+
documentData.chunks[currentChunkIndex].end_position
|
| 73 |
+
)}
|
| 74 |
+
</ReactMarkdown>
|
| 75 |
+
</div>
|
| 76 |
+
)}
|
| 77 |
+
</div>
|
| 78 |
+
|
| 79 |
+
{/* Content Area */}
|
| 80 |
+
<div className="flex-1 flex flex-col min-h-0">
|
| 81 |
+
{/* Action Buttons */}
|
| 82 |
+
{chunkStates[currentChunkIndex] !== 'interactive' && (
|
| 83 |
+
<div className="flex-shrink-0 p-6 border-b border-gray-200">
|
| 84 |
+
<div className="flex gap-3">
|
| 85 |
+
<button
|
| 86 |
+
onClick={skipChunk}
|
| 87 |
+
className="flex-1 py-3 bg-gray-50 hover:bg-gray-100 text-gray-600 rounded-lg transition-all"
|
| 88 |
+
>
|
| 89 |
+
✕
|
| 90 |
+
</button>
|
| 91 |
+
|
| 92 |
+
<button
|
| 93 |
+
onClick={startInteractiveLesson}
|
| 94 |
+
disabled={chatLoading}
|
| 95 |
+
className="flex-1 py-3 bg-gray-50 hover:bg-gray-100 disabled:opacity-50 text-gray-600 rounded-lg transition-all"
|
| 96 |
+
>
|
| 97 |
+
{chatLoading ? '...' : 'Start'}
|
| 98 |
+
</button>
|
| 99 |
+
|
| 100 |
+
<button
|
| 101 |
+
onClick={markChunkUnderstood}
|
| 102 |
+
className="flex-1 py-3 bg-gray-50 hover:bg-gray-100 text-gray-600 rounded-lg transition-all"
|
| 103 |
+
>
|
| 104 |
+
✓
|
| 105 |
+
</button>
|
| 106 |
+
</div>
|
| 107 |
+
</div>
|
| 108 |
+
)}
|
| 109 |
+
|
| 110 |
+
{/* Chat Area */}
|
| 111 |
+
{chunkStates[currentChunkIndex] === 'interactive' && (
|
| 112 |
+
<div className="flex-1 flex flex-col min-h-0">
|
| 113 |
+
{/* Chat Messages */}
|
| 114 |
+
<div className="bg-white flex-1 overflow-y-auto space-y-4 px-6 py-2">
|
| 115 |
+
{(chatMessages[currentChunkIndex] || []).map((message, index) => (
|
| 116 |
+
message.type === 'user' ? (
|
| 117 |
+
<div
|
| 118 |
+
key={index}
|
| 119 |
+
className="w-full bg-gray-50 border border-gray-200 rounded-lg p-4 shadow-sm"
|
| 120 |
+
>
|
| 121 |
+
<div className="text-xs font-medium mb-2 text-gray-600">
|
| 122 |
+
You
|
| 123 |
+
</div>
|
| 124 |
+
<div className="prose prose-sm max-w-none">
|
| 125 |
+
<ReactMarkdown
|
| 126 |
+
remarkPlugins={[remarkMath]}
|
| 127 |
+
rehypePlugins={[rehypeRaw, rehypeKatex]}
|
| 128 |
+
components={chatMarkdownComponents}
|
| 129 |
+
>
|
| 130 |
+
{message.text}
|
| 131 |
+
</ReactMarkdown>
|
| 132 |
+
</div>
|
| 133 |
+
</div>
|
| 134 |
+
) : (
|
| 135 |
+
<div key={index} className="w-full py-4">
|
| 136 |
+
<div className="prose prose-sm max-w-none">
|
| 137 |
+
<ReactMarkdown
|
| 138 |
+
remarkPlugins={[remarkMath]}
|
| 139 |
+
rehypePlugins={[rehypeRaw, rehypeKatex]}
|
| 140 |
+
components={chatMarkdownComponents}
|
| 141 |
+
>
|
| 142 |
+
{message.text}
|
| 143 |
+
</ReactMarkdown>
|
| 144 |
+
</div>
|
| 145 |
+
</div>
|
| 146 |
+
)
|
| 147 |
+
))}
|
| 148 |
+
|
| 149 |
+
{/* Typing animation message */}
|
| 150 |
+
{typingMessage && (
|
| 151 |
+
<div className="w-full py-4">
|
| 152 |
+
<div className="prose prose-sm max-w-none">
|
| 153 |
+
<ReactMarkdown
|
| 154 |
+
remarkPlugins={[remarkMath]}
|
| 155 |
+
rehypePlugins={[rehypeRaw, rehypeKatex]}
|
| 156 |
+
components={chatMarkdownComponents}
|
| 157 |
+
>
|
| 158 |
+
{typingMessage}
|
| 159 |
+
</ReactMarkdown>
|
| 160 |
+
</div>
|
| 161 |
+
</div>
|
| 162 |
+
)}
|
| 163 |
+
|
| 164 |
+
{/* Loading dots */}
|
| 165 |
+
{chatLoading && (
|
| 166 |
+
<div className="w-full py-4">
|
| 167 |
+
<div className="flex space-x-1">
|
| 168 |
+
<div className="w-2 h-2 bg-gray-400 rounded-full animate-bounce"></div>
|
| 169 |
+
<div className="w-2 h-2 bg-gray-400 rounded-full animate-bounce" style={{animationDelay: '0.1s'}}></div>
|
| 170 |
+
<div className="w-2 h-2 bg-gray-400 rounded-full animate-bounce" style={{animationDelay: '0.2s'}}></div>
|
| 171 |
+
</div>
|
| 172 |
+
</div>
|
| 173 |
+
)}
|
| 174 |
+
</div>
|
| 175 |
+
|
| 176 |
+
{/* Chat Input */}
|
| 177 |
+
<div className="flex-shrink-0 bg-white border-t border-gray-200 p-6">
|
| 178 |
+
<div className="flex gap-2 mb-3">
|
| 179 |
+
<input
|
| 180 |
+
type="text"
|
| 181 |
+
value={userInput}
|
| 182 |
+
onChange={(e) => setUserInput(e.target.value)}
|
| 183 |
+
placeholder="Type your response..."
|
| 184 |
+
className="flex-1 px-3 py-2 border border-gray-200 rounded-lg text-sm focus:outline-none focus:ring-1 focus:ring-gray-300"
|
| 185 |
+
/>
|
| 186 |
+
<button className="px-4 py-2 bg-gray-50 hover:bg-gray-100 text-gray-600 rounded-lg transition-all">
|
| 187 |
+
→
|
| 188 |
+
</button>
|
| 189 |
+
</div>
|
| 190 |
+
</div>
|
| 191 |
+
</div>
|
| 192 |
+
)}
|
| 193 |
+
</div>
|
| 194 |
+
</>
|
| 195 |
+
);
|
| 196 |
+
};
|
| 197 |
+
|
| 198 |
+
export default ChunkPanel;
|
frontend/src/components/DocumentProcessor.jsx
CHANGED
|
@@ -1,449 +1,80 @@
|
|
| 1 |
-
import {
|
| 2 |
-
import ReactMarkdown from 'react-markdown';
|
| 3 |
-
import remarkMath from 'remark-math';
|
| 4 |
-
import rehypeKatex from 'rehype-katex';
|
| 5 |
-
import rehypeRaw from 'rehype-raw';
|
| 6 |
import 'katex/dist/katex.min.css';
|
| 7 |
|
| 8 |
-
//
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
const chunk = chunks[currentChunkIndex];
|
| 15 |
-
const chunkText = markdown.slice(chunk.start_position, chunk.end_position);
|
| 16 |
-
|
| 17 |
-
// Debug logging
|
| 18 |
-
console.log('Chunk debugging:', {
|
| 19 |
-
chunkIndex: currentChunkIndex,
|
| 20 |
-
startPos: chunk.start_position,
|
| 21 |
-
endPos: chunk.end_position,
|
| 22 |
-
chunkTextLength: chunkText.length,
|
| 23 |
-
chunkTextPreview: chunkText.substring(0, 50) + '...',
|
| 24 |
-
beforeText: markdown.slice(Math.max(0, chunk.start_position - 20), chunk.start_position),
|
| 25 |
-
afterText: markdown.slice(chunk.end_position, chunk.end_position + 20)
|
| 26 |
-
});
|
| 27 |
-
|
| 28 |
-
// Use div wrapper that extends into document margins with left border and fade-in animation
|
| 29 |
-
const highlightedChunk = `<div style="background-color: rgba(255, 214, 100, 0.15); border-left: 4px solid rgba(156, 163, 175, 0.5); padding: 0.75rem; margin: 0.5rem -1.5rem; font-size: 0.875rem; line-height: 1.5; color: rgb(55, 65, 81); animation: fadeInHighlight 200ms ease-out;">${chunkText}</div>`;
|
| 30 |
-
|
| 31 |
-
// Replace the original chunk with the highlighted version
|
| 32 |
-
return markdown.slice(0, chunk.start_position) +
|
| 33 |
-
highlightedChunk +
|
| 34 |
-
markdown.slice(chunk.end_position);
|
| 35 |
-
};
|
| 36 |
-
|
| 37 |
-
function DocumentProcessor() {
|
| 38 |
-
const fileInputRef = useRef(null);
|
| 39 |
-
const [selectedFile, setSelectedFile] = useState(null);
|
| 40 |
-
const [processing, setProcessing] = useState(false);
|
| 41 |
-
const [uploadProgress, setUploadProgress] = useState(0);
|
| 42 |
-
const [ocrProgress, setOcrProgress] = useState(0);
|
| 43 |
-
const [documentData, setDocumentData] = useState(null);
|
| 44 |
-
const [imageCache, setImageCache] = useState({});
|
| 45 |
-
const [leftPanelWidth, setLeftPanelWidth] = useState(40);
|
| 46 |
-
const [isDragging, setIsDragging] = useState(false);
|
| 47 |
-
const containerRef = useRef(null);
|
| 48 |
-
const [chatData, setChatData] = useState({});
|
| 49 |
-
const [chatLoading, setChatLoading] = useState(false);
|
| 50 |
-
const [chatMessages, setChatMessages] = useState({});
|
| 51 |
-
const [userInput, setUserInput] = useState('');
|
| 52 |
-
const [chunkStates, setChunkStates] = useState({}); // 'skipped', 'interactive', 'understood'
|
| 53 |
-
const [currentChunkIndex, setCurrentChunkIndex] = useState(0);
|
| 54 |
-
const [chunkExpanded, setChunkExpanded] = useState(true);
|
| 55 |
-
const [typingMessage, setTypingMessage] = useState('');
|
| 56 |
-
const [typingInterval, setTypingInterval] = useState(null);
|
| 57 |
-
|
| 58 |
-
const handleFileChange = (e) => {
|
| 59 |
-
setSelectedFile(e.target.files[0]);
|
| 60 |
-
setDocumentData(null);
|
| 61 |
-
setUploadProgress(0);
|
| 62 |
-
setOcrProgress(0);
|
| 63 |
-
setImageCache({});
|
| 64 |
-
};
|
| 65 |
-
|
| 66 |
-
const fetchImage = useCallback(async (imageId, fileId) => {
|
| 67 |
-
if (imageCache[imageId]) {
|
| 68 |
-
return imageCache[imageId];
|
| 69 |
-
}
|
| 70 |
-
|
| 71 |
-
try {
|
| 72 |
-
const response = await fetch(`/get_image/${fileId}/${imageId}`);
|
| 73 |
-
if (response.ok) {
|
| 74 |
-
const data = await response.json();
|
| 75 |
-
const imageData = data.image_base64;
|
| 76 |
-
|
| 77 |
-
// Cache the image
|
| 78 |
-
setImageCache(prev => ({
|
| 79 |
-
...prev,
|
| 80 |
-
[imageId]: imageData
|
| 81 |
-
}));
|
| 82 |
-
|
| 83 |
-
return imageData;
|
| 84 |
-
}
|
| 85 |
-
} catch (error) {
|
| 86 |
-
console.error('Error fetching image:', error);
|
| 87 |
-
}
|
| 88 |
-
return null;
|
| 89 |
-
}, [imageCache]);
|
| 90 |
-
|
| 91 |
-
// Handle panel resizing
|
| 92 |
-
const handleMouseDown = (e) => {
|
| 93 |
-
setIsDragging(true);
|
| 94 |
-
e.preventDefault();
|
| 95 |
-
};
|
| 96 |
|
| 97 |
-
|
| 98 |
-
|
| 99 |
-
|
| 100 |
-
|
| 101 |
-
|
| 102 |
-
|
| 103 |
-
// Constrain between 20% and 80%
|
| 104 |
-
if (newLeftWidth >= 20 && newLeftWidth <= 80) {
|
| 105 |
-
setLeftPanelWidth(newLeftWidth);
|
| 106 |
-
}
|
| 107 |
-
};
|
| 108 |
-
|
| 109 |
-
const handleMouseUp = () => {
|
| 110 |
-
setIsDragging(false);
|
| 111 |
-
};
|
| 112 |
|
| 113 |
-
|
| 114 |
-
|
| 115 |
-
document.addEventListener('mousemove', handleMouseMove);
|
| 116 |
-
document.addEventListener('mouseup', handleMouseUp);
|
| 117 |
-
return () => {
|
| 118 |
-
document.removeEventListener('mousemove', handleMouseMove);
|
| 119 |
-
document.removeEventListener('mouseup', handleMouseUp);
|
| 120 |
-
};
|
| 121 |
-
}
|
| 122 |
-
}, [isDragging]);
|
| 123 |
|
| 124 |
-
|
| 125 |
-
|
| 126 |
-
|
| 127 |
-
|
| 128 |
-
|
| 129 |
-
|
| 130 |
-
|
| 131 |
-
|
| 132 |
-
|
| 133 |
-
|
| 134 |
-
|
| 135 |
-
|
| 136 |
-
|
| 137 |
-
|
| 138 |
-
|
| 139 |
-
|
| 140 |
-
|
| 141 |
-
|
| 142 |
-
|
| 143 |
-
|
| 144 |
-
|
| 145 |
-
|
| 146 |
-
|
| 147 |
-
|
| 148 |
-
};
|
| 149 |
-
|
| 150 |
-
|
| 151 |
-
|
| 152 |
-
|
| 153 |
-
|
| 154 |
-
|
| 155 |
-
|
| 156 |
-
|
| 157 |
-
|
| 158 |
-
|
| 159 |
-
|
| 160 |
-
|
| 161 |
-
|
| 162 |
-
|
| 163 |
-
|
| 164 |
-
|
| 165 |
-
|
| 166 |
-
|
| 167 |
-
|
| 168 |
-
|
| 169 |
-
|
| 170 |
-
|
| 171 |
-
|
| 172 |
-
if (!response.ok) {
|
| 173 |
-
const errorData = await response.text();
|
| 174 |
-
console.error('Backend error:', errorData);
|
| 175 |
-
throw new Error(`Failed to start lesson: ${response.status} - ${errorData}`);
|
| 176 |
-
}
|
| 177 |
-
|
| 178 |
-
const lessonData = await response.json();
|
| 179 |
-
setChatData(prev => ({
|
| 180 |
-
...prev,
|
| 181 |
-
[chunkIndex]: {
|
| 182 |
-
...lessonData,
|
| 183 |
-
chunkIndex: chunkIndex,
|
| 184 |
-
chunk: chunk
|
| 185 |
-
}
|
| 186 |
-
}));
|
| 187 |
-
|
| 188 |
-
setChatLoading(false);
|
| 189 |
-
|
| 190 |
-
// Type out the message with animation
|
| 191 |
-
typeMessage(lessonData.questions, () => {
|
| 192 |
-
setChatMessages(prev => ({
|
| 193 |
-
...prev,
|
| 194 |
-
[chunkIndex]: [
|
| 195 |
-
{ type: 'ai', text: lessonData.questions }
|
| 196 |
-
]
|
| 197 |
-
}));
|
| 198 |
-
});
|
| 199 |
-
|
| 200 |
-
} catch (error) {
|
| 201 |
-
console.error('Error starting lesson:', error);
|
| 202 |
-
alert('Error starting lesson: ' + error.message);
|
| 203 |
-
setChatLoading(false);
|
| 204 |
-
}
|
| 205 |
-
};
|
| 206 |
-
|
| 207 |
-
// Navigation functions
|
| 208 |
-
const goToNextChunk = () => {
|
| 209 |
-
if (documentData && currentChunkIndex < documentData.chunks.length - 1) {
|
| 210 |
-
// Clear any ongoing typing animation
|
| 211 |
-
if (typingInterval) {
|
| 212 |
-
clearInterval(typingInterval);
|
| 213 |
-
setTypingInterval(null);
|
| 214 |
-
}
|
| 215 |
-
setTypingMessage('');
|
| 216 |
-
setCurrentChunkIndex(currentChunkIndex + 1);
|
| 217 |
-
}
|
| 218 |
-
};
|
| 219 |
-
|
| 220 |
-
const goToPrevChunk = () => {
|
| 221 |
-
if (currentChunkIndex > 0) {
|
| 222 |
-
// Clear any ongoing typing animation
|
| 223 |
-
if (typingInterval) {
|
| 224 |
-
clearInterval(typingInterval);
|
| 225 |
-
setTypingInterval(null);
|
| 226 |
-
}
|
| 227 |
-
setTypingMessage('');
|
| 228 |
-
setCurrentChunkIndex(currentChunkIndex - 1);
|
| 229 |
-
}
|
| 230 |
-
};
|
| 231 |
-
|
| 232 |
-
// Chunk action functions
|
| 233 |
-
const skipChunk = () => {
|
| 234 |
-
setChunkStates(prev => ({
|
| 235 |
-
...prev,
|
| 236 |
-
[currentChunkIndex]: 'skipped'
|
| 237 |
-
}));
|
| 238 |
-
};
|
| 239 |
-
|
| 240 |
-
const markChunkUnderstood = () => {
|
| 241 |
-
setChunkStates(prev => ({
|
| 242 |
-
...prev,
|
| 243 |
-
[currentChunkIndex]: 'understood'
|
| 244 |
-
}));
|
| 245 |
-
};
|
| 246 |
-
|
| 247 |
-
const startInteractiveLesson = () => {
|
| 248 |
-
setChunkStates(prev => ({
|
| 249 |
-
...prev,
|
| 250 |
-
[currentChunkIndex]: 'interactive'
|
| 251 |
-
}));
|
| 252 |
-
startChunkLesson(currentChunkIndex);
|
| 253 |
-
};
|
| 254 |
-
|
| 255 |
-
const ImageComponent = ({ src, alt }) => {
|
| 256 |
-
const [imageSrc, setImageSrc] = useState(null);
|
| 257 |
-
const [loading, setLoading] = useState(true);
|
| 258 |
-
|
| 259 |
-
useEffect(() => {
|
| 260 |
-
if (documentData && src) {
|
| 261 |
-
fetchImage(src, documentData.fileId).then(imageData => {
|
| 262 |
-
if (imageData) {
|
| 263 |
-
setImageSrc(imageData);
|
| 264 |
-
}
|
| 265 |
-
setLoading(false);
|
| 266 |
-
});
|
| 267 |
-
}
|
| 268 |
-
}, [src, documentData?.fileId]);
|
| 269 |
-
|
| 270 |
-
if (loading) {
|
| 271 |
-
return (
|
| 272 |
-
<span style={{
|
| 273 |
-
display: 'inline-block',
|
| 274 |
-
width: '100%',
|
| 275 |
-
height: '200px',
|
| 276 |
-
backgroundColor: '#f3f4f6',
|
| 277 |
-
textAlign: 'center',
|
| 278 |
-
lineHeight: '200px',
|
| 279 |
-
margin: '1rem 0',
|
| 280 |
-
borderRadius: '0.5rem',
|
| 281 |
-
color: '#6b7280'
|
| 282 |
-
}}>
|
| 283 |
-
Loading image...
|
| 284 |
-
</span>
|
| 285 |
-
);
|
| 286 |
-
}
|
| 287 |
-
|
| 288 |
-
if (!imageSrc) {
|
| 289 |
-
return (
|
| 290 |
-
<span style={{
|
| 291 |
-
display: 'inline-block',
|
| 292 |
-
width: '100%',
|
| 293 |
-
height: '200px',
|
| 294 |
-
backgroundColor: '#fef2f2',
|
| 295 |
-
textAlign: 'center',
|
| 296 |
-
lineHeight: '200px',
|
| 297 |
-
margin: '1rem 0',
|
| 298 |
-
borderRadius: '0.5rem',
|
| 299 |
-
border: '1px solid #fecaca',
|
| 300 |
-
color: '#dc2626'
|
| 301 |
-
}}>
|
| 302 |
-
Image not found: {alt || src}
|
| 303 |
-
</span>
|
| 304 |
-
);
|
| 305 |
-
}
|
| 306 |
-
|
| 307 |
-
return (
|
| 308 |
-
<img
|
| 309 |
-
src={imageSrc}
|
| 310 |
-
alt={alt || 'Document image'}
|
| 311 |
-
style={{
|
| 312 |
-
display: 'block',
|
| 313 |
-
maxWidth: '100%',
|
| 314 |
-
height: 'auto',
|
| 315 |
-
margin: '1.5rem auto'
|
| 316 |
-
}}
|
| 317 |
-
/>
|
| 318 |
-
);
|
| 319 |
};
|
| 320 |
|
| 321 |
-
|
| 322 |
-
|
| 323 |
-
|
| 324 |
-
|
| 325 |
-
|
| 326 |
-
setProcessing(true);
|
| 327 |
-
setUploadProgress(0);
|
| 328 |
-
setOcrProgress(0);
|
| 329 |
-
|
| 330 |
-
try {
|
| 331 |
-
// Step 1: Upload PDF
|
| 332 |
-
const formData = new FormData();
|
| 333 |
-
formData.append('file', selectedFile);
|
| 334 |
-
|
| 335 |
-
setUploadProgress(30);
|
| 336 |
-
const uploadResponse = await fetch('/upload_pdf', {
|
| 337 |
-
method: 'POST',
|
| 338 |
-
body: formData,
|
| 339 |
-
});
|
| 340 |
-
|
| 341 |
-
if (!uploadResponse.ok) {
|
| 342 |
-
throw new Error('Failed to upload PDF');
|
| 343 |
-
}
|
| 344 |
-
|
| 345 |
-
const uploadData = await uploadResponse.json();
|
| 346 |
-
setUploadProgress(100);
|
| 347 |
-
|
| 348 |
-
// Step 2: Process OCR
|
| 349 |
-
setOcrProgress(20);
|
| 350 |
-
await new Promise(resolve => setTimeout(resolve, 500)); // Small delay for UX
|
| 351 |
-
|
| 352 |
-
setOcrProgress(60);
|
| 353 |
-
const ocrResponse = await fetch(`/process_ocr/${uploadData.file_id}`);
|
| 354 |
-
|
| 355 |
-
if (!ocrResponse.ok) {
|
| 356 |
-
throw new Error('Failed to process OCR');
|
| 357 |
-
}
|
| 358 |
-
|
| 359 |
-
const ocrData = await ocrResponse.json();
|
| 360 |
-
setOcrProgress(100);
|
| 361 |
-
|
| 362 |
-
// Combine all markdown from pages
|
| 363 |
-
const combinedMarkdown = ocrData.pages
|
| 364 |
-
.map(page => page.markdown)
|
| 365 |
-
.join('\n\n---\n\n');
|
| 366 |
-
|
| 367 |
-
// Collect all chunks from all pages
|
| 368 |
-
const allChunks = [];
|
| 369 |
-
let markdownOffset = 0;
|
| 370 |
-
|
| 371 |
-
ocrData.pages.forEach((page, pageIndex) => {
|
| 372 |
-
if (page.chunks && page.chunks.length > 0) {
|
| 373 |
-
page.chunks.forEach(chunk => {
|
| 374 |
-
allChunks.push({
|
| 375 |
-
...chunk,
|
| 376 |
-
start_position: chunk.start_position + markdownOffset,
|
| 377 |
-
end_position: chunk.end_position + markdownOffset,
|
| 378 |
-
pageIndex: pageIndex
|
| 379 |
-
});
|
| 380 |
-
});
|
| 381 |
-
}
|
| 382 |
-
markdownOffset += page.markdown.length + 6; // +6 for the separator "\n\n---\n\n"
|
| 383 |
-
});
|
| 384 |
-
|
| 385 |
-
setDocumentData({
|
| 386 |
-
fileId: uploadData.file_id,
|
| 387 |
-
filename: uploadData.filename,
|
| 388 |
-
markdown: combinedMarkdown,
|
| 389 |
-
pages: ocrData.pages,
|
| 390 |
-
totalPages: ocrData.total_pages,
|
| 391 |
-
chunks: allChunks
|
| 392 |
-
});
|
| 393 |
-
|
| 394 |
-
} catch (error) {
|
| 395 |
-
console.error('Error processing document:', error);
|
| 396 |
-
alert('Error processing document: ' + error.message);
|
| 397 |
-
} finally {
|
| 398 |
-
setProcessing(false);
|
| 399 |
}
|
| 400 |
-
|
| 401 |
-
|
| 402 |
-
const LoadingAnimation = () => (
|
| 403 |
-
<div className="flex flex-col items-center justify-center min-h-screen bg-gray-50">
|
| 404 |
-
<div className="text-center max-w-md">
|
| 405 |
-
<div className="mb-8">
|
| 406 |
-
<div className="w-16 h-16 border-4 border-blue-500 border-t-transparent rounded-full animate-spin mx-auto mb-4"></div>
|
| 407 |
-
<h2 className="text-2xl font-bold text-gray-900 mb-2">Processing Your Document</h2>
|
| 408 |
-
<p className="text-gray-600">This may take a moment...</p>
|
| 409 |
-
</div>
|
| 410 |
-
|
| 411 |
-
{/* Upload Progress */}
|
| 412 |
-
<div className="mb-6">
|
| 413 |
-
<div className="flex justify-between text-sm text-gray-600 mb-1">
|
| 414 |
-
<span>Uploading PDF</span>
|
| 415 |
-
<span>{uploadProgress}%</span>
|
| 416 |
-
</div>
|
| 417 |
-
<div className="w-full bg-gray-200 rounded-full h-2">
|
| 418 |
-
<div
|
| 419 |
-
className="bg-blue-500 h-2 rounded-full transition-all duration-300"
|
| 420 |
-
style={{ width: `${uploadProgress}%` }}
|
| 421 |
-
></div>
|
| 422 |
-
</div>
|
| 423 |
-
</div>
|
| 424 |
-
|
| 425 |
-
{/* OCR Progress */}
|
| 426 |
-
<div className="mb-6">
|
| 427 |
-
<div className="flex justify-between text-sm text-gray-600 mb-1">
|
| 428 |
-
<span>Processing with AI</span>
|
| 429 |
-
<span>{ocrProgress}%</span>
|
| 430 |
-
</div>
|
| 431 |
-
<div className="w-full bg-gray-200 rounded-full h-2">
|
| 432 |
-
<div
|
| 433 |
-
className="bg-green-500 h-2 rounded-full transition-all duration-300"
|
| 434 |
-
style={{ width: `${ocrProgress}%` }}
|
| 435 |
-
></div>
|
| 436 |
-
</div>
|
| 437 |
-
</div>
|
| 438 |
-
|
| 439 |
-
<p className="text-sm text-gray-500">
|
| 440 |
-
Using AI to extract text and understand your document structure...
|
| 441 |
-
</p>
|
| 442 |
-
</div>
|
| 443 |
-
</div>
|
| 444 |
-
);
|
| 445 |
-
|
| 446 |
|
|
|
|
| 447 |
if (!selectedFile) {
|
| 448 |
return (
|
| 449 |
<div className="h-screen bg-gray-50 flex items-center justify-center">
|
|
@@ -465,7 +96,7 @@ function DocumentProcessor() {
|
|
| 465 |
}
|
| 466 |
|
| 467 |
if (processing) {
|
| 468 |
-
return <LoadingAnimation />;
|
| 469 |
}
|
| 470 |
|
| 471 |
if (!documentData) {
|
|
@@ -489,6 +120,7 @@ function DocumentProcessor() {
|
|
| 489 |
);
|
| 490 |
}
|
| 491 |
|
|
|
|
| 492 |
return (
|
| 493 |
<div
|
| 494 |
ref={containerRef}
|
|
@@ -496,75 +128,14 @@ function DocumentProcessor() {
|
|
| 496 |
style={{ cursor: isDragging ? 'col-resize' : 'default' }}
|
| 497 |
>
|
| 498 |
{/* Left Panel - Document */}
|
| 499 |
-
<div
|
| 500 |
-
|
| 501 |
-
|
| 502 |
-
|
| 503 |
-
|
| 504 |
-
|
| 505 |
-
|
| 506 |
-
|
| 507 |
-
|
| 508 |
-
{/* Content */}
|
| 509 |
-
<div className="flex-1 px-6 pt-6 pb-8 overflow-y-auto">
|
| 510 |
-
<style>
|
| 511 |
-
{`
|
| 512 |
-
@keyframes fadeInHighlight {
|
| 513 |
-
0% {
|
| 514 |
-
background-color: rgba(255, 214, 100, 0);
|
| 515 |
-
border-left-color: rgba(156, 163, 175, 0);
|
| 516 |
-
transform: translateX(-10px);
|
| 517 |
-
opacity: 0;
|
| 518 |
-
}
|
| 519 |
-
100% {
|
| 520 |
-
background-color: rgba(255, 214, 100, 0.15);
|
| 521 |
-
border-left-color: rgba(156, 163, 175, 0.5);
|
| 522 |
-
transform: translateX(0);
|
| 523 |
-
opacity: 1;
|
| 524 |
-
}
|
| 525 |
-
}
|
| 526 |
-
`}
|
| 527 |
-
</style>
|
| 528 |
-
<div className="prose prose-sm max-w-none" style={{
|
| 529 |
-
fontSize: '0.875rem',
|
| 530 |
-
lineHeight: '1.5',
|
| 531 |
-
color: 'rgb(55, 65, 81)'
|
| 532 |
-
}}>
|
| 533 |
-
<ReactMarkdown
|
| 534 |
-
remarkPlugins={[remarkMath]}
|
| 535 |
-
rehypePlugins={[rehypeRaw, rehypeKatex]}
|
| 536 |
-
components={{
|
| 537 |
-
h1: ({ children }) => <h1 style={{ fontSize: '1.5rem', fontWeight: 'bold', marginBottom: '1rem', color: '#1a202c' }}>{children}</h1>,
|
| 538 |
-
h2: ({ children }) => <h2 style={{ fontSize: '1.25rem', fontWeight: 'bold', marginBottom: '0.75rem', marginTop: '1.5rem', color: '#1a202c' }}>{children}</h2>,
|
| 539 |
-
h3: ({ children }) => <h3 style={{ fontSize: '1.125rem', fontWeight: 'bold', marginBottom: '0.5rem', marginTop: '1rem', color: '#1a202c' }}>{children}</h3>,
|
| 540 |
-
p: ({ children }) => <p style={{ marginBottom: '0.75rem', color: '#374151', lineHeight: '1.5', fontSize: '0.875rem' }}>{children}</p>,
|
| 541 |
-
hr: () => <hr style={{ margin: '1.5rem 0', borderColor: '#d1d5db' }} />,
|
| 542 |
-
ul: ({ children }) => <ul style={{ marginBottom: '0.75rem', marginLeft: '1.25rem', listStyleType: 'disc', fontSize: '0.875rem' }}>{children}</ul>,
|
| 543 |
-
ol: ({ children }) => <ol style={{ marginBottom: '0.75rem', marginLeft: '1.25rem', listStyleType: 'decimal', fontSize: '0.875rem' }}>{children}</ol>,
|
| 544 |
-
li: ({ children }) => <li style={{ marginBottom: '0.125rem', color: '#374151' }}>{children}</li>,
|
| 545 |
-
blockquote: ({ children }) => (
|
| 546 |
-
<blockquote style={{ borderLeft: '3px solid #3b82f6', paddingLeft: '0.75rem', fontStyle: 'italic', margin: '0.75rem 0', color: '#6b7280', fontSize: '0.875rem' }}>
|
| 547 |
-
{children}
|
| 548 |
-
</blockquote>
|
| 549 |
-
),
|
| 550 |
-
code: ({ inline, children }) =>
|
| 551 |
-
inline ?
|
| 552 |
-
<code style={{ backgroundColor: '#f3f4f6', padding: '0.125rem 0.25rem', borderRadius: '0.25rem', fontSize: '0.75rem', fontFamily: 'monospace' }}>{children}</code> :
|
| 553 |
-
<pre style={{ backgroundColor: '#f3f4f6', padding: '0.75rem', borderRadius: '0.375rem', overflowX: 'auto', margin: '0.75rem 0' }}>
|
| 554 |
-
<code style={{ fontSize: '0.75rem', fontFamily: 'monospace' }}>{children}</code>
|
| 555 |
-
</pre>,
|
| 556 |
-
div: ({ children, style }) => (
|
| 557 |
-
<div style={style}>
|
| 558 |
-
{children}
|
| 559 |
-
</div>
|
| 560 |
-
),
|
| 561 |
-
img: ({ src, alt }) => <ImageComponent src={src} alt={alt} />
|
| 562 |
-
}}
|
| 563 |
-
>
|
| 564 |
-
{highlightChunkInMarkdown(documentData.markdown, documentData.chunks, currentChunkIndex)}
|
| 565 |
-
</ReactMarkdown>
|
| 566 |
-
</div>
|
| 567 |
-
</div>
|
| 568 |
</div>
|
| 569 |
|
| 570 |
{/* Resizable Divider */}
|
|
@@ -573,14 +144,12 @@ function DocumentProcessor() {
|
|
| 573 |
style={{ width: '8px' }}
|
| 574 |
onMouseDown={handleMouseDown}
|
| 575 |
>
|
| 576 |
-
{/* Resizable Divider */}
|
| 577 |
<div
|
| 578 |
-
className="w-px h-full rounded-full transition-all
|
| 579 |
-
|
| 580 |
-
|
| 581 |
-
|
| 582 |
-
|
| 583 |
-
}}
|
| 584 |
></div>
|
| 585 |
</div>
|
| 586 |
|
|
@@ -589,280 +158,38 @@ function DocumentProcessor() {
|
|
| 589 |
className="flex flex-col"
|
| 590 |
style={{ width: `${100 - leftPanelWidth}%` }}
|
| 591 |
>
|
| 592 |
-
{/* Navigation Bar
|
| 593 |
-
<
|
| 594 |
-
|
| 595 |
-
|
| 596 |
-
|
| 597 |
-
|
| 598 |
-
|
| 599 |
-
|
| 600 |
-
<path strokeLinecap="round" strokeLinejoin="round" d="M15 19l-7-7 7-7" />
|
| 601 |
-
</svg>
|
| 602 |
-
</button>
|
| 603 |
-
|
| 604 |
-
<div className="flex space-x-2">
|
| 605 |
-
{documentData?.chunks?.map((_, index) => (
|
| 606 |
-
<div
|
| 607 |
-
key={index}
|
| 608 |
-
className={`w-3 h-3 rounded-full ${
|
| 609 |
-
chunkStates[index] === 'understood' ? 'bg-green-500' :
|
| 610 |
-
chunkStates[index] === 'skipped' ? 'bg-red-500' :
|
| 611 |
-
chunkStates[index] === 'interactive' ? 'bg-blue-500' :
|
| 612 |
-
index === currentChunkIndex ? 'bg-gray-600' : 'bg-gray-300'
|
| 613 |
-
}`}
|
| 614 |
-
/>
|
| 615 |
-
))}
|
| 616 |
-
</div>
|
| 617 |
-
|
| 618 |
-
<button
|
| 619 |
-
onClick={goToNextChunk}
|
| 620 |
-
disabled={!documentData?.chunks || currentChunkIndex === documentData.chunks.length - 1}
|
| 621 |
-
className="p-3 bg-white hover:bg-gray-50 disabled:opacity-30 disabled:cursor-not-allowed rounded-lg shadow-sm transition-all"
|
| 622 |
-
>
|
| 623 |
-
<svg className="w-5 h-5 text-gray-700" fill="none" stroke="currentColor" viewBox="0 0 24 24" strokeWidth={3}>
|
| 624 |
-
<path strokeLinecap="round" strokeLinejoin="round" d="M9 5l7 7-7 7" />
|
| 625 |
-
</svg>
|
| 626 |
-
</button>
|
| 627 |
-
</div>
|
| 628 |
|
| 629 |
{/* Chunk Panel */}
|
| 630 |
-
|
| 631 |
-
<
|
| 632 |
-
|
| 633 |
-
|
| 634 |
-
|
| 635 |
-
|
| 636 |
-
|
| 637 |
-
|
| 638 |
-
|
| 639 |
-
|
| 640 |
-
|
| 641 |
-
|
| 642 |
-
|
| 643 |
-
|
| 644 |
-
|
| 645 |
-
|
| 646 |
-
|
| 647 |
-
|
| 648 |
-
|
| 649 |
-
✓
|
| 650 |
-
</button>
|
| 651 |
-
</div>
|
| 652 |
-
|
| 653 |
-
{/* Expandable Chunk Content - in header area */}
|
| 654 |
-
{chunkExpanded && documentData?.chunks?.[currentChunkIndex] && (
|
| 655 |
-
<div className="prose prose-sm max-w-none">
|
| 656 |
-
<ReactMarkdown
|
| 657 |
-
remarkPlugins={[remarkMath]}
|
| 658 |
-
rehypePlugins={[rehypeRaw, rehypeKatex]}
|
| 659 |
-
components={{
|
| 660 |
-
h1: ({ children }) => <h1 style={{ fontSize: '1.25rem', fontWeight: 'bold', marginBottom: '0.75rem', color: '#1a202c' }}>{children}</h1>,
|
| 661 |
-
h2: ({ children }) => <h2 style={{ fontSize: '1.125rem', fontWeight: 'bold', marginBottom: '0.5rem', marginTop: '1rem', color: '#1a202c' }}>{children}</h2>,
|
| 662 |
-
h3: ({ children }) => <h3 style={{ fontSize: '1rem', fontWeight: 'bold', marginBottom: '0.5rem', marginTop: '0.75rem', color: '#1a202c' }}>{children}</h3>,
|
| 663 |
-
p: ({ children }) => <p style={{ marginBottom: '0.5rem', color: '#374151', lineHeight: '1.4', fontSize: '0.875rem' }}>{children}</p>,
|
| 664 |
-
hr: () => <hr style={{ margin: '1rem 0', borderColor: '#d1d5db' }} />,
|
| 665 |
-
ul: ({ children }) => <ul style={{ marginBottom: '0.5rem', marginLeft: '1rem', listStyleType: 'disc', fontSize: '0.875rem' }}>{children}</ul>,
|
| 666 |
-
ol: ({ children }) => <ol style={{ marginBottom: '0.5rem', marginLeft: '1rem', listStyleType: 'decimal', fontSize: '0.875rem' }}>{children}</ol>,
|
| 667 |
-
li: ({ children }) => <li style={{ marginBottom: '0.125rem', color: '#374151' }}>{children}</li>,
|
| 668 |
-
blockquote: ({ children }) => (
|
| 669 |
-
<blockquote style={{ borderLeft: '2px solid #9ca3af', paddingLeft: '0.5rem', fontStyle: 'italic', margin: '0.5rem 0', color: '#6b7280', fontSize: '0.875rem' }}>
|
| 670 |
-
{children}
|
| 671 |
-
</blockquote>
|
| 672 |
-
),
|
| 673 |
-
code: ({ inline, children }) =>
|
| 674 |
-
inline ?
|
| 675 |
-
<code style={{ backgroundColor: '#f3f4f6', padding: '0.125rem 0.25rem', borderRadius: '0.25rem', fontSize: '0.75rem', fontFamily: 'monospace' }}>{children}</code> :
|
| 676 |
-
<pre style={{ backgroundColor: '#f3f4f6', padding: '0.5rem', borderRadius: '0.25rem', overflowX: 'auto', margin: '0.5rem 0' }}>
|
| 677 |
-
<code style={{ fontSize: '0.75rem', fontFamily: 'monospace' }}>{children}</code>
|
| 678 |
-
</pre>,
|
| 679 |
-
img: ({ src, alt }) => <ImageComponent src={src} alt={alt} />
|
| 680 |
-
}}
|
| 681 |
-
>
|
| 682 |
-
{documentData.markdown.slice(
|
| 683 |
-
documentData.chunks[currentChunkIndex].start_position,
|
| 684 |
-
documentData.chunks[currentChunkIndex].end_position
|
| 685 |
-
)}
|
| 686 |
-
</ReactMarkdown>
|
| 687 |
-
</div>
|
| 688 |
-
)}
|
| 689 |
-
|
| 690 |
-
|
| 691 |
-
</div>
|
| 692 |
-
|
| 693 |
-
|
| 694 |
-
{/* Content Area */}
|
| 695 |
-
<div className="flex-1 flex flex-col min-h-0">
|
| 696 |
-
{/* Action Buttons */}
|
| 697 |
-
{chunkStates[currentChunkIndex] !== 'interactive' && (
|
| 698 |
-
<div className="flex-shrink-0 p-6 border-b border-gray-200">
|
| 699 |
-
<div className="flex gap-3">
|
| 700 |
-
<button
|
| 701 |
-
onClick={skipChunk}
|
| 702 |
-
className="flex-1 py-3 bg-gray-50 hover:bg-gray-100 text-gray-600 rounded-lg transition-all"
|
| 703 |
-
>
|
| 704 |
-
✕
|
| 705 |
-
</button>
|
| 706 |
-
|
| 707 |
-
<button
|
| 708 |
-
onClick={startInteractiveLesson}
|
| 709 |
-
disabled={chatLoading}
|
| 710 |
-
className="flex-1 py-3 bg-gray-50 hover:bg-gray-100 disabled:opacity-50 text-gray-600 rounded-lg transition-all"
|
| 711 |
-
>
|
| 712 |
-
{chatLoading ? '...' : 'Start'}
|
| 713 |
-
</button>
|
| 714 |
-
|
| 715 |
-
<button
|
| 716 |
-
onClick={markChunkUnderstood}
|
| 717 |
-
className="flex-1 py-3 bg-gray-50 hover:bg-gray-100 text-gray-600 rounded-lg transition-all"
|
| 718 |
-
>
|
| 719 |
-
✓
|
| 720 |
-
</button>
|
| 721 |
-
</div>
|
| 722 |
-
</div>
|
| 723 |
-
)}
|
| 724 |
-
|
| 725 |
-
{/* Chat Area - sandwich layout when interactive */}
|
| 726 |
-
{chunkStates[currentChunkIndex] === 'interactive' && (
|
| 727 |
-
<div className="flex-1 flex flex-col min-h-0">
|
| 728 |
-
{/* Chat Messages - scrollable middle layer */}
|
| 729 |
-
<div className="bg-white flex-1 overflow-y-auto space-y-4 px-6 py-2">
|
| 730 |
-
{(chatMessages[currentChunkIndex] || []).map((message, index) => (
|
| 731 |
-
message.type === 'user' ? (
|
| 732 |
-
<div
|
| 733 |
-
key={index}
|
| 734 |
-
className="w-full bg-gray-50 border border-gray-200 rounded-lg p-4 shadow-sm"
|
| 735 |
-
>
|
| 736 |
-
<div className="text-xs font-medium mb-2 text-gray-600">
|
| 737 |
-
You
|
| 738 |
-
</div>
|
| 739 |
-
<div className="prose prose-sm max-w-none">
|
| 740 |
-
<ReactMarkdown
|
| 741 |
-
remarkPlugins={[remarkMath]}
|
| 742 |
-
rehypePlugins={[rehypeRaw, rehypeKatex]}
|
| 743 |
-
components={{
|
| 744 |
-
p: ({ children }) => <p className="mb-2 text-gray-800 leading-relaxed">{children}</p>,
|
| 745 |
-
ul: ({ children }) => <ul className="mb-2 ml-4 list-disc">{children}</ul>,
|
| 746 |
-
ol: ({ children }) => <ol className="mb-2 ml-4 list-decimal">{children}</ol>,
|
| 747 |
-
li: ({ children }) => <li className="mb-1 text-gray-800">{children}</li>,
|
| 748 |
-
strong: ({ children }) => <strong className="font-semibold text-gray-900">{children}</strong>,
|
| 749 |
-
em: ({ children }) => <em className="italic">{children}</em>,
|
| 750 |
-
code: ({ inline, children }) =>
|
| 751 |
-
inline ?
|
| 752 |
-
<code className="bg-gray-100 px-1 py-0.5 rounded text-sm font-mono">{children}</code> :
|
| 753 |
-
<pre className="bg-gray-100 p-2 rounded overflow-x-auto my-2">
|
| 754 |
-
<code className="text-sm font-mono">{children}</code>
|
| 755 |
-
</pre>,
|
| 756 |
-
blockquote: ({ children }) => (
|
| 757 |
-
<blockquote className="border-l-4 border-blue-200 pl-4 italic text-gray-700 my-2">
|
| 758 |
-
{children}
|
| 759 |
-
</blockquote>
|
| 760 |
-
)
|
| 761 |
-
}}
|
| 762 |
-
>
|
| 763 |
-
{message.text}
|
| 764 |
-
</ReactMarkdown>
|
| 765 |
-
</div>
|
| 766 |
-
</div>
|
| 767 |
-
) : (
|
| 768 |
-
<div key={index} className="w-full py-4">
|
| 769 |
-
<div className="prose prose-sm max-w-none">
|
| 770 |
-
<ReactMarkdown
|
| 771 |
-
remarkPlugins={[remarkMath]}
|
| 772 |
-
rehypePlugins={[rehypeRaw, rehypeKatex]}
|
| 773 |
-
components={{
|
| 774 |
-
p: ({ children }) => <p className="mb-2 text-gray-800 leading-relaxed">{children}</p>,
|
| 775 |
-
ul: ({ children }) => <ul className="mb-2 ml-4 list-disc">{children}</ul>,
|
| 776 |
-
ol: ({ children }) => <ol className="mb-2 ml-4 list-decimal">{children}</ol>,
|
| 777 |
-
li: ({ children }) => <li className="mb-1 text-gray-800">{children}</li>,
|
| 778 |
-
strong: ({ children }) => <strong className="font-semibold text-gray-900">{children}</strong>,
|
| 779 |
-
em: ({ children }) => <em className="italic">{children}</em>,
|
| 780 |
-
code: ({ inline, children }) =>
|
| 781 |
-
inline ?
|
| 782 |
-
<code className="bg-gray-100 px-1 py-0.5 rounded text-sm font-mono">{children}</code> :
|
| 783 |
-
<pre className="bg-gray-100 p-2 rounded overflow-x-auto my-2">
|
| 784 |
-
<code className="text-sm font-mono">{children}</code>
|
| 785 |
-
</pre>,
|
| 786 |
-
blockquote: ({ children }) => (
|
| 787 |
-
<blockquote className="border-l-4 border-blue-200 pl-4 italic text-gray-700 my-2">
|
| 788 |
-
{children}
|
| 789 |
-
</blockquote>
|
| 790 |
-
)
|
| 791 |
-
}}
|
| 792 |
-
>
|
| 793 |
-
{message.text}
|
| 794 |
-
</ReactMarkdown>
|
| 795 |
-
</div>
|
| 796 |
-
</div>
|
| 797 |
-
)
|
| 798 |
-
))}
|
| 799 |
-
|
| 800 |
-
{/* Typing animation message */}
|
| 801 |
-
{typingMessage && (
|
| 802 |
-
<div className="w-full py-4">
|
| 803 |
-
<div className="prose prose-sm max-w-none">
|
| 804 |
-
<ReactMarkdown
|
| 805 |
-
remarkPlugins={[remarkMath]}
|
| 806 |
-
rehypePlugins={[rehypeRaw, rehypeKatex]}
|
| 807 |
-
components={{
|
| 808 |
-
p: ({ children }) => <p className="mb-2 text-gray-800 leading-relaxed">{children}</p>,
|
| 809 |
-
ul: ({ children }) => <ul className="mb-2 ml-4 list-disc">{children}</ul>,
|
| 810 |
-
ol: ({ children }) => <ol className="mb-2 ml-4 list-decimal">{children}</ol>,
|
| 811 |
-
li: ({ children }) => <li className="mb-1 text-gray-800">{children}</li>,
|
| 812 |
-
strong: ({ children }) => <strong className="font-semibold text-gray-900">{children}</strong>,
|
| 813 |
-
em: ({ children }) => <em className="italic">{children}</em>,
|
| 814 |
-
code: ({ inline, children }) =>
|
| 815 |
-
inline ?
|
| 816 |
-
<code className="bg-gray-100 px-1 py-0.5 rounded text-sm font-mono">{children}</code> :
|
| 817 |
-
<pre className="bg-gray-100 p-2 rounded overflow-x-auto my-2">
|
| 818 |
-
<code className="text-sm font-mono">{children}</code>
|
| 819 |
-
</pre>,
|
| 820 |
-
blockquote: ({ children }) => (
|
| 821 |
-
<blockquote className="border-l-4 border-blue-200 pl-4 italic text-gray-700 my-2">
|
| 822 |
-
{children}
|
| 823 |
-
</blockquote>
|
| 824 |
-
)
|
| 825 |
-
}}
|
| 826 |
-
>
|
| 827 |
-
{typingMessage}
|
| 828 |
-
</ReactMarkdown>
|
| 829 |
-
</div>
|
| 830 |
-
</div>
|
| 831 |
-
)}
|
| 832 |
-
|
| 833 |
-
{/* Loading dots */}
|
| 834 |
-
{chatLoading && (
|
| 835 |
-
<div className="w-full py-4">
|
| 836 |
-
<div className="flex space-x-1">
|
| 837 |
-
<div className="w-2 h-2 bg-gray-400 rounded-full animate-bounce"></div>
|
| 838 |
-
<div className="w-2 h-2 bg-gray-400 rounded-full animate-bounce" style={{animationDelay: '0.1s'}}></div>
|
| 839 |
-
<div className="w-2 h-2 bg-gray-400 rounded-full animate-bounce" style={{animationDelay: '0.2s'}}></div>
|
| 840 |
-
</div>
|
| 841 |
-
</div>
|
| 842 |
-
)}
|
| 843 |
-
</div>
|
| 844 |
-
|
| 845 |
-
{/* Chat Input - sticky at bottom */}
|
| 846 |
-
<div className="flex-shrink-0 bg-white border-t border-gray-200 p-6">
|
| 847 |
-
<div className="flex gap-2 mb-3">
|
| 848 |
-
<input
|
| 849 |
-
type="text"
|
| 850 |
-
value={userInput}
|
| 851 |
-
onChange={(e) => setUserInput(e.target.value)}
|
| 852 |
-
placeholder="Type your response..."
|
| 853 |
-
className="flex-1 px-3 py-2 border border-gray-200 rounded-lg text-sm focus:outline-none focus:ring-1 focus:ring-gray-300"
|
| 854 |
-
/>
|
| 855 |
-
<button className="px-4 py-2 bg-gray-50 hover:bg-gray-100 text-gray-600 rounded-lg transition-all">
|
| 856 |
-
→
|
| 857 |
-
</button>
|
| 858 |
-
</div>
|
| 859 |
-
|
| 860 |
-
</div>
|
| 861 |
-
</div>
|
| 862 |
-
)}
|
| 863 |
-
</div>
|
| 864 |
</div>
|
| 865 |
</div>
|
|
|
|
| 866 |
);
|
| 867 |
}
|
| 868 |
|
|
|
|
| 1 |
+
import { useMemo } from 'react';
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
import 'katex/dist/katex.min.css';
|
| 3 |
|
| 4 |
+
// Import custom hooks
|
| 5 |
+
import { useDocumentProcessor } from '../hooks/useDocumentProcessor';
|
| 6 |
+
import { useChat } from '../hooks/useChat';
|
| 7 |
+
import { useChunkNavigation } from '../hooks/useChunkNavigation';
|
| 8 |
+
import { usePanelResize } from '../hooks/usePanelResize';
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
|
| 10 |
+
// Import components
|
| 11 |
+
import LoadingAnimation from './LoadingAnimation';
|
| 12 |
+
import DocumentViewer from './DocumentViewer';
|
| 13 |
+
import ChunkNavigation from './ChunkNavigation';
|
| 14 |
+
import ChunkPanel from './ChunkPanel';
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
+
// Import utilities
|
| 17 |
+
import { highlightChunkInMarkdown } from '../utils/markdownUtils';
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
|
| 19 |
+
function DocumentProcessor() {
|
| 20 |
+
// Custom hooks
|
| 21 |
+
const {
|
| 22 |
+
fileInputRef,
|
| 23 |
+
selectedFile,
|
| 24 |
+
processing,
|
| 25 |
+
uploadProgress,
|
| 26 |
+
ocrProgress,
|
| 27 |
+
documentData,
|
| 28 |
+
imageCache,
|
| 29 |
+
handleFileChange,
|
| 30 |
+
fetchImage,
|
| 31 |
+
processDocument,
|
| 32 |
+
setSelectedFile
|
| 33 |
+
} = useDocumentProcessor();
|
| 34 |
+
|
| 35 |
+
const {
|
| 36 |
+
chatLoading,
|
| 37 |
+
chatMessages,
|
| 38 |
+
userInput,
|
| 39 |
+
typingMessage,
|
| 40 |
+
startChunkLesson,
|
| 41 |
+
clearTypingAnimation,
|
| 42 |
+
setUserInput
|
| 43 |
+
} = useChat();
|
| 44 |
+
|
| 45 |
+
const {
|
| 46 |
+
chunkStates,
|
| 47 |
+
currentChunkIndex,
|
| 48 |
+
chunkExpanded,
|
| 49 |
+
goToNextChunk,
|
| 50 |
+
goToPrevChunk,
|
| 51 |
+
skipChunk,
|
| 52 |
+
markChunkUnderstood,
|
| 53 |
+
startInteractiveLesson,
|
| 54 |
+
setChunkExpanded
|
| 55 |
+
} = useChunkNavigation(documentData, clearTypingAnimation);
|
| 56 |
+
|
| 57 |
+
const {
|
| 58 |
+
leftPanelWidth,
|
| 59 |
+
isDragging,
|
| 60 |
+
containerRef,
|
| 61 |
+
handleMouseDown
|
| 62 |
+
} = usePanelResize(40);
|
| 63 |
+
|
| 64 |
+
// Enhanced startInteractiveLesson that uses the chat hook
|
| 65 |
+
const handleStartInteractiveLesson = () => {
|
| 66 |
+
startInteractiveLesson(() => startChunkLesson(currentChunkIndex, documentData));
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 67 |
};
|
| 68 |
|
| 69 |
+
// Memoize the highlighted markdown to prevent unnecessary re-renders
|
| 70 |
+
const highlightedMarkdown = useMemo(() => {
|
| 71 |
+
if (!documentData || !documentData.markdown || !documentData.chunks) {
|
| 72 |
+
return '';
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 73 |
}
|
| 74 |
+
return highlightChunkInMarkdown(documentData.markdown, documentData.chunks, currentChunkIndex);
|
| 75 |
+
}, [documentData?.markdown, documentData?.chunks, currentChunkIndex]);
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 76 |
|
| 77 |
+
// Early returns for different states
|
| 78 |
if (!selectedFile) {
|
| 79 |
return (
|
| 80 |
<div className="h-screen bg-gray-50 flex items-center justify-center">
|
|
|
|
| 96 |
}
|
| 97 |
|
| 98 |
if (processing) {
|
| 99 |
+
return <LoadingAnimation uploadProgress={uploadProgress} ocrProgress={ocrProgress} />;
|
| 100 |
}
|
| 101 |
|
| 102 |
if (!documentData) {
|
|
|
|
| 120 |
);
|
| 121 |
}
|
| 122 |
|
| 123 |
+
// Main render
|
| 124 |
return (
|
| 125 |
<div
|
| 126 |
ref={containerRef}
|
|
|
|
| 128 |
style={{ cursor: isDragging ? 'col-resize' : 'default' }}
|
| 129 |
>
|
| 130 |
{/* Left Panel - Document */}
|
| 131 |
+
<div style={{ width: `${leftPanelWidth}%`, height: '100%' }}>
|
| 132 |
+
<DocumentViewer
|
| 133 |
+
highlightedMarkdown={highlightedMarkdown}
|
| 134 |
+
documentData={documentData}
|
| 135 |
+
fetchImage={fetchImage}
|
| 136 |
+
imageCache={imageCache}
|
| 137 |
+
setImageCache={() => {}} // Handled by useDocumentProcessor
|
| 138 |
+
/>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 139 |
</div>
|
| 140 |
|
| 141 |
{/* Resizable Divider */}
|
|
|
|
| 144 |
style={{ width: '8px' }}
|
| 145 |
onMouseDown={handleMouseDown}
|
| 146 |
>
|
|
|
|
| 147 |
<div
|
| 148 |
+
className="w-px h-full rounded-full transition-all duration-200 group-hover:shadow-lg"
|
| 149 |
+
style={{
|
| 150 |
+
backgroundColor: isDragging ? 'rgba(59, 130, 246, 0.8)' : 'transparent',
|
| 151 |
+
boxShadow: isDragging ? '0 0 8px rgba(59, 130, 246, 0.8)' : 'none'
|
| 152 |
+
}}
|
|
|
|
| 153 |
></div>
|
| 154 |
</div>
|
| 155 |
|
|
|
|
| 158 |
className="flex flex-col"
|
| 159 |
style={{ width: `${100 - leftPanelWidth}%` }}
|
| 160 |
>
|
| 161 |
+
{/* Navigation Bar */}
|
| 162 |
+
<ChunkNavigation
|
| 163 |
+
currentChunkIndex={currentChunkIndex}
|
| 164 |
+
documentData={documentData}
|
| 165 |
+
chunkStates={chunkStates}
|
| 166 |
+
goToPrevChunk={goToPrevChunk}
|
| 167 |
+
goToNextChunk={goToNextChunk}
|
| 168 |
+
/>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 169 |
|
| 170 |
{/* Chunk Panel */}
|
| 171 |
+
<div className="flex-1 flex flex-col min-h-0 bg-white rounded-lg shadow-sm">
|
| 172 |
+
<ChunkPanel
|
| 173 |
+
documentData={documentData}
|
| 174 |
+
currentChunkIndex={currentChunkIndex}
|
| 175 |
+
chunkExpanded={chunkExpanded}
|
| 176 |
+
setChunkExpanded={setChunkExpanded}
|
| 177 |
+
chunkStates={chunkStates}
|
| 178 |
+
skipChunk={skipChunk}
|
| 179 |
+
markChunkUnderstood={markChunkUnderstood}
|
| 180 |
+
startInteractiveLesson={handleStartInteractiveLesson}
|
| 181 |
+
chatLoading={chatLoading}
|
| 182 |
+
chatMessages={chatMessages}
|
| 183 |
+
typingMessage={typingMessage}
|
| 184 |
+
userInput={userInput}
|
| 185 |
+
setUserInput={setUserInput}
|
| 186 |
+
fetchImage={fetchImage}
|
| 187 |
+
imageCache={imageCache}
|
| 188 |
+
setImageCache={() => {}} // Handled by useDocumentProcessor
|
| 189 |
+
/>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 190 |
</div>
|
| 191 |
</div>
|
| 192 |
+
</div>
|
| 193 |
);
|
| 194 |
}
|
| 195 |
|
frontend/src/components/DocumentProcessor.jsx.backup
ADDED
|
@@ -0,0 +1,889 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import { useMemo } from 'react';
|
| 2 |
+
import 'katex/dist/katex.min.css';
|
| 3 |
+
|
| 4 |
+
// Import custom hooks
|
| 5 |
+
import { useDocumentProcessor } from '../hooks/useDocumentProcessor';
|
| 6 |
+
import { useChat } from '../hooks/useChat';
|
| 7 |
+
import { useChunkNavigation } from '../hooks/useChunkNavigation';
|
| 8 |
+
import { usePanelResize } from '../hooks/usePanelResize';
|
| 9 |
+
|
| 10 |
+
// Import components
|
| 11 |
+
import LoadingAnimation from './LoadingAnimation';
|
| 12 |
+
import DocumentViewer from './DocumentViewer';
|
| 13 |
+
import ChunkNavigation from './ChunkNavigation';
|
| 14 |
+
import ChunkPanel from './ChunkPanel';
|
| 15 |
+
|
| 16 |
+
// Import utilities
|
| 17 |
+
import { highlightChunkInMarkdown } from '../utils/markdownUtils';
|
| 18 |
+
|
| 19 |
+
|
| 20 |
+
function DocumentProcessor() {
|
| 21 |
+
// Custom hooks
|
| 22 |
+
const {
|
| 23 |
+
fileInputRef,
|
| 24 |
+
selectedFile,
|
| 25 |
+
processing,
|
| 26 |
+
uploadProgress,
|
| 27 |
+
ocrProgress,
|
| 28 |
+
documentData,
|
| 29 |
+
imageCache,
|
| 30 |
+
handleFileChange,
|
| 31 |
+
fetchImage,
|
| 32 |
+
processDocument,
|
| 33 |
+
setSelectedFile
|
| 34 |
+
} = useDocumentProcessor();
|
| 35 |
+
|
| 36 |
+
const {
|
| 37 |
+
chatLoading,
|
| 38 |
+
chatMessages,
|
| 39 |
+
userInput,
|
| 40 |
+
typingMessage,
|
| 41 |
+
startChunkLesson,
|
| 42 |
+
clearTypingAnimation,
|
| 43 |
+
setUserInput
|
| 44 |
+
} = useChat();
|
| 45 |
+
|
| 46 |
+
const {
|
| 47 |
+
chunkStates,
|
| 48 |
+
currentChunkIndex,
|
| 49 |
+
chunkExpanded,
|
| 50 |
+
goToNextChunk,
|
| 51 |
+
goToPrevChunk,
|
| 52 |
+
skipChunk,
|
| 53 |
+
markChunkUnderstood,
|
| 54 |
+
startInteractiveLesson,
|
| 55 |
+
setChunkExpanded
|
| 56 |
+
} = useChunkNavigation(documentData, clearTypingAnimation);
|
| 57 |
+
|
| 58 |
+
const {
|
| 59 |
+
leftPanelWidth,
|
| 60 |
+
isDragging,
|
| 61 |
+
containerRef,
|
| 62 |
+
handleMouseDown
|
| 63 |
+
} = usePanelResize(40);
|
| 64 |
+
|
| 65 |
+
// Enhanced startInteractiveLesson that uses the chat hook
|
| 66 |
+
const handleStartInteractiveLesson = () => {
|
| 67 |
+
startInteractiveLesson(() => startChunkLesson(currentChunkIndex, documentData));
|
| 68 |
+
};
|
| 69 |
+
|
| 70 |
+
// Memoize the highlighted markdown to prevent unnecessary re-renders
|
| 71 |
+
const highlightedMarkdown = useMemo(() => {
|
| 72 |
+
if (!documentData || !documentData.markdown || !documentData.chunks) {
|
| 73 |
+
return '';
|
| 74 |
+
}
|
| 75 |
+
return highlightChunkInMarkdown(documentData.markdown, documentData.chunks, currentChunkIndex);
|
| 76 |
+
}, [documentData?.markdown, documentData?.chunks, currentChunkIndex]);
|
| 77 |
+
|
| 78 |
+
|
| 79 |
+
// Handle panel resizing
|
| 80 |
+
const handleMouseDown = (e) => {
|
| 81 |
+
setIsDragging(true);
|
| 82 |
+
e.preventDefault();
|
| 83 |
+
};
|
| 84 |
+
|
| 85 |
+
const handleMouseMove = (e) => {
|
| 86 |
+
if (!isDragging || !containerRef.current) return;
|
| 87 |
+
|
| 88 |
+
const containerRect = containerRef.current.getBoundingClientRect();
|
| 89 |
+
const newLeftWidth = ((e.clientX - containerRect.left) / containerRect.width) * 100;
|
| 90 |
+
|
| 91 |
+
// Constrain between 20% and 80%
|
| 92 |
+
if (newLeftWidth >= 20 && newLeftWidth <= 80) {
|
| 93 |
+
setLeftPanelWidth(newLeftWidth);
|
| 94 |
+
}
|
| 95 |
+
};
|
| 96 |
+
|
| 97 |
+
const handleMouseUp = () => {
|
| 98 |
+
setIsDragging(false);
|
| 99 |
+
};
|
| 100 |
+
|
| 101 |
+
useEffect(() => {
|
| 102 |
+
if (isDragging) {
|
| 103 |
+
document.addEventListener('mousemove', handleMouseMove);
|
| 104 |
+
document.addEventListener('mouseup', handleMouseUp);
|
| 105 |
+
return () => {
|
| 106 |
+
document.removeEventListener('mousemove', handleMouseMove);
|
| 107 |
+
document.removeEventListener('mouseup', handleMouseUp);
|
| 108 |
+
};
|
| 109 |
+
}
|
| 110 |
+
}, [isDragging]);
|
| 111 |
+
|
| 112 |
+
// Function to simulate typing animation
|
| 113 |
+
const typeMessage = (text, callback) => {
|
| 114 |
+
// Clear any existing typing animation
|
| 115 |
+
if (typingInterval) {
|
| 116 |
+
clearInterval(typingInterval);
|
| 117 |
+
}
|
| 118 |
+
|
| 119 |
+
setTypingMessage('');
|
| 120 |
+
let currentIndex = 0;
|
| 121 |
+
const typeSpeed = Math.max(1, Math.min(3, 200 / text.length)); // Much faster: max 800ms total
|
| 122 |
+
|
| 123 |
+
const interval = setInterval(() => {
|
| 124 |
+
if (currentIndex < text.length) {
|
| 125 |
+
setTypingMessage(text.slice(0, currentIndex + 1));
|
| 126 |
+
currentIndex++;
|
| 127 |
+
} else {
|
| 128 |
+
clearInterval(interval);
|
| 129 |
+
setTypingInterval(null);
|
| 130 |
+
setTypingMessage('');
|
| 131 |
+
callback();
|
| 132 |
+
}
|
| 133 |
+
}, typeSpeed);
|
| 134 |
+
|
| 135 |
+
setTypingInterval(interval);
|
| 136 |
+
};
|
| 137 |
+
|
| 138 |
+
// Function to start a chunk lesson
|
| 139 |
+
const startChunkLesson = async (chunkIndex) => {
|
| 140 |
+
if (!documentData || !documentData.chunks[chunkIndex]) return;
|
| 141 |
+
|
| 142 |
+
setChatLoading(true);
|
| 143 |
+
|
| 144 |
+
try {
|
| 145 |
+
const chunk = documentData.chunks[chunkIndex];
|
| 146 |
+
console.log('Starting lesson for chunk:', chunkIndex, chunk);
|
| 147 |
+
console.log('Document data:', documentData.fileId, documentData.markdown?.length);
|
| 148 |
+
|
| 149 |
+
const response = await fetch(`/start_chunk_lesson/${documentData.fileId}/${chunkIndex}`, {
|
| 150 |
+
method: 'POST',
|
| 151 |
+
headers: {
|
| 152 |
+
'Content-Type': 'application/json',
|
| 153 |
+
},
|
| 154 |
+
body: JSON.stringify({
|
| 155 |
+
chunk: chunk,
|
| 156 |
+
document_markdown: documentData.markdown
|
| 157 |
+
})
|
| 158 |
+
});
|
| 159 |
+
|
| 160 |
+
if (!response.ok) {
|
| 161 |
+
const errorData = await response.text();
|
| 162 |
+
console.error('Backend error:', errorData);
|
| 163 |
+
throw new Error(`Failed to start lesson: ${response.status} - ${errorData}`);
|
| 164 |
+
}
|
| 165 |
+
|
| 166 |
+
const lessonData = await response.json();
|
| 167 |
+
setChatData(prev => ({
|
| 168 |
+
...prev,
|
| 169 |
+
[chunkIndex]: {
|
| 170 |
+
...lessonData,
|
| 171 |
+
chunkIndex: chunkIndex,
|
| 172 |
+
chunk: chunk
|
| 173 |
+
}
|
| 174 |
+
}));
|
| 175 |
+
|
| 176 |
+
setChatLoading(false);
|
| 177 |
+
|
| 178 |
+
// Type out the message with animation
|
| 179 |
+
typeMessage(lessonData.questions, () => {
|
| 180 |
+
setChatMessages(prev => ({
|
| 181 |
+
...prev,
|
| 182 |
+
[chunkIndex]: [
|
| 183 |
+
{ type: 'ai', text: lessonData.questions }
|
| 184 |
+
]
|
| 185 |
+
}));
|
| 186 |
+
});
|
| 187 |
+
|
| 188 |
+
} catch (error) {
|
| 189 |
+
console.error('Error starting lesson:', error);
|
| 190 |
+
alert('Error starting lesson: ' + error.message);
|
| 191 |
+
setChatLoading(false);
|
| 192 |
+
}
|
| 193 |
+
};
|
| 194 |
+
|
| 195 |
+
// Navigation functions
|
| 196 |
+
const goToNextChunk = () => {
|
| 197 |
+
if (documentData && currentChunkIndex < documentData.chunks.length - 1) {
|
| 198 |
+
// Clear any ongoing typing animation
|
| 199 |
+
if (typingInterval) {
|
| 200 |
+
clearInterval(typingInterval);
|
| 201 |
+
setTypingInterval(null);
|
| 202 |
+
}
|
| 203 |
+
setTypingMessage('');
|
| 204 |
+
setCurrentChunkIndex(currentChunkIndex + 1);
|
| 205 |
+
}
|
| 206 |
+
};
|
| 207 |
+
|
| 208 |
+
const goToPrevChunk = () => {
|
| 209 |
+
if (currentChunkIndex > 0) {
|
| 210 |
+
// Clear any ongoing typing animation
|
| 211 |
+
if (typingInterval) {
|
| 212 |
+
clearInterval(typingInterval);
|
| 213 |
+
setTypingInterval(null);
|
| 214 |
+
}
|
| 215 |
+
setTypingMessage('');
|
| 216 |
+
setCurrentChunkIndex(currentChunkIndex - 1);
|
| 217 |
+
}
|
| 218 |
+
};
|
| 219 |
+
|
| 220 |
+
// Chunk action functions
|
| 221 |
+
const skipChunk = () => {
|
| 222 |
+
setChunkStates(prev => ({
|
| 223 |
+
...prev,
|
| 224 |
+
[currentChunkIndex]: 'skipped'
|
| 225 |
+
}));
|
| 226 |
+
};
|
| 227 |
+
|
| 228 |
+
const markChunkUnderstood = () => {
|
| 229 |
+
setChunkStates(prev => ({
|
| 230 |
+
...prev,
|
| 231 |
+
[currentChunkIndex]: 'understood'
|
| 232 |
+
}));
|
| 233 |
+
};
|
| 234 |
+
|
| 235 |
+
const startInteractiveLesson = () => {
|
| 236 |
+
setChunkStates(prev => ({
|
| 237 |
+
...prev,
|
| 238 |
+
[currentChunkIndex]: 'interactive'
|
| 239 |
+
}));
|
| 240 |
+
startChunkLesson(currentChunkIndex);
|
| 241 |
+
};
|
| 242 |
+
|
| 243 |
+
const fetchImage = useCallback(async (imageId, fileId) => {
|
| 244 |
+
// Check if image is already cached using ref
|
| 245 |
+
if (imageCacheRef.current[imageId]) {
|
| 246 |
+
return imageCacheRef.current[imageId];
|
| 247 |
+
}
|
| 248 |
+
|
| 249 |
+
try {
|
| 250 |
+
const response = await fetch(`/get_image/${fileId}/${imageId}`);
|
| 251 |
+
if (response.ok) {
|
| 252 |
+
const data = await response.json();
|
| 253 |
+
const imageData = data.image_base64;
|
| 254 |
+
|
| 255 |
+
// Cache the image in ref
|
| 256 |
+
imageCacheRef.current = {
|
| 257 |
+
...imageCacheRef.current,
|
| 258 |
+
[imageId]: imageData
|
| 259 |
+
};
|
| 260 |
+
|
| 261 |
+
// Also update state for other components that might need it
|
| 262 |
+
setImageCache(prev => ({
|
| 263 |
+
...prev,
|
| 264 |
+
[imageId]: imageData
|
| 265 |
+
}));
|
| 266 |
+
|
| 267 |
+
return imageData;
|
| 268 |
+
}
|
| 269 |
+
} catch (error) {
|
| 270 |
+
console.error('Error fetching image:', error);
|
| 271 |
+
}
|
| 272 |
+
return null;
|
| 273 |
+
}, []); // No dependencies - stable function
|
| 274 |
+
|
| 275 |
+
const ImageComponent = memo(({ src, alt }) => {
|
| 276 |
+
const [imageSrc, setImageSrc] = useState(null);
|
| 277 |
+
const [loading, setLoading] = useState(true);
|
| 278 |
+
|
| 279 |
+
useEffect(() => {
|
| 280 |
+
if (documentData && src) {
|
| 281 |
+
fetchImage(src, documentData.fileId).then(imageData => {
|
| 282 |
+
if (imageData) {
|
| 283 |
+
setImageSrc(imageData);
|
| 284 |
+
}
|
| 285 |
+
setLoading(false);
|
| 286 |
+
});
|
| 287 |
+
}
|
| 288 |
+
}, [src, documentData?.fileId, fetchImage]);
|
| 289 |
+
|
| 290 |
+
if (loading) {
|
| 291 |
+
return (
|
| 292 |
+
<span style={{
|
| 293 |
+
display: 'inline-block',
|
| 294 |
+
width: '100%',
|
| 295 |
+
height: '200px',
|
| 296 |
+
backgroundColor: '#f3f4f6',
|
| 297 |
+
textAlign: 'center',
|
| 298 |
+
lineHeight: '200px',
|
| 299 |
+
margin: '1rem 0',
|
| 300 |
+
borderRadius: '0.5rem',
|
| 301 |
+
color: '#6b7280'
|
| 302 |
+
}}>
|
| 303 |
+
Loading image...
|
| 304 |
+
</span>
|
| 305 |
+
);
|
| 306 |
+
}
|
| 307 |
+
|
| 308 |
+
if (!imageSrc) {
|
| 309 |
+
return (
|
| 310 |
+
<span style={{
|
| 311 |
+
display: 'inline-block',
|
| 312 |
+
width: '100%',
|
| 313 |
+
height: '200px',
|
| 314 |
+
backgroundColor: '#fef2f2',
|
| 315 |
+
textAlign: 'center',
|
| 316 |
+
lineHeight: '200px',
|
| 317 |
+
margin: '1rem 0',
|
| 318 |
+
borderRadius: '0.5rem',
|
| 319 |
+
border: '1px solid #fecaca',
|
| 320 |
+
color: '#dc2626'
|
| 321 |
+
}}>
|
| 322 |
+
Image not found: {alt || src}
|
| 323 |
+
</span>
|
| 324 |
+
);
|
| 325 |
+
}
|
| 326 |
+
|
| 327 |
+
return (
|
| 328 |
+
<img
|
| 329 |
+
src={imageSrc}
|
| 330 |
+
alt={alt || 'Document image'}
|
| 331 |
+
style={{
|
| 332 |
+
display: 'block',
|
| 333 |
+
maxWidth: '100%',
|
| 334 |
+
height: 'auto',
|
| 335 |
+
margin: '1.5rem auto'
|
| 336 |
+
}}
|
| 337 |
+
/>
|
| 338 |
+
);
|
| 339 |
+
});
|
| 340 |
+
|
| 341 |
+
|
| 342 |
+
|
| 343 |
+
const processDocument = async () => {
|
| 344 |
+
if (!selectedFile) return;
|
| 345 |
+
|
| 346 |
+
setProcessing(true);
|
| 347 |
+
setUploadProgress(0);
|
| 348 |
+
setOcrProgress(0);
|
| 349 |
+
|
| 350 |
+
try {
|
| 351 |
+
// Step 1: Upload PDF
|
| 352 |
+
const formData = new FormData();
|
| 353 |
+
formData.append('file', selectedFile);
|
| 354 |
+
|
| 355 |
+
setUploadProgress(30);
|
| 356 |
+
const uploadResponse = await fetch('/upload_pdf', {
|
| 357 |
+
method: 'POST',
|
| 358 |
+
body: formData,
|
| 359 |
+
});
|
| 360 |
+
|
| 361 |
+
if (!uploadResponse.ok) {
|
| 362 |
+
throw new Error('Failed to upload PDF');
|
| 363 |
+
}
|
| 364 |
+
|
| 365 |
+
const uploadData = await uploadResponse.json();
|
| 366 |
+
setUploadProgress(100);
|
| 367 |
+
|
| 368 |
+
// Step 2: Process OCR
|
| 369 |
+
setOcrProgress(20);
|
| 370 |
+
await new Promise(resolve => setTimeout(resolve, 500)); // Small delay for UX
|
| 371 |
+
|
| 372 |
+
setOcrProgress(60);
|
| 373 |
+
const ocrResponse = await fetch(`/process_ocr/${uploadData.file_id}`);
|
| 374 |
+
|
| 375 |
+
if (!ocrResponse.ok) {
|
| 376 |
+
throw new Error('Failed to process OCR');
|
| 377 |
+
}
|
| 378 |
+
|
| 379 |
+
const ocrData = await ocrResponse.json();
|
| 380 |
+
setOcrProgress(100);
|
| 381 |
+
|
| 382 |
+
// Combine all markdown from pages
|
| 383 |
+
const combinedMarkdown = ocrData.pages
|
| 384 |
+
.map(page => page.markdown)
|
| 385 |
+
.join('\n\n---\n\n');
|
| 386 |
+
|
| 387 |
+
// Collect all chunks from all pages
|
| 388 |
+
const allChunks = [];
|
| 389 |
+
let markdownOffset = 0;
|
| 390 |
+
|
| 391 |
+
ocrData.pages.forEach((page, pageIndex) => {
|
| 392 |
+
if (page.chunks && page.chunks.length > 0) {
|
| 393 |
+
page.chunks.forEach(chunk => {
|
| 394 |
+
allChunks.push({
|
| 395 |
+
...chunk,
|
| 396 |
+
start_position: chunk.start_position + markdownOffset,
|
| 397 |
+
end_position: chunk.end_position + markdownOffset,
|
| 398 |
+
pageIndex: pageIndex
|
| 399 |
+
});
|
| 400 |
+
});
|
| 401 |
+
}
|
| 402 |
+
markdownOffset += page.markdown.length + 6; // +6 for the separator "\n\n---\n\n"
|
| 403 |
+
});
|
| 404 |
+
|
| 405 |
+
setDocumentData({
|
| 406 |
+
fileId: uploadData.file_id,
|
| 407 |
+
filename: uploadData.filename,
|
| 408 |
+
markdown: combinedMarkdown,
|
| 409 |
+
pages: ocrData.pages,
|
| 410 |
+
totalPages: ocrData.total_pages,
|
| 411 |
+
chunks: allChunks
|
| 412 |
+
});
|
| 413 |
+
|
| 414 |
+
} catch (error) {
|
| 415 |
+
console.error('Error processing document:', error);
|
| 416 |
+
alert('Error processing document: ' + error.message);
|
| 417 |
+
} finally {
|
| 418 |
+
setProcessing(false);
|
| 419 |
+
}
|
| 420 |
+
};
|
| 421 |
+
|
| 422 |
+
const LoadingAnimation = () => (
|
| 423 |
+
<div className="flex flex-col items-center justify-center min-h-screen bg-gray-50">
|
| 424 |
+
<div className="text-center max-w-md">
|
| 425 |
+
<div className="mb-8">
|
| 426 |
+
<div className="w-16 h-16 border-4 border-blue-500 border-t-transparent rounded-full animate-spin mx-auto mb-4"></div>
|
| 427 |
+
<h2 className="text-2xl font-bold text-gray-900 mb-2">Processing Your Document</h2>
|
| 428 |
+
<p className="text-gray-600">This may take a moment...</p>
|
| 429 |
+
</div>
|
| 430 |
+
|
| 431 |
+
{/* Upload Progress */}
|
| 432 |
+
<div className="mb-6">
|
| 433 |
+
<div className="flex justify-between text-sm text-gray-600 mb-1">
|
| 434 |
+
<span>Uploading PDF</span>
|
| 435 |
+
<span>{uploadProgress}%</span>
|
| 436 |
+
</div>
|
| 437 |
+
<div className="w-full bg-gray-200 rounded-full h-2">
|
| 438 |
+
<div
|
| 439 |
+
className="bg-blue-500 h-2 rounded-full transition-all duration-300"
|
| 440 |
+
style={{ width: `${uploadProgress}%` }}
|
| 441 |
+
></div>
|
| 442 |
+
</div>
|
| 443 |
+
</div>
|
| 444 |
+
|
| 445 |
+
{/* OCR Progress */}
|
| 446 |
+
<div className="mb-6">
|
| 447 |
+
<div className="flex justify-between text-sm text-gray-600 mb-1">
|
| 448 |
+
<span>Processing with AI</span>
|
| 449 |
+
<span>{ocrProgress}%</span>
|
| 450 |
+
</div>
|
| 451 |
+
<div className="w-full bg-gray-200 rounded-full h-2">
|
| 452 |
+
<div
|
| 453 |
+
className="bg-green-500 h-2 rounded-full transition-all duration-300"
|
| 454 |
+
style={{ width: `${ocrProgress}%` }}
|
| 455 |
+
></div>
|
| 456 |
+
</div>
|
| 457 |
+
</div>
|
| 458 |
+
|
| 459 |
+
<p className="text-sm text-gray-500">
|
| 460 |
+
Using AI to extract text and understand your document structure...
|
| 461 |
+
</p>
|
| 462 |
+
</div>
|
| 463 |
+
</div>
|
| 464 |
+
);
|
| 465 |
+
|
| 466 |
+
|
| 467 |
+
if (!selectedFile) {
|
| 468 |
+
return (
|
| 469 |
+
<div className="h-screen bg-gray-50 flex items-center justify-center">
|
| 470 |
+
<input
|
| 471 |
+
ref={fileInputRef}
|
| 472 |
+
type="file"
|
| 473 |
+
accept=".pdf"
|
| 474 |
+
className="hidden"
|
| 475 |
+
onChange={handleFileChange}
|
| 476 |
+
/>
|
| 477 |
+
<button
|
| 478 |
+
onClick={() => fileInputRef.current.click()}
|
| 479 |
+
className="px-6 py-3 bg-white shadow-md hover:shadow-lg text-gray-700 font-medium rounded-lg transition-all"
|
| 480 |
+
>
|
| 481 |
+
Select PDF
|
| 482 |
+
</button>
|
| 483 |
+
</div>
|
| 484 |
+
);
|
| 485 |
+
}
|
| 486 |
+
|
| 487 |
+
if (processing) {
|
| 488 |
+
return <LoadingAnimation />;
|
| 489 |
+
}
|
| 490 |
+
|
| 491 |
+
if (!documentData) {
|
| 492 |
+
return (
|
| 493 |
+
<div className="h-screen bg-gray-50 flex items-center justify-center">
|
| 494 |
+
<div className="flex gap-4">
|
| 495 |
+
<button
|
| 496 |
+
onClick={processDocument}
|
| 497 |
+
className="px-6 py-3 bg-white shadow-md hover:shadow-lg text-gray-700 font-medium rounded-lg transition-all"
|
| 498 |
+
>
|
| 499 |
+
Process
|
| 500 |
+
</button>
|
| 501 |
+
<button
|
| 502 |
+
onClick={() => setSelectedFile(null)}
|
| 503 |
+
className="px-6 py-3 bg-white shadow-md hover:shadow-lg text-gray-700 font-medium rounded-lg transition-all"
|
| 504 |
+
>
|
| 505 |
+
← Back
|
| 506 |
+
</button>
|
| 507 |
+
</div>
|
| 508 |
+
</div>
|
| 509 |
+
);
|
| 510 |
+
}
|
| 511 |
+
|
| 512 |
+
return (
|
| 513 |
+
<div
|
| 514 |
+
ref={containerRef}
|
| 515 |
+
className="h-screen bg-gray-100 flex gap-2 p-6 overflow-hidden"
|
| 516 |
+
style={{ cursor: isDragging ? 'col-resize' : 'default' }}
|
| 517 |
+
>
|
| 518 |
+
{/* Left Panel - Document */}
|
| 519 |
+
<div
|
| 520 |
+
className="bg-white rounded-lg shadow-sm flex flex-col"
|
| 521 |
+
style={{ width: `${leftPanelWidth}%` }}
|
| 522 |
+
>
|
| 523 |
+
{/* Header */}
|
| 524 |
+
<div className="sticky top-0 bg-white rounded-t-lg px-6 py-4 border-b border-gray-200 z-10">
|
| 525 |
+
<h2 className="text-lg font-semibold text-left text-gray-800">Document</h2>
|
| 526 |
+
</div>
|
| 527 |
+
|
| 528 |
+
{/* Content */}
|
| 529 |
+
<div className="flex-1 px-6 pt-6 pb-8 overflow-y-auto">
|
| 530 |
+
<style>
|
| 531 |
+
{`
|
| 532 |
+
@keyframes fadeInHighlight {
|
| 533 |
+
0% {
|
| 534 |
+
background-color: rgba(255, 214, 100, 0);
|
| 535 |
+
border-left-color: rgba(156, 163, 175, 0);
|
| 536 |
+
transform: translateX(-10px);
|
| 537 |
+
opacity: 0;
|
| 538 |
+
}
|
| 539 |
+
100% {
|
| 540 |
+
background-color: rgba(255, 214, 100, 0.15);
|
| 541 |
+
border-left-color: rgba(156, 163, 175, 0.5);
|
| 542 |
+
transform: translateX(0);
|
| 543 |
+
opacity: 1;
|
| 544 |
+
}
|
| 545 |
+
}
|
| 546 |
+
`}
|
| 547 |
+
</style>
|
| 548 |
+
<div className="prose prose-sm max-w-none" style={{
|
| 549 |
+
fontSize: '0.875rem',
|
| 550 |
+
lineHeight: '1.5',
|
| 551 |
+
color: 'rgb(55, 65, 81)'
|
| 552 |
+
}}>
|
| 553 |
+
<ReactMarkdown
|
| 554 |
+
remarkPlugins={[remarkMath]}
|
| 555 |
+
rehypePlugins={[rehypeRaw, rehypeKatex]}
|
| 556 |
+
components={{
|
| 557 |
+
h1: ({ children }) => <h1 style={{ fontSize: '1.5rem', fontWeight: 'bold', marginBottom: '1rem', color: '#1a202c' }}>{children}</h1>,
|
| 558 |
+
h2: ({ children }) => <h2 style={{ fontSize: '1.25rem', fontWeight: 'bold', marginBottom: '0.75rem', marginTop: '1.5rem', color: '#1a202c' }}>{children}</h2>,
|
| 559 |
+
h3: ({ children }) => <h3 style={{ fontSize: '1.125rem', fontWeight: 'bold', marginBottom: '0.5rem', marginTop: '1rem', color: '#1a202c' }}>{children}</h3>,
|
| 560 |
+
p: ({ children }) => <p style={{ marginBottom: '0.75rem', color: '#374151', lineHeight: '1.5', fontSize: '0.875rem' }}>{children}</p>,
|
| 561 |
+
hr: () => <hr style={{ margin: '1.5rem 0', borderColor: '#d1d5db' }} />,
|
| 562 |
+
ul: ({ children }) => <ul style={{ marginBottom: '0.75rem', marginLeft: '1.25rem', listStyleType: 'disc', fontSize: '0.875rem' }}>{children}</ul>,
|
| 563 |
+
ol: ({ children }) => <ol style={{ marginBottom: '0.75rem', marginLeft: '1.25rem', listStyleType: 'decimal', fontSize: '0.875rem' }}>{children}</ol>,
|
| 564 |
+
li: ({ children }) => <li style={{ marginBottom: '0.125rem', color: '#374151' }}>{children}</li>,
|
| 565 |
+
blockquote: ({ children }) => (
|
| 566 |
+
<blockquote style={{ borderLeft: '3px solid #3b82f6', paddingLeft: '0.75rem', fontStyle: 'italic', margin: '0.75rem 0', color: '#6b7280', fontSize: '0.875rem' }}>
|
| 567 |
+
{children}
|
| 568 |
+
</blockquote>
|
| 569 |
+
),
|
| 570 |
+
code: ({ inline, children }) =>
|
| 571 |
+
inline ?
|
| 572 |
+
<code style={{ backgroundColor: '#f3f4f6', padding: '0.125rem 0.25rem', borderRadius: '0.25rem', fontSize: '0.75rem', fontFamily: 'monospace' }}>{children}</code> :
|
| 573 |
+
<pre style={{ backgroundColor: '#f3f4f6', padding: '0.75rem', borderRadius: '0.375rem', overflowX: 'auto', margin: '0.75rem 0' }}>
|
| 574 |
+
<code style={{ fontSize: '0.75rem', fontFamily: 'monospace' }}>{children}</code>
|
| 575 |
+
</pre>,
|
| 576 |
+
div: ({ children, style }) => (
|
| 577 |
+
<div style={style}>
|
| 578 |
+
{children}
|
| 579 |
+
</div>
|
| 580 |
+
),
|
| 581 |
+
img: ({ src, alt }) => <ImageComponent src={src} alt={alt} />
|
| 582 |
+
}}
|
| 583 |
+
>
|
| 584 |
+
{highlightedMarkdown}
|
| 585 |
+
</ReactMarkdown>
|
| 586 |
+
</div>
|
| 587 |
+
</div>
|
| 588 |
+
</div>
|
| 589 |
+
|
| 590 |
+
{/* Resizable Divider */}
|
| 591 |
+
<div
|
| 592 |
+
className="flex items-center justify-center cursor-col-resize group transition-all duration-200"
|
| 593 |
+
style={{ width: '8px' }}
|
| 594 |
+
onMouseDown={handleMouseDown}
|
| 595 |
+
>
|
| 596 |
+
{/* Resizable Divider */}
|
| 597 |
+
<div
|
| 598 |
+
className="w-px h-full rounded-full transition-all
|
| 599 |
+
duration-200 group-hover:shadow-lg"
|
| 600 |
+
style={{
|
| 601 |
+
backgroundColor: isDragging ? 'rgba(59, 130, 246, 0.8)' : 'transparent',
|
| 602 |
+
boxShadow: isDragging ? '0 0 8px rgba(59, 130, 246, 0.8)' : 'none'
|
| 603 |
+
}}
|
| 604 |
+
></div>
|
| 605 |
+
</div>
|
| 606 |
+
|
| 607 |
+
{/* Right Panel Container */}
|
| 608 |
+
<div
|
| 609 |
+
className="flex flex-col"
|
| 610 |
+
style={{ width: `${100 - leftPanelWidth}%` }}
|
| 611 |
+
>
|
| 612 |
+
{/* Navigation Bar - Above chunk panel */}
|
| 613 |
+
<div className="flex items-center justify-center gap-4 mb-4 px-4">
|
| 614 |
+
<button
|
| 615 |
+
onClick={goToPrevChunk}
|
| 616 |
+
disabled={currentChunkIndex === 0}
|
| 617 |
+
className="p-3 bg-white hover:bg-gray-50 disabled:opacity-30 disabled:cursor-not-allowed rounded-lg shadow-sm transition-all"
|
| 618 |
+
>
|
| 619 |
+
<svg className="w-5 h-5 text-gray-700" fill="none" stroke="currentColor" viewBox="0 0 24 24" strokeWidth={3}>
|
| 620 |
+
<path strokeLinecap="round" strokeLinejoin="round" d="M15 19l-7-7 7-7" />
|
| 621 |
+
</svg>
|
| 622 |
+
</button>
|
| 623 |
+
|
| 624 |
+
<div className="flex space-x-2">
|
| 625 |
+
{documentData?.chunks?.map((_, index) => (
|
| 626 |
+
<div
|
| 627 |
+
key={index}
|
| 628 |
+
className={`w-3 h-3 rounded-full ${
|
| 629 |
+
chunkStates[index] === 'understood' ? 'bg-green-500' :
|
| 630 |
+
chunkStates[index] === 'skipped' ? 'bg-red-500' :
|
| 631 |
+
chunkStates[index] === 'interactive' ? 'bg-blue-500' :
|
| 632 |
+
index === currentChunkIndex ? 'bg-gray-600' : 'bg-gray-300'
|
| 633 |
+
}`}
|
| 634 |
+
/>
|
| 635 |
+
))}
|
| 636 |
+
</div>
|
| 637 |
+
|
| 638 |
+
<button
|
| 639 |
+
onClick={goToNextChunk}
|
| 640 |
+
disabled={!documentData?.chunks || currentChunkIndex === documentData.chunks.length - 1}
|
| 641 |
+
className="p-3 bg-white hover:bg-gray-50 disabled:opacity-30 disabled:cursor-not-allowed rounded-lg shadow-sm transition-all"
|
| 642 |
+
>
|
| 643 |
+
<svg className="w-5 h-5 text-gray-700" fill="none" stroke="currentColor" viewBox="0 0 24 24" strokeWidth={3}>
|
| 644 |
+
<path strokeLinecap="round" strokeLinejoin="round" d="M9 5l7 7-7 7" />
|
| 645 |
+
</svg>
|
| 646 |
+
</button>
|
| 647 |
+
</div>
|
| 648 |
+
|
| 649 |
+
{/* Chunk Panel */}
|
| 650 |
+
{/* Chunk Header - Left aligned title only */}
|
| 651 |
+
<div className="px-6 py-4 flex-shrink-0 bg-white rounded-t-lg border-b border-gray-200 z-10">
|
| 652 |
+
<div className="flex items-center justify-between">
|
| 653 |
+
<button
|
| 654 |
+
onClick={() => setChunkExpanded(!chunkExpanded)}
|
| 655 |
+
className="flex items-center hover:bg-gray-50 py-2 px-3 rounded-lg transition-all -ml-3"
|
| 656 |
+
>
|
| 657 |
+
<span className="font-semibold text-gray-900 text-left">
|
| 658 |
+
{documentData?.chunks?.[currentChunkIndex]?.topic || "Loading..."}
|
| 659 |
+
</span>
|
| 660 |
+
<span className="text-gray-400 ml-3">
|
| 661 |
+
{chunkExpanded ? '▲' : '▼'}
|
| 662 |
+
</span>
|
| 663 |
+
</button>
|
| 664 |
+
|
| 665 |
+
<button
|
| 666 |
+
onClick={markChunkUnderstood}
|
| 667 |
+
className="py-2 px-4 bg-gray-50 hover:bg-gray-100 text-gray-600 rounded-lg transition-all text-sm"
|
| 668 |
+
>
|
| 669 |
+
✓
|
| 670 |
+
</button>
|
| 671 |
+
</div>
|
| 672 |
+
|
| 673 |
+
{/* Expandable Chunk Content - in header area */}
|
| 674 |
+
{chunkExpanded && documentData?.chunks?.[currentChunkIndex] && (
|
| 675 |
+
<div className="prose prose-sm max-w-none">
|
| 676 |
+
<ReactMarkdown
|
| 677 |
+
remarkPlugins={[remarkMath]}
|
| 678 |
+
rehypePlugins={[rehypeRaw, rehypeKatex]}
|
| 679 |
+
components={{
|
| 680 |
+
h1: ({ children }) => <h1 style={{ fontSize: '1.25rem', fontWeight: 'bold', marginBottom: '0.75rem', color: '#1a202c' }}>{children}</h1>,
|
| 681 |
+
h2: ({ children }) => <h2 style={{ fontSize: '1.125rem', fontWeight: 'bold', marginBottom: '0.5rem', marginTop: '1rem', color: '#1a202c' }}>{children}</h2>,
|
| 682 |
+
h3: ({ children }) => <h3 style={{ fontSize: '1rem', fontWeight: 'bold', marginBottom: '0.5rem', marginTop: '0.75rem', color: '#1a202c' }}>{children}</h3>,
|
| 683 |
+
p: ({ children }) => <p style={{ marginBottom: '0.5rem', color: '#374151', lineHeight: '1.4', fontSize: '0.875rem' }}>{children}</p>,
|
| 684 |
+
hr: () => <hr style={{ margin: '1rem 0', borderColor: '#d1d5db' }} />,
|
| 685 |
+
ul: ({ children }) => <ul style={{ marginBottom: '0.5rem', marginLeft: '1rem', listStyleType: 'disc', fontSize: '0.875rem' }}>{children}</ul>,
|
| 686 |
+
ol: ({ children }) => <ol style={{ marginBottom: '0.5rem', marginLeft: '1rem', listStyleType: 'decimal', fontSize: '0.875rem' }}>{children}</ol>,
|
| 687 |
+
li: ({ children }) => <li style={{ marginBottom: '0.125rem', color: '#374151' }}>{children}</li>,
|
| 688 |
+
blockquote: ({ children }) => (
|
| 689 |
+
<blockquote style={{ borderLeft: '2px solid #9ca3af', paddingLeft: '0.5rem', fontStyle: 'italic', margin: '0.5rem 0', color: '#6b7280', fontSize: '0.875rem' }}>
|
| 690 |
+
{children}
|
| 691 |
+
</blockquote>
|
| 692 |
+
),
|
| 693 |
+
code: ({ inline, children }) =>
|
| 694 |
+
inline ?
|
| 695 |
+
<code style={{ backgroundColor: '#f3f4f6', padding: '0.125rem 0.25rem', borderRadius: '0.25rem', fontSize: '0.75rem', fontFamily: 'monospace' }}>{children}</code> :
|
| 696 |
+
<pre style={{ backgroundColor: '#f3f4f6', padding: '0.5rem', borderRadius: '0.25rem', overflowX: 'auto', margin: '0.5rem 0' }}>
|
| 697 |
+
<code style={{ fontSize: '0.75rem', fontFamily: 'monospace' }}>{children}</code>
|
| 698 |
+
</pre>,
|
| 699 |
+
img: ({ src, alt }) => <ImageComponent src={src} alt={alt} />
|
| 700 |
+
}}
|
| 701 |
+
>
|
| 702 |
+
{documentData.markdown.slice(
|
| 703 |
+
documentData.chunks[currentChunkIndex].start_position,
|
| 704 |
+
documentData.chunks[currentChunkIndex].end_position
|
| 705 |
+
)}
|
| 706 |
+
</ReactMarkdown>
|
| 707 |
+
</div>
|
| 708 |
+
)}
|
| 709 |
+
|
| 710 |
+
|
| 711 |
+
</div>
|
| 712 |
+
|
| 713 |
+
|
| 714 |
+
{/* Content Area */}
|
| 715 |
+
<div className="flex-1 flex flex-col min-h-0">
|
| 716 |
+
{/* Action Buttons */}
|
| 717 |
+
{chunkStates[currentChunkIndex] !== 'interactive' && (
|
| 718 |
+
<div className="flex-shrink-0 p-6 border-b border-gray-200">
|
| 719 |
+
<div className="flex gap-3">
|
| 720 |
+
<button
|
| 721 |
+
onClick={skipChunk}
|
| 722 |
+
className="flex-1 py-3 bg-gray-50 hover:bg-gray-100 text-gray-600 rounded-lg transition-all"
|
| 723 |
+
>
|
| 724 |
+
✕
|
| 725 |
+
</button>
|
| 726 |
+
|
| 727 |
+
<button
|
| 728 |
+
onClick={startInteractiveLesson}
|
| 729 |
+
disabled={chatLoading}
|
| 730 |
+
className="flex-1 py-3 bg-gray-50 hover:bg-gray-100 disabled:opacity-50 text-gray-600 rounded-lg transition-all"
|
| 731 |
+
>
|
| 732 |
+
{chatLoading ? '...' : 'Start'}
|
| 733 |
+
</button>
|
| 734 |
+
|
| 735 |
+
<button
|
| 736 |
+
onClick={markChunkUnderstood}
|
| 737 |
+
className="flex-1 py-3 bg-gray-50 hover:bg-gray-100 text-gray-600 rounded-lg transition-all"
|
| 738 |
+
>
|
| 739 |
+
✓
|
| 740 |
+
</button>
|
| 741 |
+
</div>
|
| 742 |
+
</div>
|
| 743 |
+
)}
|
| 744 |
+
|
| 745 |
+
{/* Chat Area - sandwich layout when interactive */}
|
| 746 |
+
{chunkStates[currentChunkIndex] === 'interactive' && (
|
| 747 |
+
<div className="flex-1 flex flex-col min-h-0">
|
| 748 |
+
{/* Chat Messages - scrollable middle layer */}
|
| 749 |
+
<div className="bg-white flex-1 overflow-y-auto space-y-4 px-6 py-2">
|
| 750 |
+
{(chatMessages[currentChunkIndex] || []).map((message, index) => (
|
| 751 |
+
message.type === 'user' ? (
|
| 752 |
+
<div
|
| 753 |
+
key={index}
|
| 754 |
+
className="w-full bg-gray-50 border border-gray-200 rounded-lg p-4 shadow-sm"
|
| 755 |
+
>
|
| 756 |
+
<div className="text-xs font-medium mb-2 text-gray-600">
|
| 757 |
+
You
|
| 758 |
+
</div>
|
| 759 |
+
<div className="prose prose-sm max-w-none">
|
| 760 |
+
<ReactMarkdown
|
| 761 |
+
remarkPlugins={[remarkMath]}
|
| 762 |
+
rehypePlugins={[rehypeRaw, rehypeKatex]}
|
| 763 |
+
components={{
|
| 764 |
+
p: ({ children }) => <p className="mb-2 text-gray-800 leading-relaxed">{children}</p>,
|
| 765 |
+
ul: ({ children }) => <ul className="mb-2 ml-4 list-disc">{children}</ul>,
|
| 766 |
+
ol: ({ children }) => <ol className="mb-2 ml-4 list-decimal">{children}</ol>,
|
| 767 |
+
li: ({ children }) => <li className="mb-1 text-gray-800">{children}</li>,
|
| 768 |
+
strong: ({ children }) => <strong className="font-semibold text-gray-900">{children}</strong>,
|
| 769 |
+
em: ({ children }) => <em className="italic">{children}</em>,
|
| 770 |
+
code: ({ inline, children }) =>
|
| 771 |
+
inline ?
|
| 772 |
+
<code className="bg-gray-100 px-1 py-0.5 rounded text-sm font-mono">{children}</code> :
|
| 773 |
+
<pre className="bg-gray-100 p-2 rounded overflow-x-auto my-2">
|
| 774 |
+
<code className="text-sm font-mono">{children}</code>
|
| 775 |
+
</pre>,
|
| 776 |
+
blockquote: ({ children }) => (
|
| 777 |
+
<blockquote className="border-l-4 border-blue-200 pl-4 italic text-gray-700 my-2">
|
| 778 |
+
{children}
|
| 779 |
+
</blockquote>
|
| 780 |
+
)
|
| 781 |
+
}}
|
| 782 |
+
>
|
| 783 |
+
{message.text}
|
| 784 |
+
</ReactMarkdown>
|
| 785 |
+
</div>
|
| 786 |
+
</div>
|
| 787 |
+
) : (
|
| 788 |
+
<div key={index} className="w-full py-4">
|
| 789 |
+
<div className="prose prose-sm max-w-none">
|
| 790 |
+
<ReactMarkdown
|
| 791 |
+
remarkPlugins={[remarkMath]}
|
| 792 |
+
rehypePlugins={[rehypeRaw, rehypeKatex]}
|
| 793 |
+
components={{
|
| 794 |
+
p: ({ children }) => <p className="mb-2 text-gray-800 leading-relaxed">{children}</p>,
|
| 795 |
+
ul: ({ children }) => <ul className="mb-2 ml-4 list-disc">{children}</ul>,
|
| 796 |
+
ol: ({ children }) => <ol className="mb-2 ml-4 list-decimal">{children}</ol>,
|
| 797 |
+
li: ({ children }) => <li className="mb-1 text-gray-800">{children}</li>,
|
| 798 |
+
strong: ({ children }) => <strong className="font-semibold text-gray-900">{children}</strong>,
|
| 799 |
+
em: ({ children }) => <em className="italic">{children}</em>,
|
| 800 |
+
code: ({ inline, children }) =>
|
| 801 |
+
inline ?
|
| 802 |
+
<code className="bg-gray-100 px-1 py-0.5 rounded text-sm font-mono">{children}</code> :
|
| 803 |
+
<pre className="bg-gray-100 p-2 rounded overflow-x-auto my-2">
|
| 804 |
+
<code className="text-sm font-mono">{children}</code>
|
| 805 |
+
</pre>,
|
| 806 |
+
blockquote: ({ children }) => (
|
| 807 |
+
<blockquote className="border-l-4 border-blue-200 pl-4 italic text-gray-700 my-2">
|
| 808 |
+
{children}
|
| 809 |
+
</blockquote>
|
| 810 |
+
)
|
| 811 |
+
}}
|
| 812 |
+
>
|
| 813 |
+
{message.text}
|
| 814 |
+
</ReactMarkdown>
|
| 815 |
+
</div>
|
| 816 |
+
</div>
|
| 817 |
+
)
|
| 818 |
+
))}
|
| 819 |
+
|
| 820 |
+
{/* Typing animation message */}
|
| 821 |
+
{typingMessage && (
|
| 822 |
+
<div className="w-full py-4">
|
| 823 |
+
<div className="prose prose-sm max-w-none">
|
| 824 |
+
<ReactMarkdown
|
| 825 |
+
remarkPlugins={[remarkMath]}
|
| 826 |
+
rehypePlugins={[rehypeRaw, rehypeKatex]}
|
| 827 |
+
components={{
|
| 828 |
+
p: ({ children }) => <p className="mb-2 text-gray-800 leading-relaxed">{children}</p>,
|
| 829 |
+
ul: ({ children }) => <ul className="mb-2 ml-4 list-disc">{children}</ul>,
|
| 830 |
+
ol: ({ children }) => <ol className="mb-2 ml-4 list-decimal">{children}</ol>,
|
| 831 |
+
li: ({ children }) => <li className="mb-1 text-gray-800">{children}</li>,
|
| 832 |
+
strong: ({ children }) => <strong className="font-semibold text-gray-900">{children}</strong>,
|
| 833 |
+
em: ({ children }) => <em className="italic">{children}</em>,
|
| 834 |
+
code: ({ inline, children }) =>
|
| 835 |
+
inline ?
|
| 836 |
+
<code className="bg-gray-100 px-1 py-0.5 rounded text-sm font-mono">{children}</code> :
|
| 837 |
+
<pre className="bg-gray-100 p-2 rounded overflow-x-auto my-2">
|
| 838 |
+
<code className="text-sm font-mono">{children}</code>
|
| 839 |
+
</pre>,
|
| 840 |
+
blockquote: ({ children }) => (
|
| 841 |
+
<blockquote className="border-l-4 border-blue-200 pl-4 italic text-gray-700 my-2">
|
| 842 |
+
{children}
|
| 843 |
+
</blockquote>
|
| 844 |
+
)
|
| 845 |
+
}}
|
| 846 |
+
>
|
| 847 |
+
{typingMessage}
|
| 848 |
+
</ReactMarkdown>
|
| 849 |
+
</div>
|
| 850 |
+
</div>
|
| 851 |
+
)}
|
| 852 |
+
|
| 853 |
+
{/* Loading dots */}
|
| 854 |
+
{chatLoading && (
|
| 855 |
+
<div className="w-full py-4">
|
| 856 |
+
<div className="flex space-x-1">
|
| 857 |
+
<div className="w-2 h-2 bg-gray-400 rounded-full animate-bounce"></div>
|
| 858 |
+
<div className="w-2 h-2 bg-gray-400 rounded-full animate-bounce" style={{animationDelay: '0.1s'}}></div>
|
| 859 |
+
<div className="w-2 h-2 bg-gray-400 rounded-full animate-bounce" style={{animationDelay: '0.2s'}}></div>
|
| 860 |
+
</div>
|
| 861 |
+
</div>
|
| 862 |
+
)}
|
| 863 |
+
</div>
|
| 864 |
+
|
| 865 |
+
{/* Chat Input - sticky at bottom */}
|
| 866 |
+
<div className="flex-shrink-0 bg-white border-t border-gray-200 p-6">
|
| 867 |
+
<div className="flex gap-2 mb-3">
|
| 868 |
+
<input
|
| 869 |
+
type="text"
|
| 870 |
+
value={userInput}
|
| 871 |
+
onChange={(e) => setUserInput(e.target.value)}
|
| 872 |
+
placeholder="Type your response..."
|
| 873 |
+
className="flex-1 px-3 py-2 border border-gray-200 rounded-lg text-sm focus:outline-none focus:ring-1 focus:ring-gray-300"
|
| 874 |
+
/>
|
| 875 |
+
<button className="px-4 py-2 bg-gray-50 hover:bg-gray-100 text-gray-600 rounded-lg transition-all">
|
| 876 |
+
→
|
| 877 |
+
</button>
|
| 878 |
+
</div>
|
| 879 |
+
|
| 880 |
+
</div>
|
| 881 |
+
</div>
|
| 882 |
+
)}
|
| 883 |
+
</div>
|
| 884 |
+
</div>
|
| 885 |
+
</div>
|
| 886 |
+
);
|
| 887 |
+
}
|
| 888 |
+
|
| 889 |
+
export default DocumentProcessor;
|
frontend/src/components/DocumentViewer.jsx
ADDED
|
@@ -0,0 +1,53 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import ReactMarkdown from 'react-markdown';
|
| 2 |
+
import remarkMath from 'remark-math';
|
| 3 |
+
import rehypeKatex from 'rehype-katex';
|
| 4 |
+
import rehypeRaw from 'rehype-raw';
|
| 5 |
+
import { getDocumentMarkdownComponents } from '../utils/markdownComponents.jsx';
|
| 6 |
+
|
| 7 |
+
const DocumentViewer = ({ highlightedMarkdown, documentData, fetchImage, imageCache, setImageCache }) => {
|
| 8 |
+
const markdownComponents = getDocumentMarkdownComponents(documentData, fetchImage, imageCache, setImageCache);
|
| 9 |
+
|
| 10 |
+
return (
|
| 11 |
+
<div className="bg-white rounded-lg shadow-sm flex flex-col" style={{ width: '100%', height: '100%' }}>
|
| 12 |
+
<div className="sticky top-0 bg-white rounded-t-lg px-6 py-4 border-b border-gray-200 z-10">
|
| 13 |
+
<h2 className="text-lg font-semibold text-left text-gray-800">Document</h2>
|
| 14 |
+
</div>
|
| 15 |
+
|
| 16 |
+
<div className="flex-1 px-6 pt-6 pb-8 overflow-y-auto">
|
| 17 |
+
<style>
|
| 18 |
+
{`
|
| 19 |
+
@keyframes fadeInHighlight {
|
| 20 |
+
0% {
|
| 21 |
+
background-color: rgba(255, 214, 100, 0);
|
| 22 |
+
border-left-color: rgba(156, 163, 175, 0);
|
| 23 |
+
transform: translateX(-10px);
|
| 24 |
+
opacity: 0;
|
| 25 |
+
}
|
| 26 |
+
100% {
|
| 27 |
+
background-color: rgba(255, 214, 100, 0.15);
|
| 28 |
+
border-left-color: rgba(156, 163, 175, 0.5);
|
| 29 |
+
transform: translateX(0);
|
| 30 |
+
opacity: 1;
|
| 31 |
+
}
|
| 32 |
+
}
|
| 33 |
+
`}
|
| 34 |
+
</style>
|
| 35 |
+
<div className="prose prose-sm max-w-none" style={{
|
| 36 |
+
fontSize: '0.875rem',
|
| 37 |
+
lineHeight: '1.5',
|
| 38 |
+
color: 'rgb(55, 65, 81)'
|
| 39 |
+
}}>
|
| 40 |
+
<ReactMarkdown
|
| 41 |
+
remarkPlugins={[remarkMath]}
|
| 42 |
+
rehypePlugins={[rehypeRaw, rehypeKatex]}
|
| 43 |
+
components={markdownComponents}
|
| 44 |
+
>
|
| 45 |
+
{highlightedMarkdown}
|
| 46 |
+
</ReactMarkdown>
|
| 47 |
+
</div>
|
| 48 |
+
</div>
|
| 49 |
+
</div>
|
| 50 |
+
);
|
| 51 |
+
};
|
| 52 |
+
|
| 53 |
+
export default DocumentViewer;
|
frontend/src/components/ImageComponent.jsx
ADDED
|
@@ -0,0 +1,115 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import { useState, useEffect, memo } from 'react';
|
| 2 |
+
|
| 3 |
+
/**
|
| 4 |
+
* ImageComponent - Handles loading and displaying images from the backend
|
| 5 |
+
*
|
| 6 |
+
* Props:
|
| 7 |
+
* - src: The image ID to fetch
|
| 8 |
+
* - alt: Alt text for the image
|
| 9 |
+
* - fileId: The document file ID (for fetching the image)
|
| 10 |
+
* - imageCache: Object containing cached images
|
| 11 |
+
* - onImageCached: Callback when image is successfully cached
|
| 12 |
+
*/
|
| 13 |
+
const ImageComponent = memo(({ src, alt, fileId, imageCache, onImageCached }) => {
|
| 14 |
+
// Local state for this specific image
|
| 15 |
+
const [imageSrc, setImageSrc] = useState(null);
|
| 16 |
+
const [loading, setLoading] = useState(true);
|
| 17 |
+
|
| 18 |
+
useEffect(() => {
|
| 19 |
+
// Only proceed if we have the required data
|
| 20 |
+
if (!fileId || !src) {
|
| 21 |
+
setLoading(false);
|
| 22 |
+
return;
|
| 23 |
+
}
|
| 24 |
+
|
| 25 |
+
// Check if image is already cached
|
| 26 |
+
if (imageCache && imageCache[src]) {
|
| 27 |
+
setImageSrc(imageCache[src]);
|
| 28 |
+
setLoading(false);
|
| 29 |
+
return;
|
| 30 |
+
}
|
| 31 |
+
|
| 32 |
+
// Fetch the image from backend
|
| 33 |
+
const fetchImage = async () => {
|
| 34 |
+
try {
|
| 35 |
+
const response = await fetch(`/get_image/${fileId}/${src}`);
|
| 36 |
+
if (response.ok) {
|
| 37 |
+
const data = await response.json();
|
| 38 |
+
const imageData = data.image_base64;
|
| 39 |
+
|
| 40 |
+
// Set the image for display
|
| 41 |
+
setImageSrc(imageData);
|
| 42 |
+
|
| 43 |
+
// Notify parent component to cache this image
|
| 44 |
+
if (onImageCached) {
|
| 45 |
+
onImageCached(src, imageData);
|
| 46 |
+
}
|
| 47 |
+
}
|
| 48 |
+
} catch (error) {
|
| 49 |
+
console.error('Error fetching image:', error);
|
| 50 |
+
} finally {
|
| 51 |
+
setLoading(false);
|
| 52 |
+
}
|
| 53 |
+
};
|
| 54 |
+
|
| 55 |
+
fetchImage();
|
| 56 |
+
}, [src, fileId, imageCache, onImageCached]);
|
| 57 |
+
|
| 58 |
+
// Show loading state
|
| 59 |
+
if (loading) {
|
| 60 |
+
return (
|
| 61 |
+
<span style={{
|
| 62 |
+
display: 'inline-block',
|
| 63 |
+
width: '100%',
|
| 64 |
+
height: '200px',
|
| 65 |
+
backgroundColor: '#f3f4f6',
|
| 66 |
+
textAlign: 'center',
|
| 67 |
+
lineHeight: '200px',
|
| 68 |
+
margin: '1rem 0',
|
| 69 |
+
borderRadius: '0.5rem',
|
| 70 |
+
color: '#6b7280'
|
| 71 |
+
}}>
|
| 72 |
+
Loading image...
|
| 73 |
+
</span>
|
| 74 |
+
);
|
| 75 |
+
}
|
| 76 |
+
|
| 77 |
+
// Show error state if image couldn't be loaded
|
| 78 |
+
if (!imageSrc) {
|
| 79 |
+
return (
|
| 80 |
+
<span style={{
|
| 81 |
+
display: 'inline-block',
|
| 82 |
+
width: '100%',
|
| 83 |
+
height: '200px',
|
| 84 |
+
backgroundColor: '#fef2f2',
|
| 85 |
+
textAlign: 'center',
|
| 86 |
+
lineHeight: '200px',
|
| 87 |
+
margin: '1rem 0',
|
| 88 |
+
borderRadius: '0.5rem',
|
| 89 |
+
border: '1px solid #fecaca',
|
| 90 |
+
color: '#dc2626'
|
| 91 |
+
}}>
|
| 92 |
+
Image not found: {alt || src}
|
| 93 |
+
</span>
|
| 94 |
+
);
|
| 95 |
+
}
|
| 96 |
+
|
| 97 |
+
// Render the actual image
|
| 98 |
+
return (
|
| 99 |
+
<img
|
| 100 |
+
src={imageSrc}
|
| 101 |
+
alt={alt || 'Document image'}
|
| 102 |
+
style={{
|
| 103 |
+
display: 'block',
|
| 104 |
+
maxWidth: '100%',
|
| 105 |
+
height: 'auto',
|
| 106 |
+
margin: '1.5rem auto'
|
| 107 |
+
}}
|
| 108 |
+
/>
|
| 109 |
+
);
|
| 110 |
+
});
|
| 111 |
+
|
| 112 |
+
// Set display name for debugging
|
| 113 |
+
ImageComponent.displayName = 'ImageComponent';
|
| 114 |
+
|
| 115 |
+
export default ImageComponent;
|
frontend/src/components/LoadingAnimation.jsx
ADDED
|
@@ -0,0 +1,45 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
const LoadingAnimation = ({ uploadProgress, ocrProgress }) => (
|
| 2 |
+
<div className="flex flex-col items-center justify-center min-h-screen bg-gray-50">
|
| 3 |
+
<div className="text-center max-w-md">
|
| 4 |
+
<div className="mb-8">
|
| 5 |
+
<div className="w-16 h-16 border-4 border-blue-500 border-t-transparent rounded-full animate-spin mx-auto mb-4"></div>
|
| 6 |
+
<h2 className="text-2xl font-bold text-gray-900 mb-2">Processing Your Document</h2>
|
| 7 |
+
<p className="text-gray-600">This may take a moment...</p>
|
| 8 |
+
</div>
|
| 9 |
+
|
| 10 |
+
{/* Upload Progress */}
|
| 11 |
+
<div className="mb-6">
|
| 12 |
+
<div className="flex justify-between text-sm text-gray-600 mb-1">
|
| 13 |
+
<span>Uploading PDF</span>
|
| 14 |
+
<span>{uploadProgress}%</span>
|
| 15 |
+
</div>
|
| 16 |
+
<div className="w-full bg-gray-200 rounded-full h-2">
|
| 17 |
+
<div
|
| 18 |
+
className="bg-blue-500 h-2 rounded-full transition-all duration-300"
|
| 19 |
+
style={{ width: `${uploadProgress}%` }}
|
| 20 |
+
></div>
|
| 21 |
+
</div>
|
| 22 |
+
</div>
|
| 23 |
+
|
| 24 |
+
{/* OCR Progress */}
|
| 25 |
+
<div className="mb-6">
|
| 26 |
+
<div className="flex justify-between text-sm text-gray-600 mb-1">
|
| 27 |
+
<span>Processing with AI</span>
|
| 28 |
+
<span>{ocrProgress}%</span>
|
| 29 |
+
</div>
|
| 30 |
+
<div className="w-full bg-gray-200 rounded-full h-2">
|
| 31 |
+
<div
|
| 32 |
+
className="bg-green-500 h-2 rounded-full transition-all duration-300"
|
| 33 |
+
style={{ width: `${ocrProgress}%` }}
|
| 34 |
+
></div>
|
| 35 |
+
</div>
|
| 36 |
+
</div>
|
| 37 |
+
|
| 38 |
+
<p className="text-sm text-gray-500">
|
| 39 |
+
Using AI to extract text and understand your document structure...
|
| 40 |
+
</p>
|
| 41 |
+
</div>
|
| 42 |
+
</div>
|
| 43 |
+
);
|
| 44 |
+
|
| 45 |
+
export default LoadingAnimation;
|
frontend/src/hooks/useChat.js
ADDED
|
@@ -0,0 +1,109 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import { useState, useRef } from 'react';
|
| 2 |
+
|
| 3 |
+
export const useChat = () => {
|
| 4 |
+
const [chatData, setChatData] = useState({});
|
| 5 |
+
const [chatLoading, setChatLoading] = useState(false);
|
| 6 |
+
const [chatMessages, setChatMessages] = useState({});
|
| 7 |
+
const [userInput, setUserInput] = useState('');
|
| 8 |
+
const [typingMessage, setTypingMessage] = useState('');
|
| 9 |
+
const [typingInterval, setTypingInterval] = useState(null);
|
| 10 |
+
|
| 11 |
+
const typeMessage = (text, callback) => {
|
| 12 |
+
if (typingInterval) {
|
| 13 |
+
clearInterval(typingInterval);
|
| 14 |
+
}
|
| 15 |
+
|
| 16 |
+
setTypingMessage('');
|
| 17 |
+
let currentIndex = 0;
|
| 18 |
+
const typeSpeed = Math.max(1, Math.min(3, 200 / text.length));
|
| 19 |
+
|
| 20 |
+
const interval = setInterval(() => {
|
| 21 |
+
if (currentIndex < text.length) {
|
| 22 |
+
setTypingMessage(text.slice(0, currentIndex + 1));
|
| 23 |
+
currentIndex++;
|
| 24 |
+
} else {
|
| 25 |
+
clearInterval(interval);
|
| 26 |
+
setTypingInterval(null);
|
| 27 |
+
setTypingMessage('');
|
| 28 |
+
callback();
|
| 29 |
+
}
|
| 30 |
+
}, typeSpeed);
|
| 31 |
+
|
| 32 |
+
setTypingInterval(interval);
|
| 33 |
+
};
|
| 34 |
+
|
| 35 |
+
const startChunkLesson = async (chunkIndex, documentData) => {
|
| 36 |
+
if (!documentData || !documentData.chunks[chunkIndex]) return;
|
| 37 |
+
|
| 38 |
+
setChatLoading(true);
|
| 39 |
+
|
| 40 |
+
try {
|
| 41 |
+
const chunk = documentData.chunks[chunkIndex];
|
| 42 |
+
console.log('Starting lesson for chunk:', chunkIndex, chunk);
|
| 43 |
+
console.log('Document data:', documentData.fileId, documentData.markdown?.length);
|
| 44 |
+
|
| 45 |
+
const response = await fetch(`/start_chunk_lesson/${documentData.fileId}/${chunkIndex}`, {
|
| 46 |
+
method: 'POST',
|
| 47 |
+
headers: {
|
| 48 |
+
'Content-Type': 'application/json',
|
| 49 |
+
},
|
| 50 |
+
body: JSON.stringify({
|
| 51 |
+
chunk: chunk,
|
| 52 |
+
document_markdown: documentData.markdown
|
| 53 |
+
})
|
| 54 |
+
});
|
| 55 |
+
|
| 56 |
+
if (!response.ok) {
|
| 57 |
+
const errorData = await response.text();
|
| 58 |
+
console.error('Backend error:', errorData);
|
| 59 |
+
throw new Error(`Failed to start lesson: ${response.status} - ${errorData}`);
|
| 60 |
+
}
|
| 61 |
+
|
| 62 |
+
const lessonData = await response.json();
|
| 63 |
+
setChatData(prev => ({
|
| 64 |
+
...prev,
|
| 65 |
+
[chunkIndex]: {
|
| 66 |
+
...lessonData,
|
| 67 |
+
chunkIndex: chunkIndex,
|
| 68 |
+
chunk: chunk
|
| 69 |
+
}
|
| 70 |
+
}));
|
| 71 |
+
|
| 72 |
+
setChatLoading(false);
|
| 73 |
+
|
| 74 |
+
typeMessage(lessonData.questions, () => {
|
| 75 |
+
setChatMessages(prev => ({
|
| 76 |
+
...prev,
|
| 77 |
+
[chunkIndex]: [
|
| 78 |
+
{ type: 'ai', text: lessonData.questions }
|
| 79 |
+
]
|
| 80 |
+
}));
|
| 81 |
+
});
|
| 82 |
+
|
| 83 |
+
} catch (error) {
|
| 84 |
+
console.error('Error starting lesson:', error);
|
| 85 |
+
alert('Error starting lesson: ' + error.message);
|
| 86 |
+
setChatLoading(false);
|
| 87 |
+
}
|
| 88 |
+
};
|
| 89 |
+
|
| 90 |
+
const clearTypingAnimation = () => {
|
| 91 |
+
if (typingInterval) {
|
| 92 |
+
clearInterval(typingInterval);
|
| 93 |
+
setTypingInterval(null);
|
| 94 |
+
}
|
| 95 |
+
setTypingMessage('');
|
| 96 |
+
};
|
| 97 |
+
|
| 98 |
+
return {
|
| 99 |
+
chatData,
|
| 100 |
+
chatLoading,
|
| 101 |
+
chatMessages,
|
| 102 |
+
userInput,
|
| 103 |
+
typingMessage,
|
| 104 |
+
startChunkLesson,
|
| 105 |
+
clearTypingAnimation,
|
| 106 |
+
setUserInput,
|
| 107 |
+
setChatMessages
|
| 108 |
+
};
|
| 109 |
+
};
|
frontend/src/hooks/useChunkNavigation.js
ADDED
|
@@ -0,0 +1,59 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import { useState } from 'react';
|
| 2 |
+
|
| 3 |
+
export const useChunkNavigation = (documentData, clearTypingAnimation) => {
|
| 4 |
+
const [chunkStates, setChunkStates] = useState({});
|
| 5 |
+
const [currentChunkIndex, setCurrentChunkIndex] = useState(0);
|
| 6 |
+
const [chunkExpanded, setChunkExpanded] = useState(true);
|
| 7 |
+
|
| 8 |
+
const goToNextChunk = () => {
|
| 9 |
+
if (documentData && currentChunkIndex < documentData.chunks.length - 1) {
|
| 10 |
+
if (clearTypingAnimation) {
|
| 11 |
+
clearTypingAnimation();
|
| 12 |
+
}
|
| 13 |
+
setCurrentChunkIndex(currentChunkIndex + 1);
|
| 14 |
+
}
|
| 15 |
+
};
|
| 16 |
+
|
| 17 |
+
const goToPrevChunk = () => {
|
| 18 |
+
if (currentChunkIndex > 0) {
|
| 19 |
+
if (clearTypingAnimation) {
|
| 20 |
+
clearTypingAnimation();
|
| 21 |
+
}
|
| 22 |
+
setCurrentChunkIndex(currentChunkIndex - 1);
|
| 23 |
+
}
|
| 24 |
+
};
|
| 25 |
+
|
| 26 |
+
const skipChunk = () => {
|
| 27 |
+
setChunkStates(prev => ({
|
| 28 |
+
...prev,
|
| 29 |
+
[currentChunkIndex]: 'skipped'
|
| 30 |
+
}));
|
| 31 |
+
};
|
| 32 |
+
|
| 33 |
+
const markChunkUnderstood = () => {
|
| 34 |
+
setChunkStates(prev => ({
|
| 35 |
+
...prev,
|
| 36 |
+
[currentChunkIndex]: 'understood'
|
| 37 |
+
}));
|
| 38 |
+
};
|
| 39 |
+
|
| 40 |
+
const startInteractiveLesson = (startChunkLessonFn) => {
|
| 41 |
+
setChunkStates(prev => ({
|
| 42 |
+
...prev,
|
| 43 |
+
[currentChunkIndex]: 'interactive'
|
| 44 |
+
}));
|
| 45 |
+
startChunkLessonFn(currentChunkIndex);
|
| 46 |
+
};
|
| 47 |
+
|
| 48 |
+
return {
|
| 49 |
+
chunkStates,
|
| 50 |
+
currentChunkIndex,
|
| 51 |
+
chunkExpanded,
|
| 52 |
+
goToNextChunk,
|
| 53 |
+
goToPrevChunk,
|
| 54 |
+
skipChunk,
|
| 55 |
+
markChunkUnderstood,
|
| 56 |
+
startInteractiveLesson,
|
| 57 |
+
setChunkExpanded
|
| 58 |
+
};
|
| 59 |
+
};
|
frontend/src/hooks/useDocumentProcessor.js
ADDED
|
@@ -0,0 +1,121 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import { useState, useRef, useCallback } from 'react';
|
| 2 |
+
|
| 3 |
+
export const useDocumentProcessor = () => {
|
| 4 |
+
const fileInputRef = useRef(null);
|
| 5 |
+
const [selectedFile, setSelectedFile] = useState(null);
|
| 6 |
+
const [processing, setProcessing] = useState(false);
|
| 7 |
+
const [uploadProgress, setUploadProgress] = useState(0);
|
| 8 |
+
const [ocrProgress, setOcrProgress] = useState(0);
|
| 9 |
+
const [documentData, setDocumentData] = useState(null);
|
| 10 |
+
const [imageCache, setImageCache] = useState({});
|
| 11 |
+
const imageCacheRef = useRef({});
|
| 12 |
+
|
| 13 |
+
const handleFileChange = (e) => {
|
| 14 |
+
setSelectedFile(e.target.files[0]);
|
| 15 |
+
setDocumentData(null);
|
| 16 |
+
setUploadProgress(0);
|
| 17 |
+
setOcrProgress(0);
|
| 18 |
+
setImageCache({});
|
| 19 |
+
imageCacheRef.current = {};
|
| 20 |
+
};
|
| 21 |
+
|
| 22 |
+
const fetchImage = useCallback(async (imageId, fileId) => {
|
| 23 |
+
if (imageCacheRef.current[imageId]) {
|
| 24 |
+
return imageCacheRef.current[imageId];
|
| 25 |
+
}
|
| 26 |
+
|
| 27 |
+
try {
|
| 28 |
+
const response = await fetch(`/get_image/${fileId}/${imageId}`);
|
| 29 |
+
if (response.ok) {
|
| 30 |
+
const data = await response.json();
|
| 31 |
+
const imageData = data.image_base64;
|
| 32 |
+
|
| 33 |
+
imageCacheRef.current = {
|
| 34 |
+
...imageCacheRef.current,
|
| 35 |
+
[imageId]: imageData
|
| 36 |
+
};
|
| 37 |
+
|
| 38 |
+
setImageCache(prev => ({
|
| 39 |
+
...prev,
|
| 40 |
+
[imageId]: imageData
|
| 41 |
+
}));
|
| 42 |
+
|
| 43 |
+
return imageData;
|
| 44 |
+
}
|
| 45 |
+
} catch (error) {
|
| 46 |
+
console.error('Error fetching image:', error);
|
| 47 |
+
}
|
| 48 |
+
return null;
|
| 49 |
+
}, []);
|
| 50 |
+
|
| 51 |
+
const processDocument = async () => {
|
| 52 |
+
if (!selectedFile) return;
|
| 53 |
+
|
| 54 |
+
setProcessing(true);
|
| 55 |
+
setUploadProgress(0);
|
| 56 |
+
setOcrProgress(0);
|
| 57 |
+
|
| 58 |
+
try {
|
| 59 |
+
// Step 1: Upload PDF
|
| 60 |
+
const formData = new FormData();
|
| 61 |
+
formData.append('file', selectedFile);
|
| 62 |
+
|
| 63 |
+
setUploadProgress(30);
|
| 64 |
+
const uploadResponse = await fetch('/upload_pdf', {
|
| 65 |
+
method: 'POST',
|
| 66 |
+
body: formData,
|
| 67 |
+
});
|
| 68 |
+
|
| 69 |
+
if (!uploadResponse.ok) {
|
| 70 |
+
throw new Error('Failed to upload PDF');
|
| 71 |
+
}
|
| 72 |
+
|
| 73 |
+
const uploadData = await uploadResponse.json();
|
| 74 |
+
setUploadProgress(100);
|
| 75 |
+
|
| 76 |
+
// Step 2: Process OCR
|
| 77 |
+
setOcrProgress(20);
|
| 78 |
+
await new Promise(resolve => setTimeout(resolve, 500));
|
| 79 |
+
|
| 80 |
+
setOcrProgress(60);
|
| 81 |
+
const ocrResponse = await fetch(`/process_ocr/${uploadData.file_id}`);
|
| 82 |
+
|
| 83 |
+
if (!ocrResponse.ok) {
|
| 84 |
+
throw new Error('Failed to process OCR');
|
| 85 |
+
}
|
| 86 |
+
|
| 87 |
+
const ocrData = await ocrResponse.json();
|
| 88 |
+
setOcrProgress(100);
|
| 89 |
+
|
| 90 |
+
// Backend now provides combined markdown and correctly positioned chunks
|
| 91 |
+
setDocumentData({
|
| 92 |
+
fileId: uploadData.file_id,
|
| 93 |
+
filename: uploadData.filename,
|
| 94 |
+
markdown: ocrData.combined_markdown,
|
| 95 |
+
pages: ocrData.pages,
|
| 96 |
+
totalPages: ocrData.total_pages,
|
| 97 |
+
chunks: ocrData.chunks
|
| 98 |
+
});
|
| 99 |
+
|
| 100 |
+
} catch (error) {
|
| 101 |
+
console.error('Error processing document:', error);
|
| 102 |
+
alert('Error processing document: ' + error.message);
|
| 103 |
+
} finally {
|
| 104 |
+
setProcessing(false);
|
| 105 |
+
}
|
| 106 |
+
};
|
| 107 |
+
|
| 108 |
+
return {
|
| 109 |
+
fileInputRef,
|
| 110 |
+
selectedFile,
|
| 111 |
+
processing,
|
| 112 |
+
uploadProgress,
|
| 113 |
+
ocrProgress,
|
| 114 |
+
documentData,
|
| 115 |
+
imageCache,
|
| 116 |
+
handleFileChange,
|
| 117 |
+
fetchImage,
|
| 118 |
+
processDocument,
|
| 119 |
+
setSelectedFile
|
| 120 |
+
};
|
| 121 |
+
};
|
frontend/src/hooks/usePanelResize.js
ADDED
|
@@ -0,0 +1,45 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import { useState, useEffect, useRef } from 'react';
|
| 2 |
+
|
| 3 |
+
export const usePanelResize = (initialWidth = 40) => {
|
| 4 |
+
const [leftPanelWidth, setLeftPanelWidth] = useState(initialWidth);
|
| 5 |
+
const [isDragging, setIsDragging] = useState(false);
|
| 6 |
+
const containerRef = useRef(null);
|
| 7 |
+
|
| 8 |
+
const handleMouseDown = (e) => {
|
| 9 |
+
setIsDragging(true);
|
| 10 |
+
e.preventDefault();
|
| 11 |
+
};
|
| 12 |
+
|
| 13 |
+
const handleMouseMove = (e) => {
|
| 14 |
+
if (!isDragging || !containerRef.current) return;
|
| 15 |
+
|
| 16 |
+
const containerRect = containerRef.current.getBoundingClientRect();
|
| 17 |
+
const newLeftWidth = ((e.clientX - containerRect.left) / containerRect.width) * 100;
|
| 18 |
+
|
| 19 |
+
if (newLeftWidth >= 20 && newLeftWidth <= 80) {
|
| 20 |
+
setLeftPanelWidth(newLeftWidth);
|
| 21 |
+
}
|
| 22 |
+
};
|
| 23 |
+
|
| 24 |
+
const handleMouseUp = () => {
|
| 25 |
+
setIsDragging(false);
|
| 26 |
+
};
|
| 27 |
+
|
| 28 |
+
useEffect(() => {
|
| 29 |
+
if (isDragging) {
|
| 30 |
+
document.addEventListener('mousemove', handleMouseMove);
|
| 31 |
+
document.addEventListener('mouseup', handleMouseUp);
|
| 32 |
+
return () => {
|
| 33 |
+
document.removeEventListener('mousemove', handleMouseMove);
|
| 34 |
+
document.removeEventListener('mouseup', handleMouseUp);
|
| 35 |
+
};
|
| 36 |
+
}
|
| 37 |
+
}, [isDragging]);
|
| 38 |
+
|
| 39 |
+
return {
|
| 40 |
+
leftPanelWidth,
|
| 41 |
+
isDragging,
|
| 42 |
+
containerRef,
|
| 43 |
+
handleMouseDown
|
| 44 |
+
};
|
| 45 |
+
};
|
frontend/src/utils/markdownComponents.jsx
ADDED
|
@@ -0,0 +1,98 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import ImageComponent from '../components/ImageComponent';
|
| 2 |
+
|
| 3 |
+
export const getDocumentMarkdownComponents = (documentData, fetchImage, imageCache, setImageCache) => ({
|
| 4 |
+
h1: ({ children }) => <h1 style={{ fontSize: '1.5rem', fontWeight: 'bold', marginBottom: '1rem', color: '#1a202c' }}>{children}</h1>,
|
| 5 |
+
h2: ({ children }) => <h2 style={{ fontSize: '1.25rem', fontWeight: 'bold', marginBottom: '0.75rem', marginTop: '1.5rem', color: '#1a202c' }}>{children}</h2>,
|
| 6 |
+
h3: ({ children }) => <h3 style={{ fontSize: '1.125rem', fontWeight: 'bold', marginBottom: '0.5rem', marginTop: '1rem', color: '#1a202c' }}>{children}</h3>,
|
| 7 |
+
p: ({ children }) => <p style={{ marginBottom: '0.75rem', color: '#374151', lineHeight: '1.5', fontSize: '0.875rem' }}>{children}</p>,
|
| 8 |
+
hr: () => <hr style={{ margin: '1.5rem 0', borderColor: '#d1d5db' }} />,
|
| 9 |
+
ul: ({ children }) => <ul style={{ marginBottom: '0.75rem', marginLeft: '1.25rem', listStyleType: 'disc', fontSize: '0.875rem' }}>{children}</ul>,
|
| 10 |
+
ol: ({ children }) => <ol style={{ marginBottom: '0.75rem', marginLeft: '1.25rem', listStyleType: 'decimal', fontSize: '0.875rem' }}>{children}</ol>,
|
| 11 |
+
li: ({ children }) => <li style={{ marginBottom: '0.125rem', color: '#374151' }}>{children}</li>,
|
| 12 |
+
blockquote: ({ children }) => (
|
| 13 |
+
<blockquote style={{ borderLeft: '3px solid #3b82f6', paddingLeft: '0.75rem', fontStyle: 'italic', margin: '0.75rem 0', color: '#6b7280', fontSize: '0.875rem' }}>
|
| 14 |
+
{children}
|
| 15 |
+
</blockquote>
|
| 16 |
+
),
|
| 17 |
+
code: ({ inline, children }) =>
|
| 18 |
+
inline ?
|
| 19 |
+
<code style={{ backgroundColor: '#f3f4f6', padding: '0.125rem 0.25rem', borderRadius: '0.25rem', fontSize: '0.75rem', fontFamily: 'monospace' }}>{children}</code> :
|
| 20 |
+
<pre style={{ backgroundColor: '#f3f4f6', padding: '0.75rem', borderRadius: '0.375rem', overflowX: 'auto', margin: '0.75rem 0' }}>
|
| 21 |
+
<code style={{ fontSize: '0.75rem', fontFamily: 'monospace' }}>{children}</code>
|
| 22 |
+
</pre>,
|
| 23 |
+
div: ({ children, style }) => (
|
| 24 |
+
<div style={style}>
|
| 25 |
+
{children}
|
| 26 |
+
</div>
|
| 27 |
+
),
|
| 28 |
+
img: ({ src, alt }) => (
|
| 29 |
+
<ImageComponent
|
| 30 |
+
src={src}
|
| 31 |
+
alt={alt}
|
| 32 |
+
fileId={documentData?.fileId}
|
| 33 |
+
imageCache={imageCache}
|
| 34 |
+
onImageCached={(imageId, imageData) => {
|
| 35 |
+
setImageCache(prev => ({
|
| 36 |
+
...prev,
|
| 37 |
+
[imageId]: imageData
|
| 38 |
+
}));
|
| 39 |
+
}}
|
| 40 |
+
/>
|
| 41 |
+
)
|
| 42 |
+
});
|
| 43 |
+
|
| 44 |
+
export const getChunkMarkdownComponents = (documentData, fetchImage, imageCache, setImageCache) => ({
|
| 45 |
+
h1: ({ children }) => <h1 style={{ fontSize: '1.25rem', fontWeight: 'bold', marginBottom: '0.75rem', color: '#1a202c' }}>{children}</h1>,
|
| 46 |
+
h2: ({ children }) => <h2 style={{ fontSize: '1.125rem', fontWeight: 'bold', marginBottom: '0.5rem', marginTop: '1rem', color: '#1a202c' }}>{children}</h2>,
|
| 47 |
+
h3: ({ children }) => <h3 style={{ fontSize: '1rem', fontWeight: 'bold', marginBottom: '0.5rem', marginTop: '0.75rem', color: '#1a202c' }}>{children}</h3>,
|
| 48 |
+
p: ({ children }) => <p style={{ marginBottom: '0.5rem', color: '#374151', lineHeight: '1.4', fontSize: '0.875rem' }}>{children}</p>,
|
| 49 |
+
hr: () => <hr style={{ margin: '1rem 0', borderColor: '#d1d5db' }} />,
|
| 50 |
+
ul: ({ children }) => <ul style={{ marginBottom: '0.5rem', marginLeft: '1rem', listStyleType: 'disc', fontSize: '0.875rem' }}>{children}</ul>,
|
| 51 |
+
ol: ({ children }) => <ol style={{ marginBottom: '0.5rem', marginLeft: '1rem', listStyleType: 'decimal', fontSize: '0.875rem' }}>{children}</ol>,
|
| 52 |
+
li: ({ children }) => <li style={{ marginBottom: '0.125rem', color: '#374151' }}>{children}</li>,
|
| 53 |
+
blockquote: ({ children }) => (
|
| 54 |
+
<blockquote style={{ borderLeft: '2px solid #9ca3af', paddingLeft: '0.5rem', fontStyle: 'italic', margin: '0.5rem 0', color: '#6b7280', fontSize: '0.875rem' }}>
|
| 55 |
+
{children}
|
| 56 |
+
</blockquote>
|
| 57 |
+
),
|
| 58 |
+
code: ({ inline, children }) =>
|
| 59 |
+
inline ?
|
| 60 |
+
<code style={{ backgroundColor: '#f3f4f6', padding: '0.125rem 0.25rem', borderRadius: '0.25rem', fontSize: '0.75rem', fontFamily: 'monospace' }}>{children}</code> :
|
| 61 |
+
<pre style={{ backgroundColor: '#f3f4f6', padding: '0.5rem', borderRadius: '0.25rem', overflowX: 'auto', margin: '0.5rem 0' }}>
|
| 62 |
+
<code style={{ fontSize: '0.75rem', fontFamily: 'monospace' }}>{children}</code>
|
| 63 |
+
</pre>,
|
| 64 |
+
img: ({ src, alt }) => (
|
| 65 |
+
<ImageComponent
|
| 66 |
+
src={src}
|
| 67 |
+
alt={alt}
|
| 68 |
+
fileId={documentData?.fileId}
|
| 69 |
+
imageCache={imageCache}
|
| 70 |
+
onImageCached={(imageId, imageData) => {
|
| 71 |
+
setImageCache(prev => ({
|
| 72 |
+
...prev,
|
| 73 |
+
[imageId]: imageData
|
| 74 |
+
}));
|
| 75 |
+
}}
|
| 76 |
+
/>
|
| 77 |
+
)
|
| 78 |
+
});
|
| 79 |
+
|
| 80 |
+
export const getChatMarkdownComponents = () => ({
|
| 81 |
+
p: ({ children }) => <p className="mb-2 text-gray-800 leading-relaxed">{children}</p>,
|
| 82 |
+
ul: ({ children }) => <ul className="mb-2 ml-4 list-disc">{children}</ul>,
|
| 83 |
+
ol: ({ children }) => <ol className="mb-2 ml-4 list-decimal">{children}</ol>,
|
| 84 |
+
li: ({ children }) => <li className="mb-1 text-gray-800">{children}</li>,
|
| 85 |
+
strong: ({ children }) => <strong className="font-semibold text-gray-900">{children}</strong>,
|
| 86 |
+
em: ({ children }) => <em className="italic">{children}</em>,
|
| 87 |
+
code: ({ inline, children }) =>
|
| 88 |
+
inline ?
|
| 89 |
+
<code className="bg-gray-100 px-1 py-0.5 rounded text-sm font-mono">{children}</code> :
|
| 90 |
+
<pre className="bg-gray-100 p-2 rounded overflow-x-auto my-2">
|
| 91 |
+
<code className="text-sm font-mono">{children}</code>
|
| 92 |
+
</pre>,
|
| 93 |
+
blockquote: ({ children }) => (
|
| 94 |
+
<blockquote className="border-l-4 border-blue-200 pl-4 italic text-gray-700 my-2">
|
| 95 |
+
{children}
|
| 96 |
+
</blockquote>
|
| 97 |
+
)
|
| 98 |
+
});
|
frontend/src/utils/markdownUtils.js
ADDED
|
@@ -0,0 +1,33 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
export const highlightChunkInMarkdown = (markdown, chunks, currentChunkIndex) => {
|
| 2 |
+
if (!chunks || !chunks[currentChunkIndex] || !markdown) {
|
| 3 |
+
return markdown;
|
| 4 |
+
}
|
| 5 |
+
|
| 6 |
+
const chunk = chunks[currentChunkIndex];
|
| 7 |
+
const chunkText = markdown.slice(chunk.start_position, chunk.end_position);
|
| 8 |
+
|
| 9 |
+
console.log('Chunk debugging:', {
|
| 10 |
+
chunkIndex: currentChunkIndex,
|
| 11 |
+
startPos: chunk.start_position,
|
| 12 |
+
endPos: chunk.end_position,
|
| 13 |
+
chunkTextLength: chunkText.length,
|
| 14 |
+
chunkTextPreview: chunkText.substring(0, 50) + '...',
|
| 15 |
+
beforeText: markdown.slice(Math.max(0, chunk.start_position - 20), chunk.start_position),
|
| 16 |
+
afterText: markdown.slice(chunk.end_position, chunk.end_position + 20)
|
| 17 |
+
});
|
| 18 |
+
|
| 19 |
+
// Use markdown blockquote which preserves structure while providing visual distinction
|
| 20 |
+
const lines = chunkText.split('\n');
|
| 21 |
+
const highlightedLines = lines.map(line => {
|
| 22 |
+
if (line.trim() === '') return '>'; // Empty blockquote line
|
| 23 |
+
return '> ' + line;
|
| 24 |
+
});
|
| 25 |
+
|
| 26 |
+
const highlightedChunk = '\n\n> **Current Learning Section**\n>\n' +
|
| 27 |
+
highlightedLines.join('\n') +
|
| 28 |
+
'\n\n';
|
| 29 |
+
|
| 30 |
+
return markdown.slice(0, chunk.start_position) +
|
| 31 |
+
highlightedChunk +
|
| 32 |
+
markdown.slice(chunk.end_position);
|
| 33 |
+
};
|