# PDF Inspector - Test Plan ## Overview This test plan outlines valid verification steps for the PDF Inspector application using the provided example documents. Since all currently included examples are **untagged** documents, this plan focuses on verifying the "Untagged" detection logic, fallback heuristics (math detection, reading order), and error handling. ## Test Environment - **URL**: http://127.0.0.1:7860 - **Browsers**: Chrome / Safari / Firefox (Any modern browser) --- ## 1. Test Case: Untagged Document Detection **Target Document**: `test_document.pdf` | Step | Action | Expected Result | Pass/Fail | |------|--------|-----------------|-----------| | 1.1 | Select `test_document.pdf` from Examples. | File loads into the input box. | | | 1.2 | Click **Analyze** button. | Analysis completes; "Analysis Results" image appears. | | | 1.3 | Check Summary Report. | **Alert**: "⚠️ Accessibility Alert: Untagged Document" is visible. | | | 1.4 | Go to **Advanced Analysis** tab. | Tab opens. | | | 1.5 | Open **4. Structure Tree Visualizer** and click **Extract**. | **Result**: "## No Structure Tree Found" message. | | **Success Criteria**: The application correctly identifies the document as untagged and prevents structure-dependent tools from crashing. --- ## 2. Test Case: Math & Visual Block Detection **Target Document**: `18.1 Notes.pdf` (Handwritten/Math Slides) | Step | Action | Expected Result | Pass/Fail | |------|--------|-----------------|-----------| | 2.1 | Select `18.1 Notes.pdf` from Examples. | File loads. | | | 2.2 | Click **Analyze** button. | Analysis completes (~1-2 seconds). | | | 2.3 | Inspect "Page overlay" image. | - **Red Boxes**: Detected around text blocks.
- **Math Highlight**: Math formulas (e.g., integrals, sums) should have specific bounding boxes. | | | 2.4 | Check Summary Report. | **Alert**: "Untagged Document".
**Stats**: Should show > 0 "Math-like blocks detected". | | **Success Criteria**: The heuristic regex-based math detection works on the text extracted from the slides. --- ## 3. Test Case: Screen Reader Simulation (Untagged Fallback) **Target Document**: `logic.pdf` (Academic Text) | Step | Action | Expected Result | Pass/Fail | |------|--------|-----------------|-----------| | 3.1 | Select `logic.pdf`. | File loads. | | | 3.2 | Click **Analyze**. | Analysis completes. | | | 3.3 | Go to **Advanced Analysis** -> **2. Screen Reader Simulator**. | Accordion opens. | | | 3.4 | Set **Reading Order** to "Raw" or "TBLR". | Settings accepted. | | | 3.5 | Click **Generate Transcript**. | **Result**: Transcript appears in the textbook.
**Header**: "⚠️ Simulated from visual order (PDF not tagged)".
**Content**: Contains readable text (e.g., "A Logical Interpretation..."). | | **Success Criteria**: The simulator successfully uses the fallback logic (visual ordering) instead of crashing when no structure tree is present. --- ## 4. Test Case: Feature Availability Check (Negative Testing) **Target Document**: Any of the above | Step | Action | Expected Result | Pass/Fail | |------|--------|-----------------|-----------| | 4.1 | Open **5. Block-to-Tag Mapping**. | Accordion opens. | | | 4.2 | Click **Map Blocks to Tags**. | **Result**: "## No Mappings Found" (because there are no tags). | | | 4.3 | Open **3. Paragraph Detection** and click **Analyze**. | **Result**: Visual paragraphs are detected (green boxes), but **Semantic

Tags** count is 0. | | ### 1.6 Landscape / Rotated Documents - **Why**: Ensure overlays align correctly on rotated pages. - **Test**: - Load a PDF with landscape pages (or 90-degree rotation). - Verify that the blue/red bounding boxes align perfectly with the text. - Verify that "reading order" flows logically (e.g., top-left of the *visual* page). **Success Criteria**: Features requiring tags explicitly state that tags are missing rather than showing empty/broken UIs. ## Known Limitations / Expected Behavior * **Untagged Alerts**: All examples provided are untagged; the alert is **expected behavior**. * **Reading Order**: Without tags, reading order is a guess. Columns might be read left-to-right across the page in "Raw" mode.