Spaces:
Sleeping
Sleeping
| # PDF Debugging Workflow | |
| This guide details how to use the PDF Inspector tool to diagnose and remediate common PDF accessibility issues. | |
| ## 1. Initial Compatibility Check | |
| **Goal**: Determine if the document requires major remediation before detailed analysis. | |
| 1. **Upload the PDF**: Use the file uploader or select an example from the list. | |
| 2. **Run Single Page Analysis**: Click "Analyze". | |
| 3. **Check for Alerts**: Look for the "Accessibility Alert" box at the top of the summary. | |
| * **Untagged Document**: If you see this, the document lacks the "Structure Tree" required for screen readers. | |
| * *Remediation*: Open the source file (Word/PPT) and "Save as PDF" with tags enabled, or use Adobe Acrobat Pro's "Autotag" feature. | |
| * **Scanned Page**: If you see this, the page is an image with no selectable text. | |
| * *Remediation*: Perform Optical Character Recognition (OCR) using Adobe Acrobat or a similar tool. | |
| ## 2. Detailed Single-Page Inspection | |
| **Goal**: Verify reading order and content types on a specific page. | |
| 1. **Visual Inspection**: Look at the "Analysis Results" image. | |
| * **Red Boxes**: Indicate detected text blocks. | |
| * **Numbers**: Show the reading order. | |
| 2. **Verify Reading Order**: | |
| * Does the order (1, 2, 3...) follow the logical flow of the document? | |
| * *Issue*: If columns are read left-to-right across the page instead of down the column, the reading order is broken. | |
| * *Fix*: This usually requires manual retagging in Acrobat (Order panel). | |
| 3. **Check for Artifacts**: | |
| * Are headers/footers marked as text blocks? (They should generally be artifacts/ignored by screen readers). | |
| ## 3. Advanced Diagnostics | |
| **Goal**: Deep dive into specific issues using the "Advanced Analysis" tab. | |
| ### Content Stream Inspector | |
| * **Use when**: Text looks correct visually but copies weirdly or reads wrong (e.g., "fi" ligaure issues). | |
| * **Action**: Select a block and click "Extract Operators". | |
| * **Look for**: `TJ` or `Tj` operators showing garbled characters or strange spacing adjustments. | |
| ### Screen Reader Simulator | |
| * **Use when**: You want to "hear" what a user hears. | |
| * **Action**: Select "NVDA" and click "Generate Transcript". | |
| * **Check**: | |
| * Are headings announced as "Heading Level X"? | |
| * Is alt text read for images? | |
| * Is the reading order intelligible? | |
| ### Paragraph Detection | |
| * **Use when**: Text seems run-on or broken into too many fragments. | |
| * **Action**: Click "Analyze Paragraphs". | |
| * **Check**: | |
| * **Visual vs. Semantic**: Large discrepancies suggest the `<P>` tags don't match the visual layout, which can confuse users navigating by paragraph. | |
| ### Structure Tree Visualizer | |
| * **Use when**: The document is tagged, but navigation is broken. | |
| * **Action**: Click "Extract Structure Tree". | |
| * **Check**: | |
| * Hierarchy depth. | |
| * Correct nesting (e.g., `L` -> `LI` -> `LBody`). | |
| ## 4. Batch Analysis for Large Documents | |
| **Goal**: Identify problematic pages in a long report. | |
| 1. **Go to Batch Analysis Tab**. | |
| 2. **Run Batch**: Analyze 50-100 pages. | |
| 3. **Review the Report**: | |
| * **Issues Found**: Look for "Scanned Pages" or "Garbled Text". | |
| * **Page List**: Use the list of page numbers to targeting your remediation efforts. | |
| ## Summary Checklist | |
| - [ ] Document is Tagged (`/StructTreeRoot` exists) | |
| - [ ] Text is selectable (not an image/scan) | |
| - [ ] Reading order is logical (columns handled correctly) | |
| - [ ] Images have Alt Text (or are marked as artifacts) | |
| - [ ] Headings use Heading tags (`<H1>`, `<H2>`), not just bold text. | |