Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available:
6.9.0
PDF Debugging Workflow
This guide details how to use the PDF Inspector tool to diagnose and remediate common PDF accessibility issues.
1. Initial Compatibility Check
Goal: Determine if the document requires major remediation before detailed analysis.
- Upload the PDF: Use the file uploader or select an example from the list.
- Run Single Page Analysis: Click "Analyze".
- Check for Alerts: Look for the "Accessibility Alert" box at the top of the summary.
- Untagged Document: If you see this, the document lacks the "Structure Tree" required for screen readers.
- Remediation: Open the source file (Word/PPT) and "Save as PDF" with tags enabled, or use Adobe Acrobat Pro's "Autotag" feature.
- Scanned Page: If you see this, the page is an image with no selectable text.
- Remediation: Perform Optical Character Recognition (OCR) using Adobe Acrobat or a similar tool.
- Untagged Document: If you see this, the document lacks the "Structure Tree" required for screen readers.
2. Detailed Single-Page Inspection
Goal: Verify reading order and content types on a specific page.
- Visual Inspection: Look at the "Analysis Results" image.
- Red Boxes: Indicate detected text blocks.
- Numbers: Show the reading order.
- Verify Reading Order:
- Does the order (1, 2, 3...) follow the logical flow of the document?
- Issue: If columns are read left-to-right across the page instead of down the column, the reading order is broken.
- Fix: This usually requires manual retagging in Acrobat (Order panel).
- Check for Artifacts:
- Are headers/footers marked as text blocks? (They should generally be artifacts/ignored by screen readers).
3. Advanced Diagnostics
Goal: Deep dive into specific issues using the "Advanced Analysis" tab.
Content Stream Inspector
- Use when: Text looks correct visually but copies weirdly or reads wrong (e.g., "fi" ligaure issues).
- Action: Select a block and click "Extract Operators".
- Look for:
TJorTjoperators showing garbled characters or strange spacing adjustments.
Screen Reader Simulator
- Use when: You want to "hear" what a user hears.
- Action: Select "NVDA" and click "Generate Transcript".
- Check:
- Are headings announced as "Heading Level X"?
- Is alt text read for images?
- Is the reading order intelligible?
Paragraph Detection
- Use when: Text seems run-on or broken into too many fragments.
- Action: Click "Analyze Paragraphs".
- Check:
- Visual vs. Semantic: Large discrepancies suggest the
<P>tags don't match the visual layout, which can confuse users navigating by paragraph.
- Visual vs. Semantic: Large discrepancies suggest the
Structure Tree Visualizer
- Use when: The document is tagged, but navigation is broken.
- Action: Click "Extract Structure Tree".
- Check:
- Hierarchy depth.
- Correct nesting (e.g.,
L->LI->LBody).
4. Batch Analysis for Large Documents
Goal: Identify problematic pages in a long report.
- Go to Batch Analysis Tab.
- Run Batch: Analyze 50-100 pages.
- Review the Report:
- Issues Found: Look for "Scanned Pages" or "Garbled Text".
- Page List: Use the list of page numbers to targeting your remediation efforts.
Summary Checklist
- Document is Tagged (
/StructTreeRootexists) - Text is selectable (not an image/scan)
- Reading order is logical (columns handled correctly)
- Images have Alt Text (or are marked as artifacts)
- Headings use Heading tags (
<H1>,<H2>), not just bold text.