Spaces:

rianders
/

pdfinspector

Sleeping

App Files Files Community

pdfinspector / DEBUGGING_WORKFLOW.md

rianders

Fix file load errors and implement auto-refresh functionality

0d61aa0 about 2 months ago

preview code

raw

history blame contribute delete

3.59 kB

	# PDF Debugging Workflow

	This guide details how to use the PDF Inspector tool to diagnose and remediate common PDF accessibility issues.

	## 1. Initial Compatibility Check
	Goal: Determine if the document requires major remediation before detailed analysis.

	1. Upload the PDF: Use the file uploader or select an example from the list.
	2. Run Single Page Analysis: Click "Analyze".
	3. Check for Alerts: Look for the "Accessibility Alert" box at the top of the summary.
	* Untagged Document: If you see this, the document lacks the "Structure Tree" required for screen readers.
	* Remediation: Open the source file (Word/PPT) and "Save as PDF" with tags enabled, or use Adobe Acrobat Pro's "Autotag" feature.
	* Scanned Page: If you see this, the page is an image with no selectable text.
	* Remediation: Perform Optical Character Recognition (OCR) using Adobe Acrobat or a similar tool.

	## 2. Detailed Single-Page Inspection
	Goal: Verify reading order and content types on a specific page.

	1. Visual Inspection: Look at the "Analysis Results" image.
	* Red Boxes: Indicate detected text blocks.
	* Numbers: Show the reading order.
	2. Verify Reading Order:
	* Does the order (1, 2, 3...) follow the logical flow of the document?
	* Issue: If columns are read left-to-right across the page instead of down the column, the reading order is broken.
	* Fix: This usually requires manual retagging in Acrobat (Order panel).
	3. Check for Artifacts:
	* Are headers/footers marked as text blocks? (They should generally be artifacts/ignored by screen readers).

	## 3. Advanced Diagnostics
	Goal: Deep dive into specific issues using the "Advanced Analysis" tab.

	### Content Stream Inspector
	* Use when: Text looks correct visually but copies weirdly or reads wrong (e.g., "fi" ligaure issues).
	* Action: Select a block and click "Extract Operators".
	* Look for: `TJ` or `Tj` operators showing garbled characters or strange spacing adjustments.

	### Screen Reader Simulator
	* Use when: You want to "hear" what a user hears.
	* Action: Select "NVDA" and click "Generate Transcript".
	* Check:
	* Are headings announced as "Heading Level X"?
	* Is alt text read for images?
	* Is the reading order intelligible?

	### Paragraph Detection
	* Use when: Text seems run-on or broken into too many fragments.
	* Action: Click "Analyze Paragraphs".
	* Check:
	* Visual vs. Semantic: Large discrepancies suggest the `<P>` tags don't match the visual layout, which can confuse users navigating by paragraph.

	### Structure Tree Visualizer
	* Use when: The document is tagged, but navigation is broken.
	* Action: Click "Extract Structure Tree".
	* Check:
	* Hierarchy depth.
	* Correct nesting (e.g., `L` -> `LI` -> `LBody`).

	## 4. Batch Analysis for Large Documents
	Goal: Identify problematic pages in a long report.

	1. Go to Batch Analysis Tab.
	2. Run Batch: Analyze 50-100 pages.
	3. Review the Report:
	* Issues Found: Look for "Scanned Pages" or "Garbled Text".
	* Page List: Use the list of page numbers to targeting your remediation efforts.

	## Summary Checklist
	- [ ] Document is Tagged (`/StructTreeRoot` exists)
	- [ ] Text is selectable (not an image/scan)
	- [ ] Reading order is logical (columns handled correctly)
	- [ ] Images have Alt Text (or are marked as artifacts)
	- [ ] Headings use Heading tags (`<H1>`, `<H2>`), not just bold text.

	# PDF Debugging Workflow

	This guide details how to use the PDF Inspector tool to diagnose and remediate common PDF accessibility issues.

	## 1. Initial Compatibility Check
	Goal: Determine if the document requires major remediation before detailed analysis.

	1. Upload the PDF: Use the file uploader or select an example from the list.
	2. Run Single Page Analysis: Click "Analyze".
	3. Check for Alerts: Look for the "Accessibility Alert" box at the top of the summary.
	* Untagged Document: If you see this, the document lacks the "Structure Tree" required for screen readers.
	* Remediation: Open the source file (Word/PPT) and "Save as PDF" with tags enabled, or use Adobe Acrobat Pro's "Autotag" feature.
	* Scanned Page: If you see this, the page is an image with no selectable text.
	* Remediation: Perform Optical Character Recognition (OCR) using Adobe Acrobat or a similar tool.

	## 2. Detailed Single-Page Inspection
	Goal: Verify reading order and content types on a specific page.

	1. Visual Inspection: Look at the "Analysis Results" image.
	* Red Boxes: Indicate detected text blocks.
	* Numbers: Show the reading order.
	2. Verify Reading Order:
	* Does the order (1, 2, 3...) follow the logical flow of the document?
	* Issue: If columns are read left-to-right across the page instead of down the column, the reading order is broken.
	* Fix: This usually requires manual retagging in Acrobat (Order panel).
	3. Check for Artifacts:
	* Are headers/footers marked as text blocks? (They should generally be artifacts/ignored by screen readers).

	## 3. Advanced Diagnostics
	Goal: Deep dive into specific issues using the "Advanced Analysis" tab.

	### Content Stream Inspector
	* Use when: Text looks correct visually but copies weirdly or reads wrong (e.g., "fi" ligaure issues).
	* Action: Select a block and click "Extract Operators".
	* Look for: `TJ` or `Tj` operators showing garbled characters or strange spacing adjustments.

	### Screen Reader Simulator
	* Use when: You want to "hear" what a user hears.
	* Action: Select "NVDA" and click "Generate Transcript".
	* Check:
	* Are headings announced as "Heading Level X"?
	* Is alt text read for images?
	* Is the reading order intelligible?

	### Paragraph Detection
	* Use when: Text seems run-on or broken into too many fragments.
	* Action: Click "Analyze Paragraphs".
	* Check:
	* Visual vs. Semantic: Large discrepancies suggest the `<P>` tags don't match the visual layout, which can confuse users navigating by paragraph.

	### Structure Tree Visualizer
	* Use when: The document is tagged, but navigation is broken.
	* Action: Click "Extract Structure Tree".
	* Check:
	* Hierarchy depth.
	* Correct nesting (e.g., `L` -> `LI` -> `LBody`).

	## 4. Batch Analysis for Large Documents
	Goal: Identify problematic pages in a long report.

	1. Go to Batch Analysis Tab.
	2. Run Batch: Analyze 50-100 pages.
	3. Review the Report:
	* Issues Found: Look for "Scanned Pages" or "Garbled Text".
	* Page List: Use the list of page numbers to targeting your remediation efforts.

	## Summary Checklist
	- [ ] Document is Tagged (`/StructTreeRoot` exists)
	- [ ] Text is selectable (not an image/scan)
	- [ ] Reading order is logical (columns handled correctly)
	- [ ] Images have Alt Text (or are marked as artifacts)
	- [ ] Headings use Heading tags (`<H1>`, `<H2>`), not just bold text.