Spaces:

Text-to-Document-Generation
/

Docgenie-API

Paused

App Files Files Community

Docgenie-API / TESTING_PLAN.md

Ahadhassan-2003

deploy: update HF Space

dc4e6da 9 days ago

preview code

raw

history blame contribute delete

62.2 kB

	# Comprehensive Testing Plan & Test Cases
	## DocGenie Synthetic Document Generation API

	Document Version: 1.0
	Date: March 4, 2026
	Project: DocGenie - AI-Powered Synthetic Document Dataset Generator

	---

	## Table of Contents
	1. [Testing Overview](#testing-overview)
	2. [Functional Testing](#functional-testing)
	- [Unit Testing](#unit-testing)
	- [Integration Testing](#integration-testing)
	- [System Testing](#system-testing)
	3. [Non-Functional Testing](#non-functional-testing)
	- [Performance Testing](#performance-testing)
	- [Security Testing](#security-testing)
	- [Reliability Testing](#reliability-testing)
	- [Scalability Testing](#scalability-testing)
	- [Usability Testing](#usability-testing)
	4. [Test Environment Setup](#test-environment-setup)
	5. [Testing Tools & Frameworks](#testing-tools--frameworks)
	6. [Test Execution Plan](#test-execution-plan)
	7. [Success Criteria & Metrics](#success-criteria--metrics)
	8. [Risk Assessment](#risk-assessment)

	---

	## Testing Overview

	### Purpose
	This document outlines the comprehensive testing strategy for DocGenie API, ensuring quality, reliability, and performance of the synthetic document generation system across all 19 pipeline stages.

	### Scope
	- API endpoints testing (`/generate`, `/generate/pdf`, `/generate/async`)
	- 19-stage pipeline validation
	- External service integrations (Claude API, RunPod handwriting service)
	- Database operations (Supabase)
	- Background job processing (Redis Queue)
	- Error handling and recovery mechanisms

	### Testing Approach
	- Test-Driven Development (TDD): Write tests before implementation where applicable
	- Continuous Integration: Automated test execution on every commit
	- Coverage Target: Minimum 80% code coverage for critical paths
	- Risk-Based Testing: Prioritize high-risk components (LLM integration, handwriting service)

	---

	## Functional Testing

	### A.1 Unit Testing

	Unit tests verify individual functions and methods in isolation. Target: 85% code coverage.

	#### A.1.1 Seed Image Processing (Stage 01)

	Module: `api/utils.py::download_seed_images()`

	\| Test Case ID \| Test Name \| Input \| Expected Output \| Priority \|
	\|--------------\|-----------\|-------\|-----------------\|----------\|
	\| UT-SEED-001 \| Download valid image URL \| Valid HTTPS URL (JPEG) \| Base64-encoded image string \| High \|
	\| UT-SEED-002 \| Download PNG format \| Valid PNG URL \| Base64-encoded PNG \| High \|
	\| UT-SEED-003 \| Handle 503 timeout error \| URL returning 503 \| Retry 3 times, eventual success \| Critical \|
	\| UT-SEED-004 \| Handle 502 bad gateway \| URL returning 502 \| Retry with exponential backoff \| High \|
	\| UT-SEED-005 \| Handle 404 not found \| Invalid URL \| Raise HTTPException(400) \| High \|
	\| UT-SEED-006 \| Handle connection timeout \| Slow/unresponsive server \| Retry then raise exception \| Medium \|
	\| UT-SEED-007 \| Validate image format \| Non-image URL (HTML) \| Raise validation error \| Medium \|
	\| UT-SEED-008 \| Handle oversized images \| >10MB image \| Process or reject gracefully \| Low \|
	\| UT-SEED-009 \| Test retry backoff timing \| Mock 503 responses \| Delays: 2s, 4s, 8s \| Medium \|
	\| UT-SEED-010 \| Test max retries exhausted \| Persistent 503 errors \| Raise exception after 3 attempts \| High \|

	Test Implementation:
	```python
	# test_seed_download.py
	import pytest
	from api.utils import download_seed_images
	from unittest.mock import patch, Mock

	@pytest.mark.asyncio
	async def test_download_valid_image():
	url = "https://example.com/test.jpg"
	with patch('httpx.AsyncClient') as mock_client:
	mock_response = Mock()
	mock_response.content = b'\xff\xd8\xff\xe0' # JPEG header
	mock_client.return_value.__aenter__.return_value.get.return_value = mock_response

	result = await download_seed_images([url])
	assert len(result) == 1
	assert isinstance(result[0], str) # base64 string

	@pytest.mark.asyncio
	async def test_download_503_retry():
	url = "https://example.com/test.jpg"
	with patch('httpx.AsyncClient') as mock_client:
	# First two calls: 503, third call: success
	responses = [
	Mock(status_code=503, raise_for_status=Mock(side_effect=httpx.HTTPStatusError("503", request=Mock(), response=Mock()))),
	Mock(status_code=503, raise_for_status=Mock(side_effect=httpx.HTTPStatusError("503", request=Mock(), response=Mock()))),
	Mock(content=b'\xff\xd8\xff\xe0', raise_for_status=Mock())
	]
	mock_client.return_value.__aenter__.return_value.get.side_effect = responses

	result = await download_seed_images([url])
	assert len(result) == 1
	assert mock_client.return_value.__aenter__.return_value.get.call_count == 3
	```

	#### A.1.2 HTML Processing (Stage 03)

	Module: `api/utils.py::extract_html_documents_from_response()`

	\| Test Case ID \| Test Name \| Input \| Expected Output \| Priority \|
	\|--------------\|-----------\|-------\|-----------------\|----------\|
	\| UT-HTML-001 \| Extract single HTML \| LLM response with 1 HTML \| List with 1 HTML document \| High \|
	\| UT-HTML-002 \| Extract multiple HTMLs \| Response with 3 HTMLs \| List with 3 documents \| High \|
	\| UT-HTML-003 \| Extract ground truth \| HTML with `<script id="GT">` \| GT JSON extracted, script removed \| Critical \|
	\| UT-HTML-004 \| Handle malformed HTML \| Invalid HTML tags \| Parse with BeautifulSoup recovery \| Medium \|
	\| UT-HTML-005 \| Handle missing DOCTYPE \| HTML without DOCTYPE \| Add DOCTYPE or flag error \| Low \|
	\| UT-HTML-006 \| Validate CSS presence \| HTML without `<style>` \| Raise validation error \| High \|
	\| UT-HTML-007 \| Extract handwriting markers \| HTML with `class="handwritten"` \| Identify 5 handwriting elements \| High \|
	\| UT-HTML-008 \| Extract visual elements \| HTML with `data-placeholder` \| Identify 3 visual elements \| High \|
	\| UT-HTML-009 \| Handle empty response \| Empty string from LLM \| Return empty list \| Medium \|
	\| UT-HTML-010 \| Prettify minified HTML \| Single-line HTML \| Multi-line formatted HTML \| Low \|

	#### A.1.3 PDF Rendering (Stage 04)

	Module: `api/utils.py::render_html_to_pdf()`

	\| Test Case ID \| Test Name \| Input \| Expected Output \| Priority \|
	\|--------------\|-----------\|-------\|-----------------\|----------\|
	\| UT-PDF-001 \| Render A4 document \| HTML with A4 page size \| PDF 210×297mm \| High \|
	\| UT-PDF-002 \| Render Letter size \| HTML with Letter page \| PDF 215.9×279.4mm \| Medium \|
	\| UT-PDF-003 \| Extract geometries \| HTML with handwriting \| Geometries JSON with rects \| Critical \|
	\| UT-PDF-004 \| Handle custom fonts \| HTML with @font-face \| PDF with embedded fonts \| Low \|
	\| UT-PDF-005 \| Preserve CSS styling \| HTML with colors/borders \| PDF matches visual style \| Medium \|
	\| UT-PDF-006 \| Handle images in HTML \| HTML with <img> tags \| Images embedded in PDF \| Low \|
	\| UT-PDF-007 \| Extract text coordinates \| HTML with paragraphs \| Accurate bbox coordinates \| High \|
	\| UT-PDF-008 \| Handle landscape orientation \| HTML with landscape CSS \| PDF in landscape mode \| Low \|
	\| UT-PDF-009 \| Validate page dimensions \| Various page sizes \| Dimensions match CSS @page \| High \|
	\| UT-PDF-010 \| Handle Playwright errors \| Browser crash scenario \| Retry or graceful failure \| Medium \|

	#### A.1.4 Bbox Extraction (Stage 05)

	Module: `api/utils.py::extract_bboxes_from_rendered_pdf()`

	\| Test Case ID \| Test Name \| Input \| Expected Output \| Priority \|
	\|--------------\|-----------\|-------\|-----------------\|----------\|
	\| UT-BBOX-001 \| Extract word bboxes \| Standard PDF \| List of word-level bboxes \| Critical \|
	\| UT-BBOX-002 \| Extract char bboxes \| Same PDF \| List of char-level bboxes \| High \|
	\| UT-BBOX-003 \| Handle multi-line text \| PDF with paragraphs \| Correct block/line grouping \| High \|
	\| UT-BBOX-004 \| Filter whitespace \| PDF with spaces/tabs \| No whitespace-only bboxes \| Medium \|
	\| UT-BBOX-005 \| Handle special characters \| PDF with ©, ®, ™ \| Characters properly extracted \| Medium \|
	\| UT-BBOX-006 \| Handle non-Latin scripts \| PDF with Chinese/Arabic \| Correct unicode extraction \| Low \|
	\| UT-BBOX-007 \| Validate coordinates \| Extracted bboxes \| All coords within page bounds \| High \|
	\| UT-BBOX-008 \| Handle empty PDF \| PDF with no text \| Return empty list \| Low \|
	\| UT-BBOX-009 \| Handle rotated text \| PDF with rotation \| Bboxes account for rotation \| Low \|
	\| UT-BBOX-010 \| Parse bbox strings \| "0_0_0 Hello 10 20 50 30" \| OCRBox object with correct fields \| High \|

	#### A.1.5 Handwriting Region Extraction (Stage 07)

	Module: `api/utils.py::process_stage3_complete()` - handwriting section

	\| Test Case ID \| Test Name \| Input \| Expected Output \| Priority \|
	\|--------------\|-----------\|-------\|-----------------\|----------\|
	\| UT-HW-001 \| Filter by handwriting_ratio \| 10 regions, ratio=0.3 \| ~3 regions selected \| Critical \|
	\| UT-HW-002 \| Parse author IDs \| `class="handwritten author1"` \| author_id="author1" \| High \|
	\| UT-HW-003 \| Match to word bboxes \| Geometry + bboxes \| Correct bbox mapping \| Critical \|
	\| UT-HW-004 \| Handle signature class \| `class="handwritten signature"` \| is_signature=True \| Medium \|
	\| UT-HW-005 \| DPI coordinate conversion \| Browser coords (96 DPI) \| PDF coords (72 DPI) with 0.75 scale \| High \|
	\| UT-HW-006 \| Handle overlapping regions \| 2 regions, same text \| Prevent duplicate bbox usage \| Medium \|
	\| UT-HW-007 \| Validate rect boundaries \| Geometries with rect \| Check bboxes within rect threshold \| High \|
	\| UT-HW-008 \| Test seed reproducibility \| Same seed, same input \| Identical region selection \| High \|
	\| UT-HW-009 \| Handle zero ratio \| ratio=0.0 \| No regions selected \| Medium \|
	\| UT-HW-010 \| Handle full ratio \| ratio=1.0 \| All regions selected \| Medium \|

	#### A.1.6 Handwriting Service Integration

	Module: `api/utils.py::call_handwriting_service_batch()`

	\| Test Case ID \| Test Name \| Input \| Expected Output \| Priority \|
	\|--------------\|-----------\|-------\|-----------------\|----------\|
	\| UT-HWSVC-001 \| Batch request format \| 10 texts with metadata \| Correct RunPod JSON format \| Critical \|
	\| UT-HWSVC-002 \| Handle sync response \| Immediate completion \| Parse output.images[] \| High \|
	\| UT-HWSVC-003 \| Handle IN_PROGRESS \| Delayed completion \| Poll status endpoint \| Critical \|
	\| UT-HWSVC-004 \| Status polling timeout \| Job exceeds 30 polls \| Raise timeout exception \| High \|
	\| UT-HWSVC-005 \| Handle FAILED status \| RunPod job failure \| Raise exception with error \| High \|
	\| UT-HWSVC-006 \| Parse image results \| Batch response \| Map hw_id to image_base64 \| Critical \|
	\| UT-HWSVC-007 \| Calculate dynamic timeout \| 50 texts \| Timeout = 50×20+30 = 1030s \| Medium \|
	\| UT-HWSVC-008 \| Handle network errors \| Connection timeout \| Retry up to max_retries \| High \|
	\| UT-HWSVC-009 \| Validate authorization \| Missing API key \| Request includes Bearer token \| Medium \|
	\| UT-HWSVC-010 \| Test exponential backoff \| Status polling \| Delays: 5s, 6s, 7s... up to 10s \| Low \|

	#### A.1.7 Visual Element Generation (Stage 10)

	Module: `api/utils.py::generate_visual_element_images()`

	\| Test Case ID \| Test Name \| Input \| Expected Output \| Priority \|
	\|--------------\|-----------\|-------\|-----------------\|----------\|
	\| UT-VE-001 \| Select logo prefab \| type="logo" \| Random logo from prefabs/ \| High \|
	\| UT-VE-002 \| Select photo prefab \| type="photo" \| Random photo image \| High \|
	\| UT-VE-003 \| Generate barcode \| type="barcode" \| EAN-13 barcode image \| Medium \|
	\| UT-VE-004 \| Generate QR code \| type="qr_code", content="URL" \| QR code image \| Medium \|
	\| UT-VE-005 \| Test seed reproducibility \| Same seed, same type \| Identical prefab selection \| High \|
	\| UT-VE-006 \| Handle missing prefabs \| type with no files \| Fallback or error \| Medium \|
	\| UT-VE-007 \| Load SVG prefabs \| SVG logo file \| Convert to PNG \| Low \|
	\| UT-VE-008 \| Filter by requested types \| types=["logo","signature"] \| Only matching types generated \| High \|
	\| UT-VE-009 \| Normalize type synonyms \| "chart" → "figure" \| Consistent type mapping \| Medium \|
	\| UT-VE-010 \| Return base64 encoding \| All image types \| Valid base64 strings \| High \|

	#### A.1.8 PDF Modification (Stages 12-13)

	Module: `api/utils.py::process_stage3_complete()` - insertion sections

	\| Test Case ID \| Test Name \| Input \| Expected Output \| Priority \|
	\|--------------\|-----------\|-------\|-----------------\|----------\|
	\| UT-PDFMOD-001 \| Whiteout text regions \| 5 word bboxes \| White rectangles drawn \| High \|
	\| UT-PDFMOD-002 \| Insert handwriting image \| Image + bbox \| Image at correct position \| Critical \|
	\| UT-PDFMOD-003 \| Apply random offsets \| Word bbox \| Position offset within limits \| Medium \|
	\| UT-PDFMOD-004 \| Resize with aspect ratio \| Wide/tall images \| Scaled to fit bbox \| High \|
	\| UT-PDFMOD-005 \| Insert visual element \| Logo + rect \| Centered in bbox \| High \|
	\| UT-PDFMOD-006 \| Handle rotation \| Element with rotation=45 \| Rotated image insertion \| Low \|
	\| UT-PDFMOD-007 \| Save intermediate PDF \| After handwriting \| _with_handwriting.pdf created \| Medium \|
	\| UT-PDFMOD-008 \| Save final PDF \| After visual elements \| _final.pdf created \| High \|
	\| UT-PDFMOD-009 \| Scale factor application \| 3x upscale \| High-res image quality \| Medium \|
	\| UT-PDFMOD-010 \| Handle insertion errors \| Invalid image data \| Log error, continue \| Medium \|

	#### A.1.9 OCR Processing (Stage 15)

	Module: `api/utils.py::run_paddle_ocr()`

	\| Test Case ID \| Test Name \| Input \| Expected Output \| Priority \|
	\|--------------\|-----------\|-------\|-----------------\|----------\|
	\| UT-OCR-001 \| OCR English text \| English document image \| Accurate word recognition \| Critical \|
	\| UT-OCR-002 \| OCR with handwriting \| Mixed typed/handwritten \| Both text types detected \| High \|
	\| UT-OCR-003 \| Extract word bboxes \| Document image \| List of word-level bboxes \| Critical \|
	\| UT-OCR-004 \| Calculate confidence \| OCR results \| Confidence score per word \| High \|
	\| UT-OCR-005 \| Handle low quality \| Blurry/noisy image \| Reasonable accuracy (>70%) \| Medium \|
	\| UT-OCR-006 \| Handle rotated text \| 90° rotated document \| Correct orientation detection \| Low \|
	\| UT-OCR-007 \| Multi-language support \| Document with German text \| lang="de" parameter works \| Medium \|
	\| UT-OCR-008 \| Handle empty image \| Blank white image \| Empty results list \| Low \|
	\| UT-OCR-009 \| DPI configuration \| Various DPI settings \| Consistent accuracy \| Medium \|
	\| UT-OCR-010 \| Return image dimensions \| Any image \| width, height in pixels \| High \|

	#### A.1.10 Bbox Normalization (Stage 16)

	Module: `api/utils.py::normalize_bboxes()`

	\| Test Case ID \| Test Name \| Input \| Expected Output \| Priority \|
	\|--------------\|-----------\|-------\|-----------------\|----------\|
	\| UT-NORM-001 \| Normalize to [0,1] \| Pixel bboxes, image dims \| Normalized coordinates \| Critical \|
	\| UT-NORM-002 \| Handle out-of-bounds \| x1 > image_width \| Clipped to [0, 1] \| High \|
	\| UT-NORM-003 \| Preserve text data \| Bboxes with text field \| Text preserved in output \| High \|
	\| UT-NORM-004 \| Create segment bboxes \| Word-level bboxes \| Aggregated segment bboxes \| Medium \|
	\| UT-NORM-005 \| Handle zero dimensions \| Image with width=0 \| Raise validation error \| Low \|
	\| UT-NORM-006 \| Round to precision \| Float coordinates \| 6 decimal places \| Low \|
	\| UT-NORM-007 \| Maintain bbox order \| Ordered input list \| Same order in output \| Medium \|
	\| UT-NORM-008 \| Handle negative coords \| bbox with x0=-5 \| Clipped to 0 \| Medium \|
	\| UT-NORM-009 \| Validate bbox format \| Various input formats \| Consistent output schema \| High \|
	\| UT-NORM-010 \| Handle empty list \| No bboxes \| Return empty list \| Low \|

	#### A.1.11 Dataset Export (Stage 19)

	Module: `api/utils.py::export_to_msgpack()`

	\| Test Case ID \| Test Name \| Input \| Expected Output \| Priority \|
	\|--------------\|-----------\|-------\|-----------------\|----------\|
	\| UT-EXPORT-001 \| Create msgpack file \| Complete document data \| Valid .msgpack file \| Critical \|
	\| UT-EXPORT-002 \| Encode image bytes \| PNG image \| Binary image in msgpack \| High \|
	\| UT-EXPORT-003 \| Store normalized bboxes \| Normalized coordinates \| Bboxes in [0,1] range \| High \|
	\| UT-EXPORT-004 \| Store ground truth \| GT JSON \| GT dict in msgpack \| High \|
	\| UT-EXPORT-005 \| Store metadata \| Document metadata \| Metadata dict in msgpack \| Medium \|
	\| UT-EXPORT-006 \| Validate msgpack format \| Generated file \| Readable by msgpack.load() \| Critical \|
	\| UT-EXPORT-007 \| Handle large files \| 10MB+ image \| Compression applied \| Low \|
	\| UT-EXPORT-008 \| Store words list \| OCR words \| Ordered word list \| High \|
	\| UT-EXPORT-009 \| Handle missing fields \| Partial data \| Fill with null/defaults \| Medium \|
	\| UT-EXPORT-010 \| Return file path \| Export operation \| Absolute path to .msgpack \| Medium \|

	#### A.1.12 Validation Functions

	Module: `api/utils.py::validate_*()`

	\| Test Case ID \| Test Name \| Input \| Expected Output \| Priority \|
	\|--------------\|-----------\|-------\|-----------------\|----------\|
	\| UT-VAL-001 \| Validate HTML structure \| Valid HTML5 \| (True, None) \| High \|
	\| UT-VAL-002 \| Detect missing DOCTYPE \| HTML without DOCTYPE \| (False, "Missing DOCTYPE") \| Medium \|
	\| UT-VAL-003 \| Detect missing CSS \| HTML without <style> \| (False, "Missing CSS") \| High \|
	\| UT-VAL-004 \| Validate PDF file \| Valid PDF \| (True, None) \| High \|
	\| UT-VAL-005 \| Detect corrupt PDF \| Truncated PDF file \| (False, "Corrupt PDF") \| High \|
	\| UT-VAL-006 \| Validate bbox count \| 100 bboxes, min=50 \| (True, None) \| Medium \|
	\| UT-VAL-007 \| Detect insufficient bboxes \| 10 bboxes, min=50 \| (False, "Insufficient bboxes") \| Medium \|
	\| UT-VAL-008 \| Validate bbox coordinates \| Valid bboxes \| (True, None) \| High \|
	\| UT-VAL-009 \| Detect invalid coordinates \| x0 > x1 \| (False, "Invalid bbox") \| High \|
	\| UT-VAL-010 \| Validate page count \| Multi-page PDF \| (False, "Expected 1 page") \| Medium \|

	Total Unit Tests: 120+ test cases

	---

	### A.2 Integration Testing

	Integration tests verify interactions between multiple components. Target: Complete workflow coverage.

	#### A.2.1 Pipeline Stage Integration

	Purpose: Verify data flow between consecutive pipeline stages

	\| Test Case ID \| Test Name \| Components \| Test Scenario \| Priority \|
	\|--------------\|-----------\|------------\|---------------\|----------\|
	\| IT-PIPE-001 \| Stages 01-03 integration \| Seed download → LLM → HTML extraction \| Download seeds, call LLM, extract HTML successfully \| Critical \|
	\| IT-PIPE-002 \| Stages 03-05 integration \| HTML extraction → PDF render → Bbox extraction \| Clean HTML renders to PDF, bboxes extracted \| Critical \|
	\| IT-PIPE-003 \| Stages 07-09 integration \| HW extraction → Service call \| HW regions trigger service batch request \| Critical \|
	\| IT-PIPE-004 \| Stages 09-12 integration \| HW generation → Insertion \| Generated images inserted at correct positions \| Critical \|
	\| IT-PIPE-005 \| Stages 14-15 integration \| Image render → OCR \| Final image passed to OCR successfully \| High \|
	\| IT-PIPE-006 \| Stages 15-16 integration \| OCR → Normalization \| OCR bboxes normalized with correct dimensions \| High \|
	\| IT-PIPE-007 \| Stages 07-13 complete \| Full Stage 3 \| Handwriting + visual elements end-to-end \| Critical \|
	\| IT-PIPE-008 \| Stages 14-19 complete \| Full Stages 4-5 \| OCR → export complete workflow \| High \|
	\| IT-PIPE-009 \| Stages 01-19 minimal \| End-to-end minimal \| No handwriting/VE, basic generation \| Critical \|
	\| IT-PIPE-010 \| Stages 01-19 full \| End-to-end full features \| All features enabled, complete dataset \| Critical \|

	#### A.2.2 External Service Integration

	Purpose: Verify interactions with external APIs and services

	\| Test Case ID \| Test Name \| Services \| Test Scenario \| Priority \|
	\|--------------\|-----------\|----------\|---------------\|----------\|
	\| IT-EXT-001 \| Claude API integration \| Claude Messages API \| Send prompt, receive valid response \| Critical \|
	\| IT-EXT-002 \| Claude error handling \| Claude API \| Handle rate limits (429) gracefully \| High \|
	\| IT-EXT-003 \| Claude retry logic \| Claude API \| Automatic retry on transient errors \| High \|
	\| IT-EXT-004 \| RunPod sync integration \| RunPod /runsync \| Send batch, receive images \| Critical \|
	\| IT-EXT-005 \| RunPod async integration \| RunPod /run + status \| Queue job, poll until completion \| High \|
	\| IT-EXT-006 \| RunPod auth \| RunPod API \| Bearer token authentication works \| Medium \|
	\| IT-EXT-007 \| Supabase storage \| Supabase storage API \| Upload/download seed images \| Medium \|
	\| IT-EXT-008 \| Supabase database \| Supabase DB \| Store generation metadata \| Medium \|
	\| IT-EXT-009 \| Redis Queue \| RQ worker \| Enqueue async job, process in background \| High \|
	\| IT-EXT-010 \| Google Drive \| Drive API (optional) \| Export to Google Drive if configured \| Low \|

	#### A.2.3 Database Operations

	Purpose: Verify database interactions (Supabase)

	\| Test Case ID \| Test Name \| Operations \| Test Scenario \| Priority \|
	\|--------------\|-----------\|------------\|---------------\|----------\|
	\| IT-DB-001 \| Insert generation record \| INSERT \| New generation logged in DB \| High \|
	\| IT-DB-002 \| Update generation status \| UPDATE \| Status changes reflected \| High \|
	\| IT-DB-003 \| Query by task ID \| SELECT \| Retrieve generation by ID \| High \|
	\| IT-DB-004 \| Store metadata \| INSERT \| Complete metadata stored \| Medium \|
	\| IT-DB-005 \| Handle connection errors \| Network failure \| Retry or graceful degradation \| High \|
	\| IT-DB-006 \| Transaction rollback \| Error mid-transaction \| Data consistency maintained \| Medium \|
	\| IT-DB-007 \| Concurrent updates \| Multiple workers \| No race conditions \| Medium \|
	\| IT-DB-008 \| Pagination \| Large result sets \| Efficient pagination \| Low \|
	\| IT-DB-009 \| Search functionality \| Full-text search \| Search by doc_type, language \| Low \|
	\| IT-DB-010 \| Data retention \| Cleanup old data \| Archive/delete after N days \| Low \|

	#### A.2.4 API Endpoint Integration

	Purpose: Test complete request/response cycles through endpoints

	\| Test Case ID \| Test Name \| Endpoint \| Test Scenario \| Priority \|
	\|--------------\|-----------\|----------\|---------------\|----------\|
	\| IT-API-001 \| GET /health \| Health check \| Returns 200 with system status \| Critical \|
	\| IT-API-002 \| POST /generate \| Legacy endpoint \| Returns JSON with complete data \| High \|
	\| IT-API-003 \| POST /generate/pdf \| Sync PDF endpoint \| Returns ZIP file download \| Critical \|
	\| IT-API-004 \| POST /generate/async \| Async endpoint \| Returns task ID \| Critical \|
	\| IT-API-005 \| GET /generate/async/status/{id} \| Status check \| Returns current job status \| Critical \|
	\| IT-API-006 \| GET /generate/async/result/{id} \| Result download \| Returns ZIP when complete \| High \|
	\| IT-API-007 \| Request validation \| All endpoints \| Invalid params rejected with 400 \| High \|
	\| IT-API-008 \| Authentication \| Protected endpoints \| Requires valid API key \| High \|
	\| IT-API-009 \| Rate limiting \| All endpoints \| Enforces rate limits \| Medium \|
	\| IT-API-010 \| CORS headers \| All endpoints \| Correct CORS configuration \| Medium \|

	#### A.2.5 Background Worker Integration

	Purpose: Test async job processing via Redis Queue

	\| Test Case ID \| Test Name \| Components \| Test Scenario \| Priority \|
	\|--------------\|-----------\|------------\|---------------\|----------\|
	\| IT-WORKER-001 \| Job enqueue \| API → RQ \| Job added to queue successfully \| Critical \|
	\| IT-WORKER-002 \| Job processing \| Worker → Pipeline \| Worker picks up and processes job \| Critical \|
	\| IT-WORKER-003 \| Job status updates \| Worker → DB \| Status updated throughout processing \| High \|
	\| IT-WORKER-004 \| Job failure handling \| Worker error \| Failed job logged, error reported \| High \|
	\| IT-WORKER-005 \| Job retry \| Transient failure \| Failed job retried up to max attempts \| High \|
	\| IT-WORKER-006 \| Job timeout \| Long-running job \| Timeout enforced, job killed \| Medium \|
	\| IT-WORKER-007 \| Result storage \| Worker → Storage \| Results saved to correct location \| High \|
	\| IT-WORKER-008 \| Queue priority \| Multiple jobs \| High priority jobs processed first \| Low \|
	\| IT-WORKER-009 \| Worker scaling \| Multiple workers \| Jobs distributed across workers \| Medium \|
	\| IT-WORKER-010 \| Worker health \| Worker crash \| Replaced automatically, jobs reassigned \| High \|

	Total Integration Tests: 50+ test cases

	---

	### A.3 System Testing

	System tests verify end-to-end workflows from user perspective. Target: All user journeys covered.

	#### A.3.1 Complete Generation Workflows

	\| Test Case ID \| Test Name \| Workflow \| Test Scenario \| Expected Outcome \| Priority \|
	\|--------------\|-----------\|----------\|---------------\|------------------\|----------\|
	\| ST-GEN-001 \| Basic document generation \| Minimal config \| Generate 1 English invoice, no handwriting/VE \| PDF + metadata returned in <60s \| Critical \|
	\| ST-GEN-002 \| Handwriting generation \| Enable handwriting \| Generate document with handwriting \| Handwriting visible in PDF \| Critical \|
	\| ST-GEN-003 \| Visual elements \| Enable VE \| Generate document with logo + barcode \| Elements visible in PDF \| High \|
	\| ST-GEN-004 \| Full feature set \| All features enabled \| Generate with HW + VE + OCR + analysis \| Complete dataset ZIP \| Critical \|
	\| ST-GEN-005 \| Multi-document batch \| num_solutions=5 \| Generate 5 documents from 3 seeds \| 5 complete documents \| High \|
	\| ST-GEN-006 \| Reproducible generation \| Same seed value \| Generate twice with seed=42 \| Identical outputs \| High \|
	\| ST-GEN-007 \| Multi-language \| language="german" \| Generate German document \| Correct language output \| Medium \|
	\| ST-GEN-008 \| Various doc types \| doc_type variations \| Test invoice, receipt, form, letter \| All types work \| High \|
	\| ST-GEN-009 \| Different GT formats \| gt_type="kie" / "qa" \| Test both GT formats \| Correct GT structure \| High \|
	\| ST-GEN-010 \| Custom seed images \| User-provided URLs \| Generate from user's images \| Images influence output \| High \|

	#### A.3.2 Error Handling Workflows

	\| Test Case ID \| Test Name \| Error Condition \| Test Scenario \| Expected Outcome \| Priority \|
	\|--------------\|-----------\|-----------------\|---------------\|------------------\|----------\|
	\| ST-ERR-001 \| Invalid seed URL \| 404 not found \| Submit invalid image URL \| HTTP 400 with clear error message \| High \|
	\| ST-ERR-002 \| LLM API failure \| Claude API down \| Submit request during outage \| HTTP 503 with retry-after \| Critical \|
	\| ST-ERR-003 \| Handwriting service failure \| RunPod timeout \| Enable handwriting, service fails \| HTTP 500, generation stopped \| Critical \|
	\| ST-ERR-004 \| Invalid parameters \| Missing required field \| Omit doc_type parameter \| HTTP 422 with validation details \| High \|
	\| ST-ERR-005 \| Rate limit exceeded \| Too many requests \| Submit 100 concurrent requests \| HTTP 429 with retry info \| High \|
	\| ST-ERR-006 \| Payload too large \| Huge request \| Submit 50 seed image URLs \| HTTP 413 payload too large \| Medium \|
	\| ST-ERR-007 \| Malformed JSON \| Invalid JSON \| Submit broken JSON request \| HTTP 400 with parse error \| High \|
	\| ST-ERR-008 \| Authentication failure \| Missing/invalid API key \| Request without auth \| HTTP 401 unauthorized \| High \|
	\| ST-ERR-009 \| Database connection loss \| DB unavailable \| Submit during DB outage \| Graceful degradation or 503 \| Medium \|
	\| ST-ERR-010 \| Disk space exhausted \| No storage space \| Generate large batch \| HTTP 507 insufficient storage \| Low \|

	#### A.3.3 Async Processing Workflows

	\| Test Case ID \| Test Name \| Workflow \| Test Scenario \| Expected Outcome \| Priority \|
	\|--------------\|-----------\|----------\|---------------\|------------------\|----------\|
	\| ST-ASYNC-001 \| Submit async job \| POST /generate/async \| Submit batch job \| Receive task ID immediately \| Critical \|
	\| ST-ASYNC-002 \| Check pending status \| GET status before completion \| Poll status endpoint \| Returns "pending" or "processing" \| High \|
	\| ST-ASYNC-003 \| Check completed status \| GET status after completion \| Poll status after 5 minutes \| Returns "completed" \| Critical \|
	\| ST-ASYNC-004 \| Download results \| GET result/{id} \| Download after completion \| Returns ZIP file \| Critical \|
	\| ST-ASYNC-005 \| Check failed status \| Job fails during processing \| Check status of failed job \| Returns "failed" with error details \| High \|
	\| ST-ASYNC-006 \| Multiple concurrent jobs \| Submit 10 jobs \| 10 async submissions \| All jobs process independently \| High \|
	\| ST-ASYNC-007 \| Job cancellation \| Cancel in-progress job \| Submit, then cancel \| Job stops, partial results cleaned \| Medium \|
	\| ST-ASYNC-008 \| Result expiration \| Check old results \| Access 7-day old result \| HTTP 410 gone (expired) \| Low \|
	\| ST-ASYNC-009 \| Progress updates \| Monitor long job \| Poll during processing \| Progress % increases \| Medium \|
	\| ST-ASYNC-010 \| Worker restart recovery \| Worker crashes mid-job \| Kill worker process \| Job reassigned, completes \| High \|

	#### A.3.4 Data Quality Workflows

	\| Test Case ID \| Test Name \| Quality Check \| Test Scenario \| Expected Outcome \| Priority \|
	\|--------------\|-----------\|---------------\|---------------\|------------------\|----------\|
	\| ST-QUAL-001 \| OCR accuracy \| Compare OCR to ground truth \| Generate doc, compare OCR text to GT \| >90% accuracy \| High \|
	\| ST-QUAL-002 \| Bbox alignment \| Visual inspection \| Generate doc with debug viz \| Bboxes align with text \| High \|
	\| ST-QUAL-003 \| Handwriting quality \| Visual realism \| Generate handwritten doc \| Handwriting looks realistic \| Medium \|
	\| ST-QUAL-004 \| Visual element placement \| Correct positioning \| Generate with logo + barcode \| Elements at correct positions \| High \|
	\| ST-QUAL-005 \| GT completeness \| All GT fields present \| Generate KIE document \| All expected GT fields extracted \| High \|
	\| ST-QUAL-006 \| Dataset format validity \| msgpack validation \| Export dataset \| PyTorch can load msgpack \| High \|
	\| ST-QUAL-007 \| Image resolution \| Check output image \| Render final image \| Minimum 220 DPI quality \| Medium \|
	\| ST-QUAL-008 \| PDF compliance \| PDF/A validation \| Generate PDF \| Valid PDF/A format \| Low \|
	\| ST-QUAL-009 \| Metadata accuracy \| Check metadata fields \| Generate document \| Metadata matches actual data \| High \|
	\| ST-QUAL-010 \| Reproducibility \| Same input → same output \| Generate 3 times with seed \| All outputs identical \| High \|

	#### A.3.5 Performance Workflows

	\| Test Case ID \| Test Name \| Performance Metric \| Test Scenario \| Target Performance \| Priority \|
	\|--------------\|-----------\|-------------------\|---------------\|---------------------\|----------\|
	\| ST-PERF-001 \| Basic generation time \| Time to completion \| Generate minimal document \| <60 seconds \| High \|
	\| ST-PERF-002 \| Handwriting generation time \| Time with HW \| Generate with 20 HW words \| <300 seconds \| High \|
	\| ST-PERF-003 \| Batch generation time \| Multiple documents \| Generate 10 documents \| <15 minutes \| Medium \|
	\| ST-PERF-004 \| API response time \| Endpoint latency \| Submit request \| <500ms to return task ID \| High \|
	\| ST-PERF-005 \| Status check latency \| Status endpoint \| Check job status \| <100ms response time \| Medium \|
	\| ST-PERF-006 \| Concurrent requests \| Load handling \| 50 concurrent requests \| All complete successfully \| High \|
	\| ST-PERF-007 \| Large payload \| Big request \| 8 seed images, 10 solutions \| Processes without timeout \| Medium \|
	\| ST-PERF-008 \| Memory usage \| Resource consumption \| Generate 100 documents \| <8GB RAM per worker \| Medium \|
	\| ST-PERF-009 \| Disk I/O \| Storage performance \| Rapid sequential generations \| No I/O bottleneck \| Low \|
	\| ST-PERF-010 \| Network bandwidth \| Data transfer \| Download large result ZIP \| Download completes in <60s \| Low \|

	Total System Tests: 50+ test cases

	---

	## Non-Functional Testing

	### B.1 Performance Testing

	Purpose: Verify system performance under various load conditions.

	#### B.1.1 Load Testing

	Tool: Apache JMeter / Locust

	\| Test Case ID \| Test Name \| Load Profile \| Metrics \| Acceptance Criteria \| Priority \|
	\|--------------\|-----------\|--------------\|---------\|---------------------\|----------\|
	\| NFT-LOAD-001 \| Normal load \| 10 concurrent users, 1 hour \| Throughput, response time \| Avg response <5s, 0 errors \| Critical \|
	\| NFT-LOAD-002 \| Peak load \| 50 concurrent users, 30 min \| Throughput, error rate \| <5% error rate, response <15s \| Critical \|
	\| NFT-LOAD-003 \| Sustained load \| 25 concurrent users, 4 hours \| CPU, memory, throughput \| Stable resource usage, no leaks \| High \|
	\| NFT-LOAD-004 \| Ramp-up load \| 1→100 users over 30 min \| System behavior \| Graceful scaling or degradation \| High \|
	\| NFT-LOAD-005 \| Spike load \| Sudden 0→100 users \| Response time spike \| Recovers within 2 minutes \| Medium \|

	Test Script Example (Locust):
	```python
	# locustfile.py
	from locust import HttpUser, task, between

	class DocGenieUser(HttpUser):
	wait_time = between(5, 15)

	@task(3)
	def generate_basic_document(self):
	payload = {
	"seed_images": ["https://example.com/seed1.jpg"],
	"prompt_params": {
	"language": "english",
	"doc_type": "invoice",
	"num_solutions": 1,
	"enable_handwriting": False,
	"enable_visual_elements": False
	}
	}
	self.client.post("/generate", json=payload, timeout=120)

	@task(1)
	def check_async_status(self):
	# Assume task_id from previous task
	self.client.get(f"/generate/async/status/{self.task_id}")
	```

	#### B.1.2 Stress Testing

	Purpose: Determine system breaking point

	\| Test Case ID \| Test Name \| Stress Condition \| Metrics \| Acceptance Criteria \| Priority \|
	\|--------------\|-----------\|------------------\|---------\|---------------------\|----------\|
	\| NFT-STRESS-001 \| User overload \| 200+ concurrent users \| Max capacity \| Identifies max users before failure \| High \|
	\| NFT-STRESS-002 \| Memory stress \| Generate 1000 docs without cleanup \| Memory usage \| OOM protection, graceful failure \| High \|
	\| NFT-STRESS-003 \| CPU stress \| Complex documents, no throttling \| CPU utilization \| System remains responsive \| Medium \|
	\| NFT-STRESS-004 \| Disk stress \| Fill 95% of disk space \| I/O performance \| Handles low disk gracefully \| Medium \|
	\| NFT-STRESS-005 \| Network stress \| Simulate slow network \| Timeout handling \| Appropriate timeouts, retries \| Medium \|

	#### B.1.3 Endurance Testing (Soak Testing)

	Purpose: Detect memory leaks and performance degradation over time

	\| Test Case ID \| Test Name \| Duration \| Load \| Metrics \| Acceptance Criteria \| Priority \|
	\|--------------\|-----------\|----------\|------\|---------\|---------------------\|----------\|
	\| NFT-ENDUR-001 \| 24-hour test \| 24 hours \| 10 concurrent users \| Memory, CPU over time \| No memory leaks, stable performance \| High \|
	\| NFT-ENDUR-002 \| 7-day test \| 7 days \| 5 concurrent users \| All resources \| System stable, no degradation \| Medium \|
	\| NFT-ENDUR-003 \| Weekend load \| 48 hours \| Variable load \| Error rate \| <1% errors throughout \| Medium \|

	#### B.1.4 Scalability Testing

	Purpose: Verify horizontal and vertical scaling

	\| Test Case ID \| Test Name \| Scaling Type \| Test Scenario \| Acceptance Criteria \| Priority \|
	\|--------------\|-----------\|--------------\|---------------\|---------------------\|----------\|
	\| NFT-SCALE-001 \| Horizontal scaling \| Add workers \| 1→5 workers, measure throughput \| Linear throughput increase \| High \|
	\| NFT-SCALE-002 \| Vertical scaling \| Increase CPU/RAM \| 2→8 cores, 4→16GB RAM \| Performance improvement \| Medium \|
	\| NFT-SCALE-003 \| Auto-scaling \| Dynamic load \| Trigger auto-scale rules \| Scales up/down automatically \| Medium \|
	\| NFT-SCALE-004 \| Database scaling \| Database load \| High concurrent DB ops \| No DB bottleneck \| High \|
	\| NFT-SCALE-005 \| Storage scaling \| Large datasets \| Generate 10,000 documents \| Storage handles volume \| Low \|

	#### B.1.5 Benchmark Testing

	Purpose: Establish performance baselines

	\| Test Case ID \| Component \| Benchmark \| Target \| Priority \|
	\|--------------\|-----------\|-----------\|--------\|----------\|
	\| NFT-BENCH-001 \| Seed download \| 1 image (1MB) \| <2 seconds \| High \|
	\| NFT-BENCH-002 \| LLM call \| 1 prompt (standard) \| <30 seconds \| Critical \|
	\| NFT-BENCH-003 \| PDF rendering \| 1 A4 page \| <3 seconds \| High \|
	\| NFT-BENCH-004 \| Bbox extraction \| 500 words \| <2 seconds \| Medium \|
	\| NFT-BENCH-005 \| Handwriting service \| 10 words batch \| <200 seconds \| Critical \|
	\| NFT-BENCH-006 \| Visual element generation \| 5 elements \| <5 seconds \| Medium \|
	\| NFT-BENCH-007 \| OCR processing \| 1 A4 page (300 DPI) \| <5 seconds \| High \|
	\| NFT-BENCH-008 \| Msgpack export \| 1 document \| <2 seconds \| Medium \|
	\| NFT-BENCH-009 \| Complete pipeline (minimal) \| End-to-end \| <60 seconds \| Critical \|
	\| NFT-BENCH-010 \| Complete pipeline (full) \| End-to-end with HW \| <300 seconds \| Critical \|

	---

	### B.2 Security Testing

	Purpose: Identify vulnerabilities and ensure data protection.

	#### B.2.1 Authentication & Authorization Testing

	\| Test Case ID \| Test Name \| Security Control \| Test Scenario \| Expected Outcome \| Priority \|
	\|--------------\|-----------\|------------------\|---------------\|------------------\|----------\|
	\| NFT-SEC-001 \| API key validation \| Authentication \| Request without API key \| HTTP 401 Unauthorized \| Critical \|
	\| NFT-SEC-002 \| Invalid API key \| Authentication \| Request with wrong key \| HTTP 401 Unauthorized \| Critical \|
	\| NFT-SEC-003 \| Expired API key \| Token expiration \| Request with expired key \| HTTP 401 with renewal info \| High \|
	\| NFT-SEC-004 \| API key rotation \| Key management \| Rotate keys, test old key \| Old key rejected \| Medium \|
	\| NFT-SEC-005 \| Role-based access \| Authorization \| User tries admin endpoint \| HTTP 403 Forbidden \| High \|
	\| NFT-SEC-006 \| Resource ownership \| Authorization \| User accesses other's job \| HTTP 403 Forbidden \| High \|
	\| NFT-SEC-007 \| JWT validation \| Token security \| Tampered JWT token \| Signature validation fails \| High \|
	\| NFT-SEC-008 \| Session hijacking \| Session security \| Stolen session token \| Token invalidated after detection \| Medium \|
	\| NFT-SEC-009 \| Brute force protection \| Rate limiting \| 100 failed auth attempts \| Account locked, IP blocked \| High \|
	\| NFT-SEC-010 \| Multi-factor auth \| MFA \| Admin login without MFA \| MFA required \| Low \|

	#### B.2.2 Input Validation & Injection Testing

	\| Test Case ID \| Test Name \| Vulnerability \| Test Scenario \| Expected Outcome \| Priority \|
	\|--------------\|-----------\|---------------\|---------------\|------------------\|----------\|
	\| NFT-SEC-011 \| SQL injection \| Injection \| Inject SQL in parameters \| Parameterized queries prevent injection \| Critical \|
	\| NFT-SEC-012 \| XSS attack \| Cross-site scripting \| Inject `<script>` in doc_type \| Input sanitized, script not executed \| High \|
	\| NFT-SEC-013 \| Command injection \| OS command injection \| Inject shell commands \| Commands not executed \| Critical \|
	\| NFT-SEC-014 \| Path traversal \| Directory traversal \| `../../etc/passwd` in filename \| Access denied \| Critical \|
	\| NFT-SEC-015 \| SSRF attack \| Server-side request forgery \| seed_image URL to internal IP \| Internal IPs blocked \| High \|
	\| NFT-SEC-016 \| XXE attack \| XML external entity \| Upload XML with external entity \| External entities disabled \| Medium \|
	\| NFT-SEC-017 \| LLM prompt injection \| Prompt manipulation \| Inject ignore instructions \| Prompt sandboxing prevents escape \| High \|
	\| NFT-SEC-018 \| Buffer overflow \| Memory safety \| Send 10MB+ parameter \| Request rejected, no crash \| Medium \|
	\| NFT-SEC-019 \| Unicode attack \| Unicode bypass \| Unicode normalization tricks \| Normalized before processing \| Low \|
	\| NFT-SEC-020 \| Regex DoS \| ReDoS \| Complex regex in input \| Timeout protection active \| Medium \|

	#### B.2.3 Data Protection Testing

	\| Test Case ID \| Test Name \| Protection Mechanism \| Test Scenario \| Expected Outcome \| Priority \|
	\|--------------\|-----------\|---------------------\|---------------\|------------------\|----------\|
	\| NFT-SEC-021 \| Data encryption at rest \| Storage encryption \| Check stored files \| Files encrypted on disk \| High \|
	\| NFT-SEC-022 \| Data encryption in transit \| TLS/HTTPS \| Inspect network traffic \| All traffic over HTTPS \| Critical \|
	\| NFT-SEC-023 \| API key exposure \| Secret management \| Check logs, errors \| API keys never logged \| Critical \|
	\| NFT-SEC-024 \| PII handling \| Data privacy \| Generate docs with PII \| PII not stored beyond retention \| High \|
	\| NFT-SEC-025 \| Data sanitization \| Data cleanup \| Delete job after 7 days \| All data removed \| High \|
	\| NFT-SEC-026 \| Backup encryption \| Backup security \| Check backup files \| Backups encrypted \| Medium \|
	\| NFT-SEC-027 \| Secure headers \| HTTP headers \| Check response headers \| Security headers present \| High \|
	\| NFT-SEC-028 \| CORS policy \| Cross-origin security \| Request from unauthorized origin \| CORS policy blocks request \| High \|
	\| NFT-SEC-029 \| Cookie security \| Cookie flags \| Check cookie attributes \| HttpOnly, Secure, SameSite set \| Medium \|
	\| NFT-SEC-030 \| Sensitive data in URLs \| URL security \| Check for secrets in URLs \| No sensitive data in query params \| High \|

	#### B.2.4 Dependency & Supply Chain Security

	\| Test Case ID \| Test Name \| Security Aspect \| Test Method \| Expected Outcome \| Priority \|
	\|--------------\|-----------\|-----------------\|-------------\|------------------\|----------\|
	\| NFT-SEC-031 \| Vulnerable dependencies \| CVE scanning \| Run `pip-audit` \| No high/critical vulnerabilities \| High \|
	\| NFT-SEC-032 \| Outdated packages \| Package versions \| Check `requirements.txt` \| All packages recent (<6 months) \| Medium \|
	\| NFT-SEC-033 \| Malicious packages \| Supply chain \| Verify package checksums \| Checksums match official registry \| High \|
	\| NFT-SEC-034 \| License compliance \| Legal compliance \| Check package licenses \| All licenses compatible \| Low \|
	\| NFT-SEC-035 \| Container security \| Docker image \| Scan with Trivy \| No critical image vulnerabilities \| High \|

	Security Testing Tools:
	- OWASP ZAP: Automated security scanning
	- Burp Suite: Manual penetration testing
	- pip-audit: Python dependency vulnerability scanning
	- Trivy: Container image scanning
	- Bandit: Python code security linter

	---

	### B.3 Reliability Testing

	Purpose: Verify system stability and fault tolerance.

	#### B.3.1 Fault Tolerance Testing

	\| Test Case ID \| Test Name \| Fault Condition \| Test Scenario \| Expected Outcome \| Priority \|
	\|--------------\|-----------\|-----------------\|---------------\|------------------\|----------\|
	\| NFT-REL-001 \| Database failover \| Primary DB failure \| Kill primary DB instance \| Failover to standby, no downtime \| Critical \|
	\| NFT-REL-002 \| Worker crash recovery \| Worker process crash \| Kill worker mid-job \| Job reassigned, completes \| High \|
	\| NFT-REL-003 \| Network partition \| Network split \| Simulate network partition \| System detects, retries \| High \|
	\| NFT-REL-004 \| External API failure \| Claude API down \| LLM service unavailable \| Graceful error, retry queue \| Critical \|
	\| NFT-REL-005 \| Handwriting service failure \| RunPod timeout \| Service exceeds timeout \| Exception raised, clear error \| Critical \|
	\| NFT-REL-006 \| Disk full \| No storage space \| Fill disk to 100% \| Rejects new jobs, alerts sent \| High \|
	\| NFT-REL-007 \| Redis failure \| Queue unavailable \| Redis server down \| Async jobs fail with clear error \| High \|
	\| NFT-REL-008 \| Load balancer failure \| LB goes down \| Kill load balancer \| Requests reach servers via backup \| Medium \|
	\| NFT-REL-009 \| DNS resolution failure \| DNS timeout \| DNS server unreachable \| Falls back to IP or cached DNS \| Low \|
	\| NFT-REL-010 \| Partial service degradation \| Some features down \| VE prefabs missing \| Skips VE, completes other features \| Medium \|

	#### B.3.2 Data Integrity Testing

	\| Test Case ID \| Test Name \| Integrity Check \| Test Scenario \| Expected Outcome \| Priority \|
	\|--------------\|-----------\|-----------------\|---------------\|------------------\|----------\|
	\| NFT-REL-011 \| Transaction atomicity \| Database transactions \| Simulate crash mid-transaction \| Either all or no changes applied \| High \|
	\| NFT-REL-012 \| Data corruption detection \| Checksum validation \| Corrupt file on disk \| Corruption detected, file rejected \| High \|
	\| NFT-REL-013 \| Concurrent write safety \| Race conditions \| Multiple writes to same resource \| Last write wins or lock prevents \| High \|
	\| NFT-REL-014 \| Duplicate prevention \| Idempotency \| Submit same request twice \| Duplicate detected, not processed \| Medium \|
	\| NFT-REL-015 \| Backup restoration \| Backup recovery \| Restore from backup \| Data fully restored, consistent \| High \|

	#### B.3.3 Recovery Testing

	\| Test Case ID \| Test Name \| Recovery Scenario \| Test Procedure \| Expected Outcome \| Priority \|
	\|--------------\|-----------\|-------------------\|----------------\|------------------\|----------\|
	\| NFT-REL-016 \| Crash recovery \| Server crash \| Kill server, restart \| Server recovers, in-flight jobs resume \| Critical \|
	\| NFT-REL-017 \| Database restore \| DB corruption \| Restore from backup \| System operational with latest data \| High \|
	\| NFT-REL-018 \| Disaster recovery \| Complete site failure \| Failover to DR site \| Service restored within RTO (4 hours) \| Critical \|
	\| NFT-REL-019 \| Job queue recovery \| Redis crash \| Redis restart with persistence \| Queued jobs not lost \| High \|
	\| NFT-REL-020 \| Config recovery \| Bad config deployment \| Deploy bad config \| Rollback to previous config \| Medium \|

	---

	### B.4 Scalability Testing

	Purpose: Verify system can handle growth in load and data.

	#### B.4.1 Capacity Testing

	\| Test Case ID \| Test Name \| Capacity Metric \| Test Scenario \| Target Capacity \| Priority \|
	\|--------------\|-----------\|-----------------\|---------------\|-----------------\|----------\|
	\| NFT-SCAL-001 \| Max concurrent users \| User capacity \| Gradually increase users \| Support 100+ concurrent users \| High \|
	\| NFT-SCAL-002 \| Max documents per hour \| Throughput \| Generate continuously \| Process 500+ docs/hour \| High \|
	\| NFT-SCAL-003 \| Max queue depth \| Job queue \| Enqueue 10,000 jobs \| Queue handles all jobs \| Medium \|
	\| NFT-SCAL-004 \| Max dataset size \| Storage \| Generate large dataset \| Handle 1TB+ datasets \| Low \|
	\| NFT-SCAL-005 \| Max file size \| Upload limit \| Upload large seed image \| Accept up to 10MB images \| Medium \|

	#### B.4.2 Elasticity Testing

	\| Test Case ID \| Test Name \| Scaling Behavior \| Test Scenario \| Expected Outcome \| Priority \|
	\|--------------\|-----------\|------------------\|---------------\|------------------\|----------\|
	\| NFT-SCAL-006 \| Scale-up \| Add resources \| Increase from 2→10 workers \| Linear throughput increase \| High \|
	\| NFT-SCAL-007 \| Scale-down \| Remove resources \| Decrease from 10→2 workers \| Graceful job completion \| High \|
	\| NFT-SCAL-008 \| Auto-scale up \| Load increase \| Load triggers scale-up \| New instances launched \| Medium \|
	\| NFT-SCAL-009 \| Auto-scale down \| Load decrease \| Low load triggers scale-down \| Excess instances terminated \| Medium \|
	\| NFT-SCAL-010 \| Burst scaling \| Sudden spike \| 0→100 requests instantly \| Scale-up handles burst \| High \|

	---

	### B.5 Usability Testing

	Purpose: Verify API ease of use and developer experience.

	#### B.5.1 API Documentation Testing

	\| Test Case ID \| Test Name \| Documentation Aspect \| Test Scenario \| Expected Outcome \| Priority \|
	\|--------------\|-----------\|---------------------\|---------------\|------------------\|----------\|
	\| NFT-USAB-001 \| API docs completeness \| All endpoints documented \| Review /docs \| All endpoints, params documented \| High \|
	\| NFT-USAB-002 \| Example accuracy \| Code examples \| Test all code examples \| Examples work without modification \| High \|
	\| NFT-USAB-003 \| Error messages clarity \| Error documentation \| Check error responses \| Errors have clear messages, codes \| High \|
	\| NFT-USAB-004 \| OpenAPI spec validity \| Swagger/OpenAPI \| Validate spec \| Spec passes OpenAPI validation \| Medium \|
	\| NFT-USAB-005 \| Interactive docs \| Try-it-out feature \| Use /docs to test \| Can test endpoints in browser \| Medium \|

	#### B.5.2 Developer Experience Testing

	\| Test Case ID \| Test Name \| DX Aspect \| Test Scenario \| Expected Outcome \| Priority \|
	\|--------------\|-----------\|-----------\|---------------\|------------------\|----------\|
	\| NFT-USAB-006 \| SDK availability \| Client libraries \| Check for Python/JS SDKs \| SDKs available, documented \| Low \|
	\| NFT-USAB-007 \| Quick start guide \| Getting started \| Follow quick start \| Working request in <10 minutes \| High \|
	\| NFT-USAB-008 \| API versioning \| Version management \| Check version headers \| Versions clearly indicated \| Medium \|
	\| NFT-USAB-009 \| Changelog maintenance \| Release notes \| Review changelog \| All changes documented \| Low \|
	\| NFT-USAB-010 \| Deprecation notices \| Breaking changes \| Check deprecated features \| Clear deprecation warnings \| Medium \|

	---

	### B.6 Compatibility Testing

	Purpose: Verify system works across different environments.

	#### B.6.1 Browser Compatibility (for API docs)

	\| Test Case ID \| Browser \| Version \| Expected Outcome \|
	\|--------------\|---------\|---------\|------------------\|
	\| NFT-COMPAT-001 \| Chrome \| Latest \| /docs fully functional \|
	\| NFT-COMPAT-002 \| Firefox \| Latest \| /docs fully functional \|
	\| NFT-COMPAT-003 \| Safari \| Latest \| /docs fully functional \|
	\| NFT-COMPAT-004 \| Edge \| Latest \| /docs fully functional \|

	#### B.6.2 Platform Compatibility

	\| Test Case ID \| Platform \| Test Scenario \| Expected Outcome \| Priority \|
	\|--------------\|----------\|---------------\|------------------\|----------\|
	\| NFT-COMPAT-005 \| Docker \| Deploy in container \| Runs without issues \| Critical \|
	\| NFT-COMPAT-006 \| Railway \| Deploy to Railway \| Successful deployment \| High \|
	\| NFT-COMPAT-007 \| AWS \| Deploy to ECS/Lambda \| Runs on AWS \| Medium \|
	\| NFT-COMPAT-008 \| GCP \| Deploy to Cloud Run \| Runs on GCP \| Low \|
	\| NFT-COMPAT-009 \| Azure \| Deploy to App Service \| Runs on Azure \| Low \|

	#### B.6.3 Python Version Compatibility

	\| Test Case ID \| Python Version \| Test Scenario \| Expected Outcome \| Priority \|
	\|--------------\|----------------\|---------------\|------------------\|----------\|
	\| NFT-COMPAT-010 \| Python 3.11 \| Run full test suite \| All tests pass \| Critical \|
	\| NFT-COMPAT-011 \| Python 3.10 \| Run full test suite \| All tests pass \| High \|
	\| NFT-COMPAT-012 \| Python 3.12 \| Run full test suite \| All tests pass \| Medium \|

	---

	### B.7 Maintainability Testing

	Purpose: Verify system is easy to maintain and debug.

	#### B.7.1 Logging & Monitoring

	\| Test Case ID \| Test Name \| Aspect \| Test Scenario \| Expected Outcome \| Priority \|
	\|--------------\|-----------\|--------\|---------------\|------------------\|----------\|
	\| NFT-MAINT-001 \| Log completeness \| Logging \| Check logs during generation \| All stages logged \| High \|
	\| NFT-MAINT-002 \| Log levels \| Log filtering \| Filter by ERROR, INFO, DEBUG \| Correct levels used \| Medium \|
	\| NFT-MAINT-003 \| Structured logging \| Log format \| Parse log entries \| JSON-formatted, parseable \| High \|
	\| NFT-MAINT-004 \| Error traceability \| Error tracking \| Trace error through logs \| Request ID tracks full flow \| High \|
	\| NFT-MAINT-005 \| Metrics collection \| Monitoring \| Check Prometheus metrics \| Key metrics exported \| High \|
	\| NFT-MAINT-006 \| Health checks \| Monitoring \| Call /health endpoint \| Returns detailed status \| Critical \|
	\| NFT-MAINT-007 \| Alert configuration \| Alerting \| Trigger alert condition \| Alert fired, notification sent \| Medium \|
	\| NFT-MAINT-008 \| Dashboard usability \| Visualization \| View Grafana dashboards \| Clear, actionable insights \| Medium \|

	#### B.7.2 Code Quality

	\| Test Case ID \| Test Name \| Quality Metric \| Tool \| Acceptance Criteria \| Priority \|
	\|--------------\|-----------\|----------------\|------\|---------------------\|----------\|
	\| NFT-MAINT-009 \| Code coverage \| Test coverage \| pytest-cov \| >80% coverage \| High \|
	\| NFT-MAINT-010 \| Code complexity \| Cyclomatic complexity \| radon \| CC <10 per function \| Medium \|
	\| NFT-MAINT-011 \| Code duplication \| DRY principle \| pylint \| <5% duplicated code \| Low \|
	\| NFT-MAINT-012 \| Code style \| PEP 8 compliance \| flake8 \| No style violations \| Medium \|
	\| NFT-MAINT-013 \| Type hints \| Type coverage \| mypy \| >90% type hints \| Medium \|
	\| NFT-MAINT-014 \| Security linting \| Vulnerability scan \| bandit \| No high-severity issues \| High \|

	---

	## Test Environment Setup

	### Test Environments

	\| Environment \| Purpose \| Configuration \| Access \|
	\|-------------\|---------\|---------------\|--------\|
	\| Local Dev \| Development testing \| Local Docker Compose \| Developers \|
	\| CI/CD \| Automated testing \| GitHub Actions runners \| Automated \|
	\| Staging \| Pre-production testing \| Mirrors production \| QA team \|
	\| Production \| Live system \| Full infrastructure \| Ops team \|

	### Test Data Management

	Seed Image Dataset:
	- Source: Curated test set of 50 diverse seed images
	- Location: `tests/fixtures/seed_images/`
	- Categories: Invoice samples, receipt samples, form samples, letter samples
	- Licensing: Public domain or test-licensed images

	Test Parameters:
	```yaml
	# tests/fixtures/test_params.yaml
	test_cases:
	minimal:
	language: "english"
	doc_type: "invoice"
	num_solutions: 1
	enable_handwriting: false
	enable_visual_elements: false

	full_features:
	language: "english"
	doc_type: "medical_form"
	num_solutions: 2
	enable_handwriting: true
	handwriting_ratio: 0.3
	enable_visual_elements: true
	visual_element_types: ["logo", "signature", "barcode"]
	enable_ocr: true
	enable_dataset_export: true
	```

	Mock Services:
	- Mock Claude API: Returns predefined HTML responses for testing
	- Mock RunPod API: Returns test handwriting images, simulates delays
	- Mock Supabase: In-memory database for testing

	---

	## Testing Tools & Frameworks

	### Test Frameworks

	\| Tool \| Purpose \| Usage \|
	\|------\|---------\|-------\|
	\| pytest \| Unit & integration testing \| `pytest tests/` \|
	\| pytest-asyncio \| Async test support \| Async function testing \|
	\| pytest-cov \| Code coverage \| `pytest --cov=api` \|
	\| httpx \| HTTP client testing \| API request mocking \|
	\| respx \| HTTP mock library \| Mock external APIs \|
	\| pytest-mock \| Mocking framework \| Mock functions, classes \|
	\| Faker \| Test data generation \| Generate realistic data \|

	### Load Testing Tools

	\| Tool \| Purpose \| Usage \|
	\|------\|---------\|-------\|
	\| Locust \| Load & stress testing \| `locust -f locustfile.py` \|
	\| Apache JMeter \| Performance testing \| GUI-based test scenarios \|
	\| k6 \| Cloud-native load testing \| Scripted load tests \|

	### Security Testing Tools

	\| Tool \| Purpose \| Usage \|
	\|------\|---------\|-------\|
	\| OWASP ZAP \| Security scanning \| Automated vulnerability scan \|
	\| Burp Suite \| Penetration testing \| Manual security testing \|
	\| pip-audit \| Dependency scanning \| `pip-audit -r requirements.txt` \|
	\| Bandit \| Code security linting \| `bandit -r api/` \|
	\| Trivy \| Container scanning \| `trivy image docgenie-api:latest` \|

	### Monitoring & Observability

	\| Tool \| Purpose \| Usage \|
	\|------\|---------\|-------\|
	\| Prometheus \| Metrics collection \| Scrape /metrics endpoint \|
	\| Grafana \| Metrics visualization \| Dashboard creation \|
	\| ELK Stack \| Log aggregation \| Centralized logging \|
	\| Sentry \| Error tracking \| Automatic error reporting \|

	---

	## Test Execution Plan

	### Phase 1: Unit Testing (Week 1-2)
	Objective: Achieve 80%+ code coverage

	Tasks:
	1. Write unit tests for all utility functions (`api/utils.py`)
	2. Test all pipeline stages individually (Stages 01-19)
	3. Mock external dependencies (Claude API, RunPod, Supabase)
	4. Achieve minimum 80% code coverage
	5. Set up CI/CD pipeline for automated testing

	Deliverables:
	- 120+ unit test cases passing
	- Coverage report >80%
	- CI/CD pipeline configured

	### Phase 2: Integration Testing (Week 3)
	Objective: Verify component interactions

	Tasks:
	1. Test pipeline stage integrations (01-03, 03-05, 07-09, etc.)
	2. Test external service integrations (Claude, RunPod, Supabase)
	3. Test database operations (CRUD, transactions)
	4. Test API endpoint workflows
	5. Test background worker integration

	Deliverables:
	- 50+ integration test cases passing
	- All critical workflows tested
	- Service mocks validated

	### Phase 3: System Testing (Week 4)
	Objective: End-to-end workflow validation

	Tasks:
	1. Test complete generation workflows (minimal, full features)
	2. Test error handling scenarios
	3. Test async processing workflows
	4. Test data quality and accuracy
	5. Test performance benchmarks

	Deliverables:
	- 50+ system test cases passing
	- All user journeys tested
	- Performance baselines established

	### Phase 4: Non-Functional Testing (Week 5-6)
	Objective: Verify performance, security, reliability

	Tasks:
	1. Performance: Load, stress, endurance, scalability tests
	2. Security: Penetration testing, vulnerability scanning
	3. Reliability: Fault tolerance, recovery testing
	4. Usability: Documentation review, DX testing

	Deliverables:
	- Load test report (normal, peak, sustained)
	- Security audit report
	- Reliability test report
	- Performance benchmarks

	### Phase 5: Regression Testing (Ongoing)
	Objective: Prevent defect reintroduction

	Tasks:
	1. Run full test suite on every commit (CI/CD)
	2. Add tests for every bug fix
	3. Update tests for new features
	4. Maintain >80% code coverage

	Frequency: Continuous (automated on every PR/commit)

	---

	## Success Criteria & Metrics

	### Test Completion Criteria

	\| Criteria \| Target \| Critical \|
	\|----------\|--------\|----------\|
	\| Unit test coverage \| >80% \| Yes \|
	\| Integration tests passing \| 100% \| Yes \|
	\| System tests passing \| 100% \| Yes \|
	\| Load test: Normal load \| 0% errors \| Yes \|
	\| Load test: Peak load \| <5% errors \| Yes \|
	\| Security: Critical vulnerabilities \| 0 \| Yes \|
	\| Security: High vulnerabilities \| <5 \| Yes \|
	\| Performance: Basic generation \| <60s \| Yes \|
	\| Performance: Handwriting generation \| <300s \| Yes \|
	\| Uptime SLA \| >99.5% \| No \|

	### Quality Metrics

	Code Quality:
	- Code coverage: >80%
	- Cyclomatic complexity: <10
	- Code duplication: <5%
	- Type hint coverage: >90%

	Performance:
	- API response time (P95): <500ms
	- Document generation (minimal): <60s
	- Document generation (with handwriting): <300s
	- Throughput: >500 docs/hour

	Reliability:
	- Uptime: >99.5%
	- MTBF (Mean Time Between Failures): >720 hours (30 days)
	- MTTR (Mean Time To Recover): <30 minutes
	- Error rate: <1%

	Security:
	- Zero critical vulnerabilities
	- <5 high-severity vulnerabilities
	- Dependency update cadence: <30 days behind

	---

	## Risk Assessment

	### High-Risk Areas

	\| Component \| Risk Level \| Mitigation Strategy \| Priority \|
	\|-----------\|------------\|---------------------\|----------\|
	\| Claude API integration \| HIGH \| Retry logic, fallback prompts, rate limiting \| Critical \|
	\| RunPod handwriting service \| HIGH \| Timeout handling, batch optimization, error raising \| Critical \|
	\| PDF rendering (Playwright) \| MEDIUM \| Headless browser stability, resource limits \| High \|
	\| OCR accuracy \| MEDIUM \| Multiple OCR engine options, confidence thresholds \| High \|
	\| Async job processing \| MEDIUM \| Worker health checks, job retry mechanisms \| High \|
	\| Database transactions \| MEDIUM \| ACID compliance, connection pooling \| High \|
	\| File storage \| LOW \| Disk space monitoring, cleanup policies \| Medium \|

	### Test Risk Mitigation

	\| Risk \| Impact \| Probability \| Mitigation \|
	\|------\|--------\|-------------\|------------\|
	\| External API unavailable during tests \| High \| Medium \| Use mocks, record/replay mode \|
	\| Test data corruption \| Medium \| Low \| Version control test fixtures \|
	\| Test environment instability \| High \| Medium \| Docker isolation, reproducible builds \|
	\| Long test execution time \| Low \| High \| Parallel execution, selective testing \|
	\| Flaky tests \| Medium \| Medium \| Retry logic, better assertions \|

	---

	## Test Reporting

	### Test Reports

	Daily Reports (Automated):
	- Test execution summary (pass/fail counts)
	- Code coverage trends
	- Failed test details
	- Performance benchmark comparison

	Weekly Reports (Manual):
	- Test progress against plan
	- New defects discovered
	- Defect resolution rate
	- Risk updates

	Release Reports (Per Release):
	- Complete test execution summary
	- All test case results
	- Performance test results
	- Security scan results
	- Known issues and limitations

	### Defect Tracking

	Defect Workflow:
	1. Report: Tester creates defect in issue tracker
	2. Triage: Team prioritizes defect (P0-Critical, P1-High, P2-Medium, P3-Low)
	3. Assign: Developer assigned to fix
	4. Fix: Developer implements fix
	5. Verify: Tester verifies fix
	6. Close: Defect closed, regression test added

	Defect Metrics:
	- Defect discovery rate
	- Defect resolution rate
	- Defect escape rate (to production)
	- Mean time to resolve (MTTR)

	---

	## Continuous Improvement

	### Test Optimization

	Quarterly Reviews:
	- Review test coverage (identify gaps)
	- Remove obsolete tests
	- Update test data
	- Optimize test execution time
	- Review test environment stability

	Automation Goals:
	- Automate 100% of unit tests
	- Automate 90% of integration tests
	- Automate 70% of system tests
	- Automate 50% of non-functional tests

	---

	## Appendix

	### Test Case Template

	```markdown
	## Test Case ID: [ID]

	Test Name: [Descriptive name]

	Component: [Module/Component under test]

	Test Type: [Unit/Integration/System/Non-Functional]

	Priority: [Critical/High/Medium/Low]

	Prerequisites:
	- [List any setup required]

	Test Steps:
	1. [Step 1]
	2. [Step 2]
	3. [Step 3]

	Test Data:
	- [Input data required]

	Expected Result:
	- [What should happen]

	Actual Result:
	- [What actually happened - filled during execution]

	Status: [Pass/Fail/Blocked/Not Run]

	Notes:
	- [Any additional observations]
	```

	### Glossary

	- API: Application Programming Interface
	- CI/CD: Continuous Integration/Continuous Deployment
	- DPI: Dots Per Inch
	- GT: Ground Truth
	- HW: Handwriting
	- KIE: Key Information Extraction
	- LLM: Large Language Model
	- MTBF: Mean Time Between Failures
	- MTTR: Mean Time To Recover
	- OCR: Optical Character Recognition
	- P95: 95th Percentile
	- SLA: Service Level Agreement
	- VE: Visual Element

	---

	Document Control:
	- Author: DocGenie QA Team
	- Reviewers: Development Team, Product Manager
	- Approval: Project Lead
	- Next Review Date: [3 months from approval]

	---

	END OF DOCUMENT