# Comprehensive Testing Plan & Test Cases
## DocGenie Synthetic Document Generation API

**Document Version**: 1.0  
**Date**: March 4, 2026  
**Project**: DocGenie - AI-Powered Synthetic Document Dataset Generator

---

## Table of Contents
1. [Testing Overview](#testing-overview)
2. [Functional Testing](#functional-testing)
   - [Unit Testing](#unit-testing)
   - [Integration Testing](#integration-testing)
   - [System Testing](#system-testing)
3. [Non-Functional Testing](#non-functional-testing)
   - [Performance Testing](#performance-testing)
   - [Security Testing](#security-testing)
   - [Reliability Testing](#reliability-testing)
   - [Scalability Testing](#scalability-testing)
   - [Usability Testing](#usability-testing)
4. [Test Environment Setup](#test-environment-setup)
5. [Testing Tools & Frameworks](#testing-tools--frameworks)
6. [Test Execution Plan](#test-execution-plan)
7. [Success Criteria & Metrics](#success-criteria--metrics)
8. [Risk Assessment](#risk-assessment)

---

## Testing Overview

### Purpose
This document outlines the comprehensive testing strategy for DocGenie API, ensuring quality, reliability, and performance of the synthetic document generation system across all 19 pipeline stages.

### Scope
- API endpoints testing (`/generate`, `/generate/pdf`, `/generate/async`)
- 19-stage pipeline validation
- External service integrations (Claude API, RunPod handwriting service)
- Database operations (Supabase)
- Background job processing (Redis Queue)
- Error handling and recovery mechanisms

### Testing Approach
- **Test-Driven Development (TDD)**: Write tests before implementation where applicable
- **Continuous Integration**: Automated test execution on every commit
- **Coverage Target**: Minimum 80% code coverage for critical paths
- **Risk-Based Testing**: Prioritize high-risk components (LLM integration, handwriting service)

---

## Functional Testing

### A.1 Unit Testing

Unit tests verify individual functions and methods in isolation. Target: 85% code coverage.

#### **A.1.1 Seed Image Processing (Stage 01)**

**Module**: `api/utils.py::download_seed_images()`

| Test Case ID | Test Name | Input | Expected Output | Priority |
|--------------|-----------|-------|-----------------|----------|
| UT-SEED-001 | Download valid image URL | Valid HTTPS URL (JPEG) | Base64-encoded image string | High |
| UT-SEED-002 | Download PNG format | Valid PNG URL | Base64-encoded PNG | High |
| UT-SEED-003 | Handle 503 timeout error | URL returning 503 | Retry 3 times, eventual success | Critical |
| UT-SEED-004 | Handle 502 bad gateway | URL returning 502 | Retry with exponential backoff | High |
| UT-SEED-005 | Handle 404 not found | Invalid URL | Raise HTTPException(400) | High |
| UT-SEED-006 | Handle connection timeout | Slow/unresponsive server | Retry then raise exception | Medium |
| UT-SEED-007 | Validate image format | Non-image URL (HTML) | Raise validation error | Medium |
| UT-SEED-008 | Handle oversized images | >10MB image | Process or reject gracefully | Low |
| UT-SEED-009 | Test retry backoff timing | Mock 503 responses | Delays: 2s, 4s, 8s | Medium |
| UT-SEED-010 | Test max retries exhausted | Persistent 503 errors | Raise exception after 3 attempts | High |

**Test Implementation**:
```python
# test_seed_download.py
import pytest
from api.utils import download_seed_images
from unittest.mock import patch, Mock

@pytest.mark.asyncio
async def test_download_valid_image():
    url = "https://example.com/test.jpg"
    with patch('httpx.AsyncClient') as mock_client:
        mock_response = Mock()
        mock_response.content = b'\xff\xd8\xff\xe0'  # JPEG header
        mock_client.return_value.__aenter__.return_value.get.return_value = mock_response
        
        result = await download_seed_images([url])
        assert len(result) == 1
        assert isinstance(result[0], str)  # base64 string

@pytest.mark.asyncio
async def test_download_503_retry():
    url = "https://example.com/test.jpg"
    with patch('httpx.AsyncClient') as mock_client:
        # First two calls: 503, third call: success
        responses = [
            Mock(status_code=503, raise_for_status=Mock(side_effect=httpx.HTTPStatusError("503", request=Mock(), response=Mock()))),
            Mock(status_code=503, raise_for_status=Mock(side_effect=httpx.HTTPStatusError("503", request=Mock(), response=Mock()))),
            Mock(content=b'\xff\xd8\xff\xe0', raise_for_status=Mock())
        ]
        mock_client.return_value.__aenter__.return_value.get.side_effect = responses
        
        result = await download_seed_images([url])
        assert len(result) == 1
        assert mock_client.return_value.__aenter__.return_value.get.call_count == 3
```

#### **A.1.2 HTML Processing (Stage 03)**

**Module**: `api/utils.py::extract_html_documents_from_response()`

| Test Case ID | Test Name | Input | Expected Output | Priority |
|--------------|-----------|-------|-----------------|----------|
| UT-HTML-001 | Extract single HTML | LLM response with 1 HTML | List with 1 HTML document | High |
| UT-HTML-002 | Extract multiple HTMLs | Response with 3 HTMLs | List with 3 documents | High |
| UT-HTML-003 | Extract ground truth | HTML with `<script id="GT">` | GT JSON extracted, script removed | Critical |
| UT-HTML-004 | Handle malformed HTML | Invalid HTML tags | Parse with BeautifulSoup recovery | Medium |
| UT-HTML-005 | Handle missing DOCTYPE | HTML without DOCTYPE | Add DOCTYPE or flag error | Low |
| UT-HTML-006 | Validate CSS presence | HTML without `<style>` | Raise validation error | High |
| UT-HTML-007 | Extract handwriting markers | HTML with `class="handwritten"` | Identify 5 handwriting elements | High |
| UT-HTML-008 | Extract visual elements | HTML with `data-placeholder` | Identify 3 visual elements | High |
| UT-HTML-009 | Handle empty response | Empty string from LLM | Return empty list | Medium |
| UT-HTML-010 | Prettify minified HTML | Single-line HTML | Multi-line formatted HTML | Low |

#### **A.1.3 PDF Rendering (Stage 04)**

**Module**: `api/utils.py::render_html_to_pdf()`

| Test Case ID | Test Name | Input | Expected Output | Priority |
|--------------|-----------|-------|-----------------|----------|
| UT-PDF-001 | Render A4 document | HTML with A4 page size | PDF 210×297mm | High |
| UT-PDF-002 | Render Letter size | HTML with Letter page | PDF 215.9×279.4mm | Medium |
| UT-PDF-003 | Extract geometries | HTML with handwriting | Geometries JSON with rects | Critical |
| UT-PDF-004 | Handle custom fonts | HTML with @font-face | PDF with embedded fonts | Low |
| UT-PDF-005 | Preserve CSS styling | HTML with colors/borders | PDF matches visual style | Medium |
| UT-PDF-006 | Handle images in HTML | HTML with <img> tags | Images embedded in PDF | Low |
| UT-PDF-007 | Extract text coordinates | HTML with paragraphs | Accurate bbox coordinates | High |
| UT-PDF-008 | Handle landscape orientation | HTML with landscape CSS | PDF in landscape mode | Low |
| UT-PDF-009 | Validate page dimensions | Various page sizes | Dimensions match CSS @page | High |
| UT-PDF-010 | Handle Playwright errors | Browser crash scenario | Retry or graceful failure | Medium |

#### **A.1.4 Bbox Extraction (Stage 05)**

**Module**: `api/utils.py::extract_bboxes_from_rendered_pdf()`

| Test Case ID | Test Name | Input | Expected Output | Priority |
|--------------|-----------|-------|-----------------|----------|
| UT-BBOX-001 | Extract word bboxes | Standard PDF | List of word-level bboxes | Critical |
| UT-BBOX-002 | Extract char bboxes | Same PDF | List of char-level bboxes | High |
| UT-BBOX-003 | Handle multi-line text | PDF with paragraphs | Correct block/line grouping | High |
| UT-BBOX-004 | Filter whitespace | PDF with spaces/tabs | No whitespace-only bboxes | Medium |
| UT-BBOX-005 | Handle special characters | PDF with ©, ®, ™ | Characters properly extracted | Medium |
| UT-BBOX-006 | Handle non-Latin scripts | PDF with Chinese/Arabic | Correct unicode extraction | Low |
| UT-BBOX-007 | Validate coordinates | Extracted bboxes | All coords within page bounds | High |
| UT-BBOX-008 | Handle empty PDF | PDF with no text | Return empty list | Low |
| UT-BBOX-009 | Handle rotated text | PDF with rotation | Bboxes account for rotation | Low |
| UT-BBOX-010 | Parse bbox strings | "0_0_0 Hello 10 20 50 30" | OCRBox object with correct fields | High |

#### **A.1.5 Handwriting Region Extraction (Stage 07)**

**Module**: `api/utils.py::process_stage3_complete()` - handwriting section

| Test Case ID | Test Name | Input | Expected Output | Priority |
|--------------|-----------|-------|-----------------|----------|
| UT-HW-001 | Filter by handwriting_ratio | 10 regions, ratio=0.3 | ~3 regions selected | Critical |
| UT-HW-002 | Parse author IDs | `class="handwritten author1"` | author_id="author1" | High |
| UT-HW-003 | Match to word bboxes | Geometry + bboxes | Correct bbox mapping | Critical |
| UT-HW-004 | Handle signature class | `class="handwritten signature"` | is_signature=True | Medium |
| UT-HW-005 | DPI coordinate conversion | Browser coords (96 DPI) | PDF coords (72 DPI) with 0.75 scale | High |
| UT-HW-006 | Handle overlapping regions | 2 regions, same text | Prevent duplicate bbox usage | Medium |
| UT-HW-007 | Validate rect boundaries | Geometries with rect | Check bboxes within rect threshold | High |
| UT-HW-008 | Test seed reproducibility | Same seed, same input | Identical region selection | High |
| UT-HW-009 | Handle zero ratio | ratio=0.0 | No regions selected | Medium |
| UT-HW-010 | Handle full ratio | ratio=1.0 | All regions selected | Medium |

#### **A.1.6 Handwriting Service Integration**

**Module**: `api/utils.py::call_handwriting_service_batch()`

| Test Case ID | Test Name | Input | Expected Output | Priority |
|--------------|-----------|-------|-----------------|----------|
| UT-HWSVC-001 | Batch request format | 10 texts with metadata | Correct RunPod JSON format | Critical |
| UT-HWSVC-002 | Handle sync response | Immediate completion | Parse output.images[] | High |
| UT-HWSVC-003 | Handle IN_PROGRESS | Delayed completion | Poll status endpoint | Critical |
| UT-HWSVC-004 | Status polling timeout | Job exceeds 30 polls | Raise timeout exception | High |
| UT-HWSVC-005 | Handle FAILED status | RunPod job failure | Raise exception with error | High |
| UT-HWSVC-006 | Parse image results | Batch response | Map hw_id to image_base64 | Critical |
| UT-HWSVC-007 | Calculate dynamic timeout | 50 texts | Timeout = 50×20+30 = 1030s | Medium |
| UT-HWSVC-008 | Handle network errors | Connection timeout | Retry up to max_retries | High |
| UT-HWSVC-009 | Validate authorization | Missing API key | Request includes Bearer token | Medium |
| UT-HWSVC-010 | Test exponential backoff | Status polling | Delays: 5s, 6s, 7s... up to 10s | Low |

#### **A.1.7 Visual Element Generation (Stage 10)**

**Module**: `api/utils.py::generate_visual_element_images()`

| Test Case ID | Test Name | Input | Expected Output | Priority |
|--------------|-----------|-------|-----------------|----------|
| UT-VE-001 | Select logo prefab | type="logo" | Random logo from prefabs/ | High |
| UT-VE-002 | Select photo prefab | type="photo" | Random photo image | High |
| UT-VE-003 | Generate barcode | type="barcode" | EAN-13 barcode image | Medium |
| UT-VE-004 | Generate QR code | type="qr_code", content="URL" | QR code image | Medium |
| UT-VE-005 | Test seed reproducibility | Same seed, same type | Identical prefab selection | High |
| UT-VE-006 | Handle missing prefabs | type with no files | Fallback or error | Medium |
| UT-VE-007 | Load SVG prefabs | SVG logo file | Convert to PNG | Low |
| UT-VE-008 | Filter by requested types | types=["logo","signature"] | Only matching types generated | High |
| UT-VE-009 | Normalize type synonyms | "chart" → "figure" | Consistent type mapping | Medium |
| UT-VE-010 | Return base64 encoding | All image types | Valid base64 strings | High |

#### **A.1.8 PDF Modification (Stages 12-13)**

**Module**: `api/utils.py::process_stage3_complete()` - insertion sections

| Test Case ID | Test Name | Input | Expected Output | Priority |
|--------------|-----------|-------|-----------------|----------|
| UT-PDFMOD-001 | Whiteout text regions | 5 word bboxes | White rectangles drawn | High |
| UT-PDFMOD-002 | Insert handwriting image | Image + bbox | Image at correct position | Critical |
| UT-PDFMOD-003 | Apply random offsets | Word bbox | Position offset within limits | Medium |
| UT-PDFMOD-004 | Resize with aspect ratio | Wide/tall images | Scaled to fit bbox | High |
| UT-PDFMOD-005 | Insert visual element | Logo + rect | Centered in bbox | High |
| UT-PDFMOD-006 | Handle rotation | Element with rotation=45 | Rotated image insertion | Low |
| UT-PDFMOD-007 | Save intermediate PDF | After handwriting | _with_handwriting.pdf created | Medium |
| UT-PDFMOD-008 | Save final PDF | After visual elements | _final.pdf created | High |
| UT-PDFMOD-009 | Scale factor application | 3x upscale | High-res image quality | Medium |
| UT-PDFMOD-010 | Handle insertion errors | Invalid image data | Log error, continue | Medium |

#### **A.1.9 OCR Processing (Stage 15)**

**Module**: `api/utils.py::run_paddle_ocr()`

| Test Case ID | Test Name | Input | Expected Output | Priority |
|--------------|-----------|-------|-----------------|----------|
| UT-OCR-001 | OCR English text | English document image | Accurate word recognition | Critical |
| UT-OCR-002 | OCR with handwriting | Mixed typed/handwritten | Both text types detected | High |
| UT-OCR-003 | Extract word bboxes | Document image | List of word-level bboxes | Critical |
| UT-OCR-004 | Calculate confidence | OCR results | Confidence score per word | High |
| UT-OCR-005 | Handle low quality | Blurry/noisy image | Reasonable accuracy (>70%) | Medium |
| UT-OCR-006 | Handle rotated text | 90° rotated document | Correct orientation detection | Low |
| UT-OCR-007 | Multi-language support | Document with German text | lang="de" parameter works | Medium |
| UT-OCR-008 | Handle empty image | Blank white image | Empty results list | Low |
| UT-OCR-009 | DPI configuration | Various DPI settings | Consistent accuracy | Medium |
| UT-OCR-010 | Return image dimensions | Any image | width, height in pixels | High |

#### **A.1.10 Bbox Normalization (Stage 16)**

**Module**: `api/utils.py::normalize_bboxes()`

| Test Case ID | Test Name | Input | Expected Output | Priority |
|--------------|-----------|-------|-----------------|----------|
| UT-NORM-001 | Normalize to [0,1] | Pixel bboxes, image dims | Normalized coordinates | Critical |
| UT-NORM-002 | Handle out-of-bounds | x1 > image_width | Clipped to [0, 1] | High |
| UT-NORM-003 | Preserve text data | Bboxes with text field | Text preserved in output | High |
| UT-NORM-004 | Create segment bboxes | Word-level bboxes | Aggregated segment bboxes | Medium |
| UT-NORM-005 | Handle zero dimensions | Image with width=0 | Raise validation error | Low |
| UT-NORM-006 | Round to precision | Float coordinates | 6 decimal places | Low |
| UT-NORM-007 | Maintain bbox order | Ordered input list | Same order in output | Medium |
| UT-NORM-008 | Handle negative coords | bbox with x0=-5 | Clipped to 0 | Medium |
| UT-NORM-009 | Validate bbox format | Various input formats | Consistent output schema | High |
| UT-NORM-010 | Handle empty list | No bboxes | Return empty list | Low |

#### **A.1.11 Dataset Export (Stage 19)**

**Module**: `api/utils.py::export_to_msgpack()`

| Test Case ID | Test Name | Input | Expected Output | Priority |
|--------------|-----------|-------|-----------------|----------|
| UT-EXPORT-001 | Create msgpack file | Complete document data | Valid .msgpack file | Critical |
| UT-EXPORT-002 | Encode image bytes | PNG image | Binary image in msgpack | High |
| UT-EXPORT-003 | Store normalized bboxes | Normalized coordinates | Bboxes in [0,1] range | High |
| UT-EXPORT-004 | Store ground truth | GT JSON | GT dict in msgpack | High |
| UT-EXPORT-005 | Store metadata | Document metadata | Metadata dict in msgpack | Medium |
| UT-EXPORT-006 | Validate msgpack format | Generated file | Readable by msgpack.load() | Critical |
| UT-EXPORT-007 | Handle large files | 10MB+ image | Compression applied | Low |
| UT-EXPORT-008 | Store words list | OCR words | Ordered word list | High |
| UT-EXPORT-009 | Handle missing fields | Partial data | Fill with null/defaults | Medium |
| UT-EXPORT-010 | Return file path | Export operation | Absolute path to .msgpack | Medium |

#### **A.1.12 Validation Functions**

**Module**: `api/utils.py::validate_*()`

| Test Case ID | Test Name | Input | Expected Output | Priority |
|--------------|-----------|-------|-----------------|----------|
| UT-VAL-001 | Validate HTML structure | Valid HTML5 | (True, None) | High |
| UT-VAL-002 | Detect missing DOCTYPE | HTML without DOCTYPE | (False, "Missing DOCTYPE") | Medium |
| UT-VAL-003 | Detect missing CSS | HTML without <style> | (False, "Missing CSS") | High |
| UT-VAL-004 | Validate PDF file | Valid PDF | (True, None) | High |
| UT-VAL-005 | Detect corrupt PDF | Truncated PDF file | (False, "Corrupt PDF") | High |
| UT-VAL-006 | Validate bbox count | 100 bboxes, min=50 | (True, None) | Medium |
| UT-VAL-007 | Detect insufficient bboxes | 10 bboxes, min=50 | (False, "Insufficient bboxes") | Medium |
| UT-VAL-008 | Validate bbox coordinates | Valid bboxes | (True, None) | High |
| UT-VAL-009 | Detect invalid coordinates | x0 > x1 | (False, "Invalid bbox") | High |
| UT-VAL-010 | Validate page count | Multi-page PDF | (False, "Expected 1 page") | Medium |

**Total Unit Tests**: 120+ test cases

---

### A.2 Integration Testing

Integration tests verify interactions between multiple components. Target: Complete workflow coverage.

#### **A.2.1 Pipeline Stage Integration**

**Purpose**: Verify data flow between consecutive pipeline stages

| Test Case ID | Test Name | Components | Test Scenario | Priority |
|--------------|-----------|------------|---------------|----------|
| IT-PIPE-001 | Stages 01-03 integration | Seed download → LLM → HTML extraction | Download seeds, call LLM, extract HTML successfully | Critical |
| IT-PIPE-002 | Stages 03-05 integration | HTML extraction → PDF render → Bbox extraction | Clean HTML renders to PDF, bboxes extracted | Critical |
| IT-PIPE-003 | Stages 07-09 integration | HW extraction → Service call | HW regions trigger service batch request | Critical |
| IT-PIPE-004 | Stages 09-12 integration | HW generation → Insertion | Generated images inserted at correct positions | Critical |
| IT-PIPE-005 | Stages 14-15 integration | Image render → OCR | Final image passed to OCR successfully | High |
| IT-PIPE-006 | Stages 15-16 integration | OCR → Normalization | OCR bboxes normalized with correct dimensions | High |
| IT-PIPE-007 | Stages 07-13 complete | Full Stage 3 | Handwriting + visual elements end-to-end | Critical |
| IT-PIPE-008 | Stages 14-19 complete | Full Stages 4-5 | OCR → export complete workflow | High |
| IT-PIPE-009 | Stages 01-19 minimal | End-to-end minimal | No handwriting/VE, basic generation | Critical |
| IT-PIPE-010 | Stages 01-19 full | End-to-end full features | All features enabled, complete dataset | Critical |

#### **A.2.2 External Service Integration**

**Purpose**: Verify interactions with external APIs and services

| Test Case ID | Test Name | Services | Test Scenario | Priority |
|--------------|-----------|----------|---------------|----------|
| IT-EXT-001 | Claude API integration | Claude Messages API | Send prompt, receive valid response | Critical |
| IT-EXT-002 | Claude error handling | Claude API | Handle rate limits (429) gracefully | High |
| IT-EXT-003 | Claude retry logic | Claude API | Automatic retry on transient errors | High |
| IT-EXT-004 | RunPod sync integration | RunPod /runsync | Send batch, receive images | Critical |
| IT-EXT-005 | RunPod async integration | RunPod /run + status | Queue job, poll until completion | High |
| IT-EXT-006 | RunPod auth | RunPod API | Bearer token authentication works | Medium |
| IT-EXT-007 | Supabase storage | Supabase storage API | Upload/download seed images | Medium |
| IT-EXT-008 | Supabase database | Supabase DB | Store generation metadata | Medium |
| IT-EXT-009 | Redis Queue | RQ worker | Enqueue async job, process in background | High |
| IT-EXT-010 | Google Drive | Drive API (optional) | Export to Google Drive if configured | Low |

#### **A.2.3 Database Operations**

**Purpose**: Verify database interactions (Supabase)

| Test Case ID | Test Name | Operations | Test Scenario | Priority |
|--------------|-----------|------------|---------------|----------|
| IT-DB-001 | Insert generation record | INSERT | New generation logged in DB | High |
| IT-DB-002 | Update generation status | UPDATE | Status changes reflected | High |
| IT-DB-003 | Query by task ID | SELECT | Retrieve generation by ID | High |
| IT-DB-004 | Store metadata | INSERT | Complete metadata stored | Medium |
| IT-DB-005 | Handle connection errors | Network failure | Retry or graceful degradation | High |
| IT-DB-006 | Transaction rollback | Error mid-transaction | Data consistency maintained | Medium |
| IT-DB-007 | Concurrent updates | Multiple workers | No race conditions | Medium |
| IT-DB-008 | Pagination | Large result sets | Efficient pagination | Low |
| IT-DB-009 | Search functionality | Full-text search | Search by doc_type, language | Low |
| IT-DB-010 | Data retention | Cleanup old data | Archive/delete after N days | Low |

#### **A.2.4 API Endpoint Integration**

**Purpose**: Test complete request/response cycles through endpoints

| Test Case ID | Test Name | Endpoint | Test Scenario | Priority |
|--------------|-----------|----------|---------------|----------|
| IT-API-001 | GET /health | Health check | Returns 200 with system status | Critical |
| IT-API-002 | POST /generate | Legacy endpoint | Returns JSON with complete data | High |
| IT-API-003 | POST /generate/pdf | Sync PDF endpoint | Returns ZIP file download | Critical |
| IT-API-004 | POST /generate/async | Async endpoint | Returns task ID | Critical |
| IT-API-005 | GET /generate/async/status/{id} | Status check | Returns current job status | Critical |
| IT-API-006 | GET /generate/async/result/{id} | Result download | Returns ZIP when complete | High |
| IT-API-007 | Request validation | All endpoints | Invalid params rejected with 400 | High |
| IT-API-008 | Authentication | Protected endpoints | Requires valid API key | High |
| IT-API-009 | Rate limiting | All endpoints | Enforces rate limits | Medium |
| IT-API-010 | CORS headers | All endpoints | Correct CORS configuration | Medium |

#### **A.2.5 Background Worker Integration**

**Purpose**: Test async job processing via Redis Queue

| Test Case ID | Test Name | Components | Test Scenario | Priority |
|--------------|-----------|------------|---------------|----------|
| IT-WORKER-001 | Job enqueue | API → RQ | Job added to queue successfully | Critical |
| IT-WORKER-002 | Job processing | Worker → Pipeline | Worker picks up and processes job | Critical |
| IT-WORKER-003 | Job status updates | Worker → DB | Status updated throughout processing | High |
| IT-WORKER-004 | Job failure handling | Worker error | Failed job logged, error reported | High |
| IT-WORKER-005 | Job retry | Transient failure | Failed job retried up to max attempts | High |
| IT-WORKER-006 | Job timeout | Long-running job | Timeout enforced, job killed | Medium |
| IT-WORKER-007 | Result storage | Worker → Storage | Results saved to correct location | High |
| IT-WORKER-008 | Queue priority | Multiple jobs | High priority jobs processed first | Low |
| IT-WORKER-009 | Worker scaling | Multiple workers | Jobs distributed across workers | Medium |
| IT-WORKER-010 | Worker health | Worker crash | Replaced automatically, jobs reassigned | High |

**Total Integration Tests**: 50+ test cases

---

### A.3 System Testing

System tests verify end-to-end workflows from user perspective. Target: All user journeys covered.

#### **A.3.1 Complete Generation Workflows**

| Test Case ID | Test Name | Workflow | Test Scenario | Expected Outcome | Priority |
|--------------|-----------|----------|---------------|------------------|----------|
| ST-GEN-001 | Basic document generation | Minimal config | Generate 1 English invoice, no handwriting/VE | PDF + metadata returned in <60s | Critical |
| ST-GEN-002 | Handwriting generation | Enable handwriting | Generate document with handwriting | Handwriting visible in PDF | Critical |
| ST-GEN-003 | Visual elements | Enable VE | Generate document with logo + barcode | Elements visible in PDF | High |
| ST-GEN-004 | Full feature set | All features enabled | Generate with HW + VE + OCR + analysis | Complete dataset ZIP | Critical |
| ST-GEN-005 | Multi-document batch | num_solutions=5 | Generate 5 documents from 3 seeds | 5 complete documents | High |
| ST-GEN-006 | Reproducible generation | Same seed value | Generate twice with seed=42 | Identical outputs | High |
| ST-GEN-007 | Multi-language | language="german" | Generate German document | Correct language output | Medium |
| ST-GEN-008 | Various doc types | doc_type variations | Test invoice, receipt, form, letter | All types work | High |
| ST-GEN-009 | Different GT formats | gt_type="kie" / "qa" | Test both GT formats | Correct GT structure | High |
| ST-GEN-010 | Custom seed images | User-provided URLs | Generate from user's images | Images influence output | High |

#### **A.3.2 Error Handling Workflows**

| Test Case ID | Test Name | Error Condition | Test Scenario | Expected Outcome | Priority |
|--------------|-----------|-----------------|---------------|------------------|----------|
| ST-ERR-001 | Invalid seed URL | 404 not found | Submit invalid image URL | HTTP 400 with clear error message | High |
| ST-ERR-002 | LLM API failure | Claude API down | Submit request during outage | HTTP 503 with retry-after | Critical |
| ST-ERR-003 | Handwriting service failure | RunPod timeout | Enable handwriting, service fails | HTTP 500, generation stopped | Critical |
| ST-ERR-004 | Invalid parameters | Missing required field | Omit doc_type parameter | HTTP 422 with validation details | High |
| ST-ERR-005 | Rate limit exceeded | Too many requests | Submit 100 concurrent requests | HTTP 429 with retry info | High |
| ST-ERR-006 | Payload too large | Huge request | Submit 50 seed image URLs | HTTP 413 payload too large | Medium |
| ST-ERR-007 | Malformed JSON | Invalid JSON | Submit broken JSON request | HTTP 400 with parse error | High |
| ST-ERR-008 | Authentication failure | Missing/invalid API key | Request without auth | HTTP 401 unauthorized | High |
| ST-ERR-009 | Database connection loss | DB unavailable | Submit during DB outage | Graceful degradation or 503 | Medium |
| ST-ERR-010 | Disk space exhausted | No storage space | Generate large batch | HTTP 507 insufficient storage | Low |

#### **A.3.3 Async Processing Workflows**

| Test Case ID | Test Name | Workflow | Test Scenario | Expected Outcome | Priority |
|--------------|-----------|----------|---------------|------------------|----------|
| ST-ASYNC-001 | Submit async job | POST /generate/async | Submit batch job | Receive task ID immediately | Critical |
| ST-ASYNC-002 | Check pending status | GET status before completion | Poll status endpoint | Returns "pending" or "processing" | High |
| ST-ASYNC-003 | Check completed status | GET status after completion | Poll status after 5 minutes | Returns "completed" | Critical |
| ST-ASYNC-004 | Download results | GET result/{id} | Download after completion | Returns ZIP file | Critical |
| ST-ASYNC-005 | Check failed status | Job fails during processing | Check status of failed job | Returns "failed" with error details | High |
| ST-ASYNC-006 | Multiple concurrent jobs | Submit 10 jobs | 10 async submissions | All jobs process independently | High |
| ST-ASYNC-007 | Job cancellation | Cancel in-progress job | Submit, then cancel | Job stops, partial results cleaned | Medium |
| ST-ASYNC-008 | Result expiration | Check old results | Access 7-day old result | HTTP 410 gone (expired) | Low |
| ST-ASYNC-009 | Progress updates | Monitor long job | Poll during processing | Progress % increases | Medium |
| ST-ASYNC-010 | Worker restart recovery | Worker crashes mid-job | Kill worker process | Job reassigned, completes | High |

#### **A.3.4 Data Quality Workflows**

| Test Case ID | Test Name | Quality Check | Test Scenario | Expected Outcome | Priority |
|--------------|-----------|---------------|---------------|------------------|----------|
| ST-QUAL-001 | OCR accuracy | Compare OCR to ground truth | Generate doc, compare OCR text to GT | >90% accuracy | High |
| ST-QUAL-002 | Bbox alignment | Visual inspection | Generate doc with debug viz | Bboxes align with text | High |
| ST-QUAL-003 | Handwriting quality | Visual realism | Generate handwritten doc | Handwriting looks realistic | Medium |
| ST-QUAL-004 | Visual element placement | Correct positioning | Generate with logo + barcode | Elements at correct positions | High |
| ST-QUAL-005 | GT completeness | All GT fields present | Generate KIE document | All expected GT fields extracted | High |
| ST-QUAL-006 | Dataset format validity | msgpack validation | Export dataset | PyTorch can load msgpack | High |
| ST-QUAL-007 | Image resolution | Check output image | Render final image | Minimum 220 DPI quality | Medium |
| ST-QUAL-008 | PDF compliance | PDF/A validation | Generate PDF | Valid PDF/A format | Low |
| ST-QUAL-009 | Metadata accuracy | Check metadata fields | Generate document | Metadata matches actual data | High |
| ST-QUAL-010 | Reproducibility | Same input → same output | Generate 3 times with seed | All outputs identical | High |

#### **A.3.5 Performance Workflows**

| Test Case ID | Test Name | Performance Metric | Test Scenario | Target Performance | Priority |
|--------------|-----------|-------------------|---------------|---------------------|----------|
| ST-PERF-001 | Basic generation time | Time to completion | Generate minimal document | <60 seconds | High |
| ST-PERF-002 | Handwriting generation time | Time with HW | Generate with 20 HW words | <300 seconds | High |
| ST-PERF-003 | Batch generation time | Multiple documents | Generate 10 documents | <15 minutes | Medium |
| ST-PERF-004 | API response time | Endpoint latency | Submit request | <500ms to return task ID | High |
| ST-PERF-005 | Status check latency | Status endpoint | Check job status | <100ms response time | Medium |
| ST-PERF-006 | Concurrent requests | Load handling | 50 concurrent requests | All complete successfully | High |
| ST-PERF-007 | Large payload | Big request | 8 seed images, 10 solutions | Processes without timeout | Medium |
| ST-PERF-008 | Memory usage | Resource consumption | Generate 100 documents | <8GB RAM per worker | Medium |
| ST-PERF-009 | Disk I/O | Storage performance | Rapid sequential generations | No I/O bottleneck | Low |
| ST-PERF-010 | Network bandwidth | Data transfer | Download large result ZIP | Download completes in <60s | Low |

**Total System Tests**: 50+ test cases

---

## Non-Functional Testing

### B.1 Performance Testing

Purpose: Verify system performance under various load conditions.

#### **B.1.1 Load Testing**

**Tool**: Apache JMeter / Locust

| Test Case ID | Test Name | Load Profile | Metrics | Acceptance Criteria | Priority |
|--------------|-----------|--------------|---------|---------------------|----------|
| NFT-LOAD-001 | Normal load | 10 concurrent users, 1 hour | Throughput, response time | Avg response <5s, 0 errors | Critical |
| NFT-LOAD-002 | Peak load | 50 concurrent users, 30 min | Throughput, error rate | <5% error rate, response <15s | Critical |
| NFT-LOAD-003 | Sustained load | 25 concurrent users, 4 hours | CPU, memory, throughput | Stable resource usage, no leaks | High |
| NFT-LOAD-004 | Ramp-up load | 1→100 users over 30 min | System behavior | Graceful scaling or degradation | High |
| NFT-LOAD-005 | Spike load | Sudden 0→100 users | Response time spike | Recovers within 2 minutes | Medium |

**Test Script Example (Locust)**:
```python
# locustfile.py
from locust import HttpUser, task, between

class DocGenieUser(HttpUser):
    wait_time = between(5, 15)
    
    @task(3)
    def generate_basic_document(self):
        payload = {
            "seed_images": ["https://example.com/seed1.jpg"],
            "prompt_params": {
                "language": "english",
                "doc_type": "invoice",
                "num_solutions": 1,
                "enable_handwriting": False,
                "enable_visual_elements": False
            }
        }
        self.client.post("/generate", json=payload, timeout=120)
    
    @task(1)
    def check_async_status(self):
        # Assume task_id from previous task
        self.client.get(f"/generate/async/status/{self.task_id}")
```

#### **B.1.2 Stress Testing**

**Purpose**: Determine system breaking point

| Test Case ID | Test Name | Stress Condition | Metrics | Acceptance Criteria | Priority |
|--------------|-----------|------------------|---------|---------------------|----------|
| NFT-STRESS-001 | User overload | 200+ concurrent users | Max capacity | Identifies max users before failure | High |
| NFT-STRESS-002 | Memory stress | Generate 1000 docs without cleanup | Memory usage | OOM protection, graceful failure | High |
| NFT-STRESS-003 | CPU stress | Complex documents, no throttling | CPU utilization | System remains responsive | Medium |
| NFT-STRESS-004 | Disk stress | Fill 95% of disk space | I/O performance | Handles low disk gracefully | Medium |
| NFT-STRESS-005 | Network stress | Simulate slow network | Timeout handling | Appropriate timeouts, retries | Medium |

#### **B.1.3 Endurance Testing (Soak Testing)**

**Purpose**: Detect memory leaks and performance degradation over time

| Test Case ID | Test Name | Duration | Load | Metrics | Acceptance Criteria | Priority |
|--------------|-----------|----------|------|---------|---------------------|----------|
| NFT-ENDUR-001 | 24-hour test | 24 hours | 10 concurrent users | Memory, CPU over time | No memory leaks, stable performance | High |
| NFT-ENDUR-002 | 7-day test | 7 days | 5 concurrent users | All resources | System stable, no degradation | Medium |
| NFT-ENDUR-003 | Weekend load | 48 hours | Variable load | Error rate | <1% errors throughout | Medium |

#### **B.1.4 Scalability Testing**

**Purpose**: Verify horizontal and vertical scaling

| Test Case ID | Test Name | Scaling Type | Test Scenario | Acceptance Criteria | Priority |
|--------------|-----------|--------------|---------------|---------------------|----------|
| NFT-SCALE-001 | Horizontal scaling | Add workers | 1→5 workers, measure throughput | Linear throughput increase | High |
| NFT-SCALE-002 | Vertical scaling | Increase CPU/RAM | 2→8 cores, 4→16GB RAM | Performance improvement | Medium |
| NFT-SCALE-003 | Auto-scaling | Dynamic load | Trigger auto-scale rules | Scales up/down automatically | Medium |
| NFT-SCALE-004 | Database scaling | Database load | High concurrent DB ops | No DB bottleneck | High |
| NFT-SCALE-005 | Storage scaling | Large datasets | Generate 10,000 documents | Storage handles volume | Low |

#### **B.1.5 Benchmark Testing**

**Purpose**: Establish performance baselines

| Test Case ID | Component | Benchmark | Target | Priority |
|--------------|-----------|-----------|--------|----------|
| NFT-BENCH-001 | Seed download | 1 image (1MB) | <2 seconds | High |
| NFT-BENCH-002 | LLM call | 1 prompt (standard) | <30 seconds | Critical |
| NFT-BENCH-003 | PDF rendering | 1 A4 page | <3 seconds | High |
| NFT-BENCH-004 | Bbox extraction | 500 words | <2 seconds | Medium |
| NFT-BENCH-005 | Handwriting service | 10 words batch | <200 seconds | Critical |
| NFT-BENCH-006 | Visual element generation | 5 elements | <5 seconds | Medium |
| NFT-BENCH-007 | OCR processing | 1 A4 page (300 DPI) | <5 seconds | High |
| NFT-BENCH-008 | Msgpack export | 1 document | <2 seconds | Medium |
| NFT-BENCH-009 | Complete pipeline (minimal) | End-to-end | <60 seconds | Critical |
| NFT-BENCH-010 | Complete pipeline (full) | End-to-end with HW | <300 seconds | Critical |

---

### B.2 Security Testing

Purpose: Identify vulnerabilities and ensure data protection.

#### **B.2.1 Authentication & Authorization Testing**

| Test Case ID | Test Name | Security Control | Test Scenario | Expected Outcome | Priority |
|--------------|-----------|------------------|---------------|------------------|----------|
| NFT-SEC-001 | API key validation | Authentication | Request without API key | HTTP 401 Unauthorized | Critical |
| NFT-SEC-002 | Invalid API key | Authentication | Request with wrong key | HTTP 401 Unauthorized | Critical |
| NFT-SEC-003 | Expired API key | Token expiration | Request with expired key | HTTP 401 with renewal info | High |
| NFT-SEC-004 | API key rotation | Key management | Rotate keys, test old key | Old key rejected | Medium |
| NFT-SEC-005 | Role-based access | Authorization | User tries admin endpoint | HTTP 403 Forbidden | High |
| NFT-SEC-006 | Resource ownership | Authorization | User accesses other's job | HTTP 403 Forbidden | High |
| NFT-SEC-007 | JWT validation | Token security | Tampered JWT token | Signature validation fails | High |
| NFT-SEC-008 | Session hijacking | Session security | Stolen session token | Token invalidated after detection | Medium |
| NFT-SEC-009 | Brute force protection | Rate limiting | 100 failed auth attempts | Account locked, IP blocked | High |
| NFT-SEC-010 | Multi-factor auth | MFA | Admin login without MFA | MFA required | Low |

#### **B.2.2 Input Validation & Injection Testing**

| Test Case ID | Test Name | Vulnerability | Test Scenario | Expected Outcome | Priority |
|--------------|-----------|---------------|---------------|------------------|----------|
| NFT-SEC-011 | SQL injection | Injection | Inject SQL in parameters | Parameterized queries prevent injection | Critical |
| NFT-SEC-012 | XSS attack | Cross-site scripting | Inject `<script>` in doc_type | Input sanitized, script not executed | High |
| NFT-SEC-013 | Command injection | OS command injection | Inject shell commands | Commands not executed | Critical |
| NFT-SEC-014 | Path traversal | Directory traversal | `../../etc/passwd` in filename | Access denied | Critical |
| NFT-SEC-015 | SSRF attack | Server-side request forgery | seed_image URL to internal IP | Internal IPs blocked | High |
| NFT-SEC-016 | XXE attack | XML external entity | Upload XML with external entity | External entities disabled | Medium |
| NFT-SEC-017 | LLM prompt injection | Prompt manipulation | Inject ignore instructions | Prompt sandboxing prevents escape | High |
| NFT-SEC-018 | Buffer overflow | Memory safety | Send 10MB+ parameter | Request rejected, no crash | Medium |
| NFT-SEC-019 | Unicode attack | Unicode bypass | Unicode normalization tricks | Normalized before processing | Low |
| NFT-SEC-020 | Regex DoS | ReDoS | Complex regex in input | Timeout protection active | Medium |

#### **B.2.3 Data Protection Testing**

| Test Case ID | Test Name | Protection Mechanism | Test Scenario | Expected Outcome | Priority |
|--------------|-----------|---------------------|---------------|------------------|----------|
| NFT-SEC-021 | Data encryption at rest | Storage encryption | Check stored files | Files encrypted on disk | High |
| NFT-SEC-022 | Data encryption in transit | TLS/HTTPS | Inspect network traffic | All traffic over HTTPS | Critical |
| NFT-SEC-023 | API key exposure | Secret management | Check logs, errors | API keys never logged | Critical |
| NFT-SEC-024 | PII handling | Data privacy | Generate docs with PII | PII not stored beyond retention | High |
| NFT-SEC-025 | Data sanitization | Data cleanup | Delete job after 7 days | All data removed | High |
| NFT-SEC-026 | Backup encryption | Backup security | Check backup files | Backups encrypted | Medium |
| NFT-SEC-027 | Secure headers | HTTP headers | Check response headers | Security headers present | High |
| NFT-SEC-028 | CORS policy | Cross-origin security | Request from unauthorized origin | CORS policy blocks request | High |
| NFT-SEC-029 | Cookie security | Cookie flags | Check cookie attributes | HttpOnly, Secure, SameSite set | Medium |
| NFT-SEC-030 | Sensitive data in URLs | URL security | Check for secrets in URLs | No sensitive data in query params | High |

#### **B.2.4 Dependency & Supply Chain Security**

| Test Case ID | Test Name | Security Aspect | Test Method | Expected Outcome | Priority |
|--------------|-----------|-----------------|-------------|------------------|----------|
| NFT-SEC-031 | Vulnerable dependencies | CVE scanning | Run `pip-audit` | No high/critical vulnerabilities | High |
| NFT-SEC-032 | Outdated packages | Package versions | Check `requirements.txt` | All packages recent (<6 months) | Medium |
| NFT-SEC-033 | Malicious packages | Supply chain | Verify package checksums | Checksums match official registry | High |
| NFT-SEC-034 | License compliance | Legal compliance | Check package licenses | All licenses compatible | Low |
| NFT-SEC-035 | Container security | Docker image | Scan with Trivy | No critical image vulnerabilities | High |

**Security Testing Tools**:
- **OWASP ZAP**: Automated security scanning
- **Burp Suite**: Manual penetration testing
- **pip-audit**: Python dependency vulnerability scanning
- **Trivy**: Container image scanning
- **Bandit**: Python code security linter

---

### B.3 Reliability Testing

Purpose: Verify system stability and fault tolerance.

#### **B.3.1 Fault Tolerance Testing**

| Test Case ID | Test Name | Fault Condition | Test Scenario | Expected Outcome | Priority |
|--------------|-----------|-----------------|---------------|------------------|----------|
| NFT-REL-001 | Database failover | Primary DB failure | Kill primary DB instance | Failover to standby, no downtime | Critical |
| NFT-REL-002 | Worker crash recovery | Worker process crash | Kill worker mid-job | Job reassigned, completes | High |
| NFT-REL-003 | Network partition | Network split | Simulate network partition | System detects, retries | High |
| NFT-REL-004 | External API failure | Claude API down | LLM service unavailable | Graceful error, retry queue | Critical |
| NFT-REL-005 | Handwriting service failure | RunPod timeout | Service exceeds timeout | Exception raised, clear error | Critical |
| NFT-REL-006 | Disk full | No storage space | Fill disk to 100% | Rejects new jobs, alerts sent | High |
| NFT-REL-007 | Redis failure | Queue unavailable | Redis server down | Async jobs fail with clear error | High |
| NFT-REL-008 | Load balancer failure | LB goes down | Kill load balancer | Requests reach servers via backup | Medium |
| NFT-REL-009 | DNS resolution failure | DNS timeout | DNS server unreachable | Falls back to IP or cached DNS | Low |
| NFT-REL-010 | Partial service degradation | Some features down | VE prefabs missing | Skips VE, completes other features | Medium |

#### **B.3.2 Data Integrity Testing**

| Test Case ID | Test Name | Integrity Check | Test Scenario | Expected Outcome | Priority |
|--------------|-----------|-----------------|---------------|------------------|----------|
| NFT-REL-011 | Transaction atomicity | Database transactions | Simulate crash mid-transaction | Either all or no changes applied | High |
| NFT-REL-012 | Data corruption detection | Checksum validation | Corrupt file on disk | Corruption detected, file rejected | High |
| NFT-REL-013 | Concurrent write safety | Race conditions | Multiple writes to same resource | Last write wins or lock prevents | High |
| NFT-REL-014 | Duplicate prevention | Idempotency | Submit same request twice | Duplicate detected, not processed | Medium |
| NFT-REL-015 | Backup restoration | Backup recovery | Restore from backup | Data fully restored, consistent | High |

#### **B.3.3 Recovery Testing**

| Test Case ID | Test Name | Recovery Scenario | Test Procedure | Expected Outcome | Priority |
|--------------|-----------|-------------------|----------------|------------------|----------|
| NFT-REL-016 | Crash recovery | Server crash | Kill server, restart | Server recovers, in-flight jobs resume | Critical |
| NFT-REL-017 | Database restore | DB corruption | Restore from backup | System operational with latest data | High |
| NFT-REL-018 | Disaster recovery | Complete site failure | Failover to DR site | Service restored within RTO (4 hours) | Critical |
| NFT-REL-019 | Job queue recovery | Redis crash | Redis restart with persistence | Queued jobs not lost | High |
| NFT-REL-020 | Config recovery | Bad config deployment | Deploy bad config | Rollback to previous config | Medium |

---

### B.4 Scalability Testing

Purpose: Verify system can handle growth in load and data.

#### **B.4.1 Capacity Testing**

| Test Case ID | Test Name | Capacity Metric | Test Scenario | Target Capacity | Priority |
|--------------|-----------|-----------------|---------------|-----------------|----------|
| NFT-SCAL-001 | Max concurrent users | User capacity | Gradually increase users | Support 100+ concurrent users | High |
| NFT-SCAL-002 | Max documents per hour | Throughput | Generate continuously | Process 500+ docs/hour | High |
| NFT-SCAL-003 | Max queue depth | Job queue | Enqueue 10,000 jobs | Queue handles all jobs | Medium |
| NFT-SCAL-004 | Max dataset size | Storage | Generate large dataset | Handle 1TB+ datasets | Low |
| NFT-SCAL-005 | Max file size | Upload limit | Upload large seed image | Accept up to 10MB images | Medium |

#### **B.4.2 Elasticity Testing**

| Test Case ID | Test Name | Scaling Behavior | Test Scenario | Expected Outcome | Priority |
|--------------|-----------|------------------|---------------|------------------|----------|
| NFT-SCAL-006 | Scale-up | Add resources | Increase from 2→10 workers | Linear throughput increase | High |
| NFT-SCAL-007 | Scale-down | Remove resources | Decrease from 10→2 workers | Graceful job completion | High |
| NFT-SCAL-008 | Auto-scale up | Load increase | Load triggers scale-up | New instances launched | Medium |
| NFT-SCAL-009 | Auto-scale down | Load decrease | Low load triggers scale-down | Excess instances terminated | Medium |
| NFT-SCAL-010 | Burst scaling | Sudden spike | 0→100 requests instantly | Scale-up handles burst | High |

---

### B.5 Usability Testing

Purpose: Verify API ease of use and developer experience.

#### **B.5.1 API Documentation Testing**

| Test Case ID | Test Name | Documentation Aspect | Test Scenario | Expected Outcome | Priority |
|--------------|-----------|---------------------|---------------|------------------|----------|
| NFT-USAB-001 | API docs completeness | All endpoints documented | Review /docs | All endpoints, params documented | High |
| NFT-USAB-002 | Example accuracy | Code examples | Test all code examples | Examples work without modification | High |
| NFT-USAB-003 | Error messages clarity | Error documentation | Check error responses | Errors have clear messages, codes | High |
| NFT-USAB-004 | OpenAPI spec validity | Swagger/OpenAPI | Validate spec | Spec passes OpenAPI validation | Medium |
| NFT-USAB-005 | Interactive docs | Try-it-out feature | Use /docs to test | Can test endpoints in browser | Medium |

#### **B.5.2 Developer Experience Testing**

| Test Case ID | Test Name | DX Aspect | Test Scenario | Expected Outcome | Priority |
|--------------|-----------|-----------|---------------|------------------|----------|
| NFT-USAB-006 | SDK availability | Client libraries | Check for Python/JS SDKs | SDKs available, documented | Low |
| NFT-USAB-007 | Quick start guide | Getting started | Follow quick start | Working request in <10 minutes | High |
| NFT-USAB-008 | API versioning | Version management | Check version headers | Versions clearly indicated | Medium |
| NFT-USAB-009 | Changelog maintenance | Release notes | Review changelog | All changes documented | Low |
| NFT-USAB-010 | Deprecation notices | Breaking changes | Check deprecated features | Clear deprecation warnings | Medium |

---

### B.6 Compatibility Testing

Purpose: Verify system works across different environments.

#### **B.6.1 Browser Compatibility** (for API docs)

| Test Case ID | Browser | Version | Expected Outcome |
|--------------|---------|---------|------------------|
| NFT-COMPAT-001 | Chrome | Latest | /docs fully functional |
| NFT-COMPAT-002 | Firefox | Latest | /docs fully functional |
| NFT-COMPAT-003 | Safari | Latest | /docs fully functional |
| NFT-COMPAT-004 | Edge | Latest | /docs fully functional |

#### **B.6.2 Platform Compatibility**

| Test Case ID | Platform | Test Scenario | Expected Outcome | Priority |
|--------------|----------|---------------|------------------|----------|
| NFT-COMPAT-005 | Docker | Deploy in container | Runs without issues | Critical |
| NFT-COMPAT-006 | Railway | Deploy to Railway | Successful deployment | High |
| NFT-COMPAT-007 | AWS | Deploy to ECS/Lambda | Runs on AWS | Medium |
| NFT-COMPAT-008 | GCP | Deploy to Cloud Run | Runs on GCP | Low |
| NFT-COMPAT-009 | Azure | Deploy to App Service | Runs on Azure | Low |

#### **B.6.3 Python Version Compatibility**

| Test Case ID | Python Version | Test Scenario | Expected Outcome | Priority |
|--------------|----------------|---------------|------------------|----------|
| NFT-COMPAT-010 | Python 3.11 | Run full test suite | All tests pass | Critical |
| NFT-COMPAT-011 | Python 3.10 | Run full test suite | All tests pass | High |
| NFT-COMPAT-012 | Python 3.12 | Run full test suite | All tests pass | Medium |

---

### B.7 Maintainability Testing

Purpose: Verify system is easy to maintain and debug.

#### **B.7.1 Logging & Monitoring**

| Test Case ID | Test Name | Aspect | Test Scenario | Expected Outcome | Priority |
|--------------|-----------|--------|---------------|------------------|----------|
| NFT-MAINT-001 | Log completeness | Logging | Check logs during generation | All stages logged | High |
| NFT-MAINT-002 | Log levels | Log filtering | Filter by ERROR, INFO, DEBUG | Correct levels used | Medium |
| NFT-MAINT-003 | Structured logging | Log format | Parse log entries | JSON-formatted, parseable | High |
| NFT-MAINT-004 | Error traceability | Error tracking | Trace error through logs | Request ID tracks full flow | High |
| NFT-MAINT-005 | Metrics collection | Monitoring | Check Prometheus metrics | Key metrics exported | High |
| NFT-MAINT-006 | Health checks | Monitoring | Call /health endpoint | Returns detailed status | Critical |
| NFT-MAINT-007 | Alert configuration | Alerting | Trigger alert condition | Alert fired, notification sent | Medium |
| NFT-MAINT-008 | Dashboard usability | Visualization | View Grafana dashboards | Clear, actionable insights | Medium |

#### **B.7.2 Code Quality**

| Test Case ID | Test Name | Quality Metric | Tool | Acceptance Criteria | Priority |
|--------------|-----------|----------------|------|---------------------|----------|
| NFT-MAINT-009 | Code coverage | Test coverage | pytest-cov | >80% coverage | High |
| NFT-MAINT-010 | Code complexity | Cyclomatic complexity | radon | CC <10 per function | Medium |
| NFT-MAINT-011 | Code duplication | DRY principle | pylint | <5% duplicated code | Low |
| NFT-MAINT-012 | Code style | PEP 8 compliance | flake8 | No style violations | Medium |
| NFT-MAINT-013 | Type hints | Type coverage | mypy | >90% type hints | Medium |
| NFT-MAINT-014 | Security linting | Vulnerability scan | bandit | No high-severity issues | High |

---

## Test Environment Setup

### Test Environments

| Environment | Purpose | Configuration | Access |
|-------------|---------|---------------|--------|
| **Local Dev** | Development testing | Local Docker Compose | Developers |
| **CI/CD** | Automated testing | GitHub Actions runners | Automated |
| **Staging** | Pre-production testing | Mirrors production | QA team |
| **Production** | Live system | Full infrastructure | Ops team |

### Test Data Management

**Seed Image Dataset**:
- **Source**: Curated test set of 50 diverse seed images
- **Location**: `tests/fixtures/seed_images/`
- **Categories**: Invoice samples, receipt samples, form samples, letter samples
- **Licensing**: Public domain or test-licensed images

**Test Parameters**:
```yaml
# tests/fixtures/test_params.yaml
test_cases:
  minimal:
    language: "english"
    doc_type: "invoice"
    num_solutions: 1
    enable_handwriting: false
    enable_visual_elements: false
  
  full_features:
    language: "english"
    doc_type: "medical_form"
    num_solutions: 2
    enable_handwriting: true
    handwriting_ratio: 0.3
    enable_visual_elements: true
    visual_element_types: ["logo", "signature", "barcode"]
    enable_ocr: true
    enable_dataset_export: true
```

**Mock Services**:
- **Mock Claude API**: Returns predefined HTML responses for testing
- **Mock RunPod API**: Returns test handwriting images, simulates delays
- **Mock Supabase**: In-memory database for testing

---

## Testing Tools & Frameworks

### Test Frameworks

| Tool | Purpose | Usage |
|------|---------|-------|
| **pytest** | Unit & integration testing | `pytest tests/` |
| **pytest-asyncio** | Async test support | Async function testing |
| **pytest-cov** | Code coverage | `pytest --cov=api` |
| **httpx** | HTTP client testing | API request mocking |
| **respx** | HTTP mock library | Mock external APIs |
| **pytest-mock** | Mocking framework | Mock functions, classes |
| **Faker** | Test data generation | Generate realistic data |

### Load Testing Tools

| Tool | Purpose | Usage |
|------|---------|-------|
| **Locust** | Load & stress testing | `locust -f locustfile.py` |
| **Apache JMeter** | Performance testing | GUI-based test scenarios |
| **k6** | Cloud-native load testing | Scripted load tests |

### Security Testing Tools

| Tool | Purpose | Usage |
|------|---------|-------|
| **OWASP ZAP** | Security scanning | Automated vulnerability scan |
| **Burp Suite** | Penetration testing | Manual security testing |
| **pip-audit** | Dependency scanning | `pip-audit -r requirements.txt` |
| **Bandit** | Code security linting | `bandit -r api/` |
| **Trivy** | Container scanning | `trivy image docgenie-api:latest` |

### Monitoring & Observability

| Tool | Purpose | Usage |
|------|---------|-------|
| **Prometheus** | Metrics collection | Scrape /metrics endpoint |
| **Grafana** | Metrics visualization | Dashboard creation |
| **ELK Stack** | Log aggregation | Centralized logging |
| **Sentry** | Error tracking | Automatic error reporting |

---

## Test Execution Plan

### Phase 1: Unit Testing (Week 1-2)
**Objective**: Achieve 80%+ code coverage

**Tasks**:
1. Write unit tests for all utility functions (`api/utils.py`)
2. Test all pipeline stages individually (Stages 01-19)
3. Mock external dependencies (Claude API, RunPod, Supabase)
4. Achieve minimum 80% code coverage
5. Set up CI/CD pipeline for automated testing

**Deliverables**:
- 120+ unit test cases passing
- Coverage report >80%
- CI/CD pipeline configured

### Phase 2: Integration Testing (Week 3)
**Objective**: Verify component interactions

**Tasks**:
1. Test pipeline stage integrations (01-03, 03-05, 07-09, etc.)
2. Test external service integrations (Claude, RunPod, Supabase)
3. Test database operations (CRUD, transactions)
4. Test API endpoint workflows
5. Test background worker integration

**Deliverables**:
- 50+ integration test cases passing
- All critical workflows tested
- Service mocks validated

### Phase 3: System Testing (Week 4)
**Objective**: End-to-end workflow validation

**Tasks**:
1. Test complete generation workflows (minimal, full features)
2. Test error handling scenarios
3. Test async processing workflows
4. Test data quality and accuracy
5. Test performance benchmarks

**Deliverables**:
- 50+ system test cases passing
- All user journeys tested
- Performance baselines established

### Phase 4: Non-Functional Testing (Week 5-6)
**Objective**: Verify performance, security, reliability

**Tasks**:
1. **Performance**: Load, stress, endurance, scalability tests
2. **Security**: Penetration testing, vulnerability scanning
3. **Reliability**: Fault tolerance, recovery testing
4. **Usability**: Documentation review, DX testing

**Deliverables**:
- Load test report (normal, peak, sustained)
- Security audit report
- Reliability test report
- Performance benchmarks

### Phase 5: Regression Testing (Ongoing)
**Objective**: Prevent defect reintroduction

**Tasks**:
1. Run full test suite on every commit (CI/CD)
2. Add tests for every bug fix
3. Update tests for new features
4. Maintain >80% code coverage

**Frequency**: Continuous (automated on every PR/commit)

---

## Success Criteria & Metrics

### Test Completion Criteria

| Criteria | Target | Critical |
|----------|--------|----------|
| Unit test coverage | >80% | Yes |
| Integration tests passing | 100% | Yes |
| System tests passing | 100% | Yes |
| Load test: Normal load | 0% errors | Yes |
| Load test: Peak load | <5% errors | Yes |
| Security: Critical vulnerabilities | 0 | Yes |
| Security: High vulnerabilities | <5 | Yes |
| Performance: Basic generation | <60s | Yes |
| Performance: Handwriting generation | <300s | Yes |
| Uptime SLA | >99.5% | No |

### Quality Metrics

**Code Quality**:
- Code coverage: >80%
- Cyclomatic complexity: <10
- Code duplication: <5%
- Type hint coverage: >90%

**Performance**:
- API response time (P95): <500ms
- Document generation (minimal): <60s
- Document generation (with handwriting): <300s
- Throughput: >500 docs/hour

**Reliability**:
- Uptime: >99.5%
- MTBF (Mean Time Between Failures): >720 hours (30 days)
- MTTR (Mean Time To Recover): <30 minutes
- Error rate: <1%

**Security**:
- Zero critical vulnerabilities
- <5 high-severity vulnerabilities
- Dependency update cadence: <30 days behind

---

## Risk Assessment

### High-Risk Areas

| Component | Risk Level | Mitigation Strategy | Priority |
|-----------|------------|---------------------|----------|
| Claude API integration | **HIGH** | Retry logic, fallback prompts, rate limiting | Critical |
| RunPod handwriting service | **HIGH** | Timeout handling, batch optimization, error raising | Critical |
| PDF rendering (Playwright) | **MEDIUM** | Headless browser stability, resource limits | High |
| OCR accuracy | **MEDIUM** | Multiple OCR engine options, confidence thresholds | High |
| Async job processing | **MEDIUM** | Worker health checks, job retry mechanisms | High |
| Database transactions | **MEDIUM** | ACID compliance, connection pooling | High |
| File storage | **LOW** | Disk space monitoring, cleanup policies | Medium |

### Test Risk Mitigation

| Risk | Impact | Probability | Mitigation |
|------|--------|-------------|------------|
| External API unavailable during tests | High | Medium | Use mocks, record/replay mode |
| Test data corruption | Medium | Low | Version control test fixtures |
| Test environment instability | High | Medium | Docker isolation, reproducible builds |
| Long test execution time | Low | High | Parallel execution, selective testing |
| Flaky tests | Medium | Medium | Retry logic, better assertions |

---

## Test Reporting

### Test Reports

**Daily Reports** (Automated):
- Test execution summary (pass/fail counts)
- Code coverage trends
- Failed test details
- Performance benchmark comparison

**Weekly Reports** (Manual):
- Test progress against plan
- New defects discovered
- Defect resolution rate
- Risk updates

**Release Reports** (Per Release):
- Complete test execution summary
- All test case results
- Performance test results
- Security scan results
- Known issues and limitations

### Defect Tracking

**Defect Workflow**:
1. **Report**: Tester creates defect in issue tracker
2. **Triage**: Team prioritizes defect (P0-Critical, P1-High, P2-Medium, P3-Low)
3. **Assign**: Developer assigned to fix
4. **Fix**: Developer implements fix
5. **Verify**: Tester verifies fix
6. **Close**: Defect closed, regression test added

**Defect Metrics**:
- Defect discovery rate
- Defect resolution rate
- Defect escape rate (to production)
- Mean time to resolve (MTTR)

---

## Continuous Improvement

### Test Optimization

**Quarterly Reviews**:
- Review test coverage (identify gaps)
- Remove obsolete tests
- Update test data
- Optimize test execution time
- Review test environment stability

**Automation Goals**:
- Automate 100% of unit tests
- Automate 90% of integration tests
- Automate 70% of system tests
- Automate 50% of non-functional tests

---

## Appendix

### Test Case Template

```markdown
## Test Case ID: [ID]

**Test Name**: [Descriptive name]

**Component**: [Module/Component under test]

**Test Type**: [Unit/Integration/System/Non-Functional]

**Priority**: [Critical/High/Medium/Low]

**Prerequisites**:
- [List any setup required]

**Test Steps**:
1. [Step 1]
2. [Step 2]
3. [Step 3]

**Test Data**:
- [Input data required]

**Expected Result**:
- [What should happen]

**Actual Result**:
- [What actually happened - filled during execution]

**Status**: [Pass/Fail/Blocked/Not Run]

**Notes**:
- [Any additional observations]
```

### Glossary

- **API**: Application Programming Interface
- **CI/CD**: Continuous Integration/Continuous Deployment
- **DPI**: Dots Per Inch
- **GT**: Ground Truth
- **HW**: Handwriting
- **KIE**: Key Information Extraction
- **LLM**: Large Language Model
- **MTBF**: Mean Time Between Failures
- **MTTR**: Mean Time To Recover
- **OCR**: Optical Character Recognition
- **P95**: 95th Percentile
- **SLA**: Service Level Agreement
- **VE**: Visual Element

---

**Document Control**:
- **Author**: DocGenie QA Team
- **Reviewers**: Development Team, Product Manager
- **Approval**: Project Lead
- **Next Review Date**: [3 months from approval]

---

**END OF DOCUMENT**