| # Comprehensive Testing Plan & Test Cases |
| ## DocGenie Synthetic Document Generation API |
|
|
| **Document Version**: 1.0 |
| **Date**: March 4, 2026 |
| **Project**: DocGenie - AI-Powered Synthetic Document Dataset Generator |
|
|
| --- |
|
|
| ## Table of Contents |
| 1. [Testing Overview](#testing-overview) |
| 2. [Functional Testing](#functional-testing) |
| - [Unit Testing](#unit-testing) |
| - [Integration Testing](#integration-testing) |
| - [System Testing](#system-testing) |
| 3. [Non-Functional Testing](#non-functional-testing) |
| - [Performance Testing](#performance-testing) |
| - [Security Testing](#security-testing) |
| - [Reliability Testing](#reliability-testing) |
| - [Scalability Testing](#scalability-testing) |
| - [Usability Testing](#usability-testing) |
| 4. [Test Environment Setup](#test-environment-setup) |
| 5. [Testing Tools & Frameworks](#testing-tools--frameworks) |
| 6. [Test Execution Plan](#test-execution-plan) |
| 7. [Success Criteria & Metrics](#success-criteria--metrics) |
| 8. [Risk Assessment](#risk-assessment) |
|
|
| --- |
|
|
| ## Testing Overview |
|
|
| ### Purpose |
| This document outlines the comprehensive testing strategy for DocGenie API, ensuring quality, reliability, and performance of the synthetic document generation system across all 19 pipeline stages. |
|
|
| ### Scope |
| - API endpoints testing (`/generate`, `/generate/pdf`, `/generate/async`) |
| - 19-stage pipeline validation |
| - External service integrations (Claude API, RunPod handwriting service) |
| - Database operations (Supabase) |
| - Background job processing (Redis Queue) |
| - Error handling and recovery mechanisms |
|
|
| ### Testing Approach |
| - **Test-Driven Development (TDD)**: Write tests before implementation where applicable |
| - **Continuous Integration**: Automated test execution on every commit |
| - **Coverage Target**: Minimum 80% code coverage for critical paths |
| - **Risk-Based Testing**: Prioritize high-risk components (LLM integration, handwriting service) |
|
|
| --- |
|
|
| ## Functional Testing |
|
|
| ### A.1 Unit Testing |
|
|
| Unit tests verify individual functions and methods in isolation. Target: 85% code coverage. |
|
|
| #### **A.1.1 Seed Image Processing (Stage 01)** |
|
|
| **Module**: `api/utils.py::download_seed_images()` |
|
|
| | Test Case ID | Test Name | Input | Expected Output | Priority | |
| |--------------|-----------|-------|-----------------|----------| |
| | UT-SEED-001 | Download valid image URL | Valid HTTPS URL (JPEG) | Base64-encoded image string | High | |
| | UT-SEED-002 | Download PNG format | Valid PNG URL | Base64-encoded PNG | High | |
| | UT-SEED-003 | Handle 503 timeout error | URL returning 503 | Retry 3 times, eventual success | Critical | |
| | UT-SEED-004 | Handle 502 bad gateway | URL returning 502 | Retry with exponential backoff | High | |
| | UT-SEED-005 | Handle 404 not found | Invalid URL | Raise HTTPException(400) | High | |
| | UT-SEED-006 | Handle connection timeout | Slow/unresponsive server | Retry then raise exception | Medium | |
| | UT-SEED-007 | Validate image format | Non-image URL (HTML) | Raise validation error | Medium | |
| | UT-SEED-008 | Handle oversized images | >10MB image | Process or reject gracefully | Low | |
| | UT-SEED-009 | Test retry backoff timing | Mock 503 responses | Delays: 2s, 4s, 8s | Medium | |
| | UT-SEED-010 | Test max retries exhausted | Persistent 503 errors | Raise exception after 3 attempts | High | |
|
|
| **Test Implementation**: |
| ```python |
| # test_seed_download.py |
| import pytest |
| from api.utils import download_seed_images |
| from unittest.mock import patch, Mock |
| |
| @pytest.mark.asyncio |
| async def test_download_valid_image(): |
| url = "https://example.com/test.jpg" |
| with patch('httpx.AsyncClient') as mock_client: |
| mock_response = Mock() |
| mock_response.content = b'\xff\xd8\xff\xe0' # JPEG header |
| mock_client.return_value.__aenter__.return_value.get.return_value = mock_response |
| |
| result = await download_seed_images([url]) |
| assert len(result) == 1 |
| assert isinstance(result[0], str) # base64 string |
| |
| @pytest.mark.asyncio |
| async def test_download_503_retry(): |
| url = "https://example.com/test.jpg" |
| with patch('httpx.AsyncClient') as mock_client: |
| # First two calls: 503, third call: success |
| responses = [ |
| Mock(status_code=503, raise_for_status=Mock(side_effect=httpx.HTTPStatusError("503", request=Mock(), response=Mock()))), |
| Mock(status_code=503, raise_for_status=Mock(side_effect=httpx.HTTPStatusError("503", request=Mock(), response=Mock()))), |
| Mock(content=b'\xff\xd8\xff\xe0', raise_for_status=Mock()) |
| ] |
| mock_client.return_value.__aenter__.return_value.get.side_effect = responses |
| |
| result = await download_seed_images([url]) |
| assert len(result) == 1 |
| assert mock_client.return_value.__aenter__.return_value.get.call_count == 3 |
| ``` |
|
|
| #### **A.1.2 HTML Processing (Stage 03)** |
|
|
| **Module**: `api/utils.py::extract_html_documents_from_response()` |
|
|
| | Test Case ID | Test Name | Input | Expected Output | Priority | |
| |--------------|-----------|-------|-----------------|----------| |
| | UT-HTML-001 | Extract single HTML | LLM response with 1 HTML | List with 1 HTML document | High | |
| | UT-HTML-002 | Extract multiple HTMLs | Response with 3 HTMLs | List with 3 documents | High | |
| | UT-HTML-003 | Extract ground truth | HTML with `<script id="GT">` | GT JSON extracted, script removed | Critical | |
| | UT-HTML-004 | Handle malformed HTML | Invalid HTML tags | Parse with BeautifulSoup recovery | Medium | |
| | UT-HTML-005 | Handle missing DOCTYPE | HTML without DOCTYPE | Add DOCTYPE or flag error | Low | |
| | UT-HTML-006 | Validate CSS presence | HTML without `<style>` | Raise validation error | High | |
| | UT-HTML-007 | Extract handwriting markers | HTML with `class="handwritten"` | Identify 5 handwriting elements | High | |
| | UT-HTML-008 | Extract visual elements | HTML with `data-placeholder` | Identify 3 visual elements | High | |
| | UT-HTML-009 | Handle empty response | Empty string from LLM | Return empty list | Medium | |
| | UT-HTML-010 | Prettify minified HTML | Single-line HTML | Multi-line formatted HTML | Low | |
|
|
| #### **A.1.3 PDF Rendering (Stage 04)** |
|
|
| **Module**: `api/utils.py::render_html_to_pdf()` |
|
|
| | Test Case ID | Test Name | Input | Expected Output | Priority | |
| |--------------|-----------|-------|-----------------|----------| |
| | UT-PDF-001 | Render A4 document | HTML with A4 page size | PDF 210×297mm | High | |
| | UT-PDF-002 | Render Letter size | HTML with Letter page | PDF 215.9×279.4mm | Medium | |
| | UT-PDF-003 | Extract geometries | HTML with handwriting | Geometries JSON with rects | Critical | |
| | UT-PDF-004 | Handle custom fonts | HTML with @font-face | PDF with embedded fonts | Low | |
| | UT-PDF-005 | Preserve CSS styling | HTML with colors/borders | PDF matches visual style | Medium | |
| | UT-PDF-006 | Handle images in HTML | HTML with <img> tags | Images embedded in PDF | Low | |
| | UT-PDF-007 | Extract text coordinates | HTML with paragraphs | Accurate bbox coordinates | High | |
| | UT-PDF-008 | Handle landscape orientation | HTML with landscape CSS | PDF in landscape mode | Low | |
| | UT-PDF-009 | Validate page dimensions | Various page sizes | Dimensions match CSS @page | High | |
| | UT-PDF-010 | Handle Playwright errors | Browser crash scenario | Retry or graceful failure | Medium | |
|
|
| #### **A.1.4 Bbox Extraction (Stage 05)** |
|
|
| **Module**: `api/utils.py::extract_bboxes_from_rendered_pdf()` |
|
|
| | Test Case ID | Test Name | Input | Expected Output | Priority | |
| |--------------|-----------|-------|-----------------|----------| |
| | UT-BBOX-001 | Extract word bboxes | Standard PDF | List of word-level bboxes | Critical | |
| | UT-BBOX-002 | Extract char bboxes | Same PDF | List of char-level bboxes | High | |
| | UT-BBOX-003 | Handle multi-line text | PDF with paragraphs | Correct block/line grouping | High | |
| | UT-BBOX-004 | Filter whitespace | PDF with spaces/tabs | No whitespace-only bboxes | Medium | |
| | UT-BBOX-005 | Handle special characters | PDF with ©, ®, ™ | Characters properly extracted | Medium | |
| | UT-BBOX-006 | Handle non-Latin scripts | PDF with Chinese/Arabic | Correct unicode extraction | Low | |
| | UT-BBOX-007 | Validate coordinates | Extracted bboxes | All coords within page bounds | High | |
| | UT-BBOX-008 | Handle empty PDF | PDF with no text | Return empty list | Low | |
| | UT-BBOX-009 | Handle rotated text | PDF with rotation | Bboxes account for rotation | Low | |
| | UT-BBOX-010 | Parse bbox strings | "0_0_0 Hello 10 20 50 30" | OCRBox object with correct fields | High | |
|
|
| #### **A.1.5 Handwriting Region Extraction (Stage 07)** |
|
|
| **Module**: `api/utils.py::process_stage3_complete()` - handwriting section |
|
|
| | Test Case ID | Test Name | Input | Expected Output | Priority | |
| |--------------|-----------|-------|-----------------|----------| |
| | UT-HW-001 | Filter by handwriting_ratio | 10 regions, ratio=0.3 | ~3 regions selected | Critical | |
| | UT-HW-002 | Parse author IDs | `class="handwritten author1"` | author_id="author1" | High | |
| | UT-HW-003 | Match to word bboxes | Geometry + bboxes | Correct bbox mapping | Critical | |
| | UT-HW-004 | Handle signature class | `class="handwritten signature"` | is_signature=True | Medium | |
| | UT-HW-005 | DPI coordinate conversion | Browser coords (96 DPI) | PDF coords (72 DPI) with 0.75 scale | High | |
| | UT-HW-006 | Handle overlapping regions | 2 regions, same text | Prevent duplicate bbox usage | Medium | |
| | UT-HW-007 | Validate rect boundaries | Geometries with rect | Check bboxes within rect threshold | High | |
| | UT-HW-008 | Test seed reproducibility | Same seed, same input | Identical region selection | High | |
| | UT-HW-009 | Handle zero ratio | ratio=0.0 | No regions selected | Medium | |
| | UT-HW-010 | Handle full ratio | ratio=1.0 | All regions selected | Medium | |
| |
| #### **A.1.6 Handwriting Service Integration** |
| |
| **Module**: `api/utils.py::call_handwriting_service_batch()` |
|
|
| | Test Case ID | Test Name | Input | Expected Output | Priority | |
| |--------------|-----------|-------|-----------------|----------| |
| | UT-HWSVC-001 | Batch request format | 10 texts with metadata | Correct RunPod JSON format | Critical | |
| | UT-HWSVC-002 | Handle sync response | Immediate completion | Parse output.images[] | High | |
| | UT-HWSVC-003 | Handle IN_PROGRESS | Delayed completion | Poll status endpoint | Critical | |
| | UT-HWSVC-004 | Status polling timeout | Job exceeds 30 polls | Raise timeout exception | High | |
| | UT-HWSVC-005 | Handle FAILED status | RunPod job failure | Raise exception with error | High | |
| | UT-HWSVC-006 | Parse image results | Batch response | Map hw_id to image_base64 | Critical | |
| | UT-HWSVC-007 | Calculate dynamic timeout | 50 texts | Timeout = 50×20+30 = 1030s | Medium | |
| | UT-HWSVC-008 | Handle network errors | Connection timeout | Retry up to max_retries | High | |
| | UT-HWSVC-009 | Validate authorization | Missing API key | Request includes Bearer token | Medium | |
| | UT-HWSVC-010 | Test exponential backoff | Status polling | Delays: 5s, 6s, 7s... up to 10s | Low | |
|
|
| #### **A.1.7 Visual Element Generation (Stage 10)** |
|
|
| **Module**: `api/utils.py::generate_visual_element_images()` |
|
|
| | Test Case ID | Test Name | Input | Expected Output | Priority | |
| |--------------|-----------|-------|-----------------|----------| |
| | UT-VE-001 | Select logo prefab | type="logo" | Random logo from prefabs/ | High | |
| | UT-VE-002 | Select photo prefab | type="photo" | Random photo image | High | |
| | UT-VE-003 | Generate barcode | type="barcode" | EAN-13 barcode image | Medium | |
| | UT-VE-004 | Generate QR code | type="qr_code", content="URL" | QR code image | Medium | |
| | UT-VE-005 | Test seed reproducibility | Same seed, same type | Identical prefab selection | High | |
| | UT-VE-006 | Handle missing prefabs | type with no files | Fallback or error | Medium | |
| | UT-VE-007 | Load SVG prefabs | SVG logo file | Convert to PNG | Low | |
| | UT-VE-008 | Filter by requested types | types=["logo","signature"] | Only matching types generated | High | |
| | UT-VE-009 | Normalize type synonyms | "chart" → "figure" | Consistent type mapping | Medium | |
| | UT-VE-010 | Return base64 encoding | All image types | Valid base64 strings | High | |
| |
| #### **A.1.8 PDF Modification (Stages 12-13)** |
| |
| **Module**: `api/utils.py::process_stage3_complete()` - insertion sections |
| |
| | Test Case ID | Test Name | Input | Expected Output | Priority | |
| |--------------|-----------|-------|-----------------|----------| |
| | UT-PDFMOD-001 | Whiteout text regions | 5 word bboxes | White rectangles drawn | High | |
| | UT-PDFMOD-002 | Insert handwriting image | Image + bbox | Image at correct position | Critical | |
| | UT-PDFMOD-003 | Apply random offsets | Word bbox | Position offset within limits | Medium | |
| | UT-PDFMOD-004 | Resize with aspect ratio | Wide/tall images | Scaled to fit bbox | High | |
| | UT-PDFMOD-005 | Insert visual element | Logo + rect | Centered in bbox | High | |
| | UT-PDFMOD-006 | Handle rotation | Element with rotation=45 | Rotated image insertion | Low | |
| | UT-PDFMOD-007 | Save intermediate PDF | After handwriting | _with_handwriting.pdf created | Medium | |
| | UT-PDFMOD-008 | Save final PDF | After visual elements | _final.pdf created | High | |
| | UT-PDFMOD-009 | Scale factor application | 3x upscale | High-res image quality | Medium | |
| | UT-PDFMOD-010 | Handle insertion errors | Invalid image data | Log error, continue | Medium | |
|
|
| #### **A.1.9 OCR Processing (Stage 15)** |
|
|
| **Module**: `api/utils.py::run_paddle_ocr()` |
|
|
| | Test Case ID | Test Name | Input | Expected Output | Priority | |
| |--------------|-----------|-------|-----------------|----------| |
| | UT-OCR-001 | OCR English text | English document image | Accurate word recognition | Critical | |
| | UT-OCR-002 | OCR with handwriting | Mixed typed/handwritten | Both text types detected | High | |
| | UT-OCR-003 | Extract word bboxes | Document image | List of word-level bboxes | Critical | |
| | UT-OCR-004 | Calculate confidence | OCR results | Confidence score per word | High | |
| | UT-OCR-005 | Handle low quality | Blurry/noisy image | Reasonable accuracy (>70%) | Medium | |
| | UT-OCR-006 | Handle rotated text | 90° rotated document | Correct orientation detection | Low | |
| | UT-OCR-007 | Multi-language support | Document with German text | lang="de" parameter works | Medium | |
| | UT-OCR-008 | Handle empty image | Blank white image | Empty results list | Low | |
| | UT-OCR-009 | DPI configuration | Various DPI settings | Consistent accuracy | Medium | |
| | UT-OCR-010 | Return image dimensions | Any image | width, height in pixels | High | |
|
|
| #### **A.1.10 Bbox Normalization (Stage 16)** |
|
|
| **Module**: `api/utils.py::normalize_bboxes()` |
|
|
| | Test Case ID | Test Name | Input | Expected Output | Priority | |
| |--------------|-----------|-------|-----------------|----------| |
| | UT-NORM-001 | Normalize to [0,1] | Pixel bboxes, image dims | Normalized coordinates | Critical | |
| | UT-NORM-002 | Handle out-of-bounds | x1 > image_width | Clipped to [0, 1] | High | |
| | UT-NORM-003 | Preserve text data | Bboxes with text field | Text preserved in output | High | |
| | UT-NORM-004 | Create segment bboxes | Word-level bboxes | Aggregated segment bboxes | Medium | |
| | UT-NORM-005 | Handle zero dimensions | Image with width=0 | Raise validation error | Low | |
| | UT-NORM-006 | Round to precision | Float coordinates | 6 decimal places | Low | |
| | UT-NORM-007 | Maintain bbox order | Ordered input list | Same order in output | Medium | |
| | UT-NORM-008 | Handle negative coords | bbox with x0=-5 | Clipped to 0 | Medium | |
| | UT-NORM-009 | Validate bbox format | Various input formats | Consistent output schema | High | |
| | UT-NORM-010 | Handle empty list | No bboxes | Return empty list | Low | |
| |
| #### **A.1.11 Dataset Export (Stage 19)** |
| |
| **Module**: `api/utils.py::export_to_msgpack()` |
| |
| | Test Case ID | Test Name | Input | Expected Output | Priority | |
| |--------------|-----------|-------|-----------------|----------| |
| | UT-EXPORT-001 | Create msgpack file | Complete document data | Valid .msgpack file | Critical | |
| | UT-EXPORT-002 | Encode image bytes | PNG image | Binary image in msgpack | High | |
| | UT-EXPORT-003 | Store normalized bboxes | Normalized coordinates | Bboxes in [0,1] range | High | |
| | UT-EXPORT-004 | Store ground truth | GT JSON | GT dict in msgpack | High | |
| | UT-EXPORT-005 | Store metadata | Document metadata | Metadata dict in msgpack | Medium | |
| | UT-EXPORT-006 | Validate msgpack format | Generated file | Readable by msgpack.load() | Critical | |
| | UT-EXPORT-007 | Handle large files | 10MB+ image | Compression applied | Low | |
| | UT-EXPORT-008 | Store words list | OCR words | Ordered word list | High | |
| | UT-EXPORT-009 | Handle missing fields | Partial data | Fill with null/defaults | Medium | |
| | UT-EXPORT-010 | Return file path | Export operation | Absolute path to .msgpack | Medium | |
| |
| #### **A.1.12 Validation Functions** |
| |
| **Module**: `api/utils.py::validate_*()` |
| |
| | Test Case ID | Test Name | Input | Expected Output | Priority | |
| |--------------|-----------|-------|-----------------|----------| |
| | UT-VAL-001 | Validate HTML structure | Valid HTML5 | (True, None) | High | |
| | UT-VAL-002 | Detect missing DOCTYPE | HTML without DOCTYPE | (False, "Missing DOCTYPE") | Medium | |
| | UT-VAL-003 | Detect missing CSS | HTML without <style> | (False, "Missing CSS") | High | |
| | UT-VAL-004 | Validate PDF file | Valid PDF | (True, None) | High | |
| | UT-VAL-005 | Detect corrupt PDF | Truncated PDF file | (False, "Corrupt PDF") | High | |
| | UT-VAL-006 | Validate bbox count | 100 bboxes, min=50 | (True, None) | Medium | |
| | UT-VAL-007 | Detect insufficient bboxes | 10 bboxes, min=50 | (False, "Insufficient bboxes") | Medium | |
| | UT-VAL-008 | Validate bbox coordinates | Valid bboxes | (True, None) | High | |
| | UT-VAL-009 | Detect invalid coordinates | x0 > x1 | (False, "Invalid bbox") | High | |
| | UT-VAL-010 | Validate page count | Multi-page PDF | (False, "Expected 1 page") | Medium | |
| |
| **Total Unit Tests**: 120+ test cases |
| |
| --- |
| |
| ### A.2 Integration Testing |
| |
| Integration tests verify interactions between multiple components. Target: Complete workflow coverage. |
| |
| #### **A.2.1 Pipeline Stage Integration** |
| |
| **Purpose**: Verify data flow between consecutive pipeline stages |
| |
| | Test Case ID | Test Name | Components | Test Scenario | Priority | |
| |--------------|-----------|------------|---------------|----------| |
| | IT-PIPE-001 | Stages 01-03 integration | Seed download → LLM → HTML extraction | Download seeds, call LLM, extract HTML successfully | Critical | |
| | IT-PIPE-002 | Stages 03-05 integration | HTML extraction → PDF render → Bbox extraction | Clean HTML renders to PDF, bboxes extracted | Critical | |
| | IT-PIPE-003 | Stages 07-09 integration | HW extraction → Service call | HW regions trigger service batch request | Critical | |
| | IT-PIPE-004 | Stages 09-12 integration | HW generation → Insertion | Generated images inserted at correct positions | Critical | |
| | IT-PIPE-005 | Stages 14-15 integration | Image render → OCR | Final image passed to OCR successfully | High | |
| | IT-PIPE-006 | Stages 15-16 integration | OCR → Normalization | OCR bboxes normalized with correct dimensions | High | |
| | IT-PIPE-007 | Stages 07-13 complete | Full Stage 3 | Handwriting + visual elements end-to-end | Critical | |
| | IT-PIPE-008 | Stages 14-19 complete | Full Stages 4-5 | OCR → export complete workflow | High | |
| | IT-PIPE-009 | Stages 01-19 minimal | End-to-end minimal | No handwriting/VE, basic generation | Critical | |
| | IT-PIPE-010 | Stages 01-19 full | End-to-end full features | All features enabled, complete dataset | Critical | |
| |
| #### **A.2.2 External Service Integration** |
| |
| **Purpose**: Verify interactions with external APIs and services |
| |
| | Test Case ID | Test Name | Services | Test Scenario | Priority | |
| |--------------|-----------|----------|---------------|----------| |
| | IT-EXT-001 | Claude API integration | Claude Messages API | Send prompt, receive valid response | Critical | |
| | IT-EXT-002 | Claude error handling | Claude API | Handle rate limits (429) gracefully | High | |
| | IT-EXT-003 | Claude retry logic | Claude API | Automatic retry on transient errors | High | |
| | IT-EXT-004 | RunPod sync integration | RunPod /runsync | Send batch, receive images | Critical | |
| | IT-EXT-005 | RunPod async integration | RunPod /run + status | Queue job, poll until completion | High | |
| | IT-EXT-006 | RunPod auth | RunPod API | Bearer token authentication works | Medium | |
| | IT-EXT-007 | Supabase storage | Supabase storage API | Upload/download seed images | Medium | |
| | IT-EXT-008 | Supabase database | Supabase DB | Store generation metadata | Medium | |
| | IT-EXT-009 | Redis Queue | RQ worker | Enqueue async job, process in background | High | |
| | IT-EXT-010 | Google Drive | Drive API (optional) | Export to Google Drive if configured | Low | |
| |
| #### **A.2.3 Database Operations** |
| |
| **Purpose**: Verify database interactions (Supabase) |
| |
| | Test Case ID | Test Name | Operations | Test Scenario | Priority | |
| |--------------|-----------|------------|---------------|----------| |
| | IT-DB-001 | Insert generation record | INSERT | New generation logged in DB | High | |
| | IT-DB-002 | Update generation status | UPDATE | Status changes reflected | High | |
| | IT-DB-003 | Query by task ID | SELECT | Retrieve generation by ID | High | |
| | IT-DB-004 | Store metadata | INSERT | Complete metadata stored | Medium | |
| | IT-DB-005 | Handle connection errors | Network failure | Retry or graceful degradation | High | |
| | IT-DB-006 | Transaction rollback | Error mid-transaction | Data consistency maintained | Medium | |
| | IT-DB-007 | Concurrent updates | Multiple workers | No race conditions | Medium | |
| | IT-DB-008 | Pagination | Large result sets | Efficient pagination | Low | |
| | IT-DB-009 | Search functionality | Full-text search | Search by doc_type, language | Low | |
| | IT-DB-010 | Data retention | Cleanup old data | Archive/delete after N days | Low | |
| |
| #### **A.2.4 API Endpoint Integration** |
| |
| **Purpose**: Test complete request/response cycles through endpoints |
| |
| | Test Case ID | Test Name | Endpoint | Test Scenario | Priority | |
| |--------------|-----------|----------|---------------|----------| |
| | IT-API-001 | GET /health | Health check | Returns 200 with system status | Critical | |
| | IT-API-002 | POST /generate | Legacy endpoint | Returns JSON with complete data | High | |
| | IT-API-003 | POST /generate/pdf | Sync PDF endpoint | Returns ZIP file download | Critical | |
| | IT-API-004 | POST /generate/async | Async endpoint | Returns task ID | Critical | |
| | IT-API-005 | GET /generate/async/status/{id} | Status check | Returns current job status | Critical | |
| | IT-API-006 | GET /generate/async/result/{id} | Result download | Returns ZIP when complete | High | |
| | IT-API-007 | Request validation | All endpoints | Invalid params rejected with 400 | High | |
| | IT-API-008 | Authentication | Protected endpoints | Requires valid API key | High | |
| | IT-API-009 | Rate limiting | All endpoints | Enforces rate limits | Medium | |
| | IT-API-010 | CORS headers | All endpoints | Correct CORS configuration | Medium | |
| |
| #### **A.2.5 Background Worker Integration** |
| |
| **Purpose**: Test async job processing via Redis Queue |
| |
| | Test Case ID | Test Name | Components | Test Scenario | Priority | |
| |--------------|-----------|------------|---------------|----------| |
| | IT-WORKER-001 | Job enqueue | API → RQ | Job added to queue successfully | Critical | |
| | IT-WORKER-002 | Job processing | Worker → Pipeline | Worker picks up and processes job | Critical | |
| | IT-WORKER-003 | Job status updates | Worker → DB | Status updated throughout processing | High | |
| | IT-WORKER-004 | Job failure handling | Worker error | Failed job logged, error reported | High | |
| | IT-WORKER-005 | Job retry | Transient failure | Failed job retried up to max attempts | High | |
| | IT-WORKER-006 | Job timeout | Long-running job | Timeout enforced, job killed | Medium | |
| | IT-WORKER-007 | Result storage | Worker → Storage | Results saved to correct location | High | |
| | IT-WORKER-008 | Queue priority | Multiple jobs | High priority jobs processed first | Low | |
| | IT-WORKER-009 | Worker scaling | Multiple workers | Jobs distributed across workers | Medium | |
| | IT-WORKER-010 | Worker health | Worker crash | Replaced automatically, jobs reassigned | High | |
| |
| **Total Integration Tests**: 50+ test cases |
| |
| --- |
| |
| ### A.3 System Testing |
| |
| System tests verify end-to-end workflows from user perspective. Target: All user journeys covered. |
| |
| #### **A.3.1 Complete Generation Workflows** |
| |
| | Test Case ID | Test Name | Workflow | Test Scenario | Expected Outcome | Priority | |
| |--------------|-----------|----------|---------------|------------------|----------| |
| | ST-GEN-001 | Basic document generation | Minimal config | Generate 1 English invoice, no handwriting/VE | PDF + metadata returned in <60s | Critical | |
| | ST-GEN-002 | Handwriting generation | Enable handwriting | Generate document with handwriting | Handwriting visible in PDF | Critical | |
| | ST-GEN-003 | Visual elements | Enable VE | Generate document with logo + barcode | Elements visible in PDF | High | |
| | ST-GEN-004 | Full feature set | All features enabled | Generate with HW + VE + OCR + analysis | Complete dataset ZIP | Critical | |
| | ST-GEN-005 | Multi-document batch | num_solutions=5 | Generate 5 documents from 3 seeds | 5 complete documents | High | |
| | ST-GEN-006 | Reproducible generation | Same seed value | Generate twice with seed=42 | Identical outputs | High | |
| | ST-GEN-007 | Multi-language | language="german" | Generate German document | Correct language output | Medium | |
| | ST-GEN-008 | Various doc types | doc_type variations | Test invoice, receipt, form, letter | All types work | High | |
| | ST-GEN-009 | Different GT formats | gt_type="kie" / "qa" | Test both GT formats | Correct GT structure | High | |
| | ST-GEN-010 | Custom seed images | User-provided URLs | Generate from user's images | Images influence output | High | |
| |
| #### **A.3.2 Error Handling Workflows** |
| |
| | Test Case ID | Test Name | Error Condition | Test Scenario | Expected Outcome | Priority | |
| |--------------|-----------|-----------------|---------------|------------------|----------| |
| | ST-ERR-001 | Invalid seed URL | 404 not found | Submit invalid image URL | HTTP 400 with clear error message | High | |
| | ST-ERR-002 | LLM API failure | Claude API down | Submit request during outage | HTTP 503 with retry-after | Critical | |
| | ST-ERR-003 | Handwriting service failure | RunPod timeout | Enable handwriting, service fails | HTTP 500, generation stopped | Critical | |
| | ST-ERR-004 | Invalid parameters | Missing required field | Omit doc_type parameter | HTTP 422 with validation details | High | |
| | ST-ERR-005 | Rate limit exceeded | Too many requests | Submit 100 concurrent requests | HTTP 429 with retry info | High | |
| | ST-ERR-006 | Payload too large | Huge request | Submit 50 seed image URLs | HTTP 413 payload too large | Medium | |
| | ST-ERR-007 | Malformed JSON | Invalid JSON | Submit broken JSON request | HTTP 400 with parse error | High | |
| | ST-ERR-008 | Authentication failure | Missing/invalid API key | Request without auth | HTTP 401 unauthorized | High | |
| | ST-ERR-009 | Database connection loss | DB unavailable | Submit during DB outage | Graceful degradation or 503 | Medium | |
| | ST-ERR-010 | Disk space exhausted | No storage space | Generate large batch | HTTP 507 insufficient storage | Low | |
| |
| #### **A.3.3 Async Processing Workflows** |
| |
| | Test Case ID | Test Name | Workflow | Test Scenario | Expected Outcome | Priority | |
| |--------------|-----------|----------|---------------|------------------|----------| |
| | ST-ASYNC-001 | Submit async job | POST /generate/async | Submit batch job | Receive task ID immediately | Critical | |
| | ST-ASYNC-002 | Check pending status | GET status before completion | Poll status endpoint | Returns "pending" or "processing" | High | |
| | ST-ASYNC-003 | Check completed status | GET status after completion | Poll status after 5 minutes | Returns "completed" | Critical | |
| | ST-ASYNC-004 | Download results | GET result/{id} | Download after completion | Returns ZIP file | Critical | |
| | ST-ASYNC-005 | Check failed status | Job fails during processing | Check status of failed job | Returns "failed" with error details | High | |
| | ST-ASYNC-006 | Multiple concurrent jobs | Submit 10 jobs | 10 async submissions | All jobs process independently | High | |
| | ST-ASYNC-007 | Job cancellation | Cancel in-progress job | Submit, then cancel | Job stops, partial results cleaned | Medium | |
| | ST-ASYNC-008 | Result expiration | Check old results | Access 7-day old result | HTTP 410 gone (expired) | Low | |
| | ST-ASYNC-009 | Progress updates | Monitor long job | Poll during processing | Progress % increases | Medium | |
| | ST-ASYNC-010 | Worker restart recovery | Worker crashes mid-job | Kill worker process | Job reassigned, completes | High | |
| |
| #### **A.3.4 Data Quality Workflows** |
| |
| | Test Case ID | Test Name | Quality Check | Test Scenario | Expected Outcome | Priority | |
| |--------------|-----------|---------------|---------------|------------------|----------| |
| | ST-QUAL-001 | OCR accuracy | Compare OCR to ground truth | Generate doc, compare OCR text to GT | >90% accuracy | High | |
| | ST-QUAL-002 | Bbox alignment | Visual inspection | Generate doc with debug viz | Bboxes align with text | High | |
| | ST-QUAL-003 | Handwriting quality | Visual realism | Generate handwritten doc | Handwriting looks realistic | Medium | |
| | ST-QUAL-004 | Visual element placement | Correct positioning | Generate with logo + barcode | Elements at correct positions | High | |
| | ST-QUAL-005 | GT completeness | All GT fields present | Generate KIE document | All expected GT fields extracted | High | |
| | ST-QUAL-006 | Dataset format validity | msgpack validation | Export dataset | PyTorch can load msgpack | High | |
| | ST-QUAL-007 | Image resolution | Check output image | Render final image | Minimum 220 DPI quality | Medium | |
| | ST-QUAL-008 | PDF compliance | PDF/A validation | Generate PDF | Valid PDF/A format | Low | |
| | ST-QUAL-009 | Metadata accuracy | Check metadata fields | Generate document | Metadata matches actual data | High | |
| | ST-QUAL-010 | Reproducibility | Same input → same output | Generate 3 times with seed | All outputs identical | High | |
| |
| #### **A.3.5 Performance Workflows** |
| |
| | Test Case ID | Test Name | Performance Metric | Test Scenario | Target Performance | Priority | |
| |--------------|-----------|-------------------|---------------|---------------------|----------| |
| | ST-PERF-001 | Basic generation time | Time to completion | Generate minimal document | <60 seconds | High | |
| | ST-PERF-002 | Handwriting generation time | Time with HW | Generate with 20 HW words | <300 seconds | High | |
| | ST-PERF-003 | Batch generation time | Multiple documents | Generate 10 documents | <15 minutes | Medium | |
| | ST-PERF-004 | API response time | Endpoint latency | Submit request | <500ms to return task ID | High | |
| | ST-PERF-005 | Status check latency | Status endpoint | Check job status | <100ms response time | Medium | |
| | ST-PERF-006 | Concurrent requests | Load handling | 50 concurrent requests | All complete successfully | High | |
| | ST-PERF-007 | Large payload | Big request | 8 seed images, 10 solutions | Processes without timeout | Medium | |
| | ST-PERF-008 | Memory usage | Resource consumption | Generate 100 documents | <8GB RAM per worker | Medium | |
| | ST-PERF-009 | Disk I/O | Storage performance | Rapid sequential generations | No I/O bottleneck | Low | |
| | ST-PERF-010 | Network bandwidth | Data transfer | Download large result ZIP | Download completes in <60s | Low | |
| |
| **Total System Tests**: 50+ test cases |
| |
| --- |
| |
| ## Non-Functional Testing |
| |
| ### B.1 Performance Testing |
| |
| Purpose: Verify system performance under various load conditions. |
| |
| #### **B.1.1 Load Testing** |
| |
| **Tool**: Apache JMeter / Locust |
| |
| | Test Case ID | Test Name | Load Profile | Metrics | Acceptance Criteria | Priority | |
| |--------------|-----------|--------------|---------|---------------------|----------| |
| | NFT-LOAD-001 | Normal load | 10 concurrent users, 1 hour | Throughput, response time | Avg response <5s, 0 errors | Critical | |
| | NFT-LOAD-002 | Peak load | 50 concurrent users, 30 min | Throughput, error rate | <5% error rate, response <15s | Critical | |
| | NFT-LOAD-003 | Sustained load | 25 concurrent users, 4 hours | CPU, memory, throughput | Stable resource usage, no leaks | High | |
| | NFT-LOAD-004 | Ramp-up load | 1→100 users over 30 min | System behavior | Graceful scaling or degradation | High | |
| | NFT-LOAD-005 | Spike load | Sudden 0→100 users | Response time spike | Recovers within 2 minutes | Medium | |
| |
| **Test Script Example (Locust)**: |
| ```python |
| # locustfile.py |
| from locust import HttpUser, task, between |
| |
| class DocGenieUser(HttpUser): |
| wait_time = between(5, 15) |
| |
| @task(3) |
| def generate_basic_document(self): |
| payload = { |
| "seed_images": ["https://example.com/seed1.jpg"], |
| "prompt_params": { |
| "language": "english", |
| "doc_type": "invoice", |
| "num_solutions": 1, |
| "enable_handwriting": False, |
| "enable_visual_elements": False |
| } |
| } |
| self.client.post("/generate", json=payload, timeout=120) |
| |
| @task(1) |
| def check_async_status(self): |
| # Assume task_id from previous task |
| self.client.get(f"/generate/async/status/{self.task_id}") |
| ``` |
| |
| #### **B.1.2 Stress Testing** |
| |
| **Purpose**: Determine system breaking point |
| |
| | Test Case ID | Test Name | Stress Condition | Metrics | Acceptance Criteria | Priority | |
| |--------------|-----------|------------------|---------|---------------------|----------| |
| | NFT-STRESS-001 | User overload | 200+ concurrent users | Max capacity | Identifies max users before failure | High | |
| | NFT-STRESS-002 | Memory stress | Generate 1000 docs without cleanup | Memory usage | OOM protection, graceful failure | High | |
| | NFT-STRESS-003 | CPU stress | Complex documents, no throttling | CPU utilization | System remains responsive | Medium | |
| | NFT-STRESS-004 | Disk stress | Fill 95% of disk space | I/O performance | Handles low disk gracefully | Medium | |
| | NFT-STRESS-005 | Network stress | Simulate slow network | Timeout handling | Appropriate timeouts, retries | Medium | |
| |
| #### **B.1.3 Endurance Testing (Soak Testing)** |
| |
| **Purpose**: Detect memory leaks and performance degradation over time |
| |
| | Test Case ID | Test Name | Duration | Load | Metrics | Acceptance Criteria | Priority | |
| |--------------|-----------|----------|------|---------|---------------------|----------| |
| | NFT-ENDUR-001 | 24-hour test | 24 hours | 10 concurrent users | Memory, CPU over time | No memory leaks, stable performance | High | |
| | NFT-ENDUR-002 | 7-day test | 7 days | 5 concurrent users | All resources | System stable, no degradation | Medium | |
| | NFT-ENDUR-003 | Weekend load | 48 hours | Variable load | Error rate | <1% errors throughout | Medium | |
| |
| #### **B.1.4 Scalability Testing** |
| |
| **Purpose**: Verify horizontal and vertical scaling |
| |
| | Test Case ID | Test Name | Scaling Type | Test Scenario | Acceptance Criteria | Priority | |
| |--------------|-----------|--------------|---------------|---------------------|----------| |
| | NFT-SCALE-001 | Horizontal scaling | Add workers | 1→5 workers, measure throughput | Linear throughput increase | High | |
| | NFT-SCALE-002 | Vertical scaling | Increase CPU/RAM | 2→8 cores, 4→16GB RAM | Performance improvement | Medium | |
| | NFT-SCALE-003 | Auto-scaling | Dynamic load | Trigger auto-scale rules | Scales up/down automatically | Medium | |
| | NFT-SCALE-004 | Database scaling | Database load | High concurrent DB ops | No DB bottleneck | High | |
| | NFT-SCALE-005 | Storage scaling | Large datasets | Generate 10,000 documents | Storage handles volume | Low | |
| |
| #### **B.1.5 Benchmark Testing** |
| |
| **Purpose**: Establish performance baselines |
| |
| | Test Case ID | Component | Benchmark | Target | Priority | |
| |--------------|-----------|-----------|--------|----------| |
| | NFT-BENCH-001 | Seed download | 1 image (1MB) | <2 seconds | High | |
| | NFT-BENCH-002 | LLM call | 1 prompt (standard) | <30 seconds | Critical | |
| | NFT-BENCH-003 | PDF rendering | 1 A4 page | <3 seconds | High | |
| | NFT-BENCH-004 | Bbox extraction | 500 words | <2 seconds | Medium | |
| | NFT-BENCH-005 | Handwriting service | 10 words batch | <200 seconds | Critical | |
| | NFT-BENCH-006 | Visual element generation | 5 elements | <5 seconds | Medium | |
| | NFT-BENCH-007 | OCR processing | 1 A4 page (300 DPI) | <5 seconds | High | |
| | NFT-BENCH-008 | Msgpack export | 1 document | <2 seconds | Medium | |
| | NFT-BENCH-009 | Complete pipeline (minimal) | End-to-end | <60 seconds | Critical | |
| | NFT-BENCH-010 | Complete pipeline (full) | End-to-end with HW | <300 seconds | Critical | |
| |
| --- |
| |
| ### B.2 Security Testing |
| |
| Purpose: Identify vulnerabilities and ensure data protection. |
| |
| #### **B.2.1 Authentication & Authorization Testing** |
| |
| | Test Case ID | Test Name | Security Control | Test Scenario | Expected Outcome | Priority | |
| |--------------|-----------|------------------|---------------|------------------|----------| |
| | NFT-SEC-001 | API key validation | Authentication | Request without API key | HTTP 401 Unauthorized | Critical | |
| | NFT-SEC-002 | Invalid API key | Authentication | Request with wrong key | HTTP 401 Unauthorized | Critical | |
| | NFT-SEC-003 | Expired API key | Token expiration | Request with expired key | HTTP 401 with renewal info | High | |
| | NFT-SEC-004 | API key rotation | Key management | Rotate keys, test old key | Old key rejected | Medium | |
| | NFT-SEC-005 | Role-based access | Authorization | User tries admin endpoint | HTTP 403 Forbidden | High | |
| | NFT-SEC-006 | Resource ownership | Authorization | User accesses other's job | HTTP 403 Forbidden | High | |
| | NFT-SEC-007 | JWT validation | Token security | Tampered JWT token | Signature validation fails | High | |
| | NFT-SEC-008 | Session hijacking | Session security | Stolen session token | Token invalidated after detection | Medium | |
| | NFT-SEC-009 | Brute force protection | Rate limiting | 100 failed auth attempts | Account locked, IP blocked | High | |
| | NFT-SEC-010 | Multi-factor auth | MFA | Admin login without MFA | MFA required | Low | |
| |
| #### **B.2.2 Input Validation & Injection Testing** |
| |
| | Test Case ID | Test Name | Vulnerability | Test Scenario | Expected Outcome | Priority | |
| |--------------|-----------|---------------|---------------|------------------|----------| |
| | NFT-SEC-011 | SQL injection | Injection | Inject SQL in parameters | Parameterized queries prevent injection | Critical | |
| | NFT-SEC-012 | XSS attack | Cross-site scripting | Inject `<script>` in doc_type | Input sanitized, script not executed | High | |
| | NFT-SEC-013 | Command injection | OS command injection | Inject shell commands | Commands not executed | Critical | |
| | NFT-SEC-014 | Path traversal | Directory traversal | `../../etc/passwd` in filename | Access denied | Critical | |
| | NFT-SEC-015 | SSRF attack | Server-side request forgery | seed_image URL to internal IP | Internal IPs blocked | High | |
| | NFT-SEC-016 | XXE attack | XML external entity | Upload XML with external entity | External entities disabled | Medium | |
| | NFT-SEC-017 | LLM prompt injection | Prompt manipulation | Inject ignore instructions | Prompt sandboxing prevents escape | High | |
| | NFT-SEC-018 | Buffer overflow | Memory safety | Send 10MB+ parameter | Request rejected, no crash | Medium | |
| | NFT-SEC-019 | Unicode attack | Unicode bypass | Unicode normalization tricks | Normalized before processing | Low | |
| | NFT-SEC-020 | Regex DoS | ReDoS | Complex regex in input | Timeout protection active | Medium | |
| |
| #### **B.2.3 Data Protection Testing** |
| |
| | Test Case ID | Test Name | Protection Mechanism | Test Scenario | Expected Outcome | Priority | |
| |--------------|-----------|---------------------|---------------|------------------|----------| |
| | NFT-SEC-021 | Data encryption at rest | Storage encryption | Check stored files | Files encrypted on disk | High | |
| | NFT-SEC-022 | Data encryption in transit | TLS/HTTPS | Inspect network traffic | All traffic over HTTPS | Critical | |
| | NFT-SEC-023 | API key exposure | Secret management | Check logs, errors | API keys never logged | Critical | |
| | NFT-SEC-024 | PII handling | Data privacy | Generate docs with PII | PII not stored beyond retention | High | |
| | NFT-SEC-025 | Data sanitization | Data cleanup | Delete job after 7 days | All data removed | High | |
| | NFT-SEC-026 | Backup encryption | Backup security | Check backup files | Backups encrypted | Medium | |
| | NFT-SEC-027 | Secure headers | HTTP headers | Check response headers | Security headers present | High | |
| | NFT-SEC-028 | CORS policy | Cross-origin security | Request from unauthorized origin | CORS policy blocks request | High | |
| | NFT-SEC-029 | Cookie security | Cookie flags | Check cookie attributes | HttpOnly, Secure, SameSite set | Medium | |
| | NFT-SEC-030 | Sensitive data in URLs | URL security | Check for secrets in URLs | No sensitive data in query params | High | |
| |
| #### **B.2.4 Dependency & Supply Chain Security** |
| |
| | Test Case ID | Test Name | Security Aspect | Test Method | Expected Outcome | Priority | |
| |--------------|-----------|-----------------|-------------|------------------|----------| |
| | NFT-SEC-031 | Vulnerable dependencies | CVE scanning | Run `pip-audit` | No high/critical vulnerabilities | High | |
| | NFT-SEC-032 | Outdated packages | Package versions | Check `requirements.txt` | All packages recent (<6 months) | Medium | |
| | NFT-SEC-033 | Malicious packages | Supply chain | Verify package checksums | Checksums match official registry | High | |
| | NFT-SEC-034 | License compliance | Legal compliance | Check package licenses | All licenses compatible | Low | |
| | NFT-SEC-035 | Container security | Docker image | Scan with Trivy | No critical image vulnerabilities | High | |
| |
| **Security Testing Tools**: |
| - **OWASP ZAP**: Automated security scanning |
| - **Burp Suite**: Manual penetration testing |
| - **pip-audit**: Python dependency vulnerability scanning |
| - **Trivy**: Container image scanning |
| - **Bandit**: Python code security linter |
| |
| --- |
| |
| ### B.3 Reliability Testing |
| |
| Purpose: Verify system stability and fault tolerance. |
| |
| #### **B.3.1 Fault Tolerance Testing** |
| |
| | Test Case ID | Test Name | Fault Condition | Test Scenario | Expected Outcome | Priority | |
| |--------------|-----------|-----------------|---------------|------------------|----------| |
| | NFT-REL-001 | Database failover | Primary DB failure | Kill primary DB instance | Failover to standby, no downtime | Critical | |
| | NFT-REL-002 | Worker crash recovery | Worker process crash | Kill worker mid-job | Job reassigned, completes | High | |
| | NFT-REL-003 | Network partition | Network split | Simulate network partition | System detects, retries | High | |
| | NFT-REL-004 | External API failure | Claude API down | LLM service unavailable | Graceful error, retry queue | Critical | |
| | NFT-REL-005 | Handwriting service failure | RunPod timeout | Service exceeds timeout | Exception raised, clear error | Critical | |
| | NFT-REL-006 | Disk full | No storage space | Fill disk to 100% | Rejects new jobs, alerts sent | High | |
| | NFT-REL-007 | Redis failure | Queue unavailable | Redis server down | Async jobs fail with clear error | High | |
| | NFT-REL-008 | Load balancer failure | LB goes down | Kill load balancer | Requests reach servers via backup | Medium | |
| | NFT-REL-009 | DNS resolution failure | DNS timeout | DNS server unreachable | Falls back to IP or cached DNS | Low | |
| | NFT-REL-010 | Partial service degradation | Some features down | VE prefabs missing | Skips VE, completes other features | Medium | |
| |
| #### **B.3.2 Data Integrity Testing** |
| |
| | Test Case ID | Test Name | Integrity Check | Test Scenario | Expected Outcome | Priority | |
| |--------------|-----------|-----------------|---------------|------------------|----------| |
| | NFT-REL-011 | Transaction atomicity | Database transactions | Simulate crash mid-transaction | Either all or no changes applied | High | |
| | NFT-REL-012 | Data corruption detection | Checksum validation | Corrupt file on disk | Corruption detected, file rejected | High | |
| | NFT-REL-013 | Concurrent write safety | Race conditions | Multiple writes to same resource | Last write wins or lock prevents | High | |
| | NFT-REL-014 | Duplicate prevention | Idempotency | Submit same request twice | Duplicate detected, not processed | Medium | |
| | NFT-REL-015 | Backup restoration | Backup recovery | Restore from backup | Data fully restored, consistent | High | |
| |
| #### **B.3.3 Recovery Testing** |
| |
| | Test Case ID | Test Name | Recovery Scenario | Test Procedure | Expected Outcome | Priority | |
| |--------------|-----------|-------------------|----------------|------------------|----------| |
| | NFT-REL-016 | Crash recovery | Server crash | Kill server, restart | Server recovers, in-flight jobs resume | Critical | |
| | NFT-REL-017 | Database restore | DB corruption | Restore from backup | System operational with latest data | High | |
| | NFT-REL-018 | Disaster recovery | Complete site failure | Failover to DR site | Service restored within RTO (4 hours) | Critical | |
| | NFT-REL-019 | Job queue recovery | Redis crash | Redis restart with persistence | Queued jobs not lost | High | |
| | NFT-REL-020 | Config recovery | Bad config deployment | Deploy bad config | Rollback to previous config | Medium | |
| |
| --- |
| |
| ### B.4 Scalability Testing |
| |
| Purpose: Verify system can handle growth in load and data. |
| |
| #### **B.4.1 Capacity Testing** |
| |
| | Test Case ID | Test Name | Capacity Metric | Test Scenario | Target Capacity | Priority | |
| |--------------|-----------|-----------------|---------------|-----------------|----------| |
| | NFT-SCAL-001 | Max concurrent users | User capacity | Gradually increase users | Support 100+ concurrent users | High | |
| | NFT-SCAL-002 | Max documents per hour | Throughput | Generate continuously | Process 500+ docs/hour | High | |
| | NFT-SCAL-003 | Max queue depth | Job queue | Enqueue 10,000 jobs | Queue handles all jobs | Medium | |
| | NFT-SCAL-004 | Max dataset size | Storage | Generate large dataset | Handle 1TB+ datasets | Low | |
| | NFT-SCAL-005 | Max file size | Upload limit | Upload large seed image | Accept up to 10MB images | Medium | |
| |
| #### **B.4.2 Elasticity Testing** |
| |
| | Test Case ID | Test Name | Scaling Behavior | Test Scenario | Expected Outcome | Priority | |
| |--------------|-----------|------------------|---------------|------------------|----------| |
| | NFT-SCAL-006 | Scale-up | Add resources | Increase from 2→10 workers | Linear throughput increase | High | |
| | NFT-SCAL-007 | Scale-down | Remove resources | Decrease from 10→2 workers | Graceful job completion | High | |
| | NFT-SCAL-008 | Auto-scale up | Load increase | Load triggers scale-up | New instances launched | Medium | |
| | NFT-SCAL-009 | Auto-scale down | Load decrease | Low load triggers scale-down | Excess instances terminated | Medium | |
| | NFT-SCAL-010 | Burst scaling | Sudden spike | 0→100 requests instantly | Scale-up handles burst | High | |
| |
| --- |
| |
| ### B.5 Usability Testing |
| |
| Purpose: Verify API ease of use and developer experience. |
| |
| #### **B.5.1 API Documentation Testing** |
| |
| | Test Case ID | Test Name | Documentation Aspect | Test Scenario | Expected Outcome | Priority | |
| |--------------|-----------|---------------------|---------------|------------------|----------| |
| | NFT-USAB-001 | API docs completeness | All endpoints documented | Review /docs | All endpoints, params documented | High | |
| | NFT-USAB-002 | Example accuracy | Code examples | Test all code examples | Examples work without modification | High | |
| | NFT-USAB-003 | Error messages clarity | Error documentation | Check error responses | Errors have clear messages, codes | High | |
| | NFT-USAB-004 | OpenAPI spec validity | Swagger/OpenAPI | Validate spec | Spec passes OpenAPI validation | Medium | |
| | NFT-USAB-005 | Interactive docs | Try-it-out feature | Use /docs to test | Can test endpoints in browser | Medium | |
| |
| #### **B.5.2 Developer Experience Testing** |
| |
| | Test Case ID | Test Name | DX Aspect | Test Scenario | Expected Outcome | Priority | |
| |--------------|-----------|-----------|---------------|------------------|----------| |
| | NFT-USAB-006 | SDK availability | Client libraries | Check for Python/JS SDKs | SDKs available, documented | Low | |
| | NFT-USAB-007 | Quick start guide | Getting started | Follow quick start | Working request in <10 minutes | High | |
| | NFT-USAB-008 | API versioning | Version management | Check version headers | Versions clearly indicated | Medium | |
| | NFT-USAB-009 | Changelog maintenance | Release notes | Review changelog | All changes documented | Low | |
| | NFT-USAB-010 | Deprecation notices | Breaking changes | Check deprecated features | Clear deprecation warnings | Medium | |
| |
| --- |
| |
| ### B.6 Compatibility Testing |
| |
| Purpose: Verify system works across different environments. |
| |
| #### **B.6.1 Browser Compatibility** (for API docs) |
| |
| | Test Case ID | Browser | Version | Expected Outcome | |
| |--------------|---------|---------|------------------| |
| | NFT-COMPAT-001 | Chrome | Latest | /docs fully functional | |
| | NFT-COMPAT-002 | Firefox | Latest | /docs fully functional | |
| | NFT-COMPAT-003 | Safari | Latest | /docs fully functional | |
| | NFT-COMPAT-004 | Edge | Latest | /docs fully functional | |
| |
| #### **B.6.2 Platform Compatibility** |
| |
| | Test Case ID | Platform | Test Scenario | Expected Outcome | Priority | |
| |--------------|----------|---------------|------------------|----------| |
| | NFT-COMPAT-005 | Docker | Deploy in container | Runs without issues | Critical | |
| | NFT-COMPAT-006 | Railway | Deploy to Railway | Successful deployment | High | |
| | NFT-COMPAT-007 | AWS | Deploy to ECS/Lambda | Runs on AWS | Medium | |
| | NFT-COMPAT-008 | GCP | Deploy to Cloud Run | Runs on GCP | Low | |
| | NFT-COMPAT-009 | Azure | Deploy to App Service | Runs on Azure | Low | |
| |
| #### **B.6.3 Python Version Compatibility** |
| |
| | Test Case ID | Python Version | Test Scenario | Expected Outcome | Priority | |
| |--------------|----------------|---------------|------------------|----------| |
| | NFT-COMPAT-010 | Python 3.11 | Run full test suite | All tests pass | Critical | |
| | NFT-COMPAT-011 | Python 3.10 | Run full test suite | All tests pass | High | |
| | NFT-COMPAT-012 | Python 3.12 | Run full test suite | All tests pass | Medium | |
| |
| --- |
| |
| ### B.7 Maintainability Testing |
| |
| Purpose: Verify system is easy to maintain and debug. |
| |
| #### **B.7.1 Logging & Monitoring** |
| |
| | Test Case ID | Test Name | Aspect | Test Scenario | Expected Outcome | Priority | |
| |--------------|-----------|--------|---------------|------------------|----------| |
| | NFT-MAINT-001 | Log completeness | Logging | Check logs during generation | All stages logged | High | |
| | NFT-MAINT-002 | Log levels | Log filtering | Filter by ERROR, INFO, DEBUG | Correct levels used | Medium | |
| | NFT-MAINT-003 | Structured logging | Log format | Parse log entries | JSON-formatted, parseable | High | |
| | NFT-MAINT-004 | Error traceability | Error tracking | Trace error through logs | Request ID tracks full flow | High | |
| | NFT-MAINT-005 | Metrics collection | Monitoring | Check Prometheus metrics | Key metrics exported | High | |
| | NFT-MAINT-006 | Health checks | Monitoring | Call /health endpoint | Returns detailed status | Critical | |
| | NFT-MAINT-007 | Alert configuration | Alerting | Trigger alert condition | Alert fired, notification sent | Medium | |
| | NFT-MAINT-008 | Dashboard usability | Visualization | View Grafana dashboards | Clear, actionable insights | Medium | |
| |
| #### **B.7.2 Code Quality** |
| |
| | Test Case ID | Test Name | Quality Metric | Tool | Acceptance Criteria | Priority | |
| |--------------|-----------|----------------|------|---------------------|----------| |
| | NFT-MAINT-009 | Code coverage | Test coverage | pytest-cov | >80% coverage | High | |
| | NFT-MAINT-010 | Code complexity | Cyclomatic complexity | radon | CC <10 per function | Medium | |
| | NFT-MAINT-011 | Code duplication | DRY principle | pylint | <5% duplicated code | Low | |
| | NFT-MAINT-012 | Code style | PEP 8 compliance | flake8 | No style violations | Medium | |
| | NFT-MAINT-013 | Type hints | Type coverage | mypy | >90% type hints | Medium | |
| | NFT-MAINT-014 | Security linting | Vulnerability scan | bandit | No high-severity issues | High | |
| |
| --- |
| |
| ## Test Environment Setup |
| |
| ### Test Environments |
| |
| | Environment | Purpose | Configuration | Access | |
| |-------------|---------|---------------|--------| |
| | **Local Dev** | Development testing | Local Docker Compose | Developers | |
| | **CI/CD** | Automated testing | GitHub Actions runners | Automated | |
| | **Staging** | Pre-production testing | Mirrors production | QA team | |
| | **Production** | Live system | Full infrastructure | Ops team | |
| |
| ### Test Data Management |
| |
| **Seed Image Dataset**: |
| - **Source**: Curated test set of 50 diverse seed images |
| - **Location**: `tests/fixtures/seed_images/` |
| - **Categories**: Invoice samples, receipt samples, form samples, letter samples |
| - **Licensing**: Public domain or test-licensed images |
| |
| **Test Parameters**: |
| ```yaml |
| # tests/fixtures/test_params.yaml |
| test_cases: |
| minimal: |
| language: "english" |
| doc_type: "invoice" |
| num_solutions: 1 |
| enable_handwriting: false |
| enable_visual_elements: false |
| |
| full_features: |
| language: "english" |
| doc_type: "medical_form" |
| num_solutions: 2 |
| enable_handwriting: true |
| handwriting_ratio: 0.3 |
| enable_visual_elements: true |
| visual_element_types: ["logo", "signature", "barcode"] |
| enable_ocr: true |
| enable_dataset_export: true |
| ``` |
| |
| **Mock Services**: |
| - **Mock Claude API**: Returns predefined HTML responses for testing |
| - **Mock RunPod API**: Returns test handwriting images, simulates delays |
| - **Mock Supabase**: In-memory database for testing |
| |
| --- |
| |
| ## Testing Tools & Frameworks |
| |
| ### Test Frameworks |
| |
| | Tool | Purpose | Usage | |
| |------|---------|-------| |
| | **pytest** | Unit & integration testing | `pytest tests/` | |
| | **pytest-asyncio** | Async test support | Async function testing | |
| | **pytest-cov** | Code coverage | `pytest --cov=api` | |
| | **httpx** | HTTP client testing | API request mocking | |
| | **respx** | HTTP mock library | Mock external APIs | |
| | **pytest-mock** | Mocking framework | Mock functions, classes | |
| | **Faker** | Test data generation | Generate realistic data | |
| |
| ### Load Testing Tools |
| |
| | Tool | Purpose | Usage | |
| |------|---------|-------| |
| | **Locust** | Load & stress testing | `locust -f locustfile.py` | |
| | **Apache JMeter** | Performance testing | GUI-based test scenarios | |
| | **k6** | Cloud-native load testing | Scripted load tests | |
| |
| ### Security Testing Tools |
| |
| | Tool | Purpose | Usage | |
| |------|---------|-------| |
| | **OWASP ZAP** | Security scanning | Automated vulnerability scan | |
| | **Burp Suite** | Penetration testing | Manual security testing | |
| | **pip-audit** | Dependency scanning | `pip-audit -r requirements.txt` | |
| | **Bandit** | Code security linting | `bandit -r api/` | |
| | **Trivy** | Container scanning | `trivy image docgenie-api:latest` | |
| |
| ### Monitoring & Observability |
| |
| | Tool | Purpose | Usage | |
| |------|---------|-------| |
| | **Prometheus** | Metrics collection | Scrape /metrics endpoint | |
| | **Grafana** | Metrics visualization | Dashboard creation | |
| | **ELK Stack** | Log aggregation | Centralized logging | |
| | **Sentry** | Error tracking | Automatic error reporting | |
| |
| --- |
| |
| ## Test Execution Plan |
| |
| ### Phase 1: Unit Testing (Week 1-2) |
| **Objective**: Achieve 80%+ code coverage |
| |
| **Tasks**: |
| 1. Write unit tests for all utility functions (`api/utils.py`) |
| 2. Test all pipeline stages individually (Stages 01-19) |
| 3. Mock external dependencies (Claude API, RunPod, Supabase) |
| 4. Achieve minimum 80% code coverage |
| 5. Set up CI/CD pipeline for automated testing |
| |
| **Deliverables**: |
| - 120+ unit test cases passing |
| - Coverage report >80% |
| - CI/CD pipeline configured |
| |
| ### Phase 2: Integration Testing (Week 3) |
| **Objective**: Verify component interactions |
| |
| **Tasks**: |
| 1. Test pipeline stage integrations (01-03, 03-05, 07-09, etc.) |
| 2. Test external service integrations (Claude, RunPod, Supabase) |
| 3. Test database operations (CRUD, transactions) |
| 4. Test API endpoint workflows |
| 5. Test background worker integration |
| |
| **Deliverables**: |
| - 50+ integration test cases passing |
| - All critical workflows tested |
| - Service mocks validated |
| |
| ### Phase 3: System Testing (Week 4) |
| **Objective**: End-to-end workflow validation |
| |
| **Tasks**: |
| 1. Test complete generation workflows (minimal, full features) |
| 2. Test error handling scenarios |
| 3. Test async processing workflows |
| 4. Test data quality and accuracy |
| 5. Test performance benchmarks |
| |
| **Deliverables**: |
| - 50+ system test cases passing |
| - All user journeys tested |
| - Performance baselines established |
| |
| ### Phase 4: Non-Functional Testing (Week 5-6) |
| **Objective**: Verify performance, security, reliability |
| |
| **Tasks**: |
| 1. **Performance**: Load, stress, endurance, scalability tests |
| 2. **Security**: Penetration testing, vulnerability scanning |
| 3. **Reliability**: Fault tolerance, recovery testing |
| 4. **Usability**: Documentation review, DX testing |
| |
| **Deliverables**: |
| - Load test report (normal, peak, sustained) |
| - Security audit report |
| - Reliability test report |
| - Performance benchmarks |
| |
| ### Phase 5: Regression Testing (Ongoing) |
| **Objective**: Prevent defect reintroduction |
| |
| **Tasks**: |
| 1. Run full test suite on every commit (CI/CD) |
| 2. Add tests for every bug fix |
| 3. Update tests for new features |
| 4. Maintain >80% code coverage |
| |
| **Frequency**: Continuous (automated on every PR/commit) |
| |
| --- |
| |
| ## Success Criteria & Metrics |
| |
| ### Test Completion Criteria |
| |
| | Criteria | Target | Critical | |
| |----------|--------|----------| |
| | Unit test coverage | >80% | Yes | |
| | Integration tests passing | 100% | Yes | |
| | System tests passing | 100% | Yes | |
| | Load test: Normal load | 0% errors | Yes | |
| | Load test: Peak load | <5% errors | Yes | |
| | Security: Critical vulnerabilities | 0 | Yes | |
| | Security: High vulnerabilities | <5 | Yes | |
| | Performance: Basic generation | <60s | Yes | |
| | Performance: Handwriting generation | <300s | Yes | |
| | Uptime SLA | >99.5% | No | |
| |
| ### Quality Metrics |
| |
| **Code Quality**: |
| - Code coverage: >80% |
| - Cyclomatic complexity: <10 |
| - Code duplication: <5% |
| - Type hint coverage: >90% |
| |
| **Performance**: |
| - API response time (P95): <500ms |
| - Document generation (minimal): <60s |
| - Document generation (with handwriting): <300s |
| - Throughput: >500 docs/hour |
| |
| **Reliability**: |
| - Uptime: >99.5% |
| - MTBF (Mean Time Between Failures): >720 hours (30 days) |
| - MTTR (Mean Time To Recover): <30 minutes |
| - Error rate: <1% |
| |
| **Security**: |
| - Zero critical vulnerabilities |
| - <5 high-severity vulnerabilities |
| - Dependency update cadence: <30 days behind |
| |
| --- |
| |
| ## Risk Assessment |
| |
| ### High-Risk Areas |
| |
| | Component | Risk Level | Mitigation Strategy | Priority | |
| |-----------|------------|---------------------|----------| |
| | Claude API integration | **HIGH** | Retry logic, fallback prompts, rate limiting | Critical | |
| | RunPod handwriting service | **HIGH** | Timeout handling, batch optimization, error raising | Critical | |
| | PDF rendering (Playwright) | **MEDIUM** | Headless browser stability, resource limits | High | |
| | OCR accuracy | **MEDIUM** | Multiple OCR engine options, confidence thresholds | High | |
| | Async job processing | **MEDIUM** | Worker health checks, job retry mechanisms | High | |
| | Database transactions | **MEDIUM** | ACID compliance, connection pooling | High | |
| | File storage | **LOW** | Disk space monitoring, cleanup policies | Medium | |
| |
| ### Test Risk Mitigation |
| |
| | Risk | Impact | Probability | Mitigation | |
| |------|--------|-------------|------------| |
| | External API unavailable during tests | High | Medium | Use mocks, record/replay mode | |
| | Test data corruption | Medium | Low | Version control test fixtures | |
| | Test environment instability | High | Medium | Docker isolation, reproducible builds | |
| | Long test execution time | Low | High | Parallel execution, selective testing | |
| | Flaky tests | Medium | Medium | Retry logic, better assertions | |
| |
| --- |
| |
| ## Test Reporting |
| |
| ### Test Reports |
| |
| **Daily Reports** (Automated): |
| - Test execution summary (pass/fail counts) |
| - Code coverage trends |
| - Failed test details |
| - Performance benchmark comparison |
| |
| **Weekly Reports** (Manual): |
| - Test progress against plan |
| - New defects discovered |
| - Defect resolution rate |
| - Risk updates |
| |
| **Release Reports** (Per Release): |
| - Complete test execution summary |
| - All test case results |
| - Performance test results |
| - Security scan results |
| - Known issues and limitations |
| |
| ### Defect Tracking |
| |
| **Defect Workflow**: |
| 1. **Report**: Tester creates defect in issue tracker |
| 2. **Triage**: Team prioritizes defect (P0-Critical, P1-High, P2-Medium, P3-Low) |
| 3. **Assign**: Developer assigned to fix |
| 4. **Fix**: Developer implements fix |
| 5. **Verify**: Tester verifies fix |
| 6. **Close**: Defect closed, regression test added |
| |
| **Defect Metrics**: |
| - Defect discovery rate |
| - Defect resolution rate |
| - Defect escape rate (to production) |
| - Mean time to resolve (MTTR) |
| |
| --- |
| |
| ## Continuous Improvement |
| |
| ### Test Optimization |
| |
| **Quarterly Reviews**: |
| - Review test coverage (identify gaps) |
| - Remove obsolete tests |
| - Update test data |
| - Optimize test execution time |
| - Review test environment stability |
| |
| **Automation Goals**: |
| - Automate 100% of unit tests |
| - Automate 90% of integration tests |
| - Automate 70% of system tests |
| - Automate 50% of non-functional tests |
| |
| --- |
| |
| ## Appendix |
| |
| ### Test Case Template |
| |
| ```markdown |
| ## Test Case ID: [ID] |
| |
| **Test Name**: [Descriptive name] |
| |
| **Component**: [Module/Component under test] |
| |
| **Test Type**: [Unit/Integration/System/Non-Functional] |
| |
| **Priority**: [Critical/High/Medium/Low] |
| |
| **Prerequisites**: |
| - [List any setup required] |
| |
| **Test Steps**: |
| 1. [Step 1] |
| 2. [Step 2] |
| 3. [Step 3] |
| |
| **Test Data**: |
| - [Input data required] |
| |
| **Expected Result**: |
| - [What should happen] |
| |
| **Actual Result**: |
| - [What actually happened - filled during execution] |
| |
| **Status**: [Pass/Fail/Blocked/Not Run] |
| |
| **Notes**: |
| - [Any additional observations] |
| ``` |
| |
| ### Glossary |
| |
| - **API**: Application Programming Interface |
| - **CI/CD**: Continuous Integration/Continuous Deployment |
| - **DPI**: Dots Per Inch |
| - **GT**: Ground Truth |
| - **HW**: Handwriting |
| - **KIE**: Key Information Extraction |
| - **LLM**: Large Language Model |
| - **MTBF**: Mean Time Between Failures |
| - **MTTR**: Mean Time To Recover |
| - **OCR**: Optical Character Recognition |
| - **P95**: 95th Percentile |
| - **SLA**: Service Level Agreement |
| - **VE**: Visual Element |
| |
| --- |
| |
| **Document Control**: |
| - **Author**: DocGenie QA Team |
| - **Reviewers**: Development Team, Product Manager |
| - **Approval**: Project Lead |
| - **Next Review Date**: [3 months from approval] |
| |
| --- |
| |
| **END OF DOCUMENT** |
| |