# Comprehensive Testing Plan & Test Cases ## DocGenie Synthetic Document Generation API **Document Version**: 1.0 **Date**: March 4, 2026 **Project**: DocGenie - AI-Powered Synthetic Document Dataset Generator --- ## Table of Contents 1. [Testing Overview](#testing-overview) 2. [Functional Testing](#functional-testing) - [Unit Testing](#unit-testing) - [Integration Testing](#integration-testing) - [System Testing](#system-testing) 3. [Non-Functional Testing](#non-functional-testing) - [Performance Testing](#performance-testing) - [Security Testing](#security-testing) - [Reliability Testing](#reliability-testing) - [Scalability Testing](#scalability-testing) - [Usability Testing](#usability-testing) 4. [Test Environment Setup](#test-environment-setup) 5. [Testing Tools & Frameworks](#testing-tools--frameworks) 6. [Test Execution Plan](#test-execution-plan) 7. [Success Criteria & Metrics](#success-criteria--metrics) 8. [Risk Assessment](#risk-assessment) --- ## Testing Overview ### Purpose This document outlines the comprehensive testing strategy for DocGenie API, ensuring quality, reliability, and performance of the synthetic document generation system across all 19 pipeline stages. ### Scope - API endpoints testing (`/generate`, `/generate/pdf`, `/generate/async`) - 19-stage pipeline validation - External service integrations (Claude API, RunPod handwriting service) - Database operations (Supabase) - Background job processing (Redis Queue) - Error handling and recovery mechanisms ### Testing Approach - **Test-Driven Development (TDD)**: Write tests before implementation where applicable - **Continuous Integration**: Automated test execution on every commit - **Coverage Target**: Minimum 80% code coverage for critical paths - **Risk-Based Testing**: Prioritize high-risk components (LLM integration, handwriting service) --- ## Functional Testing ### A.1 Unit Testing Unit tests verify individual functions and methods in isolation. Target: 85% code coverage. #### **A.1.1 Seed Image Processing (Stage 01)** **Module**: `api/utils.py::download_seed_images()` | Test Case ID | Test Name | Input | Expected Output | Priority | |--------------|-----------|-------|-----------------|----------| | UT-SEED-001 | Download valid image URL | Valid HTTPS URL (JPEG) | Base64-encoded image string | High | | UT-SEED-002 | Download PNG format | Valid PNG URL | Base64-encoded PNG | High | | UT-SEED-003 | Handle 503 timeout error | URL returning 503 | Retry 3 times, eventual success | Critical | | UT-SEED-004 | Handle 502 bad gateway | URL returning 502 | Retry with exponential backoff | High | | UT-SEED-005 | Handle 404 not found | Invalid URL | Raise HTTPException(400) | High | | UT-SEED-006 | Handle connection timeout | Slow/unresponsive server | Retry then raise exception | Medium | | UT-SEED-007 | Validate image format | Non-image URL (HTML) | Raise validation error | Medium | | UT-SEED-008 | Handle oversized images | >10MB image | Process or reject gracefully | Low | | UT-SEED-009 | Test retry backoff timing | Mock 503 responses | Delays: 2s, 4s, 8s | Medium | | UT-SEED-010 | Test max retries exhausted | Persistent 503 errors | Raise exception after 3 attempts | High | **Test Implementation**: ```python # test_seed_download.py import pytest from api.utils import download_seed_images from unittest.mock import patch, Mock @pytest.mark.asyncio async def test_download_valid_image(): url = "https://example.com/test.jpg" with patch('httpx.AsyncClient') as mock_client: mock_response = Mock() mock_response.content = b'\xff\xd8\xff\xe0' # JPEG header mock_client.return_value.__aenter__.return_value.get.return_value = mock_response result = await download_seed_images([url]) assert len(result) == 1 assert isinstance(result[0], str) # base64 string @pytest.mark.asyncio async def test_download_503_retry(): url = "https://example.com/test.jpg" with patch('httpx.AsyncClient') as mock_client: # First two calls: 503, third call: success responses = [ Mock(status_code=503, raise_for_status=Mock(side_effect=httpx.HTTPStatusError("503", request=Mock(), response=Mock()))), Mock(status_code=503, raise_for_status=Mock(side_effect=httpx.HTTPStatusError("503", request=Mock(), response=Mock()))), Mock(content=b'\xff\xd8\xff\xe0', raise_for_status=Mock()) ] mock_client.return_value.__aenter__.return_value.get.side_effect = responses result = await download_seed_images([url]) assert len(result) == 1 assert mock_client.return_value.__aenter__.return_value.get.call_count == 3 ``` #### **A.1.2 HTML Processing (Stage 03)** **Module**: `api/utils.py::extract_html_documents_from_response()` | Test Case ID | Test Name | Input | Expected Output | Priority | |--------------|-----------|-------|-----------------|----------| | UT-HTML-001 | Extract single HTML | LLM response with 1 HTML | List with 1 HTML document | High | | UT-HTML-002 | Extract multiple HTMLs | Response with 3 HTMLs | List with 3 documents | High | | UT-HTML-003 | Extract ground truth | HTML with `