File size: 3,706 Bytes
715cb16
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
# CLI Redaction Test Suite

This test suite provides comprehensive testing for the `cli_redact.py` script based on all the examples shown in the CLI epilog.

## Overview

The test suite includes tests for:

1. **PDF Redaction Examples**
   - Default settings (local OCR)
   - Text extraction only (no redaction)
   - Text extraction with whole page redaction
   - Redaction with allow lists
   - Limited pages with custom fuzzy matching
   - Custom deny/allow/whole page lists
   - Image redaction

2. **Tabular Anonymisation Examples**
   - CSV anonymisation with specific columns
   - Different anonymisation strategies
   - Word document anonymisation

3. **AWS Services Examples**
   - Textract and Comprehend redaction
   - Signature extraction
   - Layout extraction

4. **Duplicate Detection Examples**
   - Duplicate pages in OCR files
   - Line-level duplicate detection
   - Tabular duplicate detection

5. **Textract Batch Operations**
   - Submit documents for analysis
   - Retrieve results by job ID
   - List recent jobs

## Running the Tests

### Method 1: Run the test suite directly
```bash
cd test
python test.py
```

### Method 2: Use the convenience script
```bash
cd test
python run_tests.py
```

### Method 3: Run with unittest
```bash
cd test
python -m unittest test.test.TestCLIRedactExamples -v
```

## Test Behavior

- **File Dependencies**: Tests will be skipped if required example files are not found in the `example_data/` directory
- **AWS Tests**: AWS-related tests may fail if credentials are not configured, but this is expected
- **Temporary Output**: All tests use temporary output directories that are cleaned up automatically
- **Timeout**: Each test has a 10-minute timeout to prevent hanging

## Test Structure

The test suite uses Python's `unittest` framework with the following structure:

- `TestCLIRedactExamples`: Main test class containing all test methods
- `run_cli_redact()`: Helper function that executes the CLI script with specified parameters
- `run_all_tests()`: Main function that runs all tests and provides a summary

## Example Output

```
================================================================================
DOCUMENT REDACTION CLI TEST SUITE
================================================================================
This test suite runs through all the examples from the CLI epilog.
Tests will be skipped if required example files are not found.
AWS-related tests may fail if credentials are not configured.
================================================================================

Test setup complete. Script: /path/to/cli_redact.py
Example data directory: /path/to/example_data
Temp output directory: /tmp/test_output_xyz

=== Testing PDF redaction with default settings ===
✅ PDF redaction with default settings passed

=== Testing PDF text extraction only ===
✅ PDF text extraction only passed

...

================================================================================
TEST SUMMARY
================================================================================
Tests run: 20
Failures: 0
Errors: 0
Skipped: 2

Overall result: ✅ PASSED
================================================================================
```

## Requirements

- Python 3.6+
- All dependencies for the main CLI script
- Example data files in the `example_data/` directory (for full test coverage)
- AWS credentials (for AWS-related tests)

## Notes

- Tests are designed to be robust and will skip gracefully if files are missing
- AWS tests are marked as completed even if they fail due to missing credentials
- The test suite provides detailed output for debugging
- All temporary files are cleaned up automatically