dots-ocr-idcard / scripts /README_TESTING.md
tommulder's picture
chore(test): commit Makefile and testing scripts
34c6057
# Dots.OCR API Testing
This directory contains comprehensive testing scripts for the Dots.OCR API endpoint.
## Test Scripts
### 1. `test_api_endpoint.py` - Comprehensive API Testing
The main testing script that provides full API validation capabilities.
**Features:**
- Health check validation
- Single and multiple image testing
- ROI (Region of Interest) testing
- Field extraction validation
- Response structure validation
- Performance metrics
- Detailed error reporting
**Usage:**
```bash
# Basic test with default settings
python test_api_endpoint.py
# Test with custom API URL
python test_api_endpoint.py --url https://your-api.example.com
# Test with ROI
python test_api_endpoint.py --roi '{"x1": 0.1, "y1": 0.1, "x2": 0.9, "y2": 0.9}'
# Test with specific expected fields
python test_api_endpoint.py --expected-fields document_number surname given_names
# Verbose output
python test_api_endpoint.py --verbose
# Custom timeout
python test_api_endpoint.py --timeout 60
```
**Options:**
- `--url`: API base URL (default: http://localhost:7860)
- `--timeout`: Request timeout in seconds (default: 30)
- `--roi`: ROI coordinates as JSON string
- `--expected-fields`: List of expected field names to validate
- `--verbose`: Enable verbose logging
### 2. `quick_test.py` - Quick Validation
A simple script for quick API validation after deployment.
**Usage:**
```bash
# Test local API
python quick_test.py
# Test remote API
python quick_test.py https://your-api.example.com
```
## Test Configuration
### `test_config.json`
Configuration file for test parameters and thresholds.
**Configuration sections:**
- `api_endpoints`: Different API URLs for various environments
- `test_images`: List of test image files
- `expected_fields`: Fields that should be extracted
- `roi_test_cases`: Different ROI configurations to test
- `performance_thresholds`: Performance validation criteria
- `test_timeout`: Default timeout for requests
## Test Images
The following test images are used for validation:
- `tom_id_card_front.jpg` - Front of Dutch ID card
- `tom_id_card_back.jpg` - Back of Dutch ID card
## Testing Scenarios
### 1. Basic Functionality Test
```bash
python test_api_endpoint.py
```
Tests basic API functionality with default settings.
### 2. ROI Testing
```bash
python test_api_endpoint.py --roi '{"x1": 0.25, "y1": 0.25, "x2": 0.75, "y2": 0.75}'
```
Tests Region of Interest cropping functionality.
### 3. Field Validation Test
```bash
python test_api_endpoint.py --expected-fields document_number surname given_names nationality
```
Tests that specific fields are extracted correctly.
### 4. Performance Test
```bash
python test_api_endpoint.py --timeout 60 --verbose
```
Tests API performance with extended timeout and detailed logging.
## Expected Results
### Successful Test Output
```
πŸ” Checking API health...
βœ… API is healthy: {'status': 'healthy', 'version': '1.0.0', 'model_loaded': True}
πŸš€ Starting API tests with 2 images...
βœ… tom_id_card_front.jpg: 2.45s
βœ… tom_id_card_back.jpg: 1.23s
πŸ“Š Test Results:
Total images: 2
Successful: 2
Failed: 0
Success rate: 100.0%
Average processing time: 1.84s
πŸŽ‰ All tests completed successfully!
```
### Field Extraction Example
```
Page 1: 11 fields extracted
document_number: NLD123456789 (confidence: 0.90)
surname: MULDER (confidence: 0.90)
given_names: THOMAS JAN (confidence: 0.90)
nationality: NLD (confidence: 0.95)
date_of_birth: 15-03-1990 (confidence: 0.90)
gender: M (confidence: 0.95)
```
## Troubleshooting
### Common Issues
1. **Connection Refused**
- Check if the API is running
- Verify the correct URL and port
- Check firewall settings
2. **Timeout Errors**
- Increase timeout with `--timeout` parameter
- Check API performance and resource usage
3. **Missing Fields**
- Verify test images contain the expected text
- Check field extraction patterns in the code
- Review API logs for processing errors
4. **Validation Errors**
- Check API response format
- Verify model is loaded correctly
- Review error logs for details
### Debug Mode
Enable verbose logging for detailed debugging:
```bash
python test_api_endpoint.py --verbose
```
## Integration with CI/CD
The test scripts can be integrated into CI/CD pipelines:
```yaml
# Example GitHub Actions step
- name: Test API Endpoint
run: |
python scripts/test_api_endpoint.py --url ${{ env.API_URL }} --timeout 60
```
## Performance Monitoring
The scripts provide performance metrics that can be used for monitoring:
- Processing time per image
- Success rate
- Field extraction accuracy
- Response validation results
These metrics can be integrated with monitoring systems like Prometheus or DataDog.
## πŸš€ Production API Testing
### Current Production Endpoint
- **URL**: https://algoryn-dots-ocr-idcard.hf.space
- **Health Check**: https://algoryn-dots-ocr-idcard.hf.space/health
- **API Docs**: https://algoryn-dots-ocr-idcard.hf.space/docs
### Quick Production Test
```bash
# Test production API
./run_tests.sh -e production
# Quick test with curl (no Python dependencies)
./test_production_curl.sh
```
### Staging Environment
- **Staging URL**: https://algoryn-dots-ocr-idcard-staging.hf.space (to be created)
- **Purpose**: Safe testing before production deployment
### Environment-Specific Testing
```bash
# Test different environments
./run_tests.sh -e local # Local development
./run_tests.sh -e staging # Staging environment
./run_tests.sh -e production # Production environment
```
---
### 5. `test_debug_ocr.sh` - Per-request debug logging via curl
Use this for quick, dependency-light testing of the server-side debug mode that prints OCR snippets, extracted fields, and MRZ details to logs.
**Usage:**
```bash
# Local server (per-request debug on)
./test_debug_ocr.sh -u http://localhost:7860 -f tom_id_card_front.jpg -d
# Hugging Face Space (replace with your Space URL)
./test_debug_ocr.sh -u https://<your-space>.hf.space -f tom_id_card_front.jpg -d \
-r '{"x1":0,"y1":0,"x2":1,"y2":0.5}'
```
You can also enable debug globally on the server with `DOTS_OCR_DEBUG=1`. The script only toggles the request-level flag via `-d`.