Spaces:

algoryn
/

dots-ocr-idcard

Paused

App Files Files Community

dots-ocr-idcard / scripts /README_TESTING.md

tommulder

chore(test): commit Makefile and testing scripts

34c6057 4 months ago

preview code

raw

history blame contribute delete

6.22 kB

	# Dots.OCR API Testing

	This directory contains comprehensive testing scripts for the Dots.OCR API endpoint.

	## Test Scripts

	### 1. `test_api_endpoint.py` - Comprehensive API Testing

	The main testing script that provides full API validation capabilities.

	Features:
	- Health check validation
	- Single and multiple image testing
	- ROI (Region of Interest) testing
	- Field extraction validation
	- Response structure validation
	- Performance metrics
	- Detailed error reporting

	Usage:
	```bash
	# Basic test with default settings
	python test_api_endpoint.py

	# Test with custom API URL
	python test_api_endpoint.py --url https://your-api.example.com

	# Test with ROI
	python test_api_endpoint.py --roi '{"x1": 0.1, "y1": 0.1, "x2": 0.9, "y2": 0.9}'

	# Test with specific expected fields
	python test_api_endpoint.py --expected-fields document_number surname given_names

	# Verbose output
	python test_api_endpoint.py --verbose

	# Custom timeout
	python test_api_endpoint.py --timeout 60
	```

	Options:
	- `--url`: API base URL (default: http://localhost:7860)
	- `--timeout`: Request timeout in seconds (default: 30)
	- `--roi`: ROI coordinates as JSON string
	- `--expected-fields`: List of expected field names to validate
	- `--verbose`: Enable verbose logging

	### 2. `quick_test.py` - Quick Validation

	A simple script for quick API validation after deployment.

	Usage:
	```bash
	# Test local API
	python quick_test.py

	# Test remote API
	python quick_test.py https://your-api.example.com
	```

	## Test Configuration

	### `test_config.json`

	Configuration file for test parameters and thresholds.

	Configuration sections:
	- `api_endpoints`: Different API URLs for various environments
	- `test_images`: List of test image files
	- `expected_fields`: Fields that should be extracted
	- `roi_test_cases`: Different ROI configurations to test
	- `performance_thresholds`: Performance validation criteria
	- `test_timeout`: Default timeout for requests

	## Test Images

	The following test images are used for validation:

	- `tom_id_card_front.jpg` - Front of Dutch ID card
	- `tom_id_card_back.jpg` - Back of Dutch ID card

	## Testing Scenarios

	### 1. Basic Functionality Test
	```bash
	python test_api_endpoint.py
	```
	Tests basic API functionality with default settings.

	### 2. ROI Testing
	```bash
	python test_api_endpoint.py --roi '{"x1": 0.25, "y1": 0.25, "x2": 0.75, "y2": 0.75}'
	```
	Tests Region of Interest cropping functionality.

	### 3. Field Validation Test
	```bash
	python test_api_endpoint.py --expected-fields document_number surname given_names nationality
	```
	Tests that specific fields are extracted correctly.

	### 4. Performance Test
	```bash
	python test_api_endpoint.py --timeout 60 --verbose
	```
	Tests API performance with extended timeout and detailed logging.

	## Expected Results

	### Successful Test Output
	```
	🔍 Checking API health...
	✅ API is healthy: {'status': 'healthy', 'version': '1.0.0', 'model_loaded': True}
	🚀 Starting API tests with 2 images...
	✅ tom_id_card_front.jpg: 2.45s
	✅ tom_id_card_back.jpg: 1.23s
	📊 Test Results:
	Total images: 2
	Successful: 2
	Failed: 0
	Success rate: 100.0%
	Average processing time: 1.84s
	🎉 All tests completed successfully!
	```

	### Field Extraction Example
	```
	Page 1: 11 fields extracted
	document_number: NLD123456789 (confidence: 0.90)
	surname: MULDER (confidence: 0.90)
	given_names: THOMAS JAN (confidence: 0.90)
	nationality: NLD (confidence: 0.95)
	date_of_birth: 15-03-1990 (confidence: 0.90)
	gender: M (confidence: 0.95)
	```

	## Troubleshooting

	### Common Issues

	1. Connection Refused
	- Check if the API is running
	- Verify the correct URL and port
	- Check firewall settings

	2. Timeout Errors
	- Increase timeout with `--timeout` parameter
	- Check API performance and resource usage

	3. Missing Fields
	- Verify test images contain the expected text
	- Check field extraction patterns in the code
	- Review API logs for processing errors

	4. Validation Errors
	- Check API response format
	- Verify model is loaded correctly
	- Review error logs for details

	### Debug Mode

	Enable verbose logging for detailed debugging:
	```bash
	python test_api_endpoint.py --verbose
	```

	## Integration with CI/CD

	The test scripts can be integrated into CI/CD pipelines:

	```yaml
	# Example GitHub Actions step
	- name: Test API Endpoint
	run: \|
	python scripts/test_api_endpoint.py --url ${{ env.API_URL }} --timeout 60
	```

	## Performance Monitoring

	The scripts provide performance metrics that can be used for monitoring:

	- Processing time per image
	- Success rate
	- Field extraction accuracy
	- Response validation results

	These metrics can be integrated with monitoring systems like Prometheus or DataDog.

	## 🚀 Production API Testing

	### Current Production Endpoint
	- URL: https://algoryn-dots-ocr-idcard.hf.space
	- Health Check: https://algoryn-dots-ocr-idcard.hf.space/health
	- API Docs: https://algoryn-dots-ocr-idcard.hf.space/docs

	### Quick Production Test
	```bash
	# Test production API
	./run_tests.sh -e production

	# Quick test with curl (no Python dependencies)
	./test_production_curl.sh
	```

	### Staging Environment
	- Staging URL: https://algoryn-dots-ocr-idcard-staging.hf.space (to be created)
	- Purpose: Safe testing before production deployment

	### Environment-Specific Testing
	```bash
	# Test different environments
	./run_tests.sh -e local # Local development
	./run_tests.sh -e staging # Staging environment
	./run_tests.sh -e production # Production environment
	```

	---

	### 5. `test_debug_ocr.sh` - Per-request debug logging via curl

	Use this for quick, dependency-light testing of the server-side debug mode that prints OCR snippets, extracted fields, and MRZ details to logs.

	Usage:
	```bash
	# Local server (per-request debug on)
	./test_debug_ocr.sh -u http://localhost:7860 -f tom_id_card_front.jpg -d

	# Hugging Face Space (replace with your Space URL)
	./test_debug_ocr.sh -u https://<your-space>.hf.space -f tom_id_card_front.jpg -d \
	-r '{"x1":0,"y1":0,"x2":1,"y2":0.5}'
	```

	You can also enable debug globally on the server with `DOTS_OCR_DEBUG=1`. The script only toggles the request-level flag via `-d`.