BDR-Agent-Factory / docs /TESTING_FRAMEWORK.md
Bader Alabddan
Add comprehensive documentation and implementation framework
3ef5d3c
# Testing Framework - BDR Agent Factory
## Overview
Comprehensive testing strategy for AI capabilities, ensuring quality, compliance, and reliability across all insurance business systems.
---
## Testing Pyramid
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ E2E Tests (5%) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Integration Tests (15%) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Component Tests (30%) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Unit Tests (50%) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
---
## 1. Unit Tests
### Purpose
Test individual capability functions in isolation.
### Coverage Requirements
- **Minimum**: 80% code coverage
- **Target**: 90% code coverage
- **Critical paths**: 100% coverage
### Example: Text Classification Unit Test
```python
import pytest
from bdr_agent_factory.capabilities import TextClassification
class TestTextClassification:
@pytest.fixture
def classifier(self):
return TextClassification(
model_version="1.0.0",
classes=["property_damage", "auto_accident", "health_claim"]
)
def test_basic_classification(self, classifier):
"""Test basic text classification"""
result = classifier.classify(
text="Water damage in basement after storm"
)
assert result.predicted_class == "property_damage"
assert result.confidence > 0.7
assert len(result.all_scores) == 3
def test_empty_input(self, classifier):
"""Test handling of empty input"""
with pytest.raises(ValueError, match="Input text cannot be empty"):
classifier.classify(text="")
def test_long_input(self, classifier):
"""Test handling of excessively long input"""
long_text = "word " * 10000
with pytest.raises(ValueError, match="Input exceeds maximum length"):
classifier.classify(text=long_text)
def test_confidence_threshold(self, classifier):
"""Test confidence threshold filtering"""
result = classifier.classify(
text="Ambiguous claim description",
confidence_threshold=0.95
)
if result.confidence < 0.95:
assert result.predicted_class is None
def test_explainability(self, classifier):
"""Test explanation generation"""
result = classifier.classify(
text="Water damage in basement",
explain=True
)
assert result.explanation is not None
assert "key_features" in result.explanation
assert len(result.explanation["key_features"]) > 0
```
---
## 2. Component Tests
### Purpose
Test capability components with their dependencies (models, databases, etc.).
### Example: Fraud Detection Component Test
```python
import pytest
from bdr_agent_factory.capabilities import FraudDetection
from bdr_agent_factory.models import ModelRegistry
class TestFraudDetectionComponent:
@pytest.fixture
def fraud_detector(self):
model = ModelRegistry.load("fraud_detection_v1")
return FraudDetection(model=model)
def test_fraud_detection_with_model(self, fraud_detector):
"""Test fraud detection with actual model"""
claim_data = {
"claim_amount": 50000,
"claim_type": "auto_accident",
"claimant_history": {"previous_claims": 5},
"incident_details": "Rear-end collision on highway"
}
result = fraud_detector.detect(claim_data)
assert result.fraud_score >= 0.0
assert result.fraud_score <= 1.0
assert result.risk_level in ["low", "medium", "high"]
assert result.explanation is not None
def test_audit_trail_creation(self, fraud_detector):
"""Test that audit trail is created"""
claim_data = {"claim_amount": 10000}
result = fraud_detector.detect(
claim_data,
audit=True,
request_id="test_req_123"
)
assert result.audit_id is not None
# Verify audit record was created in database
audit_record = fraud_detector.get_audit_record(result.audit_id)
assert audit_record.request_id == "test_req_123"
```
---
## 3. Integration Tests
### Purpose
Test end-to-end capability invocation through API.
### Example: API Integration Test
```python
import pytest
import requests
from bdr_agent_factory.test_utils import TestClient
class TestCapabilityAPI:
@pytest.fixture
def client(self):
return TestClient(
base_url="http://localhost:8000",
api_key="test_api_key"
)
def test_capability_invocation(self, client):
"""Test capability invocation via API"""
response = client.post(
"/v1/capabilities/cap_text_classification/invoke",
json={
"input": {
"text": "Customer reported water damage"
},
"options": {
"explain": True,
"audit_trail": True
}
}
)
assert response.status_code == 200
data = response.json()
assert "result" in data
assert "predicted_class" in data["result"]
assert "explanation" in data["result"]
assert "audit_trail" in data
def test_batch_processing(self, client):
"""Test batch capability invocation"""
response = client.post(
"/v1/capabilities/cap_text_classification/batch",
json={
"inputs": [
{"text": "Claim 1"},
{"text": "Claim 2"},
{"text": "Claim 3"}
]
}
)
assert response.status_code == 202 # Accepted
data = response.json()
assert "batch_id" in data
assert data["status"] == "processing"
# Poll for completion
batch_id = data["batch_id"]
status = client.get_batch_status(batch_id)
assert status in ["processing", "completed"]
def test_authentication_required(self, client):
"""Test that authentication is required"""
client_no_auth = TestClient(base_url="http://localhost:8000")
response = client_no_auth.post(
"/v1/capabilities/cap_text_classification/invoke",
json={"input": {"text": "Test"}}
)
assert response.status_code == 401
```
---
## 4. End-to-End Tests
### Purpose
Test complete business workflows across multiple systems.
### Example: Claims Processing E2E Test
```python
import pytest
from bdr_agent_factory.test_utils import E2ETestHarness
class TestClaimsProcessingWorkflow:
@pytest.fixture
def harness(self):
return E2ETestHarness(
systems=["ClaimsGPT", "FraudDetectionAgent"],
environment="staging"
)
def test_complete_claims_workflow(self, harness):
"""Test complete claims processing workflow"""
# Step 1: Submit claim
claim = harness.submit_claim({
"claimant": "John Doe",
"claim_type": "auto_accident",
"description": "Rear-end collision on I-5",
"amount": 5000
})
assert claim.id is not None
# Step 2: Classify claim
classification = harness.invoke_capability(
"cap_text_classification",
input={"text": claim.description}
)
assert classification.predicted_class == "auto_accident"
# Step 3: Fraud detection
fraud_check = harness.invoke_capability(
"cap_fraud_detection",
input={"claim_data": claim.to_dict()}
)
assert fraud_check.risk_level in ["low", "medium", "high"]
# Step 4: Decision
decision = harness.make_decision(
claim_id=claim.id,
classification=classification,
fraud_check=fraud_check
)
assert decision.type in ["approve", "review", "reject"]
# Step 5: Verify audit trail
audit_trail = harness.get_audit_trail(claim.id)
assert len(audit_trail) >= 3 # Classification, fraud check, decision
# Step 6: Verify compliance
compliance_check = harness.verify_compliance(
claim_id=claim.id,
frameworks=["GDPR", "IFRS17"]
)
assert compliance_check.is_compliant is True
```
---
## 5. Performance Tests
### Purpose
Ensure capabilities meet performance SLAs.
### Load Testing
```python
import pytest
from locust import HttpUser, task, between
import time
class CapabilityLoadTest(HttpUser):
wait_time = between(1, 3)
def on_start(self):
"""Authenticate before testing"""
response = self.client.post("/auth/token", json={
"client_id": "test_client",
"client_secret": "test_secret"
})
self.token = response.json()["access_token"]
@task(3)
def invoke_text_classification(self):
"""Test text classification under load"""
self.client.post(
"/v1/capabilities/cap_text_classification/invoke",
headers={"Authorization": f"Bearer {self.token}"},
json={
"input": {"text": "Sample claim description"}
}
)
@task(1)
def invoke_fraud_detection(self):
"""Test fraud detection under load"""
self.client.post(
"/v1/capabilities/cap_fraud_detection/invoke",
headers={"Authorization": f"Bearer {self.token}"},
json={
"input": {"claim_amount": 10000}
}
)
# Run with: locust -f performance_tests.py --users 100 --spawn-rate 10
```
### Performance Benchmarks
```python
import pytest
import time
from bdr_agent_factory.capabilities import TextClassification
class TestPerformanceBenchmarks:
def test_latency_p95(self):
"""Test that P95 latency is under 300ms"""
classifier = TextClassification()
latencies = []
for _ in range(100):
start = time.time()
classifier.classify(text="Sample text for classification")
latency = (time.time() - start) * 1000 # Convert to ms
latencies.append(latency)
latencies.sort()
p95_latency = latencies[94] # 95th percentile
assert p95_latency < 300, f"P95 latency {p95_latency}ms exceeds 300ms SLA"
def test_throughput(self):
"""Test minimum throughput of 100 requests/second"""
classifier = TextClassification()
start = time.time()
for _ in range(100):
classifier.classify(text="Sample text")
duration = time.time() - start
throughput = 100 / duration
assert throughput >= 100, f"Throughput {throughput} RPS below 100 RPS SLA"
```
---
## 6. Compliance Tests
### Purpose
Verify adherence to regulatory requirements.
### GDPR Compliance Tests
```python
import pytest
from bdr_agent_factory.compliance import GDPRValidator
class TestGDPRCompliance:
def test_data_minimization(self):
"""Test that only necessary data is collected"""
validator = GDPRValidator()
claim_data = {
"claim_id": "123",
"description": "Claim description",
"amount": 5000
}
result = validator.validate_data_minimization(claim_data)
assert result.is_compliant is True
def test_right_to_explanation(self):
"""Test that explanations are available"""
from bdr_agent_factory.capabilities import TextClassification
classifier = TextClassification()
result = classifier.classify(
text="Sample text",
explain=True
)
assert result.explanation is not None
assert "key_features" in result.explanation
def test_data_retention(self):
"""Test that data retention policies are enforced"""
from bdr_agent_factory.audit import AuditService
audit_service = AuditService()
# Create audit record with retention period
audit_id = audit_service.create_audit(
capability_id="cap_test",
retention_days=2555 # 7 years for GDPR
)
audit_record = audit_service.get_audit(audit_id)
assert audit_record.retention_days == 2555
```
### IFRS 17 Compliance Tests
```python
class TestIFRS17Compliance:
def test_audit_trail_completeness(self):
"""Test complete audit trail for insurance contracts"""
from bdr_agent_factory.audit import AuditService
audit_service = AuditService()
# Simulate underwriting decision
audit_id = audit_service.create_audit(
capability_id="cap_underwriting",
input_data={"policy_data": "..."},
output_data={"decision": "approve"},
compliance_flags={"ifrs17_compliant": True}
)
audit_record = audit_service.get_audit(audit_id)
assert audit_record.compliance_flags["ifrs17_compliant"] is True
assert audit_record.input_hash is not None
assert audit_record.output_hash is not None
```
---
## 7. Security Tests
### Purpose
Identify security vulnerabilities.
### Authentication Tests
```python
class TestSecurity:
def test_sql_injection_prevention(self):
"""Test SQL injection prevention"""
from bdr_agent_factory.test_utils import TestClient
client = TestClient()
# Attempt SQL injection
response = client.post(
"/v1/capabilities/cap_text_classification/invoke",
json={
"input": {
"text": "'; DROP TABLE capabilities; --"
}
}
)
# Should process safely without executing SQL
assert response.status_code in [200, 400]
def test_xss_prevention(self):
"""Test XSS prevention"""
from bdr_agent_factory.test_utils import TestClient
client = TestClient()
response = client.post(
"/v1/capabilities/cap_text_classification/invoke",
json={
"input": {
"text": "<script>alert('XSS')</script>"
}
}
)
# Should sanitize input
assert "<script>" not in response.text
def test_rate_limiting(self):
"""Test rate limiting enforcement"""
from bdr_agent_factory.test_utils import TestClient
client = TestClient()
# Make requests exceeding rate limit
for i in range(150): # Limit is 100/minute
response = client.post(
"/v1/capabilities/cap_text_classification/invoke",
json={"input": {"text": f"Request {i}"}}
)
if i >= 100:
assert response.status_code == 429 # Too Many Requests
```
---
## 8. Test Data Management
### Test Data Sets
```python
# tests/fixtures/test_data.py
CLAIM_DESCRIPTIONS = [
{
"text": "Water damage to basement after heavy rain",
"expected_class": "property_damage",
"min_confidence": 0.8
},
{
"text": "Rear-end collision on highway",
"expected_class": "auto_accident",
"min_confidence": 0.85
},
{
"text": "Slip and fall in grocery store",
"expected_class": "liability",
"min_confidence": 0.75
}
]
FRAUD_SCENARIOS = [
{
"claim_amount": 100000,
"claimant_history": {"previous_claims": 10},
"expected_risk": "high"
},
{
"claim_amount": 5000,
"claimant_history": {"previous_claims": 0},
"expected_risk": "low"
}
]
```
---
## 9. Continuous Integration
### GitHub Actions Workflow
```yaml
# .github/workflows/test.yml
name: Test Suite
on:
push:
branches: [ main, develop ]
pull_request:
branches: [ main ]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.11'
- name: Install dependencies
run: |
pip install -r requirements.txt
pip install -r requirements-test.txt
- name: Run unit tests
run: pytest tests/unit --cov=bdr_agent_factory --cov-report=xml
- name: Run integration tests
run: pytest tests/integration
- name: Run compliance tests
run: pytest tests/compliance
- name: Upload coverage
uses: codecov/codecov-action@v3
with:
file: ./coverage.xml
- name: Run security scan
run: bandit -r bdr_agent_factory/
```
---
## 10. Test Reporting
### Coverage Report
```bash
# Generate HTML coverage report
pytest --cov=bdr_agent_factory --cov-report=html
# View report
open htmlcov/index.html
```
### Test Metrics Dashboard
- **Test Coverage**: Target 90%+
- **Test Execution Time**: < 5 minutes for full suite
- **Flaky Test Rate**: < 1%
- **Test Pass Rate**: > 99%
---
## Best Practices
1. **Write tests first** (TDD approach)
2. **Keep tests independent** (no shared state)
3. **Use descriptive test names** (test_should_classify_property_damage_correctly)
4. **Mock external dependencies** (APIs, databases)
5. **Test edge cases** (empty input, max length, special characters)
6. **Maintain test data** (version control test datasets)
7. **Run tests in CI/CD** (automated on every commit)
8. **Monitor test performance** (identify slow tests)
9. **Review test coverage** (ensure critical paths covered)
10. **Update tests with code** (keep tests in sync)
---
## Test Execution
```bash
# Run all tests
pytest
# Run specific test category
pytest tests/unit
pytest tests/integration
pytest tests/e2e
# Run with coverage
pytest --cov=bdr_agent_factory
# Run specific test file
pytest tests/unit/test_text_classification.py
# Run specific test
pytest tests/unit/test_text_classification.py::TestTextClassification::test_basic_classification
# Run with verbose output
pytest -v
# Run in parallel
pytest -n auto
```
---
## Support
For testing support:
- Documentation: https://docs.bdragentfactory.com/testing
- Email: qa@bdragentfactory.com