ai-agent / docs /development /testing.md
katospiegel's picture
Deploy develop: FastAPI+React frontend, multi-stage Docker (ai_agent serve)
07c2476 verified
|
Raw
History Blame Contribute Delete
8.92 kB
# Testing (still under development)
The AI Imaging Agent uses pytest for testing. This guide covers running tests and writing new ones.
**Note:** We are still developing some tests for the agent, hence this part is not relevant for now.
## Running Tests
### Basic Usage
```bash
# Run all tests
pytest
# Run specific test file
pytest tests/test_retrieval_pipeline.py
# Run specific test
pytest tests/test_retrieval_pipeline.py::test_basic_retrieval
# Run with verbose output
pytest -v
# Run with coverage
pytest --cov=ai_agent --cov-report=html
```
### Test Categories
Tests are marked by category:
```bash
# Run only unit tests
pytest -m unit
# Run only integration tests
pytest -m integration
# Skip slow tests
pytest -m "not slow"
```
## Test Organization
### Directory Structure
```
tests/
β”œβ”€β”€ data/
β”‚ β”œβ”€β”€ test_data.json # Test cases
β”‚ └── 0002.DCM # Sample DICOM file
β”œβ”€β”€ test_retrieval_pipeline.py # Retrieval tests
β”œβ”€β”€ test_deepwiki_repo_info.py # Repo info tests
β”œβ”€β”€ test_gpt4o_vision.py # VLM tests (integration)
└── __pycache__/
```
### Test File Naming
- `test_*.py`: Test files
- `*_test.py`: Alternative naming (less common)
### Test Function Naming
```python
def test_basic_retrieval():
"""Test basic retrieval functionality."""
pass
def test_edge_case_empty_query():
"""Test handling of empty query."""
pass
def test_integration_full_pipeline():
"""Integration test for complete pipeline."""
pass
```
## Writing Tests
### Unit Test Example
```python
import pytest
from ai_agent.retriever.vector_index import VectorIndex
def test_vector_index_search():
"""Test FAISS vector search."""
# Arrange
index = VectorIndex()
index.load("artifacts/rag_index")
query = "segment lungs CT"
# Act
results = index.search(query, k=5)
# Assert
assert len(results) == 5
assert all(r['score'] > 0 for r in results)
assert 'TotalSegmentator' in [r['name'] for r in results]
```
### Integration Test Example
```python
import pytest
from ai_agent.api.pipeline import RAGImagingPipeline
@pytest.mark.integration
def test_full_pipeline_with_image():
"""Integration test with real image and VLM call."""
# Arrange
pipeline = RAGImagingPipeline(
catalog_path="dataset/catalog.jsonl",
index_dir="artifacts/rag_index"
)
# Act
result = pipeline.recommend(
query="segment lungs",
files=["tests/data/chest_ct.dcm"]
)
# Assert
assert result.status == "complete"
assert len(result.recommendations) > 0
assert result.recommendations[0].accuracy_score > 70
```
### Parametrized Tests
```python
@pytest.mark.parametrize("query,expected_tool", [
("segment brain MRI", "FreeSurfer"),
("segment lungs CT", "TotalSegmentator"),
("classify chest X-ray", "CheXNet"),
])
def test_retrieval_for_queries(query, expected_tool):
"""Test retrieval returns expected tools for various queries."""
index = VectorIndex()
index.load("artifacts/rag_index")
results = index.search(query, k=10)
tool_names = [r['name'] for r in results]
assert expected_tool in tool_names
```
### Fixtures
```python
import pytest
@pytest.fixture
def pipeline():
"""Provide initialized pipeline for tests."""
return RAGImagingPipeline(
catalog_path="dataset/catalog.jsonl",
index_dir="artifacts/rag_index"
)
@pytest.fixture
def sample_dicom():
"""Provide path to sample DICOM file."""
return "tests/data/0002.DCM"
def test_with_fixtures(pipeline, sample_dicom):
"""Test using fixtures."""
result = pipeline.recommend(
query="analyze DICOM",
files=[sample_dicom]
)
assert result is not None
```
<!-- ## Mocking
### Mocking VLM Calls
To avoid API costs during testing:
```python
from unittest.mock import Mock, patch
import pytest
@pytest.fixture
def mock_vlm_response():
"""Mock VLM response."""
return {
"status": "complete",
"recommendations": [
{
"rank": 1,
"name": "TotalSegmentator",
"accuracy_score": 95,
"explanation": "Test explanation",
"reason": "task_match"
}
]
}
def test_with_mocked_vlm(mock_vlm_response):
"""Test pipeline with mocked VLM."""
with patch('ai_agent.agent.agent.Agent.run') as mock_run:
mock_run.return_value = mock_vlm_response
# Test code here
result = pipeline.recommend(query="test", files=[])
assert result["status"] == "complete"
```
### Mocking File Operations
```python
def test_file_validation():
"""Test file validation without real files."""
with patch('os.path.getsize') as mock_size:
mock_size.return_value = 1024 * 1024 # 1 MB
from ai_agent.utils.file_validator import validate_file
is_valid = validate_file("fake.dcm")
assert is_valid
``` -->
## Test Data
### Using Test Cases
Load test cases from JSON:
```python
import json
def load_test_cases():
"""Load test cases from data file."""
with open("tests/data/test_data.json") as f:
return json.load(f)
@pytest.mark.parametrize("test_case", load_test_cases())
def test_from_json(test_case):
"""Test using cases from JSON file."""
query = test_case["query"]
expected = test_case["expected_tool"]
# Test logic here
assert expected in results
```
### Sample Data Files
Keep sample files small:
- **DICOM**: Single slice, low resolution
- **NIfTI**: Small volume (e.g., 64Γ—64Γ—64)
- **Images**: PNG/JPG under 1 MB
<!-- ## Coverage
### Measuring Coverage
```bash
# Run with coverage
pytest --cov=ai_agent
# Generate HTML report
pytest --cov=ai_agent --cov-report=html
# Open report
open htmlcov/index.html # macOS
# or
xdg-open htmlcov/index.html # Linux
```
### Coverage Goals
Aim for:
- **Overall**: >80%
- **Critical paths**: >90% (retrieval, agent, pipeline)
- **Utilities**: >70%
### Coverage Configuration
In `pyproject.toml`:
```toml
[tool.coverage.run]
source = ["src/ai_agent"]
omit = ["tests/*", "*/migrations/*"]
[tool.coverage.report]
precision = 2
show_missing = true
skip_covered = false
``` -->
## Continuous Integration
### GitHub Actions
Tests run automatically on:
- Pull requests
- Pushes to main
### CI Configuration
```yaml
# .github/workflows/test.yml
name: Tests
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.10'
- run: pip install -e ".[dev]"
- run: pytest --cov=ai_agent
```
## Best Practices
### Do's
βœ… **Test edge cases**: Empty inputs, invalid data, etc.
βœ… **Test error handling**: Verify exceptions are caught
βœ… **Use descriptive names**: `test_retrieval_with_empty_query` not `test1`
βœ… **Keep tests isolated**: Each test should be independent
βœ… **Use fixtures**: Avoid repeating setup code
βœ… **Mock expensive operations**: VLM calls, network requests
### Don'ts
❌ **Don't test implementation details**: Test behavior, not internal state
❌ **Don't make tests depend on each other**: Each should run independently
❌ **Don't commit large test files**: Keep test data small
❌ **Don't skip error checking**: Test both success and failure paths
## Performance Testing
### Benchmarking
Use pytest-benchmark:
```python
def test_retrieval_performance(benchmark):
"""Benchmark retrieval speed."""
index = VectorIndex()
index.load("artifacts/rag_index")
result = benchmark(index.search, "segment lungs", k=10)
assert len(result) == 10
```
### Profiling
```bash
# Profile tests
pytest --profile
# Generate SVG profile
pytest --profile-svg
```
## Debugging Tests
### Running in Debug Mode
```python
# Add to test
import pdb; pdb.set_trace()
# Run pytest
pytest tests/test_file.py
```
### Verbose Output
```bash
# Show print statements
pytest -s
# Very verbose
pytest -vv
# Show local variables on failure
pytest -l
```
### Running Single Test
```bash
# Run one test function
pytest tests/test_file.py::test_function_name -v
```
## Next Steps
- Review [Project Structure](structure.md)
- Read [Contributing Guide](contributing.md)
- Explore [Architecture](../architecture/overview.md)