| # Testing (still under development) | |
| The AI Imaging Agent uses pytest for testing. This guide covers running tests and writing new ones. | |
| **Note:** We are still developing some tests for the agent, hence this part is not relevant for now. | |
| ## Running Tests | |
| ### Basic Usage | |
| ```bash | |
| # Run all tests | |
| pytest | |
| # Run specific test file | |
| pytest tests/test_retrieval_pipeline.py | |
| # Run specific test | |
| pytest tests/test_retrieval_pipeline.py::test_basic_retrieval | |
| # Run with verbose output | |
| pytest -v | |
| # Run with coverage | |
| pytest --cov=ai_agent --cov-report=html | |
| ``` | |
| ### Test Categories | |
| Tests are marked by category: | |
| ```bash | |
| # Run only unit tests | |
| pytest -m unit | |
| # Run only integration tests | |
| pytest -m integration | |
| # Skip slow tests | |
| pytest -m "not slow" | |
| ``` | |
| ## Test Organization | |
| ### Directory Structure | |
| ``` | |
| tests/ | |
| βββ data/ | |
| β βββ test_data.json # Test cases | |
| β βββ 0002.DCM # Sample DICOM file | |
| βββ test_retrieval_pipeline.py # Retrieval tests | |
| βββ test_deepwiki_repo_info.py # Repo info tests | |
| βββ test_gpt4o_vision.py # VLM tests (integration) | |
| βββ __pycache__/ | |
| ``` | |
| ### Test File Naming | |
| - `test_*.py`: Test files | |
| - `*_test.py`: Alternative naming (less common) | |
| ### Test Function Naming | |
| ```python | |
| def test_basic_retrieval(): | |
| """Test basic retrieval functionality.""" | |
| pass | |
| def test_edge_case_empty_query(): | |
| """Test handling of empty query.""" | |
| pass | |
| def test_integration_full_pipeline(): | |
| """Integration test for complete pipeline.""" | |
| pass | |
| ``` | |
| ## Writing Tests | |
| ### Unit Test Example | |
| ```python | |
| import pytest | |
| from ai_agent.retriever.vector_index import VectorIndex | |
| def test_vector_index_search(): | |
| """Test FAISS vector search.""" | |
| # Arrange | |
| index = VectorIndex() | |
| index.load("artifacts/rag_index") | |
| query = "segment lungs CT" | |
| # Act | |
| results = index.search(query, k=5) | |
| # Assert | |
| assert len(results) == 5 | |
| assert all(r['score'] > 0 for r in results) | |
| assert 'TotalSegmentator' in [r['name'] for r in results] | |
| ``` | |
| ### Integration Test Example | |
| ```python | |
| import pytest | |
| from ai_agent.api.pipeline import RAGImagingPipeline | |
| @pytest.mark.integration | |
| def test_full_pipeline_with_image(): | |
| """Integration test with real image and VLM call.""" | |
| # Arrange | |
| pipeline = RAGImagingPipeline( | |
| catalog_path="dataset/catalog.jsonl", | |
| index_dir="artifacts/rag_index" | |
| ) | |
| # Act | |
| result = pipeline.recommend( | |
| query="segment lungs", | |
| files=["tests/data/chest_ct.dcm"] | |
| ) | |
| # Assert | |
| assert result.status == "complete" | |
| assert len(result.recommendations) > 0 | |
| assert result.recommendations[0].accuracy_score > 70 | |
| ``` | |
| ### Parametrized Tests | |
| ```python | |
| @pytest.mark.parametrize("query,expected_tool", [ | |
| ("segment brain MRI", "FreeSurfer"), | |
| ("segment lungs CT", "TotalSegmentator"), | |
| ("classify chest X-ray", "CheXNet"), | |
| ]) | |
| def test_retrieval_for_queries(query, expected_tool): | |
| """Test retrieval returns expected tools for various queries.""" | |
| index = VectorIndex() | |
| index.load("artifacts/rag_index") | |
| results = index.search(query, k=10) | |
| tool_names = [r['name'] for r in results] | |
| assert expected_tool in tool_names | |
| ``` | |
| ### Fixtures | |
| ```python | |
| import pytest | |
| @pytest.fixture | |
| def pipeline(): | |
| """Provide initialized pipeline for tests.""" | |
| return RAGImagingPipeline( | |
| catalog_path="dataset/catalog.jsonl", | |
| index_dir="artifacts/rag_index" | |
| ) | |
| @pytest.fixture | |
| def sample_dicom(): | |
| """Provide path to sample DICOM file.""" | |
| return "tests/data/0002.DCM" | |
| def test_with_fixtures(pipeline, sample_dicom): | |
| """Test using fixtures.""" | |
| result = pipeline.recommend( | |
| query="analyze DICOM", | |
| files=[sample_dicom] | |
| ) | |
| assert result is not None | |
| ``` | |
| <!-- ## Mocking | |
| ### Mocking VLM Calls | |
| To avoid API costs during testing: | |
| ```python | |
| from unittest.mock import Mock, patch | |
| import pytest | |
| @pytest.fixture | |
| def mock_vlm_response(): | |
| """Mock VLM response.""" | |
| return { | |
| "status": "complete", | |
| "recommendations": [ | |
| { | |
| "rank": 1, | |
| "name": "TotalSegmentator", | |
| "accuracy_score": 95, | |
| "explanation": "Test explanation", | |
| "reason": "task_match" | |
| } | |
| ] | |
| } | |
| def test_with_mocked_vlm(mock_vlm_response): | |
| """Test pipeline with mocked VLM.""" | |
| with patch('ai_agent.agent.agent.Agent.run') as mock_run: | |
| mock_run.return_value = mock_vlm_response | |
| # Test code here | |
| result = pipeline.recommend(query="test", files=[]) | |
| assert result["status"] == "complete" | |
| ``` | |
| ### Mocking File Operations | |
| ```python | |
| def test_file_validation(): | |
| """Test file validation without real files.""" | |
| with patch('os.path.getsize') as mock_size: | |
| mock_size.return_value = 1024 * 1024 # 1 MB | |
| from ai_agent.utils.file_validator import validate_file | |
| is_valid = validate_file("fake.dcm") | |
| assert is_valid | |
| ``` --> | |
| ## Test Data | |
| ### Using Test Cases | |
| Load test cases from JSON: | |
| ```python | |
| import json | |
| def load_test_cases(): | |
| """Load test cases from data file.""" | |
| with open("tests/data/test_data.json") as f: | |
| return json.load(f) | |
| @pytest.mark.parametrize("test_case", load_test_cases()) | |
| def test_from_json(test_case): | |
| """Test using cases from JSON file.""" | |
| query = test_case["query"] | |
| expected = test_case["expected_tool"] | |
| # Test logic here | |
| assert expected in results | |
| ``` | |
| ### Sample Data Files | |
| Keep sample files small: | |
| - **DICOM**: Single slice, low resolution | |
| - **NIfTI**: Small volume (e.g., 64Γ64Γ64) | |
| - **Images**: PNG/JPG under 1 MB | |
| <!-- ## Coverage | |
| ### Measuring Coverage | |
| ```bash | |
| # Run with coverage | |
| pytest --cov=ai_agent | |
| # Generate HTML report | |
| pytest --cov=ai_agent --cov-report=html | |
| # Open report | |
| open htmlcov/index.html # macOS | |
| # or | |
| xdg-open htmlcov/index.html # Linux | |
| ``` | |
| ### Coverage Goals | |
| Aim for: | |
| - **Overall**: >80% | |
| - **Critical paths**: >90% (retrieval, agent, pipeline) | |
| - **Utilities**: >70% | |
| ### Coverage Configuration | |
| In `pyproject.toml`: | |
| ```toml | |
| [tool.coverage.run] | |
| source = ["src/ai_agent"] | |
| omit = ["tests/*", "*/migrations/*"] | |
| [tool.coverage.report] | |
| precision = 2 | |
| show_missing = true | |
| skip_covered = false | |
| ``` --> | |
| ## Continuous Integration | |
| ### GitHub Actions | |
| Tests run automatically on: | |
| - Pull requests | |
| - Pushes to main | |
| ### CI Configuration | |
| ```yaml | |
| # .github/workflows/test.yml | |
| name: Tests | |
| on: [push, pull_request] | |
| jobs: | |
| test: | |
| runs-on: ubuntu-latest | |
| steps: | |
| - uses: actions/checkout@v4 | |
| - uses: actions/setup-python@v5 | |
| with: | |
| python-version: '3.10' | |
| - run: pip install -e ".[dev]" | |
| - run: pytest --cov=ai_agent | |
| ``` | |
| ## Best Practices | |
| ### Do's | |
| β **Test edge cases**: Empty inputs, invalid data, etc. | |
| β **Test error handling**: Verify exceptions are caught | |
| β **Use descriptive names**: `test_retrieval_with_empty_query` not `test1` | |
| β **Keep tests isolated**: Each test should be independent | |
| β **Use fixtures**: Avoid repeating setup code | |
| β **Mock expensive operations**: VLM calls, network requests | |
| ### Don'ts | |
| β **Don't test implementation details**: Test behavior, not internal state | |
| β **Don't make tests depend on each other**: Each should run independently | |
| β **Don't commit large test files**: Keep test data small | |
| β **Don't skip error checking**: Test both success and failure paths | |
| ## Performance Testing | |
| ### Benchmarking | |
| Use pytest-benchmark: | |
| ```python | |
| def test_retrieval_performance(benchmark): | |
| """Benchmark retrieval speed.""" | |
| index = VectorIndex() | |
| index.load("artifacts/rag_index") | |
| result = benchmark(index.search, "segment lungs", k=10) | |
| assert len(result) == 10 | |
| ``` | |
| ### Profiling | |
| ```bash | |
| # Profile tests | |
| pytest --profile | |
| # Generate SVG profile | |
| pytest --profile-svg | |
| ``` | |
| ## Debugging Tests | |
| ### Running in Debug Mode | |
| ```python | |
| # Add to test | |
| import pdb; pdb.set_trace() | |
| # Run pytest | |
| pytest tests/test_file.py | |
| ``` | |
| ### Verbose Output | |
| ```bash | |
| # Show print statements | |
| pytest -s | |
| # Very verbose | |
| pytest -vv | |
| # Show local variables on failure | |
| pytest -l | |
| ``` | |
| ### Running Single Test | |
| ```bash | |
| # Run one test function | |
| pytest tests/test_file.py::test_function_name -v | |
| ``` | |
| ## Next Steps | |
| - Review [Project Structure](structure.md) | |
| - Read [Contributing Guide](contributing.md) | |
| - Explore [Architecture](../architecture/overview.md) | |