Spaces:

SDSC
/

ai-agent

Paused

App Files Files Community

ai-agent / docs /development /testing.md

katospiegel

Deploy develop: FastAPI+React frontend, multi-stage Docker (ai_agent serve)

07c2476 verified 6 days ago

preview code

Raw

History Blame Contribute Delete

8.92 kB

	# Testing (still under development)

	The AI Imaging Agent uses pytest for testing. This guide covers running tests and writing new ones.

	Note: We are still developing some tests for the agent, hence this part is not relevant for now.

	## Running Tests

	### Basic Usage

	```bash
	# Run all tests
	pytest

	# Run specific test file
	pytest tests/test_retrieval_pipeline.py

	# Run specific test
	pytest tests/test_retrieval_pipeline.py::test_basic_retrieval

	# Run with verbose output
	pytest -v

	# Run with coverage
	pytest --cov=ai_agent --cov-report=html
	```

	### Test Categories

	Tests are marked by category:

	```bash
	# Run only unit tests
	pytest -m unit

	# Run only integration tests
	pytest -m integration

	# Skip slow tests
	pytest -m "not slow"
	```

	## Test Organization

	### Directory Structure

	```
	tests/
	├── data/
	│ ├── test_data.json # Test cases
	│ └── 0002.DCM # Sample DICOM file
	├── test_retrieval_pipeline.py # Retrieval tests
	├── test_deepwiki_repo_info.py # Repo info tests
	├── test_gpt4o_vision.py # VLM tests (integration)
	└── __pycache__/
	```

	### Test File Naming

	- `test_*.py`: Test files
	- `*_test.py`: Alternative naming (less common)

	### Test Function Naming

	```python
	def test_basic_retrieval():
	"""Test basic retrieval functionality."""
	pass

	def test_edge_case_empty_query():
	"""Test handling of empty query."""
	pass

	def test_integration_full_pipeline():
	"""Integration test for complete pipeline."""
	pass
	```

	## Writing Tests

	### Unit Test Example

	```python
	import pytest
	from ai_agent.retriever.vector_index import VectorIndex

	def test_vector_index_search():
	"""Test FAISS vector search."""
	# Arrange
	index = VectorIndex()
	index.load("artifacts/rag_index")

	query = "segment lungs CT"

	# Act
	results = index.search(query, k=5)

	# Assert
	assert len(results) == 5
	assert all(r['score'] > 0 for r in results)
	assert 'TotalSegmentator' in [r['name'] for r in results]
	```

	### Integration Test Example

	```python
	import pytest
	from ai_agent.api.pipeline import RAGImagingPipeline

	@pytest.mark.integration
	def test_full_pipeline_with_image():
	"""Integration test with real image and VLM call."""
	# Arrange
	pipeline = RAGImagingPipeline(
	catalog_path="dataset/catalog.jsonl",
	index_dir="artifacts/rag_index"
	)

	# Act
	result = pipeline.recommend(
	query="segment lungs",
	files=["tests/data/chest_ct.dcm"]
	)

	# Assert
	assert result.status == "complete"
	assert len(result.recommendations) > 0
	assert result.recommendations[0].accuracy_score > 70
	```

	### Parametrized Tests

	```python
	@pytest.mark.parametrize("query,expected_tool", [
	("segment brain MRI", "FreeSurfer"),
	("segment lungs CT", "TotalSegmentator"),
	("classify chest X-ray", "CheXNet"),
	])
	def test_retrieval_for_queries(query, expected_tool):
	"""Test retrieval returns expected tools for various queries."""
	index = VectorIndex()
	index.load("artifacts/rag_index")

	results = index.search(query, k=10)
	tool_names = [r['name'] for r in results]

	assert expected_tool in tool_names
	```

	### Fixtures

	```python
	import pytest

	@pytest.fixture
	def pipeline():
	"""Provide initialized pipeline for tests."""
	return RAGImagingPipeline(
	catalog_path="dataset/catalog.jsonl",
	index_dir="artifacts/rag_index"
	)

	@pytest.fixture
	def sample_dicom():
	"""Provide path to sample DICOM file."""
	return "tests/data/0002.DCM"

	def test_with_fixtures(pipeline, sample_dicom):
	"""Test using fixtures."""
	result = pipeline.recommend(
	query="analyze DICOM",
	files=[sample_dicom]
	)
	assert result is not None
	```

	<!-- ## Mocking

	### Mocking VLM Calls

	To avoid API costs during testing:

	```python
	from unittest.mock import Mock, patch
	import pytest

	@pytest.fixture
	def mock_vlm_response():
	"""Mock VLM response."""
	return {
	"status": "complete",
	"recommendations": [
	{
	"rank": 1,
	"name": "TotalSegmentator",
	"accuracy_score": 95,
	"explanation": "Test explanation",
	"reason": "task_match"
	}
	]
	}

	def test_with_mocked_vlm(mock_vlm_response):
	"""Test pipeline with mocked VLM."""
	with patch('ai_agent.agent.agent.Agent.run') as mock_run:
	mock_run.return_value = mock_vlm_response

	# Test code here
	result = pipeline.recommend(query="test", files=[])

	assert result["status"] == "complete"
	```

	### Mocking File Operations

	```python
	def test_file_validation():
	"""Test file validation without real files."""
	with patch('os.path.getsize') as mock_size:
	mock_size.return_value = 1024 * 1024 # 1 MB

	from ai_agent.utils.file_validator import validate_file
	is_valid = validate_file("fake.dcm")

	assert is_valid
	``` -->

	## Test Data

	### Using Test Cases

	Load test cases from JSON:

	```python
	import json

	def load_test_cases():
	"""Load test cases from data file."""
	with open("tests/data/test_data.json") as f:
	return json.load(f)

	@pytest.mark.parametrize("test_case", load_test_cases())
	def test_from_json(test_case):
	"""Test using cases from JSON file."""
	query = test_case["query"]
	expected = test_case["expected_tool"]

	# Test logic here
	assert expected in results
	```

	### Sample Data Files

	Keep sample files small:

	- DICOM: Single slice, low resolution
	- NIfTI: Small volume (e.g., 64×64×64)
	- Images: PNG/JPG under 1 MB

	<!-- ## Coverage

	### Measuring Coverage

	```bash
	# Run with coverage
	pytest --cov=ai_agent

	# Generate HTML report
	pytest --cov=ai_agent --cov-report=html

	# Open report
	open htmlcov/index.html # macOS
	# or
	xdg-open htmlcov/index.html # Linux
	```

	### Coverage Goals

	Aim for:

	- Overall: >80%
	- Critical paths: >90% (retrieval, agent, pipeline)
	- Utilities: >70%

	### Coverage Configuration

	In `pyproject.toml`:

	```toml
	[tool.coverage.run]
	source = ["src/ai_agent"]
	omit = ["tests/", "/migrations/*"]

	[tool.coverage.report]
	precision = 2
	show_missing = true
	skip_covered = false
	``` -->

	## Continuous Integration

	### GitHub Actions

	Tests run automatically on:

	- Pull requests
	- Pushes to main

	### CI Configuration

	```yaml
	# .github/workflows/test.yml
	name: Tests

	on: [push, pull_request]

	jobs:
	test:
	runs-on: ubuntu-latest
	steps:
	- uses: actions/checkout@v4
	- uses: actions/setup-python@v5
	with:
	python-version: '3.10'
	- run: pip install -e ".[dev]"
	- run: pytest --cov=ai_agent
	```

	## Best Practices

	### Do's

	✅ Test edge cases: Empty inputs, invalid data, etc.
	✅ Test error handling: Verify exceptions are caught
	✅ Use descriptive names: `test_retrieval_with_empty_query` not `test1`
	✅ Keep tests isolated: Each test should be independent
	✅ Use fixtures: Avoid repeating setup code
	✅ Mock expensive operations: VLM calls, network requests

	### Don'ts

	❌ Don't test implementation details: Test behavior, not internal state
	❌ Don't make tests depend on each other: Each should run independently
	❌ Don't commit large test files: Keep test data small
	❌ Don't skip error checking: Test both success and failure paths

	## Performance Testing

	### Benchmarking

	Use pytest-benchmark:

	```python
	def test_retrieval_performance(benchmark):
	"""Benchmark retrieval speed."""
	index = VectorIndex()
	index.load("artifacts/rag_index")

	result = benchmark(index.search, "segment lungs", k=10)

	assert len(result) == 10
	```

	### Profiling

	```bash
	# Profile tests
	pytest --profile

	# Generate SVG profile
	pytest --profile-svg
	```

	## Debugging Tests

	### Running in Debug Mode

	```python
	# Add to test
	import pdb; pdb.set_trace()

	# Run pytest
	pytest tests/test_file.py
	```

	### Verbose Output

	```bash
	# Show print statements
	pytest -s

	# Very verbose
	pytest -vv

	# Show local variables on failure
	pytest -l
	```

	### Running Single Test

	```bash
	# Run one test function
	pytest tests/test_file.py::test_function_name -v
	```

	## Next Steps

	- Review [Project Structure](structure.md)
	- Read [Contributing Guide](contributing.md)
	- Explore [Architecture](../architecture/overview.md)