| # Testing Guide | |
| > **Last Updated**: 2025-12-06 | |
| This guide covers testing strategy, patterns, and best practices for DeepBoner. | |
| ## Quick Reference | |
| ```bash | |
| # Run all tests | |
| make test | |
| # Run with coverage | |
| make test-cov | |
| # Run specific file | |
| uv run pytest tests/unit/utils/test_config.py -v | |
| # Run specific test | |
| uv run pytest tests/unit/utils/test_config.py::TestSettings::test_default -v | |
| # Run by marker | |
| uv run pytest -m unit # Unit tests only | |
| uv run pytest -m integration # Integration tests only | |
| uv run pytest -m "not slow" # Skip slow tests | |
| ``` | |
| ## Test Organization | |
| ``` | |
| tests/ | |
| βββ conftest.py # Shared fixtures | |
| βββ unit/ # Unit tests (mocked, fast) | |
| β βββ orchestrators/ | |
| β βββ agents/ | |
| β βββ clients/ | |
| β βββ tools/ | |
| β βββ services/ | |
| β βββ utils/ | |
| β βββ prompts/ | |
| β βββ agent_factory/ | |
| β βββ config/ | |
| β βββ graph/ | |
| β βββ mcp/ | |
| βββ integration/ # Integration tests (real APIs) | |
| βββ e2e/ # End-to-end tests | |
| ``` | |
| ### Directory Mapping | |
| Tests mirror the `src/` structure: | |
| - `src/tools/pubmed.py` β `tests/unit/tools/test_pubmed.py` | |
| - `src/utils/config.py` β `tests/unit/utils/test_config.py` | |
| ## Test Markers | |
| ### Available Markers | |
| | Marker | Purpose | Example | | |
| |--------|---------|---------| | |
| | `@pytest.mark.unit` | Unit tests (mocked) | Most tests | | |
| | `@pytest.mark.integration` | Real API calls | API testing | | |
| | `@pytest.mark.slow` | Long-running tests | Full pipeline | | |
| | `@pytest.mark.e2e` | End-to-end tests | Complete flows | | |
| ### Using Markers | |
| ```python | |
| import pytest | |
| @pytest.mark.unit | |
| def test_search_returns_results(): | |
| """Unit test with mocked API.""" | |
| pass | |
| @pytest.mark.integration | |
| def test_pubmed_real_api(): | |
| """Integration test with real PubMed API.""" | |
| pass | |
| ``` | |
| ### Running by Marker | |
| ```bash | |
| uv run pytest -m unit # Only unit tests | |
| uv run pytest -m "not integration" # Skip integration tests | |
| uv run pytest -m "unit or slow" # Unit OR slow tests | |
| ``` | |
| ## Test Fixtures | |
| ### Core Fixtures (conftest.py) | |
| #### `mock_httpx_client` | |
| Mocks httpx for HTTP testing: | |
| ```python | |
| def test_pubmed_search(mock_httpx_client): | |
| mock_httpx_client.get("https://eutils.ncbi.nlm.nih.gov/...").respond( | |
| 200, | |
| json={"esearchresult": {"idlist": ["12345"]}} | |
| ) | |
| tool = PubMedTool() | |
| result = tool.search("test query") | |
| assert len(result.evidence) > 0 | |
| ``` | |
| #### `mock_llm_response` | |
| Mocks LLM completions: | |
| ```python | |
| def test_judge_evaluates(mock_llm_response): | |
| mock_llm_response("The evidence is sufficient.") | |
| judge = JudgeAgent() | |
| assessment = judge.assess(evidence) | |
| assert assessment.sufficient | |
| ``` | |
| #### `sample_evidence` | |
| Provides test evidence data: | |
| ```python | |
| def test_synthesis(sample_evidence): | |
| report = synthesizer.create_report(sample_evidence) | |
| assert report.title | |
| ``` | |
| ### Creating Fixtures | |
| ```python | |
| # tests/conftest.py | |
| @pytest.fixture | |
| def mock_search_handler(mocker): | |
| """Mock SearchHandler for unit tests.""" | |
| handler = mocker.Mock(spec=SearchHandler) | |
| handler.search_all.return_value = SearchResult( | |
| query="test", | |
| evidence=[], | |
| sources_searched=["pubmed"], | |
| total_found=0 | |
| ) | |
| return handler | |
| ``` | |
| ## Mocking Patterns | |
| ### HTTP Mocking with respx | |
| ```python | |
| import respx | |
| from httpx import Response | |
| @pytest.mark.unit | |
| def test_api_call(): | |
| with respx.mock: | |
| respx.get("https://api.example.com/data").mock( | |
| return_value=Response(200, json={"result": "ok"}) | |
| ) | |
| result = make_api_call() | |
| assert result == "ok" | |
| ``` | |
| ### General Mocking with pytest-mock | |
| ```python | |
| def test_with_mock(mocker): | |
| # Mock a function | |
| mock_func = mocker.patch("src.tools.pubmed.fetch_results") | |
| mock_func.return_value = {"results": []} | |
| # Mock a class method | |
| mocker.patch.object(PubMedTool, "search", return_value=[]) | |
| # Mock a property | |
| mocker.patch.object(Settings, "has_openai_key", True) | |
| ``` | |
| ### Mocking Async Functions | |
| ```python | |
| import pytest | |
| from unittest.mock import AsyncMock | |
| @pytest.mark.asyncio | |
| async def test_async_search(mocker): | |
| mock_search = AsyncMock(return_value=[]) | |
| mocker.patch.object(SearchHandler, "search_all", mock_search) | |
| result = await handler.search_all("query") | |
| assert result == [] | |
| ``` | |
| ## Writing Tests | |
| ### Test Structure (AAA Pattern) | |
| ```python | |
| def test_search_handler_aggregates_results(): | |
| """Verify search handler combines results from multiple sources.""" | |
| # Arrange | |
| handler = SearchHandler() | |
| query = "testosterone therapy" | |
| # Act | |
| result = handler.search_all(query) | |
| # Assert | |
| assert len(result.evidence) > 0 | |
| assert "pubmed" in result.sources_searched | |
| ``` | |
| ### Test Naming | |
| ```python | |
| # Good: Describes behavior | |
| def test_judge_returns_continue_when_evidence_insufficient(): | |
| pass | |
| def test_search_raises_rate_limit_error_on_429(): | |
| pass | |
| # Bad: Vague | |
| def test_judge(): | |
| pass | |
| def test_search_error(): | |
| pass | |
| ``` | |
| ### Testing Exceptions | |
| ```python | |
| import pytest | |
| from src.utils.exceptions import SearchError | |
| def test_search_raises_on_api_failure(): | |
| """Verify SearchError is raised when API returns error.""" | |
| with pytest.raises(SearchError) as exc_info: | |
| search_with_failing_api() | |
| assert "API returned 500" in str(exc_info.value) | |
| ``` | |
| ### Async Tests | |
| ```python | |
| import pytest | |
| @pytest.mark.asyncio | |
| async def test_async_search(): | |
| """Test async search operation.""" | |
| result = await search_handler.search_all("query") | |
| assert result is not None | |
| ``` | |
| ## Test Data | |
| ### Using Factories | |
| ```python | |
| # tests/factories.py | |
| def make_evidence( | |
| content: str = "Test content", | |
| source: str = "pubmed", | |
| relevance: float = 0.8 | |
| ) -> Evidence: | |
| return Evidence( | |
| content=content, | |
| citation=Citation( | |
| source=source, | |
| title="Test Paper", | |
| url="https://test.com", | |
| date="2024-01-01", | |
| authors=["Test Author"] | |
| ), | |
| relevance=relevance, | |
| metadata={} | |
| ) | |
| ``` | |
| ### Parameterized Tests | |
| ```python | |
| import pytest | |
| @pytest.mark.parametrize("query,expected_count", [ | |
| ("testosterone", 10), | |
| ("estrogen therapy", 5), | |
| ("very specific rare condition", 0), | |
| ]) | |
| def test_search_returns_expected_results(query, expected_count, mock_api): | |
| result = search(query) | |
| assert len(result.evidence) == expected_count | |
| ``` | |
| ## Coverage | |
| ### Running with Coverage | |
| ```bash | |
| # Terminal report | |
| make test-cov | |
| # HTML report | |
| uv run pytest --cov=src --cov-report=html | |
| open htmlcov/index.html | |
| ``` | |
| ### Coverage Configuration | |
| From `pyproject.toml`: | |
| ```toml | |
| [tool.coverage.run] | |
| source = ["src"] | |
| omit = ["*/__init__.py"] | |
| [tool.coverage.report] | |
| exclude_lines = [ | |
| "pragma: no cover", | |
| "if TYPE_CHECKING:", | |
| "raise NotImplementedError", | |
| ] | |
| ``` | |
| ### Coverage Targets | |
| | Module | Target | Notes | | |
| |--------|--------|-------| | |
| | `utils/` | 90%+ | Core utilities | | |
| | `tools/` | 80%+ | API wrappers | | |
| | `orchestrators/` | 70%+ | Complex logic | | |
| | `agents/` | 70%+ | LLM-dependent | | |
| ## CI Integration | |
| Tests run in GitHub Actions: | |
| ```yaml | |
| # .github/workflows/ci.yml | |
| - name: Run Tests | |
| run: uv run pytest --cov=src --cov-report=xml | |
| - name: Upload Coverage | |
| uses: codecov/codecov-action@v4 | |
| ``` | |
| ## Best Practices | |
| ### Do | |
| - Write tests before implementation (TDD) | |
| - Use descriptive test names | |
| - Test edge cases and error conditions | |
| - Keep tests fast (mock external dependencies) | |
| - Use fixtures for shared setup | |
| - Test one behavior per test | |
| ### Don't | |
| - Test implementation details | |
| - Make tests dependent on order | |
| - Use real API keys in tests | |
| - Skip error handling tests | |
| - Leave flaky tests unfixed | |
| ## Troubleshooting | |
| ### Tests pass locally but fail in CI | |
| 1. Check for hardcoded paths | |
| 2. Verify timezone handling | |
| 3. Look for async timing issues | |
| 4. Check environment variables | |
| ### Async test hangs | |
| ```python | |
| # Add timeout | |
| @pytest.mark.asyncio | |
| @pytest.mark.timeout(10) | |
| async def test_with_timeout(): | |
| pass | |
| ``` | |
| ### Mock not working | |
| ```python | |
| # Ensure correct import path | |
| mocker.patch("src.tools.pubmed.PubMedTool") # Correct | |
| mocker.patch("tools.pubmed.PubMedTool") # Wrong | |
| ``` | |
| --- | |
| ## Related Documentation | |
| - [Code Style Guide](code-style.md) | |
| - [Contributing Guide](../../CONTRIBUTING.md) | |
| - [Component Inventory](../architecture/component-inventory.md) | |