Spaces:

SDSC
/

ai-agent

Paused

File size: 8,923 Bytes

07c2476

# Testing (still under development)

The AI Imaging Agent uses pytest for testing. This guide covers running tests and writing new ones.

**Note:** We are still developing some tests for the agent, hence this part is not relevant for now.

## Running Tests

### Basic Usage

```bash

# Run all tests

pytest



# Run specific test file

pytest tests/test_retrieval_pipeline.py



# Run specific test

pytest tests/test_retrieval_pipeline.py::test_basic_retrieval



# Run with verbose output

pytest -v



# Run with coverage

pytest --cov=ai_agent --cov-report=html

```

### Test Categories

Tests are marked by category:

```bash

# Run only unit tests

pytest -m unit



# Run only integration tests

pytest -m integration



# Skip slow tests

pytest -m "not slow"

```

## Test Organization

### Directory Structure

```

tests/

├── data/

│   ├── test_data.json         # Test cases

│   └── 0002.DCM               # Sample DICOM file

├── test_retrieval_pipeline.py # Retrieval tests

├── test_deepwiki_repo_info.py # Repo info tests

├── test_gpt4o_vision.py       # VLM tests (integration)

└── __pycache__/

```

### Test File Naming

- `test_*.py`: Test files
- `*_test.py`: Alternative naming (less common)

### Test Function Naming

```python

def test_basic_retrieval():

    """Test basic retrieval functionality."""

    pass



def test_edge_case_empty_query():

    """Test handling of empty query."""

    pass



def test_integration_full_pipeline():

    """Integration test for complete pipeline."""

    pass

```

## Writing Tests

### Unit Test Example

```python

import pytest

from ai_agent.retriever.vector_index import VectorIndex



def test_vector_index_search():

    """Test FAISS vector search."""

    # Arrange

    index = VectorIndex()

    index.load("artifacts/rag_index")

    

    query = "segment lungs CT"

    

    # Act

    results = index.search(query, k=5)

    

    # Assert

    assert len(results) == 5

    assert all(r['score'] > 0 for r in results)

    assert 'TotalSegmentator' in [r['name'] for r in results]

```

### Integration Test Example

```python

import pytest

from ai_agent.api.pipeline import RAGImagingPipeline



@pytest.mark.integration

def test_full_pipeline_with_image():

    """Integration test with real image and VLM call."""

    # Arrange

    pipeline = RAGImagingPipeline(

        catalog_path="dataset/catalog.jsonl",

        index_dir="artifacts/rag_index"

    )

    

    # Act

    result = pipeline.recommend(

        query="segment lungs",

        files=["tests/data/chest_ct.dcm"]

    )

    

    # Assert

    assert result.status == "complete"

    assert len(result.recommendations) > 0

    assert result.recommendations[0].accuracy_score > 70

```

### Parametrized Tests

```python

@pytest.mark.parametrize("query,expected_tool", [

    ("segment brain MRI", "FreeSurfer"),

    ("segment lungs CT", "TotalSegmentator"),

    ("classify chest X-ray", "CheXNet"),

])

def test_retrieval_for_queries(query, expected_tool):

    """Test retrieval returns expected tools for various queries."""

    index = VectorIndex()

    index.load("artifacts/rag_index")

    

    results = index.search(query, k=10)

    tool_names = [r['name'] for r in results]

    

    assert expected_tool in tool_names

```

### Fixtures

```python

import pytest



@pytest.fixture

def pipeline():

    """Provide initialized pipeline for tests."""

    return RAGImagingPipeline(

        catalog_path="dataset/catalog.jsonl",

        index_dir="artifacts/rag_index"

    )



@pytest.fixture

def sample_dicom():

    """Provide path to sample DICOM file."""

    return "tests/data/0002.DCM"



def test_with_fixtures(pipeline, sample_dicom):

    """Test using fixtures."""

    result = pipeline.recommend(

        query="analyze DICOM",

        files=[sample_dicom]

    )

    assert result is not None

```

<!-- ## Mocking

### Mocking VLM Calls

To avoid API costs during testing:

```python

from unittest.mock import Mock, patch

import pytest



@pytest.fixture

def mock_vlm_response():

    """Mock VLM response."""

    return {

        "status": "complete",

        "recommendations": [

            {

                "rank": 1,

                "name": "TotalSegmentator",

                "accuracy_score": 95,

                "explanation": "Test explanation",

                "reason": "task_match"

            }

        ]

    }



def test_with_mocked_vlm(mock_vlm_response):

    """Test pipeline with mocked VLM."""

    with patch('ai_agent.agent.agent.Agent.run') as mock_run:

        mock_run.return_value = mock_vlm_response

        

        # Test code here

        result = pipeline.recommend(query="test", files=[])

        

        assert result["status"] == "complete"

```

### Mocking File Operations

```python

def test_file_validation():

    """Test file validation without real files."""

    with patch('os.path.getsize') as mock_size:

        mock_size.return_value = 1024 * 1024  # 1 MB

        

        from ai_agent.utils.file_validator import validate_file

        is_valid = validate_file("fake.dcm")

        

        assert is_valid

``` -->



## Test Data



### Using Test Cases



Load test cases from JSON:



```python

import json



def load_test_cases():

    """Load test cases from data file."""

    with open("tests/data/test_data.json") as f:

        return json.load(f)



@pytest.mark.parametrize("test_case", load_test_cases())

def test_from_json(test_case):

    """Test using cases from JSON file."""

    query = test_case["query"]

    expected = test_case["expected_tool"]

    

    # Test logic here

    assert expected in results

```

### Sample Data Files

Keep sample files small:

- **DICOM**: Single slice, low resolution
- **NIfTI**: Small volume (e.g., 64×64×64)
- **Images**: PNG/JPG under 1 MB

<!-- ## Coverage

### Measuring Coverage

```bash

# Run with coverage

pytest --cov=ai_agent



# Generate HTML report

pytest --cov=ai_agent --cov-report=html



# Open report

open htmlcov/index.html  # macOS

# or

xdg-open htmlcov/index.html  # Linux

```

### Coverage Goals

Aim for:

- **Overall**: >80%
- **Critical paths**: >90% (retrieval, agent, pipeline)
- **Utilities**: >70%

### Coverage Configuration

In `pyproject.toml`:

```toml

[tool.coverage.run]

source = ["src/ai_agent"]

omit = ["tests/*", "*/migrations/*"]



[tool.coverage.report]

precision = 2

show_missing = true

skip_covered = false

``` -->



## Continuous Integration



### GitHub Actions



Tests run automatically on:



- Pull requests

- Pushes to main



### CI Configuration



```yaml

# .github/workflows/test.yml

name: Tests



on: [push, pull_request]



jobs:

  test:

    runs-on: ubuntu-latest

    steps:

      - uses: actions/checkout@v4

      - uses: actions/setup-python@v5

        with:

          python-version: '3.10'

      - run: pip install -e ".[dev]"

      - run: pytest --cov=ai_agent

```

## Best Practices

### Do's

✅ **Test edge cases**: Empty inputs, invalid data, etc.  
✅ **Test error handling**: Verify exceptions are caught  
✅ **Use descriptive names**: `test_retrieval_with_empty_query` not `test1`  
✅ **Keep tests isolated**: Each test should be independent  
✅ **Use fixtures**: Avoid repeating setup code  
✅ **Mock expensive operations**: VLM calls, network requests

### Don'ts

❌ **Don't test implementation details**: Test behavior, not internal state  
❌ **Don't make tests depend on each other**: Each should run independently  
❌ **Don't commit large test files**: Keep test data small  
❌ **Don't skip error checking**: Test both success and failure paths  

## Performance Testing

### Benchmarking

Use pytest-benchmark:

```python

def test_retrieval_performance(benchmark):

    """Benchmark retrieval speed."""

    index = VectorIndex()

    index.load("artifacts/rag_index")

    

    result = benchmark(index.search, "segment lungs", k=10)

    

    assert len(result) == 10

```

### Profiling

```bash

# Profile tests

pytest --profile



# Generate SVG profile

pytest --profile-svg

```

## Debugging Tests

### Running in Debug Mode

```python

# Add to test

import pdb; pdb.set_trace()



# Run pytest

pytest tests/test_file.py

```

### Verbose Output

```bash

# Show print statements

pytest -s



# Very verbose

pytest -vv



# Show local variables on failure

pytest -l

```

### Running Single Test

```bash

# Run one test function

pytest tests/test_file.py::test_function_name -v

```

## Next Steps

- Review [Project Structure](structure.md)
- Read [Contributing Guide](contributing.md)
- Explore [Architecture](../architecture/overview.md)