File size: 4,602 Bytes
c4f5f25
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
# Development Guide

## Overview

MediGuard AI is a medical biomarker analysis system that uses agentic RAG (Retrieval-Augmented Generation) and multi-agent workflows to provide clinical insights.

## Project Structure

```
Agentic-RagBot/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ agents/           # Agent implementations (biomarker_analyzer, disease_explainer, etc.)
β”‚   β”œβ”€β”€ services/         # Core services (retrieval, embeddings, opensearch, etc.)
β”‚   β”œβ”€β”€ routers/          # FastAPI route handlers
β”‚   β”œβ”€β”€ models/           # Data models
β”‚   β”œβ”€β”€ schemas/          # Pydantic schemas
β”‚   β”œβ”€β”€ state.py          # State management
β”‚   β”œβ”€β”€ workflow.py       # Workflow orchestration
β”‚   β”œβ”€β”€ main.py           # FastAPI application factory
β”‚   └── settings.py       # Configuration management
β”œβ”€β”€ tests/                # Test suite
β”œβ”€β”€ data/                 # Data files (vector stores, etc.)
└── docs/                 # Documentation
```

## Development Setup

1. **Install dependencies**:
   ```bash
   pip install -r requirements.txt
   ```

2. **Environment variables**:
   - Copy `.env.example` to `.env` and configure
   - Key variables:
     - `API__HOST`: Server host (default: 127.0.0.1)
     - `API__PORT`: Server port (default: 8000)
     - `GRADIO_SERVER_NAME`: Gradio host (default: 127.0.0.1)
     - `GRADIO_PORT`: Gradio port (default: 7860)

3. **Running the application**:
   ```bash
   # FastAPI server
   python -m src.main
   
   # Gradio interface
   python -m src.gradio_app
   ```

## Code Quality

### Linting
```bash
# Check code quality
ruff check src/

# Auto-fix issues
ruff check src/ --fix
```

### Security
```bash
# Run security scan
bandit -r src/
```

### Testing
```bash
# Run all tests
pytest tests/

# Run with coverage
pytest tests/ --cov=src --cov-report=term-missing

# Run specific test file
pytest tests/test_agents.py -v
```

## Testing Guidelines

1. **Test structure**:
   - Unit tests for individual components
   - Integration tests for workflows
   - Mock external dependencies (LLMs, databases)

2. **Test coverage**:
   - Current coverage: 58%
   - Target: 70%+
   - Focus on critical paths and business logic

3. **Best practices**:
   - Use descriptive test names
   - Mock external services
   - Test both success and failure cases
   - Keep tests isolated and independent

## Architecture

### Multi-Agent Workflow

The system uses a multi-agent architecture with the following agents:

1. **BiomarkerAnalyzer**: Validates and analyzes biomarker values
2. **DiseaseExplainer**: Provides disease pathophysiology explanations
3. **BiomarkerLinker**: Connects biomarkers to disease predictions
4. **ClinicalGuidelines**: Provides evidence-based recommendations
5. **ConfidenceAssessor**: Evaluates prediction reliability
6. **ResponseSynthesizer**: Compiles final response

### State Management

- `GuildState`: Shared state between agents
- `PatientInput`: Input data structure
- `ExplanationSOP`: Standard operating procedures

## Configuration

Settings are managed via Pydantic with environment variable support:

```python
from src.settings import get_settings

settings = get_settings()
print(settings.api.host)
```

## Deployment

### Production Considerations

1. **Security**:
   - Bind to specific interfaces (not 0.0.0.0)
   - Use HTTPS in production
   - Configure proper CORS origins

2. **Performance**:
   - Use multiple workers
   - Configure connection pooling
   - Monitor memory usage

3. **Monitoring**:
   - Enable health checks
   - Configure logging
   - Set up metrics collection

## Contributing

1. Fork the repository
2. Create a feature branch
3. Write tests for new functionality
4. Ensure all tests pass
5. Submit a pull request

## Troubleshooting

### Common Issues

1. **Tests failing with import errors**:
   - Check PYTHONPATH includes project root
   - Ensure all dependencies installed

2. **Vector store errors**:
   - Check data/vector_stores directory exists
   - Verify embedding model is accessible

3. **LLM connection issues**:
   - Check Ollama is running
   - Verify model is downloaded

## Performance Optimization

1. **Caching**: Redis for frequently accessed data
2. **Async**: Use async/await for I/O operations
3. **Batching**: Process multiple items when possible
4. **Lazy loading**: Load resources only when needed

## Security Best Practices

1. Never commit secrets or API keys
2. Use environment variables for configuration
3. Validate all inputs
4. Implement proper error handling
5. Regular security scans with Bandit