IntegraChat / TESTING_GUIDE.md
nothingworry's picture
all the thing
78b6d7b
|
raw
history blame
11.1 kB
# IntegraChat Testing Guide
This guide explains how to test all the new features and improvements in IntegraChat.
## Prerequisites
1. **Install Dependencies**
```bash
pip install -r requirements.txt
```
2. **Environment Setup**
- Create a `.env` file or set environment variables
- Optional: Set up Ollama for LLM testing
- Optional: Set up Supabase for production analytics
## Test Structure
### 1. Unit Tests
Run unit tests for individual components:
```bash
# Run all unit tests
pytest backend/tests/
# Run specific test files
pytest backend/tests/test_analytics_store.py -v
pytest backend/tests/test_enhanced_admin_rules.py -v
pytest backend/tests/test_api_endpoints.py -v
# Run with coverage
pytest backend/tests/ --cov=backend/api --cov-report=html
```
### 2. Integration Tests
Test API endpoints with the FastAPI test client:
```bash
pytest backend/tests/test_api_endpoints.py -v
```
**Note**: Some integration tests may fail if MCP servers or LLM are not running. That's expected.
### 3. Manual Testing Scripts
Create test data and verify functionality manually:
#### A. Test Analytics Store
```bash
python -c "
from backend.api.storage.analytics_store import AnalyticsStore
import time
store = AnalyticsStore()
# Log tool usage
store.log_tool_usage('test_tenant', 'rag', latency_ms=150, tokens_used=500, success=True)
store.log_tool_usage('test_tenant', 'web', latency_ms=80, success=True)
# Log red-flag violation
store.log_redflag_violation(
'test_tenant',
'rule1',
'.*password.*',
'high',
'password123',
confidence=0.95
)
# Log RAG search
store.log_rag_search('test_tenant', 'test query', hits_count=5, avg_score=0.85, top_score=0.92)
# Log agent query
store.log_agent_query('test_tenant', 'test message', intent='rag', tools_used=['rag', 'llm'], total_tokens=1000)
# Get stats
print('Tool Usage:', store.get_tool_usage_stats('test_tenant'))
print('Violations:', store.get_redflag_violations('test_tenant'))
print('Activity:', store.get_activity_summary('test_tenant'))
print('RAG Quality:', store.get_rag_quality_metrics('test_tenant'))
"
```
#### B. Test Admin Rules with Regex
```bash
python -c "
from backend.api.storage.rules_store import RulesStore
import re
store = RulesStore()
# Add rule with regex pattern
store.add_rule(
'test_tenant',
'Block password queries',
pattern='.*password.*|.*pwd.*',
severity='high',
description='Blocks password-related queries'
)
# Get detailed rules
rules = store.get_rules_detailed('test_tenant')
print('Rules:', rules)
# Test regex matching
pattern = rules[0]['pattern']
regex = re.compile(pattern, re.IGNORECASE)
test_text = 'What is my password?'
match = regex.search(test_text)
print(f'Match for \"{test_text}\": {match is not None}')
"
```
## API Endpoint Testing
### Using curl
#### 1. Test Analytics Endpoints
```bash
# Overview
curl -X GET "http://localhost:8000/analytics/overview?days=30" \
-H "x-tenant-id: test_tenant"
# Tool Usage
curl -X GET "http://localhost:8000/analytics/tool-usage?days=30" \
-H "x-tenant-id: test_tenant"
# RAG Quality
curl -X GET "http://localhost:8000/analytics/rag-quality?days=30" \
-H "x-tenant-id: test_tenant"
# Red Flags
curl -X GET "http://localhost:8000/analytics/redflags?limit=50&days=30" \
-H "x-tenant-id: test_tenant"
```
#### 2. Test Admin Endpoints
```bash
# Add rule with regex and severity
curl -X POST "http://localhost:8000/admin/rules" \
-H "x-tenant-id: test_tenant" \
-H "Content-Type: application/json" \
-d '{
"rule": "Block password queries",
"pattern": ".*password.*",
"severity": "high",
"description": "Blocks password-related queries"
}'
# Get detailed rules
curl -X GET "http://localhost:8000/admin/rules?detailed=true" \
-H "x-tenant-id: test_tenant"
# Get violations
curl -X GET "http://localhost:8000/admin/violations?limit=50&days=30" \
-H "x-tenant-id: test_tenant"
# Get tool logs
curl -X GET "http://localhost:8000/admin/tools/logs?tool_name=rag&days=7" \
-H "x-tenant-id: test_tenant"
```
#### 3. Test Agent Endpoints
```bash
# Agent chat (normal)
curl -X POST "http://localhost:8000/agent/message" \
-H "Content-Type: application/json" \
-d '{
"tenant_id": "test_tenant",
"message": "What is the company policy?",
"temperature": 0.0
}'
# Agent debug
curl -X POST "http://localhost:8000/agent/debug" \
-H "Content-Type: application/json" \
-d '{
"tenant_id": "test_tenant",
"message": "What is the company policy?",
"temperature": 0.0
}'
# Agent plan
curl -X POST "http://localhost:8000/agent/plan" \
-H "Content-Type: application/json" \
-d '{
"tenant_id": "test_tenant",
"message": "What is the company policy?",
"temperature": 0.0
}'
```
### Using Python requests
Create a test script `test_api_manual.py`:
```python
import requests
import json
BASE_URL = "http://localhost:8000"
TENANT_ID = "test_tenant"
headers = {"x-tenant-id": TENANT_ID}
# Test analytics
print("Testing Analytics Endpoints...")
response = requests.get(f"{BASE_URL}/analytics/overview?days=30", headers=headers)
print(f"Overview: {response.status_code} - {json.dumps(response.json(), indent=2)}")
response = requests.get(f"{BASE_URL}/analytics/tool-usage?days=30", headers=headers)
print(f"Tool Usage: {response.status_code} - {json.dumps(response.json(), indent=2)}")
# Test admin rules
print("\nTesting Admin Rules...")
response = requests.post(
f"{BASE_URL}/admin/rules",
headers=headers,
json={
"rule": "Block password queries",
"pattern": ".*password.*",
"severity": "high"
}
)
print(f"Add Rule: {response.status_code} - {json.dumps(response.json(), indent=2)}")
response = requests.get(
f"{BASE_URL}/admin/rules?detailed=true",
headers=headers
)
print(f"Get Rules: {response.status_code} - {json.dumps(response.json(), indent=2)}")
# Test agent endpoints
print("\nTesting Agent Endpoints...")
response = requests.post(
f"{BASE_URL}/agent/plan",
json={
"tenant_id": TENANT_ID,
"message": "What is the company policy?",
"temperature": 0.0
}
)
print(f"Agent Plan: {response.status_code} - {json.dumps(response.json(), indent=2)}")
```
Run it:
```bash
python test_api_manual.py
```
## End-to-End Testing Workflow
### Step 1: Start Backend Services
```bash
# Terminal 1: Start FastAPI backend
cd backend/api
uvicorn main:app --port 8000 --reload
# Terminal 2: Start unified MCP server (rag/web/admin tools)
python backend/mcp_server/server.py
# Optional: Start Ollama for LLM
ollama serve
```
### Step 2: Generate Test Data
Run the analytics and rules tests to populate the database:
```bash
pytest backend/tests/test_analytics_store.py -v
pytest backend/tests/test_enhanced_admin_rules.py -v
```
### Step 3: Test Agent Flow
1. **Add some admin rules:**
```bash
curl -X POST "http://localhost:8000/admin/rules" \
-H "x-tenant-id: test_tenant" \
-H "Content-Type: application/json" \
-d '{"rule": "Block password queries", "pattern": ".*password.*", "severity": "high"}'
```
2. **Send a query that triggers red-flag:**
```bash
curl -X POST "http://localhost:8000/agent/message" \
-H "Content-Type: application/json" \
-d '{"tenant_id": "test_tenant", "message": "What is my password?"}'
```
3. **Check violations were logged:**
```bash
curl -X GET "http://localhost:8000/admin/violations" \
-H "x-tenant-id: test_tenant"
```
4. **Send normal queries and check analytics:**
```bash
curl -X POST "http://localhost:8000/agent/message" \
-H "Content-Type: application/json" \
-d '{"tenant_id": "test_tenant", "message": "What is the company policy?"}'
curl -X GET "http://localhost:8000/analytics/overview" \
-H "x-tenant-id: test_tenant"
```
5. **Use debug endpoint to see reasoning:**
```bash
curl -X POST "http://localhost:8000/agent/debug" \
-H "Content-Type: application/json" \
-d '{"tenant_id": "test_tenant", "message": "What is the company policy?"}'
```
### Step 4: Verify Database
Check that data is being stored:
```bash
# SQLite databases are in data/ directory
sqlite3 data/analytics.db "SELECT * FROM tool_usage_events LIMIT 10;"
sqlite3 data/analytics.db "SELECT * FROM redflag_violations LIMIT 10;"
sqlite3 data/admin_rules.db "SELECT * FROM admin_rules;"
```
## Testing Checklist
### Analytics Store
- [ ] Tool usage logging works
- [ ] Red-flag violations are logged
- [ ] RAG search events are logged with quality metrics
- [ ] Agent query events are logged
- [ ] Stats can be filtered by time
- [ ] Multiple tenants are isolated
### Admin Rules
- [ ] Rules can be added with regex patterns
- [ ] Severity levels work (low/medium/high/critical)
- [ ] Rules without pattern use rule text
- [ ] Disabled rules are not returned
- [ ] Multiple tenants are isolated
- [ ] Regex patterns actually match correctly
### API Endpoints
- [ ] `/analytics/overview` returns correct data
- [ ] `/analytics/tool-usage` returns stats
- [ ] `/analytics/rag-quality` returns metrics
- [ ] `/admin/rules` accepts regex/severity
- [ ] `/admin/violations` returns violations
- [ ] `/admin/tools/logs` returns tool usage
- [ ] `/agent/debug` returns reasoning trace
- [ ] `/agent/plan` returns tool selection plan
- [ ] Missing tenant_id returns 400
### Integration
- [ ] Agent orchestrator logs to analytics
- [ ] Red-flag detector logs violations
- [ ] Tool calls are tracked
- [ ] Multi-step workflows are logged
- [ ] Errors are logged correctly
## Common Issues
### Database Not Found
- Ensure `data/` directory exists
- Analytics store will create it automatically
### Tests Fail Due to Missing Services
- Some tests require MCP servers or LLM to be running
- Mock these services or skip tests if services unavailable
- Unit tests should work without external services
### Import Errors
- Ensure you're running from project root
- Check that `backend/` is in Python path
- Install all dependencies: `pip install -r requirements.txt`
## Performance Testing
For large-scale testing:
```python
# Load test analytics store
from backend.api.storage.analytics_store import AnalyticsStore
import time
store = AnalyticsStore()
tenant_id = "load_test_tenant"
start = time.time()
for i in range(1000):
store.log_tool_usage(tenant_id, "rag", latency_ms=100 + i % 50)
elapsed = time.time() - start
print(f"Logged 1000 events in {elapsed:.2f}s ({1000/elapsed:.0f} events/sec)")
# Query performance
start = time.time()
stats = store.get_tool_usage_stats(tenant_id)
elapsed = time.time() - start
print(f"Query took {elapsed*1000:.2f}ms")
```
## Next Steps
1. **Add more test cases** for edge cases
2. **Set up CI/CD** to run tests automatically
3. **Add performance benchmarks** for analytics queries
4. **Create integration test suite** that spins up all services
5. **Add E2E tests** using Playwright or Selenium for frontend
For questions or issues, check the test files in `backend/tests/` or refer to the main README.md.