IntegraChat / TESTING_GUIDE.md
nothingworry's picture
all the thing
78b6d7b
|
raw
history blame
11.1 kB

IntegraChat Testing Guide

This guide explains how to test all the new features and improvements in IntegraChat.

Prerequisites

  1. Install Dependencies

    pip install -r requirements.txt
    
  2. Environment Setup

    • Create a .env file or set environment variables
    • Optional: Set up Ollama for LLM testing
    • Optional: Set up Supabase for production analytics

Test Structure

1. Unit Tests

Run unit tests for individual components:

# Run all unit tests
pytest backend/tests/

# Run specific test files
pytest backend/tests/test_analytics_store.py -v
pytest backend/tests/test_enhanced_admin_rules.py -v
pytest backend/tests/test_api_endpoints.py -v

# Run with coverage
pytest backend/tests/ --cov=backend/api --cov-report=html

2. Integration Tests

Test API endpoints with the FastAPI test client:

pytest backend/tests/test_api_endpoints.py -v

Note: Some integration tests may fail if MCP servers or LLM are not running. That's expected.

3. Manual Testing Scripts

Create test data and verify functionality manually:

A. Test Analytics Store

python -c "
from backend.api.storage.analytics_store import AnalyticsStore
import time

store = AnalyticsStore()

# Log tool usage
store.log_tool_usage('test_tenant', 'rag', latency_ms=150, tokens_used=500, success=True)
store.log_tool_usage('test_tenant', 'web', latency_ms=80, success=True)

# Log red-flag violation
store.log_redflag_violation(
    'test_tenant', 
    'rule1', 
    '.*password.*', 
    'high',
    'password123',
    confidence=0.95
)

# Log RAG search
store.log_rag_search('test_tenant', 'test query', hits_count=5, avg_score=0.85, top_score=0.92)

# Log agent query
store.log_agent_query('test_tenant', 'test message', intent='rag', tools_used=['rag', 'llm'], total_tokens=1000)

# Get stats
print('Tool Usage:', store.get_tool_usage_stats('test_tenant'))
print('Violations:', store.get_redflag_violations('test_tenant'))
print('Activity:', store.get_activity_summary('test_tenant'))
print('RAG Quality:', store.get_rag_quality_metrics('test_tenant'))
"

B. Test Admin Rules with Regex

python -c "
from backend.api.storage.rules_store import RulesStore
import re

store = RulesStore()

# Add rule with regex pattern
store.add_rule(
    'test_tenant',
    'Block password queries',
    pattern='.*password.*|.*pwd.*',
    severity='high',
    description='Blocks password-related queries'
)

# Get detailed rules
rules = store.get_rules_detailed('test_tenant')
print('Rules:', rules)

# Test regex matching
pattern = rules[0]['pattern']
regex = re.compile(pattern, re.IGNORECASE)
test_text = 'What is my password?'
match = regex.search(test_text)
print(f'Match for \"{test_text}\": {match is not None}')
"

API Endpoint Testing

Using curl

1. Test Analytics Endpoints

# Overview
curl -X GET "http://localhost:8000/analytics/overview?days=30" \
  -H "x-tenant-id: test_tenant"

# Tool Usage
curl -X GET "http://localhost:8000/analytics/tool-usage?days=30" \
  -H "x-tenant-id: test_tenant"

# RAG Quality
curl -X GET "http://localhost:8000/analytics/rag-quality?days=30" \
  -H "x-tenant-id: test_tenant"

# Red Flags
curl -X GET "http://localhost:8000/analytics/redflags?limit=50&days=30" \
  -H "x-tenant-id: test_tenant"

2. Test Admin Endpoints

# Add rule with regex and severity
curl -X POST "http://localhost:8000/admin/rules" \
  -H "x-tenant-id: test_tenant" \
  -H "Content-Type: application/json" \
  -d '{
    "rule": "Block password queries",
    "pattern": ".*password.*",
    "severity": "high",
    "description": "Blocks password-related queries"
  }'

# Get detailed rules
curl -X GET "http://localhost:8000/admin/rules?detailed=true" \
  -H "x-tenant-id: test_tenant"

# Get violations
curl -X GET "http://localhost:8000/admin/violations?limit=50&days=30" \
  -H "x-tenant-id: test_tenant"

# Get tool logs
curl -X GET "http://localhost:8000/admin/tools/logs?tool_name=rag&days=7" \
  -H "x-tenant-id: test_tenant"

3. Test Agent Endpoints

# Agent chat (normal)
curl -X POST "http://localhost:8000/agent/message" \
  -H "Content-Type: application/json" \
  -d '{
    "tenant_id": "test_tenant",
    "message": "What is the company policy?",
    "temperature": 0.0
  }'

# Agent debug
curl -X POST "http://localhost:8000/agent/debug" \
  -H "Content-Type: application/json" \
  -d '{
    "tenant_id": "test_tenant",
    "message": "What is the company policy?",
    "temperature": 0.0
  }'

# Agent plan
curl -X POST "http://localhost:8000/agent/plan" \
  -H "Content-Type: application/json" \
  -d '{
    "tenant_id": "test_tenant",
    "message": "What is the company policy?",
    "temperature": 0.0
  }'

Using Python requests

Create a test script test_api_manual.py:

import requests
import json

BASE_URL = "http://localhost:8000"
TENANT_ID = "test_tenant"

headers = {"x-tenant-id": TENANT_ID}

# Test analytics
print("Testing Analytics Endpoints...")
response = requests.get(f"{BASE_URL}/analytics/overview?days=30", headers=headers)
print(f"Overview: {response.status_code} - {json.dumps(response.json(), indent=2)}")

response = requests.get(f"{BASE_URL}/analytics/tool-usage?days=30", headers=headers)
print(f"Tool Usage: {response.status_code} - {json.dumps(response.json(), indent=2)}")

# Test admin rules
print("\nTesting Admin Rules...")
response = requests.post(
    f"{BASE_URL}/admin/rules",
    headers=headers,
    json={
        "rule": "Block password queries",
        "pattern": ".*password.*",
        "severity": "high"
    }
)
print(f"Add Rule: {response.status_code} - {json.dumps(response.json(), indent=2)}")

response = requests.get(
    f"{BASE_URL}/admin/rules?detailed=true",
    headers=headers
)
print(f"Get Rules: {response.status_code} - {json.dumps(response.json(), indent=2)}")

# Test agent endpoints
print("\nTesting Agent Endpoints...")
response = requests.post(
    f"{BASE_URL}/agent/plan",
    json={
        "tenant_id": TENANT_ID,
        "message": "What is the company policy?",
        "temperature": 0.0
    }
)
print(f"Agent Plan: {response.status_code} - {json.dumps(response.json(), indent=2)}")

Run it:

python test_api_manual.py

End-to-End Testing Workflow

Step 1: Start Backend Services

# Terminal 1: Start FastAPI backend
cd backend/api
uvicorn main:app --port 8000 --reload

# Terminal 2: Start unified MCP server (rag/web/admin tools)
python backend/mcp_server/server.py

# Optional: Start Ollama for LLM
ollama serve

Step 2: Generate Test Data

Run the analytics and rules tests to populate the database:

pytest backend/tests/test_analytics_store.py -v
pytest backend/tests/test_enhanced_admin_rules.py -v

Step 3: Test Agent Flow

  1. Add some admin rules:

    curl -X POST "http://localhost:8000/admin/rules" \
      -H "x-tenant-id: test_tenant" \
      -H "Content-Type: application/json" \
      -d '{"rule": "Block password queries", "pattern": ".*password.*", "severity": "high"}'
    
  2. Send a query that triggers red-flag:

    curl -X POST "http://localhost:8000/agent/message" \
      -H "Content-Type: application/json" \
      -d '{"tenant_id": "test_tenant", "message": "What is my password?"}'
    
  3. Check violations were logged:

    curl -X GET "http://localhost:8000/admin/violations" \
      -H "x-tenant-id: test_tenant"
    
  4. Send normal queries and check analytics:

    curl -X POST "http://localhost:8000/agent/message" \
      -H "Content-Type: application/json" \
      -d '{"tenant_id": "test_tenant", "message": "What is the company policy?"}'
    
    curl -X GET "http://localhost:8000/analytics/overview" \
      -H "x-tenant-id: test_tenant"
    
  5. Use debug endpoint to see reasoning:

    curl -X POST "http://localhost:8000/agent/debug" \
      -H "Content-Type: application/json" \
      -d '{"tenant_id": "test_tenant", "message": "What is the company policy?"}'
    

Step 4: Verify Database

Check that data is being stored:

# SQLite databases are in data/ directory
sqlite3 data/analytics.db "SELECT * FROM tool_usage_events LIMIT 10;"
sqlite3 data/analytics.db "SELECT * FROM redflag_violations LIMIT 10;"
sqlite3 data/admin_rules.db "SELECT * FROM admin_rules;"

Testing Checklist

Analytics Store

  • Tool usage logging works
  • Red-flag violations are logged
  • RAG search events are logged with quality metrics
  • Agent query events are logged
  • Stats can be filtered by time
  • Multiple tenants are isolated

Admin Rules

  • Rules can be added with regex patterns
  • Severity levels work (low/medium/high/critical)
  • Rules without pattern use rule text
  • Disabled rules are not returned
  • Multiple tenants are isolated
  • Regex patterns actually match correctly

API Endpoints

  • /analytics/overview returns correct data
  • /analytics/tool-usage returns stats
  • /analytics/rag-quality returns metrics
  • /admin/rules accepts regex/severity
  • /admin/violations returns violations
  • /admin/tools/logs returns tool usage
  • /agent/debug returns reasoning trace
  • /agent/plan returns tool selection plan
  • Missing tenant_id returns 400

Integration

  • Agent orchestrator logs to analytics
  • Red-flag detector logs violations
  • Tool calls are tracked
  • Multi-step workflows are logged
  • Errors are logged correctly

Common Issues

Database Not Found

  • Ensure data/ directory exists
  • Analytics store will create it automatically

Tests Fail Due to Missing Services

  • Some tests require MCP servers or LLM to be running
  • Mock these services or skip tests if services unavailable
  • Unit tests should work without external services

Import Errors

  • Ensure you're running from project root
  • Check that backend/ is in Python path
  • Install all dependencies: pip install -r requirements.txt

Performance Testing

For large-scale testing:

# Load test analytics store
from backend.api.storage.analytics_store import AnalyticsStore
import time

store = AnalyticsStore()
tenant_id = "load_test_tenant"

start = time.time()
for i in range(1000):
    store.log_tool_usage(tenant_id, "rag", latency_ms=100 + i % 50)
    
elapsed = time.time() - start
print(f"Logged 1000 events in {elapsed:.2f}s ({1000/elapsed:.0f} events/sec)")

# Query performance
start = time.time()
stats = store.get_tool_usage_stats(tenant_id)
elapsed = time.time() - start
print(f"Query took {elapsed*1000:.2f}ms")

Next Steps

  1. Add more test cases for edge cases
  2. Set up CI/CD to run tests automatically
  3. Add performance benchmarks for analytics queries
  4. Create integration test suite that spins up all services
  5. Add E2E tests using Playwright or Selenium for frontend

For questions or issues, check the test files in backend/tests/ or refer to the main README.md.