Spaces:

nothingworry
/

IntegraChat

Sleeping

App Files Files Community

IntegraChat / TESTING_GUIDE.md

nothingworry

all the thing

78b6d7b 24 days ago

preview code

raw

history blame

11.1 kB

IntegraChat Testing Guide

This guide explains how to test all the new features and improvements in IntegraChat.

Prerequisites

Install Dependencies
```
pip install -r requirements.txt
```
Environment Setup
- Create a .env file or set environment variables
- Optional: Set up Ollama for LLM testing
- Optional: Set up Supabase for production analytics

Test Structure

1. Unit Tests

Run unit tests for individual components:

# Run all unit tests
pytest backend/tests/

# Run specific test files
pytest backend/tests/test_analytics_store.py -v
pytest backend/tests/test_enhanced_admin_rules.py -v
pytest backend/tests/test_api_endpoints.py -v

# Run with coverage
pytest backend/tests/ --cov=backend/api --cov-report=html

2. Integration Tests

Test API endpoints with the FastAPI test client:

pytest backend/tests/test_api_endpoints.py -v

Note: Some integration tests may fail if MCP servers or LLM are not running. That's expected.

3. Manual Testing Scripts

Create test data and verify functionality manually:

A. Test Analytics Store

python -c "
from backend.api.storage.analytics_store import AnalyticsStore
import time

store = AnalyticsStore()

# Log tool usage
store.log_tool_usage('test_tenant', 'rag', latency_ms=150, tokens_used=500, success=True)
store.log_tool_usage('test_tenant', 'web', latency_ms=80, success=True)

# Log red-flag violation
store.log_redflag_violation(
    'test_tenant', 
    'rule1', 
    '.*password.*', 
    'high',
    'password123',
    confidence=0.95
)

# Log RAG search
store.log_rag_search('test_tenant', 'test query', hits_count=5, avg_score=0.85, top_score=0.92)

# Log agent query
store.log_agent_query('test_tenant', 'test message', intent='rag', tools_used=['rag', 'llm'], total_tokens=1000)

# Get stats
print('Tool Usage:', store.get_tool_usage_stats('test_tenant'))
print('Violations:', store.get_redflag_violations('test_tenant'))
print('Activity:', store.get_activity_summary('test_tenant'))
print('RAG Quality:', store.get_rag_quality_metrics('test_tenant'))
"

B. Test Admin Rules with Regex

python -c "
from backend.api.storage.rules_store import RulesStore
import re

store = RulesStore()

# Add rule with regex pattern
store.add_rule(
    'test_tenant',
    'Block password queries',
    pattern='.*password.*|.*pwd.*',
    severity='high',
    description='Blocks password-related queries'
)

# Get detailed rules
rules = store.get_rules_detailed('test_tenant')
print('Rules:', rules)

# Test regex matching
pattern = rules[0]['pattern']
regex = re.compile(pattern, re.IGNORECASE)
test_text = 'What is my password?'
match = regex.search(test_text)
print(f'Match for \"{test_text}\": {match is not None}')
"

API Endpoint Testing

Using curl

1. Test Analytics Endpoints

# Overview
curl -X GET "http://localhost:8000/analytics/overview?days=30" \
  -H "x-tenant-id: test_tenant"

# Tool Usage
curl -X GET "http://localhost:8000/analytics/tool-usage?days=30" \
  -H "x-tenant-id: test_tenant"

# RAG Quality
curl -X GET "http://localhost:8000/analytics/rag-quality?days=30" \
  -H "x-tenant-id: test_tenant"

# Red Flags
curl -X GET "http://localhost:8000/analytics/redflags?limit=50&days=30" \
  -H "x-tenant-id: test_tenant"

2. Test Admin Endpoints

# Add rule with regex and severity
curl -X POST "http://localhost:8000/admin/rules" \
  -H "x-tenant-id: test_tenant" \
  -H "Content-Type: application/json" \
  -d '{
    "rule": "Block password queries",
    "pattern": ".*password.*",
    "severity": "high",
    "description": "Blocks password-related queries"
  }'

# Get detailed rules
curl -X GET "http://localhost:8000/admin/rules?detailed=true" \
  -H "x-tenant-id: test_tenant"

# Get violations
curl -X GET "http://localhost:8000/admin/violations?limit=50&days=30" \
  -H "x-tenant-id: test_tenant"

# Get tool logs
curl -X GET "http://localhost:8000/admin/tools/logs?tool_name=rag&days=7" \
  -H "x-tenant-id: test_tenant"

3. Test Agent Endpoints

# Agent chat (normal)
curl -X POST "http://localhost:8000/agent/message" \
  -H "Content-Type: application/json" \
  -d '{
    "tenant_id": "test_tenant",
    "message": "What is the company policy?",
    "temperature": 0.0
  }'

# Agent debug
curl -X POST "http://localhost:8000/agent/debug" \
  -H "Content-Type: application/json" \
  -d '{
    "tenant_id": "test_tenant",
    "message": "What is the company policy?",
    "temperature": 0.0
  }'

# Agent plan
curl -X POST "http://localhost:8000/agent/plan" \
  -H "Content-Type: application/json" \
  -d '{
    "tenant_id": "test_tenant",
    "message": "What is the company policy?",
    "temperature": 0.0
  }'

Using Python requests

Create a test script test_api_manual.py:

import requests
import json

BASE_URL = "http://localhost:8000"
TENANT_ID = "test_tenant"

headers = {"x-tenant-id": TENANT_ID}

# Test analytics
print("Testing Analytics Endpoints...")
response = requests.get(f"{BASE_URL}/analytics/overview?days=30", headers=headers)
print(f"Overview: {response.status_code} - {json.dumps(response.json(), indent=2)}")

response = requests.get(f"{BASE_URL}/analytics/tool-usage?days=30", headers=headers)
print(f"Tool Usage: {response.status_code} - {json.dumps(response.json(), indent=2)}")

# Test admin rules
print("\nTesting Admin Rules...")
response = requests.post(
    f"{BASE_URL}/admin/rules",
    headers=headers,
    json={
        "rule": "Block password queries",
        "pattern": ".*password.*",
        "severity": "high"
    }
)
print(f"Add Rule: {response.status_code} - {json.dumps(response.json(), indent=2)}")

response = requests.get(
    f"{BASE_URL}/admin/rules?detailed=true",
    headers=headers
)
print(f"Get Rules: {response.status_code} - {json.dumps(response.json(), indent=2)}")

# Test agent endpoints
print("\nTesting Agent Endpoints...")
response = requests.post(
    f"{BASE_URL}/agent/plan",
    json={
        "tenant_id": TENANT_ID,
        "message": "What is the company policy?",
        "temperature": 0.0
    }
)
print(f"Agent Plan: {response.status_code} - {json.dumps(response.json(), indent=2)}")

Run it:

python test_api_manual.py

End-to-End Testing Workflow

Step 1: Start Backend Services

# Terminal 1: Start FastAPI backend
cd backend/api
uvicorn main:app --port 8000 --reload

# Terminal 2: Start unified MCP server (rag/web/admin tools)
python backend/mcp_server/server.py

# Optional: Start Ollama for LLM
ollama serve

Step 2: Generate Test Data

Run the analytics and rules tests to populate the database:

pytest backend/tests/test_analytics_store.py -v
pytest backend/tests/test_enhanced_admin_rules.py -v

Step 3: Test Agent Flow

Add some admin rules:

curl -X POST "http://localhost:8000/admin/rules" \
  -H "x-tenant-id: test_tenant" \
  -H "Content-Type: application/json" \
  -d '{"rule": "Block password queries", "pattern": ".*password.*", "severity": "high"}'

Send a query that triggers red-flag:

curl -X POST "http://localhost:8000/agent/message" \
  -H "Content-Type: application/json" \
  -d '{"tenant_id": "test_tenant", "message": "What is my password?"}'

Check violations were logged:

curl -X GET "http://localhost:8000/admin/violations" \
  -H "x-tenant-id: test_tenant"

Send normal queries and check analytics:

curl -X POST "http://localhost:8000/agent/message" \
  -H "Content-Type: application/json" \
  -d '{"tenant_id": "test_tenant", "message": "What is the company policy?"}'

curl -X GET "http://localhost:8000/analytics/overview" \
  -H "x-tenant-id: test_tenant"

Use debug endpoint to see reasoning:

curl -X POST "http://localhost:8000/agent/debug" \
  -H "Content-Type: application/json" \
  -d '{"tenant_id": "test_tenant", "message": "What is the company policy?"}'

Step 4: Verify Database

Check that data is being stored:

# SQLite databases are in data/ directory
sqlite3 data/analytics.db "SELECT * FROM tool_usage_events LIMIT 10;"
sqlite3 data/analytics.db "SELECT * FROM redflag_violations LIMIT 10;"
sqlite3 data/admin_rules.db "SELECT * FROM admin_rules;"

Testing Checklist

Analytics Store

Tool usage logging works
Red-flag violations are logged
RAG search events are logged with quality metrics
Agent query events are logged
Stats can be filtered by time
Multiple tenants are isolated

Admin Rules

Rules can be added with regex patterns
Severity levels work (low/medium/high/critical)
Rules without pattern use rule text
Disabled rules are not returned
Multiple tenants are isolated
Regex patterns actually match correctly

API Endpoints

/analytics/overview returns correct data
/analytics/tool-usage returns stats
/analytics/rag-quality returns metrics
/admin/rules accepts regex/severity
/admin/violations returns violations
/admin/tools/logs returns tool usage
/agent/debug returns reasoning trace
/agent/plan returns tool selection plan
Missing tenant_id returns 400

Integration

Agent orchestrator logs to analytics
Red-flag detector logs violations
Tool calls are tracked
Multi-step workflows are logged
Errors are logged correctly

Common Issues

Database Not Found

Ensure data/ directory exists
Analytics store will create it automatically

Tests Fail Due to Missing Services

Some tests require MCP servers or LLM to be running
Mock these services or skip tests if services unavailable
Unit tests should work without external services

Import Errors

Ensure you're running from project root
Check that backend/ is in Python path
Install all dependencies: pip install -r requirements.txt

Performance Testing

For large-scale testing:

# Load test analytics store
from backend.api.storage.analytics_store import AnalyticsStore
import time

store = AnalyticsStore()
tenant_id = "load_test_tenant"

start = time.time()
for i in range(1000):
    store.log_tool_usage(tenant_id, "rag", latency_ms=100 + i % 50)
    
elapsed = time.time() - start
print(f"Logged 1000 events in {elapsed:.2f}s ({1000/elapsed:.0f} events/sec)")

# Query performance
start = time.time()
stats = store.get_tool_usage_stats(tenant_id)
elapsed = time.time() - start
print(f"Query took {elapsed*1000:.2f}ms")

Next Steps

Add more test cases for edge cases
Set up CI/CD to run tests automatically
Add performance benchmarks for analytics queries
Create integration test suite that spins up all services
Add E2E tests using Playwright or Selenium for frontend

For questions or issues, check the test files in backend/tests/ or refer to the main README.md.