# A/B Test Predictor API - Updated Usage Guide

## Overview

The A/B Test Predictor API now accepts **both image inputs and categorical data** directly from API calls. All AI-powered auto-categorization features (Perplexity and Gemini API calls) have been removed for a more streamlined, efficient prediction service.

## What Changed

### ✅ Added
- Direct categorical data input via API
- Simplified prediction endpoint that accepts both images and metadata
- Cleaner JSON response format with confidence scores

### ❌ Removed
- Perplexity API integration (auto-categorization)
- Gemini API integration (pattern detection)
- All external AI API calls
- `requests` dependency
- Unnecessary imports (`base64`, `BytesIO`)

## API Endpoint

### `predict_with_categorical_data`

**Purpose**: Make A/B test predictions with provided images and categorical data.

**Inputs**:
1. `control_image` (numpy array/image): The control version image
2. `variant_image` (numpy array/image): The variant version image
3. `business_model` (string): One of:
   - E-Commerce
   - Lead Generation
   - Other*
   - SaaS

4. `customer_type` (string): One of:
   - B2B
   - B2C
   - Both
   - Other*

5. `conversion_type` (string): One of:
   - Direct Purchase
   - High-Intent Lead Gen
   - Info/Content Lead Gen
   - Location Search
   - Non-Profit/Community
   - Other Conversion

6. `industry` (string): One of:
   - Automotive & Transportation
   - B2B Services
   - B2B Software & Tech
   - Consumer Services
   - Consumer Software & Apps
   - Education
   - Finance, Insurance & Real Estate
   - Food, Hospitality & Travel
   - Health & Wellness
   - Industrial & Manufacturing
   - Media & Entertainment
   - Non-Profit & Government
   - Other
   - Retail & E-commerce

7. `page_type` (string): One of:
   - Awareness & Discovery
   - Consideration & Evaluation
   - Conversion
   - Internal & Navigation
   - Post-Conversion & Other

**Output**: JSON object with the following structure:

```json
{
  "predictionResults": {
    "probability": "0.682",
    "modelConfidence": "66.1",
    "trainingDataSamples": 14634,
    "totalPredictions": 1626,
    "correctPredictions": 1074,
    "totalWinPrediction": 667,
    "totalLosePrediction": 959
  },
  "providedCategories": {
    "businessModel": "SaaS",
    "customerType": "B2B",
    "conversionType": "High-Intent Lead Gen",
    "industry": "B2B Software & Tech",
    "pageType": "Awareness & Discovery"
  },
  "processingInfo": {
    "totalProcessingTime": "2.34s",
    "confidenceSource": "B2B Software & Tech | Awareness & Discovery"
  }
}
```

## Response Fields Explained

### predictionResults
- **probability**: Win probability for the variant (0-1 scale, >0.5 means variant wins)
- **modelConfidence**: Model accuracy percentage based on historical data for this category combination
- **trainingDataSamples**: Number of training samples used for this category combination
- **totalPredictions**: Total test predictions made for this category combination
- **correctPredictions**: Number of correct predictions for this category combination
- **totalWinPrediction**: Number of actual wins in the historical data
- **totalLosePrediction**: Number of actual losses in the historical data

### providedCategories
- Echo back of the categorical inputs provided by the user

### processingInfo
- **totalProcessingTime**: Time taken for the prediction
- **confidenceSource**: The Industry + Page Type combination used for confidence scoring

## Confidence Scoring

Confidence scores are based on **Industry + Page Type combinations** from historical A/B test data. This provides more reliable confidence metrics compared to using all 5 categorical features, as these 2-feature combinations have higher sample counts (average ~160 samples per combination).

## Example Usage (Python)

```python
import requests
import numpy as np
from PIL import Image

# Load your images
control_img = Image.open("control.jpg")
variant_img = Image.open("variant.jpg")

# Convert to numpy arrays
control_array = np.array(control_img)
variant_array = np.array(variant_img)

# Make prediction (via Gradio interface or direct function call)
result = predict_with_categorical_data(
    control_image=control_array,
    variant_image=variant_array,
    business_model="SaaS",
    customer_type="B2B",
    conversion_type="High-Intent Lead Gen",
    industry="B2B Software & Tech",
    page_type="Awareness & Discovery"
)

print(f"Win Probability: {result['predictionResults']['probability']}")
print(f"Model Confidence: {result['predictionResults']['modelConfidence']}%")
print(f"Based on {result['predictionResults']['trainingDataSamples']} training samples")
```

## Gradio Interface

The application now has two main tabs:

1. **🎯 API Prediction**: Primary interface for predictions with categorical data
2. **📋 Manual Selection**: Alternative interface with dropdown menus
3. **Batch Prediction from CSV**: For processing multiple tests at once

## Performance

- Average prediction time: 2-4 seconds (GPU-accelerated)
- No external API latency (all processing is local)
- Supports concurrent requests with queue management
- Optimized for 4x L4 GPU setup

## Migration Notes

If you were previously using the auto-categorization feature:

1. You now need to provide categorical data directly
2. The response format has changed slightly (see above)
3. Pattern detection is no longer included in the response
4. Processing is now faster without external API calls

## Need Help?

For questions or issues, refer to:
- `README.md` - General project documentation
- `setup_instructions.md` - Setup and deployment guide
- `confidence_scores.json` - Historical confidence data