Sami Marreed
feat: docker-v1 with optimized frontend
0646b18
# CUGA Profiling
This directory contains tools for profiling CUGA digital sales tasks with different configurations and models, extracting performance metrics and LLM call information from Langfuse.
## Directory Structure
```
system_tests/profiling/
β”œβ”€β”€ README.md # This file
β”œβ”€β”€ run_experiment.sh # Main entry point for running experiments
β”œβ”€β”€ serve.sh # HTTP server for viewing results
β”œβ”€β”€ bin/ # Internal scripts
β”‚ β”œβ”€β”€ profile_digital_sales_tasks.py
β”‚ β”œβ”€β”€ run_profiling.sh
β”‚ └── run_experiment.sh
β”œβ”€β”€ config/ # Configuration files
β”‚ β”œβ”€β”€ default_experiment.yaml # Default experiment configuration
β”‚ β”œβ”€β”€ fast_vs_accurate.yaml # Example: Fast vs Accurate comparison
β”‚ └── .secrets.yaml # Secrets file (git-ignored)
β”œβ”€β”€ experiments/ # Experiment results and comparison HTML
β”‚ └── comparison.html
└── reports/ # Individual profiling reports
```
## Quick Start
### 1. Set Up Environment Variables
Create a `.env` file in the project root or export these variables:
```bash
export LANGFUSE_PUBLIC_KEY="pk-your-public-key"
export LANGFUSE_SECRET_KEY="sk-your-secret-key"
export LANGFUSE_HOST="https://cloud.langfuse.com" # Optional
```
### 2. Run an Experiment
The simplest way to run experiments is using the configuration files:
```bash
# Run default experiment (fast vs balanced)
./system_tests/profiling/run_experiment.sh
# Run a specific experiment configuration
./system_tests/profiling/run_experiment.sh --config fast_vs_accurate.yaml
# Run and automatically open results in browser
./system_tests/profiling/run_experiment.sh --config default_experiment.yaml --open
```
### 3. View Results
Results are automatically saved to `system_tests/profiling/experiments/` and can be viewed in the HTML dashboard:
```bash
# Start the server (serves experiments directory)
./system_tests/profiling/serve.sh
# Or start and open browser automatically
./system_tests/profiling/serve.sh --open
# Use a different port
./system_tests/profiling/serve.sh --port 3000
```
## Configuration Files
Configuration files use YAML format with Dynaconf. They define experiments with multiple runs and comparison settings.
### Example Configuration
```yaml
profiling:
configs:
- "settings.openai.toml"
modes:
- "fast"
- "balanced"
tasks:
- "test_get_top_account_by_revenue_stream"
runs: 3
experiment:
name: "fast_vs_balanced"
description: "Compare fast and balanced modes"
runs:
- name: "fast_mode"
test_id: "settings.openai.toml:fast:test_get_top_account_by_revenue_stream"
iterations: 3
output: "experiments/fast_{{timestamp}}.json"
env:
MODEL_NAME: "Azure/gpt-4o" # Set environment variable
- name: "balanced_mode"
test_id: "settings.openai.toml:balanced:test_get_top_account_by_revenue_stream"
iterations: 3
output: "experiments/balanced_{{timestamp}}.json"
env:
MODEL_NAME: null # Unset environment variable
comparison:
generate_html: true
html_output: "experiments/comparison.html"
auto_open: false
```
### Configuration Options
#### Profiling Section
- `configs`: List of configuration files to test (e.g., `settings.openai.toml`)
- `modes`: List of CUGA modes (`fast`, `balanced`, `accurate`)
- `tasks`: List of test tasks to run
- `runs`: Number of iterations per configuration
- `output`: Output directory and filename settings
- `langfuse`: Langfuse connection settings (credentials from env vars)
#### Experiment Section
- `name`: Name of the experiment
- `description`: Description of what's being tested
- `runs`: List of experiment runs to execute
- `name`: Display name for the run
- `test_id`: Specific test to run (format: `config:mode:task`)
- `iterations`: Number of times to run this test
- `output`: Output file path (use `{{timestamp}}` for dynamic naming)
- `env`: (Optional) Environment variables to set/unset for this run
- Set a variable: `VAR_NAME: "value"`
- Unset a variable: `VAR_NAME: null`
- `comparison`: Settings for generating comparison HTML
## Available Test IDs
Test IDs follow the format: `config:mode:task`
**Configurations:**
- `settings.openai.toml`
- `settings.azure.toml`
- `settings.watsonx.toml`
**Modes:**
- `fast`
- `balanced`
- `accurate`
**Tasks:**
- `test_get_top_account_by_revenue_stream`
- `test_list_my_accounts`
- `test_find_vp_sales_active_high_value_accounts`
To list all available test IDs:
```bash
./system_tests/profiling/bin/run_profiling.sh --list-tests
```
## Advanced Usage
### Command Line Interface
You can also use CLI arguments directly:
```bash
# Run specific configuration with CLI args
./system_tests/profiling/bin/run_profiling.sh \
--configs settings.openai.toml \
--modes fast,balanced \
--runs 3
# Run a single test by ID
./system_tests/profiling/bin/run_profiling.sh \
--test-id settings.openai.toml:fast:test_get_top_account_by_revenue_stream \
--runs 5
# Use config file but override runs
./system_tests/profiling/bin/run_profiling.sh \
--config-file default_experiment.yaml \
--runs 5
```
### Direct Python Usage
```bash
# Run with config file
cd /path/to/project
uv run python system_tests/profiling/bin/profile_digital_sales_tasks.py \
--config-file default_experiment.yaml
# Run with CLI arguments
uv run python system_tests/profiling/bin/profile_digital_sales_tasks.py \
--configs settings.openai.toml \
--modes fast \
--tasks test_get_top_account_by_revenue_stream \
--runs 3 \
--output system_tests/profiling/reports/my_report.json
```
## Output
### Profiling Reports
Individual profiling runs generate JSON reports with:
- **Summary Statistics**: Total tests, success rate, timing
- **Configuration Stats**: Performance per config/mode
- **Langfuse Metrics**: LLM calls, tokens, costs, node timings
- **Detailed Results**: Complete test execution details
### Comparison HTML
The comparison HTML (`experiments/comparison.html`) provides:
**Interactive Visualizations:**
- πŸ“Š Execution time comparison charts
- πŸ’° Cost analysis across modes
- 🎯 Token usage visualization
- πŸ”„ LLM calls breakdown
- πŸ“Š Execution time variability (Min/Avg/Max with range and std dev)
- ⚑ Time breakdown (generation vs processing)
- πŸ“ˆ Performance radar chart (normalized comparison)
**Detailed Tables:**
- Summary view of all experiments
- Configuration statistics table
- Per-run Langfuse metrics
- Aggregated metrics across runs
**Features:**
- Tab navigation between charts and tables
- Color-coded modes (Fast=green, Balanced=blue, Accurate=orange)
- Interactive tooltips on hover
- Automatic loading of all JSON files in the directory
- Modern, responsive design
## Creating Custom Experiments
1. Create a new YAML file in `system_tests/profiling/config/`:
```bash
cp system_tests/profiling/config/default_experiment.yaml system_tests/profiling/config/my_experiment.yaml
```
2. Edit the configuration to match your experiment needs
3. Run your experiment:
```bash
./system_tests/profiling/run_experiment.sh --config my_experiment.yaml
```
## Tips
- Use `{{timestamp}}` in output paths for unique filenames
- CLI arguments override config file settings
- The HTML comparison automatically picks up new JSON files
- Set credentials in `.env` or `config/.secrets.yaml`
- Use `--open` flag to automatically open results in browser
- Use `env` in experiment runs to set/unset environment variables per run
- Set `env.VAR: null` to explicitly unset an environment variable
## Troubleshooting
### Port Conflicts
The scripts automatically clean up processes on ports 8000, 8001, 7860.
### Missing Credentials
Ensure `LANGFUSE_PUBLIC_KEY` and `LANGFUSE_SECRET_KEY` are set:
```bash
# Check if set
echo $LANGFUSE_PUBLIC_KEY
echo $LANGFUSE_SECRET_KEY
# Set temporarily
export LANGFUSE_PUBLIC_KEY="pk-..."
export LANGFUSE_SECRET_KEY="sk-..."
# Or add to .env file
```
### Configuration Not Found
If a config file isn't found, check:
- The file exists in `system_tests/profiling/config/`
- The filename is correct (case-sensitive)
- You're running from the correct directory
## Examples
### Compare Fast vs Balanced (3 runs each)
```bash
./system_tests/profiling/run_experiment.sh --config default_experiment.yaml
```
### Compare Providers (OpenAI vs Azure vs WatsonX)
```bash
./system_tests/profiling/run_experiment.sh --config providers_comparison.yaml
```
This compares different LLM providers using the same mode (balanced).
### Compare All Modes with OpenAI
Create `system_tests/profiling/config/all_modes.yaml`:
```yaml
experiment:
name: "all_modes_comparison"
runs:
- name: "fast"
test_id: "settings.openai.toml:fast:test_get_top_account_by_revenue_stream"
iterations: 5
output: "experiments/fast_{{timestamp}}.json"
- name: "balanced"
test_id: "settings.openai.toml:balanced:test_get_top_account_by_revenue_stream"
iterations: 5
output: "experiments/balanced_{{timestamp}}.json"
- name: "accurate"
test_id: "settings.openai.toml:accurate:test_get_top_account_by_revenue_stream"
iterations: 5
output: "experiments/accurate_{{timestamp}}.json"
```
Then run:
```bash
./system_tests/profiling/run_experiment.sh --config all_modes.yaml --open
```
### Full Matrix Comparison (Providers Γ— Modes)
```bash
./system_tests/profiling/run_experiment.sh --config full_matrix_comparison.yaml
```
This creates a comprehensive comparison across multiple providers and modes.