Spaces:
Runtime error
CUGA Profiling
This directory contains tools for profiling CUGA digital sales tasks with different configurations and models, extracting performance metrics and LLM call information from Langfuse.
Directory Structure
system_tests/profiling/
βββ README.md # This file
βββ run_experiment.sh # Main entry point for running experiments
βββ serve.sh # HTTP server for viewing results
βββ bin/ # Internal scripts
β βββ profile_digital_sales_tasks.py
β βββ run_profiling.sh
β βββ run_experiment.sh
βββ config/ # Configuration files
β βββ default_experiment.yaml # Default experiment configuration
β βββ fast_vs_accurate.yaml # Example: Fast vs Accurate comparison
β βββ .secrets.yaml # Secrets file (git-ignored)
βββ experiments/ # Experiment results and comparison HTML
β βββ comparison.html
βββ reports/ # Individual profiling reports
Quick Start
1. Set Up Environment Variables
Create a .env file in the project root or export these variables:
export LANGFUSE_PUBLIC_KEY="pk-your-public-key"
export LANGFUSE_SECRET_KEY="sk-your-secret-key"
export LANGFUSE_HOST="https://cloud.langfuse.com" # Optional
2. Run an Experiment
The simplest way to run experiments is using the configuration files:
# Run default experiment (fast vs balanced)
./system_tests/profiling/run_experiment.sh
# Run a specific experiment configuration
./system_tests/profiling/run_experiment.sh --config fast_vs_accurate.yaml
# Run and automatically open results in browser
./system_tests/profiling/run_experiment.sh --config default_experiment.yaml --open
3. View Results
Results are automatically saved to system_tests/profiling/experiments/ and can be viewed in the HTML dashboard:
# Start the server (serves experiments directory)
./system_tests/profiling/serve.sh
# Or start and open browser automatically
./system_tests/profiling/serve.sh --open
# Use a different port
./system_tests/profiling/serve.sh --port 3000
Configuration Files
Configuration files use YAML format with Dynaconf. They define experiments with multiple runs and comparison settings.
Example Configuration
profiling:
configs:
- "settings.openai.toml"
modes:
- "fast"
- "balanced"
tasks:
- "test_get_top_account_by_revenue_stream"
runs: 3
experiment:
name: "fast_vs_balanced"
description: "Compare fast and balanced modes"
runs:
- name: "fast_mode"
test_id: "settings.openai.toml:fast:test_get_top_account_by_revenue_stream"
iterations: 3
output: "experiments/fast_{{timestamp}}.json"
env:
MODEL_NAME: "Azure/gpt-4o" # Set environment variable
- name: "balanced_mode"
test_id: "settings.openai.toml:balanced:test_get_top_account_by_revenue_stream"
iterations: 3
output: "experiments/balanced_{{timestamp}}.json"
env:
MODEL_NAME: null # Unset environment variable
comparison:
generate_html: true
html_output: "experiments/comparison.html"
auto_open: false
Configuration Options
Profiling Section
configs: List of configuration files to test (e.g.,settings.openai.toml)modes: List of CUGA modes (fast,balanced,accurate)tasks: List of test tasks to runruns: Number of iterations per configurationoutput: Output directory and filename settingslangfuse: Langfuse connection settings (credentials from env vars)
Experiment Section
name: Name of the experimentdescription: Description of what's being testedruns: List of experiment runs to executename: Display name for the runtest_id: Specific test to run (format:config:mode:task)iterations: Number of times to run this testoutput: Output file path (use{{timestamp}}for dynamic naming)env: (Optional) Environment variables to set/unset for this run- Set a variable:
VAR_NAME: "value" - Unset a variable:
VAR_NAME: null
- Set a variable:
comparison: Settings for generating comparison HTML
Available Test IDs
Test IDs follow the format: config:mode:task
Configurations:
settings.openai.tomlsettings.azure.tomlsettings.watsonx.toml
Modes:
fastbalancedaccurate
Tasks:
test_get_top_account_by_revenue_streamtest_list_my_accountstest_find_vp_sales_active_high_value_accounts
To list all available test IDs:
./system_tests/profiling/bin/run_profiling.sh --list-tests
Advanced Usage
Command Line Interface
You can also use CLI arguments directly:
# Run specific configuration with CLI args
./system_tests/profiling/bin/run_profiling.sh \
--configs settings.openai.toml \
--modes fast,balanced \
--runs 3
# Run a single test by ID
./system_tests/profiling/bin/run_profiling.sh \
--test-id settings.openai.toml:fast:test_get_top_account_by_revenue_stream \
--runs 5
# Use config file but override runs
./system_tests/profiling/bin/run_profiling.sh \
--config-file default_experiment.yaml \
--runs 5
Direct Python Usage
# Run with config file
cd /path/to/project
uv run python system_tests/profiling/bin/profile_digital_sales_tasks.py \
--config-file default_experiment.yaml
# Run with CLI arguments
uv run python system_tests/profiling/bin/profile_digital_sales_tasks.py \
--configs settings.openai.toml \
--modes fast \
--tasks test_get_top_account_by_revenue_stream \
--runs 3 \
--output system_tests/profiling/reports/my_report.json
Output
Profiling Reports
Individual profiling runs generate JSON reports with:
- Summary Statistics: Total tests, success rate, timing
- Configuration Stats: Performance per config/mode
- Langfuse Metrics: LLM calls, tokens, costs, node timings
- Detailed Results: Complete test execution details
Comparison HTML
The comparison HTML (experiments/comparison.html) provides:
Interactive Visualizations:
- π Execution time comparison charts
- π° Cost analysis across modes
- π― Token usage visualization
- π LLM calls breakdown
- π Execution time variability (Min/Avg/Max with range and std dev)
- β‘ Time breakdown (generation vs processing)
- π Performance radar chart (normalized comparison)
Detailed Tables:
- Summary view of all experiments
- Configuration statistics table
- Per-run Langfuse metrics
- Aggregated metrics across runs
Features:
- Tab navigation between charts and tables
- Color-coded modes (Fast=green, Balanced=blue, Accurate=orange)
- Interactive tooltips on hover
- Automatic loading of all JSON files in the directory
- Modern, responsive design
Creating Custom Experiments
- Create a new YAML file in
system_tests/profiling/config/:
cp system_tests/profiling/config/default_experiment.yaml system_tests/profiling/config/my_experiment.yaml
Edit the configuration to match your experiment needs
Run your experiment:
./system_tests/profiling/run_experiment.sh --config my_experiment.yaml
Tips
- Use
{{timestamp}}in output paths for unique filenames - CLI arguments override config file settings
- The HTML comparison automatically picks up new JSON files
- Set credentials in
.envorconfig/.secrets.yaml - Use
--openflag to automatically open results in browser - Use
envin experiment runs to set/unset environment variables per run - Set
env.VAR: nullto explicitly unset an environment variable
Troubleshooting
Port Conflicts
The scripts automatically clean up processes on ports 8000, 8001, 7860.
Missing Credentials
Ensure LANGFUSE_PUBLIC_KEY and LANGFUSE_SECRET_KEY are set:
# Check if set
echo $LANGFUSE_PUBLIC_KEY
echo $LANGFUSE_SECRET_KEY
# Set temporarily
export LANGFUSE_PUBLIC_KEY="pk-..."
export LANGFUSE_SECRET_KEY="sk-..."
# Or add to .env file
Configuration Not Found
If a config file isn't found, check:
- The file exists in
system_tests/profiling/config/ - The filename is correct (case-sensitive)
- You're running from the correct directory
Examples
Compare Fast vs Balanced (3 runs each)
./system_tests/profiling/run_experiment.sh --config default_experiment.yaml
Compare Providers (OpenAI vs Azure vs WatsonX)
./system_tests/profiling/run_experiment.sh --config providers_comparison.yaml
This compares different LLM providers using the same mode (balanced).
Compare All Modes with OpenAI
Create system_tests/profiling/config/all_modes.yaml:
experiment:
name: "all_modes_comparison"
runs:
- name: "fast"
test_id: "settings.openai.toml:fast:test_get_top_account_by_revenue_stream"
iterations: 5
output: "experiments/fast_{{timestamp}}.json"
- name: "balanced"
test_id: "settings.openai.toml:balanced:test_get_top_account_by_revenue_stream"
iterations: 5
output: "experiments/balanced_{{timestamp}}.json"
- name: "accurate"
test_id: "settings.openai.toml:accurate:test_get_top_account_by_revenue_stream"
iterations: 5
output: "experiments/accurate_{{timestamp}}.json"
Then run:
./system_tests/profiling/run_experiment.sh --config all_modes.yaml --open
Full Matrix Comparison (Providers Γ Modes)
./system_tests/profiling/run_experiment.sh --config full_matrix_comparison.yaml
This creates a comprehensive comparison across multiple providers and modes.