Spaces:
Running
Running
File size: 9,700 Bytes
0646b18 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 |
# CUGA Profiling
This directory contains tools for profiling CUGA digital sales tasks with different configurations and models, extracting performance metrics and LLM call information from Langfuse.
## Directory Structure
```
system_tests/profiling/
βββ README.md # This file
βββ run_experiment.sh # Main entry point for running experiments
βββ serve.sh # HTTP server for viewing results
βββ bin/ # Internal scripts
β βββ profile_digital_sales_tasks.py
β βββ run_profiling.sh
β βββ run_experiment.sh
βββ config/ # Configuration files
β βββ default_experiment.yaml # Default experiment configuration
β βββ fast_vs_accurate.yaml # Example: Fast vs Accurate comparison
β βββ .secrets.yaml # Secrets file (git-ignored)
βββ experiments/ # Experiment results and comparison HTML
β βββ comparison.html
βββ reports/ # Individual profiling reports
```
## Quick Start
### 1. Set Up Environment Variables
Create a `.env` file in the project root or export these variables:
```bash
export LANGFUSE_PUBLIC_KEY="pk-your-public-key"
export LANGFUSE_SECRET_KEY="sk-your-secret-key"
export LANGFUSE_HOST="https://cloud.langfuse.com" # Optional
```
### 2. Run an Experiment
The simplest way to run experiments is using the configuration files:
```bash
# Run default experiment (fast vs balanced)
./system_tests/profiling/run_experiment.sh
# Run a specific experiment configuration
./system_tests/profiling/run_experiment.sh --config fast_vs_accurate.yaml
# Run and automatically open results in browser
./system_tests/profiling/run_experiment.sh --config default_experiment.yaml --open
```
### 3. View Results
Results are automatically saved to `system_tests/profiling/experiments/` and can be viewed in the HTML dashboard:
```bash
# Start the server (serves experiments directory)
./system_tests/profiling/serve.sh
# Or start and open browser automatically
./system_tests/profiling/serve.sh --open
# Use a different port
./system_tests/profiling/serve.sh --port 3000
```
## Configuration Files
Configuration files use YAML format with Dynaconf. They define experiments with multiple runs and comparison settings.
### Example Configuration
```yaml
profiling:
configs:
- "settings.openai.toml"
modes:
- "fast"
- "balanced"
tasks:
- "test_get_top_account_by_revenue_stream"
runs: 3
experiment:
name: "fast_vs_balanced"
description: "Compare fast and balanced modes"
runs:
- name: "fast_mode"
test_id: "settings.openai.toml:fast:test_get_top_account_by_revenue_stream"
iterations: 3
output: "experiments/fast_{{timestamp}}.json"
env:
MODEL_NAME: "Azure/gpt-4o" # Set environment variable
- name: "balanced_mode"
test_id: "settings.openai.toml:balanced:test_get_top_account_by_revenue_stream"
iterations: 3
output: "experiments/balanced_{{timestamp}}.json"
env:
MODEL_NAME: null # Unset environment variable
comparison:
generate_html: true
html_output: "experiments/comparison.html"
auto_open: false
```
### Configuration Options
#### Profiling Section
- `configs`: List of configuration files to test (e.g., `settings.openai.toml`)
- `modes`: List of CUGA modes (`fast`, `balanced`, `accurate`)
- `tasks`: List of test tasks to run
- `runs`: Number of iterations per configuration
- `output`: Output directory and filename settings
- `langfuse`: Langfuse connection settings (credentials from env vars)
#### Experiment Section
- `name`: Name of the experiment
- `description`: Description of what's being tested
- `runs`: List of experiment runs to execute
- `name`: Display name for the run
- `test_id`: Specific test to run (format: `config:mode:task`)
- `iterations`: Number of times to run this test
- `output`: Output file path (use `{{timestamp}}` for dynamic naming)
- `env`: (Optional) Environment variables to set/unset for this run
- Set a variable: `VAR_NAME: "value"`
- Unset a variable: `VAR_NAME: null`
- `comparison`: Settings for generating comparison HTML
## Available Test IDs
Test IDs follow the format: `config:mode:task`
**Configurations:**
- `settings.openai.toml`
- `settings.azure.toml`
- `settings.watsonx.toml`
**Modes:**
- `fast`
- `balanced`
- `accurate`
**Tasks:**
- `test_get_top_account_by_revenue_stream`
- `test_list_my_accounts`
- `test_find_vp_sales_active_high_value_accounts`
To list all available test IDs:
```bash
./system_tests/profiling/bin/run_profiling.sh --list-tests
```
## Advanced Usage
### Command Line Interface
You can also use CLI arguments directly:
```bash
# Run specific configuration with CLI args
./system_tests/profiling/bin/run_profiling.sh \
--configs settings.openai.toml \
--modes fast,balanced \
--runs 3
# Run a single test by ID
./system_tests/profiling/bin/run_profiling.sh \
--test-id settings.openai.toml:fast:test_get_top_account_by_revenue_stream \
--runs 5
# Use config file but override runs
./system_tests/profiling/bin/run_profiling.sh \
--config-file default_experiment.yaml \
--runs 5
```
### Direct Python Usage
```bash
# Run with config file
cd /path/to/project
uv run python system_tests/profiling/bin/profile_digital_sales_tasks.py \
--config-file default_experiment.yaml
# Run with CLI arguments
uv run python system_tests/profiling/bin/profile_digital_sales_tasks.py \
--configs settings.openai.toml \
--modes fast \
--tasks test_get_top_account_by_revenue_stream \
--runs 3 \
--output system_tests/profiling/reports/my_report.json
```
## Output
### Profiling Reports
Individual profiling runs generate JSON reports with:
- **Summary Statistics**: Total tests, success rate, timing
- **Configuration Stats**: Performance per config/mode
- **Langfuse Metrics**: LLM calls, tokens, costs, node timings
- **Detailed Results**: Complete test execution details
### Comparison HTML
The comparison HTML (`experiments/comparison.html`) provides:
**Interactive Visualizations:**
- π Execution time comparison charts
- π° Cost analysis across modes
- π― Token usage visualization
- π LLM calls breakdown
- π Execution time variability (Min/Avg/Max with range and std dev)
- β‘ Time breakdown (generation vs processing)
- π Performance radar chart (normalized comparison)
**Detailed Tables:**
- Summary view of all experiments
- Configuration statistics table
- Per-run Langfuse metrics
- Aggregated metrics across runs
**Features:**
- Tab navigation between charts and tables
- Color-coded modes (Fast=green, Balanced=blue, Accurate=orange)
- Interactive tooltips on hover
- Automatic loading of all JSON files in the directory
- Modern, responsive design
## Creating Custom Experiments
1. Create a new YAML file in `system_tests/profiling/config/`:
```bash
cp system_tests/profiling/config/default_experiment.yaml system_tests/profiling/config/my_experiment.yaml
```
2. Edit the configuration to match your experiment needs
3. Run your experiment:
```bash
./system_tests/profiling/run_experiment.sh --config my_experiment.yaml
```
## Tips
- Use `{{timestamp}}` in output paths for unique filenames
- CLI arguments override config file settings
- The HTML comparison automatically picks up new JSON files
- Set credentials in `.env` or `config/.secrets.yaml`
- Use `--open` flag to automatically open results in browser
- Use `env` in experiment runs to set/unset environment variables per run
- Set `env.VAR: null` to explicitly unset an environment variable
## Troubleshooting
### Port Conflicts
The scripts automatically clean up processes on ports 8000, 8001, 7860.
### Missing Credentials
Ensure `LANGFUSE_PUBLIC_KEY` and `LANGFUSE_SECRET_KEY` are set:
```bash
# Check if set
echo $LANGFUSE_PUBLIC_KEY
echo $LANGFUSE_SECRET_KEY
# Set temporarily
export LANGFUSE_PUBLIC_KEY="pk-..."
export LANGFUSE_SECRET_KEY="sk-..."
# Or add to .env file
```
### Configuration Not Found
If a config file isn't found, check:
- The file exists in `system_tests/profiling/config/`
- The filename is correct (case-sensitive)
- You're running from the correct directory
## Examples
### Compare Fast vs Balanced (3 runs each)
```bash
./system_tests/profiling/run_experiment.sh --config default_experiment.yaml
```
### Compare Providers (OpenAI vs Azure vs WatsonX)
```bash
./system_tests/profiling/run_experiment.sh --config providers_comparison.yaml
```
This compares different LLM providers using the same mode (balanced).
### Compare All Modes with OpenAI
Create `system_tests/profiling/config/all_modes.yaml`:
```yaml
experiment:
name: "all_modes_comparison"
runs:
- name: "fast"
test_id: "settings.openai.toml:fast:test_get_top_account_by_revenue_stream"
iterations: 5
output: "experiments/fast_{{timestamp}}.json"
- name: "balanced"
test_id: "settings.openai.toml:balanced:test_get_top_account_by_revenue_stream"
iterations: 5
output: "experiments/balanced_{{timestamp}}.json"
- name: "accurate"
test_id: "settings.openai.toml:accurate:test_get_top_account_by_revenue_stream"
iterations: 5
output: "experiments/accurate_{{timestamp}}.json"
```
Then run:
```bash
./system_tests/profiling/run_experiment.sh --config all_modes.yaml --open
```
### Full Matrix Comparison (Providers Γ Modes)
```bash
./system_tests/profiling/run_experiment.sh --config full_matrix_comparison.yaml
```
This creates a comprehensive comparison across multiple providers and modes.
|