File size: 9,700 Bytes
0646b18
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
# CUGA Profiling

This directory contains tools for profiling CUGA digital sales tasks with different configurations and models, extracting performance metrics and LLM call information from Langfuse.

## Directory Structure

```
system_tests/profiling/
β”œβ”€β”€ README.md                    # This file
β”œβ”€β”€ run_experiment.sh            # Main entry point for running experiments
β”œβ”€β”€ serve.sh                     # HTTP server for viewing results
β”œβ”€β”€ bin/                         # Internal scripts
β”‚   β”œβ”€β”€ profile_digital_sales_tasks.py
β”‚   β”œβ”€β”€ run_profiling.sh
β”‚   └── run_experiment.sh
β”œβ”€β”€ config/                      # Configuration files
β”‚   β”œβ”€β”€ default_experiment.yaml  # Default experiment configuration
β”‚   β”œβ”€β”€ fast_vs_accurate.yaml    # Example: Fast vs Accurate comparison
β”‚   └── .secrets.yaml            # Secrets file (git-ignored)
β”œβ”€β”€ experiments/                 # Experiment results and comparison HTML
β”‚   └── comparison.html
└── reports/                     # Individual profiling reports
```

## Quick Start

### 1. Set Up Environment Variables

Create a `.env` file in the project root or export these variables:

```bash
export LANGFUSE_PUBLIC_KEY="pk-your-public-key"
export LANGFUSE_SECRET_KEY="sk-your-secret-key"
export LANGFUSE_HOST="https://cloud.langfuse.com"  # Optional
```

### 2. Run an Experiment

The simplest way to run experiments is using the configuration files:

```bash
# Run default experiment (fast vs balanced)
./system_tests/profiling/run_experiment.sh

# Run a specific experiment configuration
./system_tests/profiling/run_experiment.sh --config fast_vs_accurate.yaml

# Run and automatically open results in browser
./system_tests/profiling/run_experiment.sh --config default_experiment.yaml --open
```

### 3. View Results

Results are automatically saved to `system_tests/profiling/experiments/` and can be viewed in the HTML dashboard:

```bash
# Start the server (serves experiments directory)
./system_tests/profiling/serve.sh

# Or start and open browser automatically
./system_tests/profiling/serve.sh --open

# Use a different port
./system_tests/profiling/serve.sh --port 3000
```

## Configuration Files

Configuration files use YAML format with Dynaconf. They define experiments with multiple runs and comparison settings.

### Example Configuration

```yaml
profiling:
  configs:
    - "settings.openai.toml"
  modes:
    - "fast"
    - "balanced"
  tasks:
    - "test_get_top_account_by_revenue_stream"
  runs: 3

experiment:
  name: "fast_vs_balanced"
  description: "Compare fast and balanced modes"
  
  runs:
    - name: "fast_mode"
      test_id: "settings.openai.toml:fast:test_get_top_account_by_revenue_stream"
      iterations: 3
      output: "experiments/fast_{{timestamp}}.json"
      env:
        MODEL_NAME: "Azure/gpt-4o"  # Set environment variable
    
    - name: "balanced_mode"
      test_id: "settings.openai.toml:balanced:test_get_top_account_by_revenue_stream"
      iterations: 3
      output: "experiments/balanced_{{timestamp}}.json"
      env:
        MODEL_NAME: null  # Unset environment variable
  
  comparison:
    generate_html: true
    html_output: "experiments/comparison.html"
    auto_open: false
```

### Configuration Options

#### Profiling Section

- `configs`: List of configuration files to test (e.g., `settings.openai.toml`)
- `modes`: List of CUGA modes (`fast`, `balanced`, `accurate`)
- `tasks`: List of test tasks to run
- `runs`: Number of iterations per configuration
- `output`: Output directory and filename settings
- `langfuse`: Langfuse connection settings (credentials from env vars)

#### Experiment Section

- `name`: Name of the experiment
- `description`: Description of what's being tested
- `runs`: List of experiment runs to execute
  - `name`: Display name for the run
  - `test_id`: Specific test to run (format: `config:mode:task`)
  - `iterations`: Number of times to run this test
  - `output`: Output file path (use `{{timestamp}}` for dynamic naming)
  - `env`: (Optional) Environment variables to set/unset for this run
    - Set a variable: `VAR_NAME: "value"`
    - Unset a variable: `VAR_NAME: null`
- `comparison`: Settings for generating comparison HTML

## Available Test IDs

Test IDs follow the format: `config:mode:task`

**Configurations:**
- `settings.openai.toml`
- `settings.azure.toml`
- `settings.watsonx.toml`

**Modes:**
- `fast`
- `balanced`
- `accurate`

**Tasks:**
- `test_get_top_account_by_revenue_stream`
- `test_list_my_accounts`
- `test_find_vp_sales_active_high_value_accounts`

To list all available test IDs:

```bash
./system_tests/profiling/bin/run_profiling.sh --list-tests
```

## Advanced Usage

### Command Line Interface

You can also use CLI arguments directly:

```bash
# Run specific configuration with CLI args
./system_tests/profiling/bin/run_profiling.sh \
  --configs settings.openai.toml \
  --modes fast,balanced \
  --runs 3

# Run a single test by ID
./system_tests/profiling/bin/run_profiling.sh \
  --test-id settings.openai.toml:fast:test_get_top_account_by_revenue_stream \
  --runs 5

# Use config file but override runs
./system_tests/profiling/bin/run_profiling.sh \
  --config-file default_experiment.yaml \
  --runs 5
```

### Direct Python Usage

```bash
# Run with config file
cd /path/to/project
uv run python system_tests/profiling/bin/profile_digital_sales_tasks.py \
  --config-file default_experiment.yaml

# Run with CLI arguments
uv run python system_tests/profiling/bin/profile_digital_sales_tasks.py \
  --configs settings.openai.toml \
  --modes fast \
  --tasks test_get_top_account_by_revenue_stream \
  --runs 3 \
  --output system_tests/profiling/reports/my_report.json
```

## Output

### Profiling Reports

Individual profiling runs generate JSON reports with:

- **Summary Statistics**: Total tests, success rate, timing
- **Configuration Stats**: Performance per config/mode
- **Langfuse Metrics**: LLM calls, tokens, costs, node timings
- **Detailed Results**: Complete test execution details

### Comparison HTML

The comparison HTML (`experiments/comparison.html`) provides:

**Interactive Visualizations:**
- πŸ“Š Execution time comparison charts
- πŸ’° Cost analysis across modes
- 🎯 Token usage visualization
- πŸ”„ LLM calls breakdown
- πŸ“Š Execution time variability (Min/Avg/Max with range and std dev)
- ⚑ Time breakdown (generation vs processing)
- πŸ“ˆ Performance radar chart (normalized comparison)

**Detailed Tables:**
- Summary view of all experiments
- Configuration statistics table
- Per-run Langfuse metrics
- Aggregated metrics across runs

**Features:**
- Tab navigation between charts and tables
- Color-coded modes (Fast=green, Balanced=blue, Accurate=orange)
- Interactive tooltips on hover
- Automatic loading of all JSON files in the directory
- Modern, responsive design

## Creating Custom Experiments

1. Create a new YAML file in `system_tests/profiling/config/`:

```bash
cp system_tests/profiling/config/default_experiment.yaml system_tests/profiling/config/my_experiment.yaml
```

2. Edit the configuration to match your experiment needs

3. Run your experiment:

```bash
./system_tests/profiling/run_experiment.sh --config my_experiment.yaml
```

## Tips

- Use `{{timestamp}}` in output paths for unique filenames
- CLI arguments override config file settings
- The HTML comparison automatically picks up new JSON files
- Set credentials in `.env` or `config/.secrets.yaml`
- Use `--open` flag to automatically open results in browser
- Use `env` in experiment runs to set/unset environment variables per run
- Set `env.VAR: null` to explicitly unset an environment variable

## Troubleshooting

### Port Conflicts

The scripts automatically clean up processes on ports 8000, 8001, 7860.

### Missing Credentials

Ensure `LANGFUSE_PUBLIC_KEY` and `LANGFUSE_SECRET_KEY` are set:

```bash
# Check if set
echo $LANGFUSE_PUBLIC_KEY
echo $LANGFUSE_SECRET_KEY

# Set temporarily
export LANGFUSE_PUBLIC_KEY="pk-..."
export LANGFUSE_SECRET_KEY="sk-..."

# Or add to .env file
```

### Configuration Not Found

If a config file isn't found, check:
- The file exists in `system_tests/profiling/config/`
- The filename is correct (case-sensitive)
- You're running from the correct directory

## Examples

### Compare Fast vs Balanced (3 runs each)

```bash
./system_tests/profiling/run_experiment.sh --config default_experiment.yaml
```

### Compare Providers (OpenAI vs Azure vs WatsonX)

```bash
./system_tests/profiling/run_experiment.sh --config providers_comparison.yaml
```

This compares different LLM providers using the same mode (balanced).

### Compare All Modes with OpenAI

Create `system_tests/profiling/config/all_modes.yaml`:

```yaml
experiment:
  name: "all_modes_comparison"
  runs:
    - name: "fast"
      test_id: "settings.openai.toml:fast:test_get_top_account_by_revenue_stream"
      iterations: 5
      output: "experiments/fast_{{timestamp}}.json"
    - name: "balanced"
      test_id: "settings.openai.toml:balanced:test_get_top_account_by_revenue_stream"
      iterations: 5
      output: "experiments/balanced_{{timestamp}}.json"
    - name: "accurate"
      test_id: "settings.openai.toml:accurate:test_get_top_account_by_revenue_stream"
      iterations: 5
      output: "experiments/accurate_{{timestamp}}.json"
```

Then run:

```bash
./system_tests/profiling/run_experiment.sh --config all_modes.yaml --open
```

### Full Matrix Comparison (Providers Γ— Modes)

```bash
./system_tests/profiling/run_experiment.sh --config full_matrix_comparison.yaml
```

This creates a comprehensive comparison across multiple providers and modes.