File size: 4,088 Bytes
0646b18
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
# Profiling Quick Start

## 1. Set Environment Variables

```bash
export LANGFUSE_PUBLIC_KEY="pk-..."
export LANGFUSE_SECRET_KEY="sk-..."
```

Or add to `.env` file in project root.

## 2. Run an Experiment

```bash
# Run default experiment (fast vs balanced)
./system_tests/profiling/run_experiment.sh

# Compare different providers
./system_tests/profiling/run_experiment.sh --config providers_comparison.yaml

# Compare all modes for one provider
./system_tests/profiling/run_experiment.sh --config fast_vs_accurate.yaml

# Full matrix: providers Γ— modes
./system_tests/profiling/run_experiment.sh --config full_matrix_comparison.yaml
```

## 3. View Results

```bash
# Start HTTP server and open browser
./system_tests/profiling/serve.sh --open

# Or just start server (visit http://localhost:8080/comparison.html)
./system_tests/profiling/serve.sh
```

## Comparison Types

### Mode Comparison (Same Provider)
Compare fast vs balanced vs accurate modes using the same LLM provider.

Example output files: `fast_20250930.json`, `balanced_20250930.json`, `accurate_20250930.json`

### Provider Comparison (Same Mode)
Compare OpenAI vs Azure vs WatsonX using the same mode (e.g., balanced).

Example output files: `openai_balanced_20250930.json`, `azure_balanced_20250930.json`, `watsonx_balanced_20250930.json`

### Full Matrix Comparison
Compare all combinations of providers and modes (2 providers Γ— 2 modes = 4 experiments).

Example output files: `openai_fast_20250930.json`, `openai_balanced_20250930.json`, `azure_fast_20250930.json`, `azure_balanced_20250930.json`

## Available Scripts

| Script | Purpose |
|--------|---------|
| `run_experiment.sh` | Run profiling experiments with YAML config |
| `serve.sh` | Start HTTP server to view results |
| `bin/run_profiling.sh` | Lower-level profiling script with CLI args |
| `bin/profile_digital_sales_tasks.py` | Core Python profiling tool |

## Configuration Files

Located in `config/`:
- `default_experiment.yaml` - Fast vs Balanced comparison
- `fast_vs_accurate.yaml` - Fast vs Accurate comparison
- `providers_comparison.yaml` - OpenAI vs Azure vs WatsonX (same mode)
- `full_matrix_comparison.yaml` - Full provider Γ— mode matrix
- `.secrets.yaml` - Your Langfuse credentials (git-ignored)

## Example: Provider Comparison

Create or use `config/providers_comparison.yaml`:

```yaml
experiment:
  name: "providers_comparison"
  runs:
    - name: "openai_balanced"
      test_id: "settings.openai.toml:balanced:test_get_top_account_by_revenue_stream"
      iterations: 3
      output: "experiments/openai_balanced_{{timestamp}}.json"
    
    - name: "azure_balanced"
      test_id: "settings.azure.toml:balanced:test_get_top_account_by_revenue_stream"
      iterations: 3
      output: "experiments/azure_balanced_{{timestamp}}.json"
```

Then run:

```bash
./system_tests/profiling/run_experiment.sh --config providers_comparison.yaml
./system_tests/profiling/serve.sh --open
```

## Color Coding in Charts

The comparison HTML automatically color-codes experiments:

**Modes:**
- Fast = Green 🟒
- Balanced = Blue πŸ”΅
- Accurate = Orange 🟠

**Providers:**
- OpenAI = Teal 🟦
- Azure = Azure Blue πŸ’™
- WatsonX = IBM Blue πŸ”΅

**Combined Labels** (e.g., `openai_balanced`) get colors based on provider first, then mode.

## Directory Structure

```
system_tests/profiling/
β”œβ”€β”€ run_experiment.sh          # Main entry point
β”œβ”€β”€ serve.sh                   # View results
β”œβ”€β”€ bin/                       # Internal scripts
β”œβ”€β”€ config/                    # YAML configurations
β”œβ”€β”€ experiments/               # Results + HTML viewer
└── reports/                   # Individual reports
```

## Tips

- πŸ’‘ HTML auto-loads all JSON files in experiments/
- πŸ’‘ Naming format: `{provider}_{mode}_{timestamp}.json` or `{mode}_{timestamp}.json`
- πŸ’‘ CLI args override YAML config settings
- πŸ’‘ Use `{{timestamp}}` in output paths for unique files
- πŸ’‘ Retry mechanism handles Langfuse propagation delays
- πŸ’‘ Stop server with Ctrl+C

For full documentation, see `README.md`.