Spaces:
Running
Running
File size: 4,088 Bytes
0646b18 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 |
# Profiling Quick Start
## 1. Set Environment Variables
```bash
export LANGFUSE_PUBLIC_KEY="pk-..."
export LANGFUSE_SECRET_KEY="sk-..."
```
Or add to `.env` file in project root.
## 2. Run an Experiment
```bash
# Run default experiment (fast vs balanced)
./system_tests/profiling/run_experiment.sh
# Compare different providers
./system_tests/profiling/run_experiment.sh --config providers_comparison.yaml
# Compare all modes for one provider
./system_tests/profiling/run_experiment.sh --config fast_vs_accurate.yaml
# Full matrix: providers Γ modes
./system_tests/profiling/run_experiment.sh --config full_matrix_comparison.yaml
```
## 3. View Results
```bash
# Start HTTP server and open browser
./system_tests/profiling/serve.sh --open
# Or just start server (visit http://localhost:8080/comparison.html)
./system_tests/profiling/serve.sh
```
## Comparison Types
### Mode Comparison (Same Provider)
Compare fast vs balanced vs accurate modes using the same LLM provider.
Example output files: `fast_20250930.json`, `balanced_20250930.json`, `accurate_20250930.json`
### Provider Comparison (Same Mode)
Compare OpenAI vs Azure vs WatsonX using the same mode (e.g., balanced).
Example output files: `openai_balanced_20250930.json`, `azure_balanced_20250930.json`, `watsonx_balanced_20250930.json`
### Full Matrix Comparison
Compare all combinations of providers and modes (2 providers Γ 2 modes = 4 experiments).
Example output files: `openai_fast_20250930.json`, `openai_balanced_20250930.json`, `azure_fast_20250930.json`, `azure_balanced_20250930.json`
## Available Scripts
| Script | Purpose |
|--------|---------|
| `run_experiment.sh` | Run profiling experiments with YAML config |
| `serve.sh` | Start HTTP server to view results |
| `bin/run_profiling.sh` | Lower-level profiling script with CLI args |
| `bin/profile_digital_sales_tasks.py` | Core Python profiling tool |
## Configuration Files
Located in `config/`:
- `default_experiment.yaml` - Fast vs Balanced comparison
- `fast_vs_accurate.yaml` - Fast vs Accurate comparison
- `providers_comparison.yaml` - OpenAI vs Azure vs WatsonX (same mode)
- `full_matrix_comparison.yaml` - Full provider Γ mode matrix
- `.secrets.yaml` - Your Langfuse credentials (git-ignored)
## Example: Provider Comparison
Create or use `config/providers_comparison.yaml`:
```yaml
experiment:
name: "providers_comparison"
runs:
- name: "openai_balanced"
test_id: "settings.openai.toml:balanced:test_get_top_account_by_revenue_stream"
iterations: 3
output: "experiments/openai_balanced_{{timestamp}}.json"
- name: "azure_balanced"
test_id: "settings.azure.toml:balanced:test_get_top_account_by_revenue_stream"
iterations: 3
output: "experiments/azure_balanced_{{timestamp}}.json"
```
Then run:
```bash
./system_tests/profiling/run_experiment.sh --config providers_comparison.yaml
./system_tests/profiling/serve.sh --open
```
## Color Coding in Charts
The comparison HTML automatically color-codes experiments:
**Modes:**
- Fast = Green π’
- Balanced = Blue π΅
- Accurate = Orange π
**Providers:**
- OpenAI = Teal π¦
- Azure = Azure Blue π
- WatsonX = IBM Blue π΅
**Combined Labels** (e.g., `openai_balanced`) get colors based on provider first, then mode.
## Directory Structure
```
system_tests/profiling/
βββ run_experiment.sh # Main entry point
βββ serve.sh # View results
βββ bin/ # Internal scripts
βββ config/ # YAML configurations
βββ experiments/ # Results + HTML viewer
βββ reports/ # Individual reports
```
## Tips
- π‘ HTML auto-loads all JSON files in experiments/
- π‘ Naming format: `{provider}_{mode}_{timestamp}.json` or `{mode}_{timestamp}.json`
- π‘ CLI args override YAML config settings
- π‘ Use `{{timestamp}}` in output paths for unique files
- π‘ Retry mechanism handles Langfuse propagation delays
- π‘ Stop server with Ctrl+C
For full documentation, see `README.md`. |