Spaces:

ibm-research
/

cuga-agent

Running

App Files Files Community

cuga-agent / src /system_tests /profiling /QUICK_START.md

Sami Marreed

feat: docker-v1 with optimized frontend

0646b18 29 days ago

preview code

raw

history blame contribute delete

4.09 kB

	# Profiling Quick Start

	## 1. Set Environment Variables

	```bash
	export LANGFUSE_PUBLIC_KEY="pk-..."
	export LANGFUSE_SECRET_KEY="sk-..."
	```

	Or add to `.env` file in project root.

	## 2. Run an Experiment

	```bash
	# Run default experiment (fast vs balanced)
	./system_tests/profiling/run_experiment.sh

	# Compare different providers
	./system_tests/profiling/run_experiment.sh --config providers_comparison.yaml

	# Compare all modes for one provider
	./system_tests/profiling/run_experiment.sh --config fast_vs_accurate.yaml

	# Full matrix: providers × modes
	./system_tests/profiling/run_experiment.sh --config full_matrix_comparison.yaml
	```

	## 3. View Results

	```bash
	# Start HTTP server and open browser
	./system_tests/profiling/serve.sh --open

	# Or just start server (visit http://localhost:8080/comparison.html)
	./system_tests/profiling/serve.sh
	```

	## Comparison Types

	### Mode Comparison (Same Provider)
	Compare fast vs balanced vs accurate modes using the same LLM provider.

	Example output files: `fast_20250930.json`, `balanced_20250930.json`, `accurate_20250930.json`

	### Provider Comparison (Same Mode)
	Compare OpenAI vs Azure vs WatsonX using the same mode (e.g., balanced).

	Example output files: `openai_balanced_20250930.json`, `azure_balanced_20250930.json`, `watsonx_balanced_20250930.json`

	### Full Matrix Comparison
	Compare all combinations of providers and modes (2 providers × 2 modes = 4 experiments).

	Example output files: `openai_fast_20250930.json`, `openai_balanced_20250930.json`, `azure_fast_20250930.json`, `azure_balanced_20250930.json`

	## Available Scripts

	\| Script \| Purpose \|
	\|--------\|---------\|
	\| `run_experiment.sh` \| Run profiling experiments with YAML config \|
	\| `serve.sh` \| Start HTTP server to view results \|
	\| `bin/run_profiling.sh` \| Lower-level profiling script with CLI args \|
	\| `bin/profile_digital_sales_tasks.py` \| Core Python profiling tool \|

	## Configuration Files

	Located in `config/`:
	- `default_experiment.yaml` - Fast vs Balanced comparison
	- `fast_vs_accurate.yaml` - Fast vs Accurate comparison
	- `providers_comparison.yaml` - OpenAI vs Azure vs WatsonX (same mode)
	- `full_matrix_comparison.yaml` - Full provider × mode matrix
	- `.secrets.yaml` - Your Langfuse credentials (git-ignored)

	## Example: Provider Comparison

	Create or use `config/providers_comparison.yaml`:

	```yaml
	experiment:
	name: "providers_comparison"
	runs:
	- name: "openai_balanced"
	test_id: "settings.openai.toml:balanced:test_get_top_account_by_revenue_stream"
	iterations: 3
	output: "experiments/openai_balanced_{{timestamp}}.json"

	- name: "azure_balanced"
	test_id: "settings.azure.toml:balanced:test_get_top_account_by_revenue_stream"
	iterations: 3
	output: "experiments/azure_balanced_{{timestamp}}.json"
	```

	Then run:

	```bash
	./system_tests/profiling/run_experiment.sh --config providers_comparison.yaml
	./system_tests/profiling/serve.sh --open
	```

	## Color Coding in Charts

	The comparison HTML automatically color-codes experiments:

	Modes:
	- Fast = Green 🟢
	- Balanced = Blue 🔵
	- Accurate = Orange 🟠

	Providers:
	- OpenAI = Teal 🟦
	- Azure = Azure Blue 💙
	- WatsonX = IBM Blue 🔵

	Combined Labels (e.g., `openai_balanced`) get colors based on provider first, then mode.

	## Directory Structure

	```
	system_tests/profiling/
	├── run_experiment.sh # Main entry point
	├── serve.sh # View results
	├── bin/ # Internal scripts
	├── config/ # YAML configurations
	├── experiments/ # Results + HTML viewer
	└── reports/ # Individual reports
	```

	## Tips

	- 💡 HTML auto-loads all JSON files in experiments/
	- 💡 Naming format: `{provider}_{mode}_{timestamp}.json` or `{mode}_{timestamp}.json`
	- 💡 CLI args override YAML config settings
	- 💡 Use `{{timestamp}}` in output paths for unique files
	- 💡 Retry mechanism handles Langfuse propagation delays
	- 💡 Stop server with Ctrl+C

	For full documentation, see `README.md`.