SGLang Performance Dashboard

A web-based dashboard for visualizing SGLang nightly test performance metrics.

Features

Performance Trends: View throughput, latency, and TTFT trends over time
Model Comparison: Compare performance across different models and configurations
Filtering: Filter by GPU configuration, model, variant, and batch size
Interactive Charts: Zoom, pan, and hover for detailed metrics
Run History: View recent benchmark runs with links to GitHub Actions

Quick Start

Option 1: Run with Local Server (Recommended)

For live data from GitHub Actions artifacts:

# Install requirements
pip install requests

# Run the server
python server.py --fetch-on-start

# Visit http://localhost:8000

The server provides:

Automatic fetching of metrics from GitHub
Caching to reduce API calls
/api/metrics endpoint for the frontend

Option 2: Fetch Data Manually

Use the fetch script to download metrics data:

# Fetch last 30 days of metrics
python fetch_metrics.py --output metrics_data.json

# Fetch a specific run
python fetch_metrics.py --run-id 21338741812 --output single_run.json

# Fetch only scheduled (nightly) runs
python fetch_metrics.py --scheduled-only --days 7

GitHub Token

To download artifacts from GitHub, you need authentication:

Using gh CLI (recommended):
```
gh auth login
```
Using environment variable:
```
export GITHUB_TOKEN=your_token_here
```

Without a token, the dashboard will show run metadata but not detailed benchmark results.

Data Structure

The metrics JSON has this structure:

{
  "run_id": "21338741812",
  "run_date": "2026-01-25T22:24:02.090218+00:00",
  "commit_sha": "5cdb391...",
  "branch": "main",
  "results": [
    {
      "gpu_config": "8-gpu-h200",
      "partition": 0,
      "model": "deepseek-ai/DeepSeek-V3.1",
      "variant": "TP8+MTP",
      "benchmarks": [
        {
          "batch_size": 1,
          "input_len": 4096,
          "output_len": 512,
          "latency_ms": 2400.72,
          "input_throughput": 21408.64,
          "output_throughput": 231.74,
          "overall_throughput": 1919.43,
          "ttft_ms": 191.32,
          "acc_length": 3.19
        }
      ]
    }
  ]
}

Deployment

GitHub Pages

The dashboard can be deployed to GitHub Pages for public access:

Copy the dashboard files to docs/performance_dashboard/
Enable GitHub Pages in repository settings
Set up a GitHub Action to periodically update metrics data

Self-Hosted

For a self-hosted deployment with live data:

Set up a server running server.py
Configure a cron job or systemd timer to refresh data
Optionally put behind nginx/caddy for SSL

Metrics Explained

Overall Throughput: Total tokens (input + output) processed per second
Input Throughput: Input tokens processed per second (prefill speed)
Output Throughput: Output tokens generated per second (decode speed)
Latency: End-to-end time to complete the request
TTFT: Time to First Token - time until the first output token
Acc Length: Acceptance length for speculative decoding (MTP variants)

Contributing

To add support for new metrics or visualizations:

Update fetch_metrics.py if data collection needs changes
Modify app.js to add new chart types or filters
Update index.html for UI changes

Troubleshooting

No data displayed

Check browser console for errors
Verify GitHub API is accessible
Try running with server.py --fetch-on-start

API rate limits

Use a GitHub token for higher limits
The server caches data for 5 minutes

Charts not rendering

Ensure Chart.js is loading from CDN
Check for JavaScript errors in console