KokosDev commited on
Commit
490d677
·
verified ·
1 Parent(s): 5083c57

Deploy nsys-llm-explainer Gradio Space

Browse files
Files changed (5) hide show
  1. README.md +62 -6
  2. app.py +263 -0
  3. requirements.txt +3 -0
  4. sample_report.json +761 -0
  5. space_utils.py +376 -0
README.md CHANGED
@@ -1,12 +1,68 @@
1
  ---
2
- title: Nsys Llm Explainer
3
- emoji: 📊
4
- colorFrom: red
5
- colorTo: purple
6
  sdk: gradio
7
- sdk_version: 6.9.0
8
  app_file: app.py
9
  pinned: false
10
  ---
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: nsys-llm-explainer Instant Nsight Trace Analyzer for Cloud LLM Inference
3
+ emoji: "📈"
4
+ colorFrom: blue
5
+ colorTo: green
6
  sdk: gradio
7
+ sdk_version: 4.44.0
8
  app_file: app.py
9
  pinned: false
10
  ---
11
 
12
+ # nsys-llm-explainer Instant Nsight Trace Analyzer for Cloud LLM Inference
13
+
14
+ This folder is a production-ready Hugging Face Space payload for the `nsys-llm-explainer` project.
15
+
16
+ It turns an uploaded `trace.sqlite`, `.db`, or `report.json` into:
17
+
18
+ - Prioritized findings with evidence and recommendations
19
+ - Kernel, NCCL, barrier, and launch-latency summaries
20
+ - NVLink-over-NCCL correlation when GPU metrics are available
21
+ - Markdown preview of the full report
22
+ - Downloadable `report.md`, `report.json`, CSV tables, and a zip bundle
23
+
24
+ ## Files
25
+
26
+ - `app.py`: Gradio app entrypoint
27
+ - `space_utils.py`: analysis and artifact helpers
28
+ - `requirements.txt`: Space dependencies
29
+
30
+ ## Deploy on Hugging Face Spaces
31
+
32
+ 1. Create a new Space using the `Gradio` SDK.
33
+ 2. Copy the contents of this folder into the Space repository root.
34
+ 3. Keep `requirements.txt` in place so the Space installs the analyzer package and Gradio runtime.
35
+ 4. Push the repo. Hugging Face will build the Space automatically.
36
+ 5. Open the app and upload a `trace.sqlite` or `report.json`.
37
+
38
+ ## Duplicate and pin
39
+
40
+ If you want a reproducible Space, keep the Git dependency pinned to a release tag in `requirements.txt`.
41
+
42
+ If you want the Space to follow the latest `main` branch instead, change:
43
+
44
+ ```txt
45
+ git+https://github.com/KOKOSde/nsys-llm-explainer.git@v0.3.0
46
+ ```
47
+
48
+ to:
49
+
50
+ ```txt
51
+ git+https://github.com/KOKOSde/nsys-llm-explainer.git@main
52
+ ```
53
+
54
+ ## Operational notes
55
+
56
+ - The app works with uploaded SQLite exports directly, so there is no need to pre-generate artifacts.
57
+ - If a trace is missing NCCL or GPU metrics tables, the UI still loads and explains which analyses are unavailable.
58
+ - For private traces, use a private Space.
59
+
60
+ ## Local run
61
+
62
+ From this repository root:
63
+
64
+ ```bash
65
+ PYTHONPATH=src python3 spaces/hf_space/app.py
66
+ ```
67
+
68
+ If you are running the folder standalone, first install the dependencies from `requirements.txt`.
app.py ADDED
@@ -0,0 +1,263 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ from pathlib import Path
4
+ from typing import Any, Optional, Sequence, Tuple
5
+
6
+ import pandas as pd
7
+ import gradio as gr
8
+
9
+ from space_utils import SpaceBundle, analyze_path, coerce_upload_path, find_local_sample
10
+
11
+
12
+ APP_TITLE = "nsys-llm-explainer — Instant Nsight Trace Analyzer for Cloud LLM Inference"
13
+
14
+ CSS = """
15
+ .gradio-container {
16
+ background:
17
+ radial-gradient(circle at top left, rgba(42, 93, 142, 0.35), transparent 30%),
18
+ radial-gradient(circle at top right, rgba(20, 104, 117, 0.22), transparent 26%),
19
+ linear-gradient(180deg, #081018 0%, #0b111a 42%, #090e15 100%);
20
+ color: #e6eef7;
21
+ font-family: "Aptos", "Segoe UI", sans-serif;
22
+ }
23
+
24
+ .hero-card {
25
+ border: 1px solid rgba(115, 145, 180, 0.28);
26
+ border-radius: 22px;
27
+ background: linear-gradient(135deg, rgba(14, 22, 34, 0.95), rgba(10, 14, 20, 0.92));
28
+ box-shadow: 0 24px 70px rgba(0, 0, 0, 0.28);
29
+ padding: 22px 24px;
30
+ margin-bottom: 16px;
31
+ }
32
+
33
+ .hero-kicker {
34
+ text-transform: uppercase;
35
+ letter-spacing: 0.18em;
36
+ color: #8fb4d9;
37
+ font-size: 11px;
38
+ font-weight: 700;
39
+ }
40
+
41
+ .hero-title {
42
+ margin: 10px 0 10px;
43
+ font-size: 34px;
44
+ line-height: 1.05;
45
+ font-weight: 800;
46
+ color: #f3f8ff;
47
+ }
48
+
49
+ .hero-subtitle {
50
+ color: #b2c5d9;
51
+ font-size: 15px;
52
+ line-height: 1.6;
53
+ max-width: 980px;
54
+ }
55
+
56
+ .badge-row {
57
+ display: flex;
58
+ flex-wrap: wrap;
59
+ gap: 8px;
60
+ margin-top: 16px;
61
+ }
62
+
63
+ .badge {
64
+ display: inline-flex;
65
+ align-items: center;
66
+ padding: 6px 12px;
67
+ border-radius: 999px;
68
+ border: 1px solid rgba(137, 171, 207, 0.28);
69
+ background: rgba(13, 21, 31, 0.82);
70
+ color: #d8e6f5;
71
+ font-size: 12px;
72
+ }
73
+
74
+ .upload-card {
75
+ border: 1px solid rgba(88, 113, 143, 0.26);
76
+ border-radius: 18px;
77
+ background: rgba(10, 16, 24, 0.86);
78
+ padding: 14px;
79
+ margin-bottom: 14px;
80
+ }
81
+
82
+ .section-title {
83
+ color: #f4f8fd;
84
+ font-size: 16px;
85
+ font-weight: 700;
86
+ margin: 0 0 10px 0;
87
+ }
88
+
89
+ .gr-markdown, .prose {
90
+ color: #e8eff7;
91
+ }
92
+
93
+ .wrap-long {
94
+ white-space: pre-wrap;
95
+ word-break: break-word;
96
+ }
97
+ """
98
+
99
+ HEADER = """
100
+ <div class="hero-card">
101
+ <div class="hero-kicker">Cloud ML trace intelligence</div>
102
+ <div class="hero-title">nsys-llm-explainer — Instant Nsight Trace Analyzer for Cloud LLM Inference</div>
103
+ <div class="hero-subtitle">
104
+ Upload a `trace.sqlite` or `report.json` and get prioritized findings, NCCL/NVLink correlation, launch storm diagnosis,
105
+ per-process breakdowns, and downloadable analysis artifacts. The same code path powers the CLI, dashboard, and this Space.
106
+ </div>
107
+ <div class="badge-row">
108
+ <span class="badge">SQLite + report.json input</span>
109
+ <span class="badge">Evidence-backed findings</span>
110
+ <span class="badge">CSV + JSON downloads</span>
111
+ <span class="badge">Built for cloud LLM traces</span>
112
+ </div>
113
+ </div>
114
+ """
115
+
116
+
117
+ def _empty_outputs(message: str) -> Tuple[Any, str, pd.DataFrame, str, str, list[str], pd.DataFrame]:
118
+ empty_df = pd.DataFrame(columns=["section", "metric", "value"])
119
+ empty_manifest = pd.DataFrame(columns=["artifact", "purpose", "path"])
120
+ return (
121
+ message,
122
+ message,
123
+ empty_df,
124
+ message,
125
+ message,
126
+ [],
127
+ empty_manifest,
128
+ )
129
+
130
+
131
+ def _bundle_to_outputs(bundle: SpaceBundle) -> Tuple[Any, str, pd.DataFrame, str, str, list[str], pd.DataFrame]:
132
+ summary_df = pd.DataFrame(bundle.summary_rows)
133
+ manifest_df = pd.DataFrame(bundle.manifest_rows)
134
+ bottleneck = next((row["value"] for row in bundle.summary_rows if row.get("metric") == "Top bottleneck"), "No bottleneck summary available")
135
+ summary_markdown = [
136
+ "### Quick read",
137
+ "",
138
+ "- Source: `{}` (`{}`)".format(bundle.source_path.name, bundle.source_kind),
139
+ "- {}".format(bundle.report.get("generated_at") or "Generated time unavailable"),
140
+ "- {}".format(bottleneck),
141
+ "- Warnings: `{}`".format(len(bundle.report.get("warnings") or [])),
142
+ ]
143
+ files = [str(path) for path in bundle.artifact_paths]
144
+ return (
145
+ bundle.status_markdown,
146
+ "\n".join(summary_markdown),
147
+ summary_df,
148
+ bundle.findings_markdown,
149
+ bundle.markdown,
150
+ files,
151
+ manifest_df,
152
+ )
153
+
154
+
155
+ def _resolve_path(uploaded: Any, sample_path: str) -> Optional[Path]:
156
+ uploaded_path = coerce_upload_path(uploaded)
157
+ if uploaded_path:
158
+ return uploaded_path
159
+ if sample_path:
160
+ candidate = Path(sample_path)
161
+ if candidate.exists():
162
+ return candidate
163
+ return None
164
+
165
+
166
+ def _run_analysis(uploaded: Any, sample_path: str) -> Tuple[Any, str, pd.DataFrame, str, str, list[str], pd.DataFrame]:
167
+ path = _resolve_path(uploaded, sample_path)
168
+ if not path:
169
+ return _empty_outputs(
170
+ "Upload a `trace.sqlite`/`.db` file or a `report.json` to generate the report. "
171
+ "If you are using this Space as a demo, click `Load sample trace` first."
172
+ )
173
+ try:
174
+ bundle = analyze_path(path)
175
+ return _bundle_to_outputs(bundle)
176
+ except Exception as exc:
177
+ message = "Failed to analyze `{}`: `{}`".format(path.name, exc)
178
+ return _empty_outputs(message)
179
+
180
+
181
+ def _build_demo(sample_path: Optional[Path]) -> gr.Blocks:
182
+ with gr.Blocks(title=APP_TITLE, css=CSS, theme=gr.themes.Soft(primary_hue="blue", secondary_hue="slate")) as demo:
183
+ gr.HTML(HEADER)
184
+ with gr.Row(elem_classes=["upload-card"]):
185
+ with gr.Column(scale=6):
186
+ upload = gr.File(
187
+ label="Upload trace or report",
188
+ file_count="single",
189
+ file_types=[".sqlite", ".db", ".json"],
190
+ type="filepath",
191
+ )
192
+ with gr.Column(scale=2, min_width=180):
193
+ analyze_btn = gr.Button("Analyze trace", variant="primary")
194
+ with gr.Column(scale=2, min_width=180):
195
+ sample_btn = gr.Button(
196
+ "Load sample trace",
197
+ variant="secondary",
198
+ visible=bool(sample_path),
199
+ )
200
+
201
+ status = gr.Markdown("Upload a trace or report to begin.")
202
+ sample_state = gr.State(str(sample_path) if sample_path else "")
203
+
204
+ with gr.Tabs():
205
+ with gr.Tab("Summary"):
206
+ gr.Markdown("### Summary")
207
+ summary = gr.Markdown(elem_classes=["wrap-long"])
208
+ summary_table = gr.Dataframe(
209
+ headers=["section", "metric", "value"],
210
+ datatype=["str", "str", "str"],
211
+ interactive=False,
212
+ wrap=True,
213
+ label="Key metrics",
214
+ )
215
+ with gr.Tab("Findings"):
216
+ findings = gr.Markdown(elem_classes=["wrap-long"])
217
+ with gr.Tab("Markdown"):
218
+ report_markdown = gr.Markdown(elem_classes=["wrap-long"])
219
+ with gr.Tab("Downloads"):
220
+ gr.Markdown(
221
+ "### Generated artifacts\n"
222
+ "The analysis writes `report.md`, `report.json`, CSV tables, and a zip bundle."
223
+ )
224
+ manifest = gr.Dataframe(
225
+ headers=["artifact", "purpose", "path"],
226
+ datatype=["str", "str", "str"],
227
+ interactive=False,
228
+ wrap=True,
229
+ label="Artifact manifest",
230
+ )
231
+ downloads = gr.File(
232
+ label="Download files",
233
+ file_count="multiple",
234
+ type="filepath",
235
+ )
236
+
237
+ analyze_btn.click(
238
+ fn=_run_analysis,
239
+ inputs=[upload, sample_state],
240
+ outputs=[status, summary, summary_table, findings, report_markdown, downloads, manifest],
241
+ )
242
+ if sample_path:
243
+ sample_btn.click(
244
+ fn=lambda sp: _run_analysis(None, sp),
245
+ inputs=[sample_state],
246
+ outputs=[status, summary, summary_table, findings, report_markdown, downloads, manifest],
247
+ )
248
+ demo.load(
249
+ fn=lambda sp: _run_analysis(None, sp),
250
+ inputs=[sample_state],
251
+ outputs=[status, summary, summary_table, findings, report_markdown, downloads, manifest],
252
+ )
253
+ return demo
254
+
255
+
256
+ def main() -> None:
257
+ demo = _build_demo(find_local_sample())
258
+ demo.queue()
259
+ demo.launch()
260
+
261
+
262
+ if __name__ == "__main__":
263
+ main()
requirements.txt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ gradio>=4.44.0
2
+ pandas>=1.5.0
3
+ git+https://github.com/KOKOSde/nsys-llm-explainer.git@v0.3.0
sample_report.json ADDED
@@ -0,0 +1,761 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "findings": [
3
+ {
4
+ "evidence": [
5
+ "Top kernel `computeKernel` is 42.6% of total kernel time."
6
+ ],
7
+ "recommendation": [
8
+ "Focus optimization effort on this kernel first."
9
+ ],
10
+ "severity": "medium",
11
+ "title": "Single kernel is a large share of GPU time"
12
+ },
13
+ {
14
+ "evidence": [
15
+ "Top sync-like call `cudaStreamSynchronize` total 0.80 ms across 1 calls.",
16
+ "All sync-like calls total 1.50 ms."
17
+ ],
18
+ "recommendation": [
19
+ "Look for `cudaDeviceSynchronize` / stream waits in your serving loop and remove unnecessary barriers.",
20
+ "Prefer async launches and overlap CPU work with GPU execution; avoid per-token synchronization."
21
+ ],
22
+ "severity": "medium",
23
+ "title": "CPU\u2194GPU synchronization detected (runtime API)"
24
+ }
25
+ ],
26
+ "generated_at": "2026-03-11T03:52:08.704882+00:00",
27
+ "metrics": {
28
+ "barriers": {
29
+ "barriers": [
30
+ {
31
+ "api_name": "cudaStreamSynchronize",
32
+ "avg_duration_us": 800.0,
33
+ "barrier_kind": "sync_api",
34
+ "count": 1,
35
+ "max_duration_us": 800.0,
36
+ "total_time_ms": 0.8
37
+ },
38
+ {
39
+ "api_name": "cudaDeviceSynchronize",
40
+ "avg_duration_us": 700.0,
41
+ "barrier_kind": "sync_api",
42
+ "count": 1,
43
+ "max_duration_us": 700.0,
44
+ "total_time_ms": 0.7
45
+ },
46
+ {
47
+ "api_name": "cudaMemcpy",
48
+ "avg_duration_us": 600.0,
49
+ "barrier_kind": "blocking_memcpy",
50
+ "count": 1,
51
+ "max_duration_us": 600.0,
52
+ "total_time_ms": 0.6
53
+ },
54
+ {
55
+ "api_name": "cpu_launcher_gap",
56
+ "avg_duration_us": 200.0,
57
+ "barrier_kind": "cpu_launcher_gap",
58
+ "count": 1,
59
+ "max_duration_us": 200.0,
60
+ "total_time_ms": 0.2
61
+ }
62
+ ],
63
+ "barriers_by_pid": [
64
+ {
65
+ "api_name": "cudaStreamSynchronize",
66
+ "avg_duration_us": 800.0,
67
+ "barrier_kind": "sync_api",
68
+ "count": 1,
69
+ "max_duration_us": 800.0,
70
+ "pid": 111,
71
+ "total_time_ms": 0.8
72
+ },
73
+ {
74
+ "api_name": "cudaMemcpy",
75
+ "avg_duration_us": 600.0,
76
+ "barrier_kind": "blocking_memcpy",
77
+ "count": 1,
78
+ "max_duration_us": 600.0,
79
+ "pid": 111,
80
+ "total_time_ms": 0.6
81
+ },
82
+ {
83
+ "api_name": "cpu_launcher_gap",
84
+ "avg_duration_us": 200.0,
85
+ "barrier_kind": "cpu_launcher_gap",
86
+ "count": 1,
87
+ "max_duration_us": 200.0,
88
+ "pid": 111,
89
+ "total_time_ms": 0.2
90
+ },
91
+ {
92
+ "api_name": "cudaDeviceSynchronize",
93
+ "avg_duration_us": 700.0,
94
+ "barrier_kind": "sync_api",
95
+ "count": 1,
96
+ "max_duration_us": 700.0,
97
+ "pid": 222,
98
+ "total_time_ms": 0.7
99
+ }
100
+ ],
101
+ "launcher_gap_threshold_us": 50.0,
102
+ "notes": [],
103
+ "pids": [
104
+ {
105
+ "barrier_event_count": 3,
106
+ "pid": 111,
107
+ "top_barrier": "cudaStreamSynchronize",
108
+ "top_barrier_kind": "sync_api",
109
+ "total_barrier_time_ms": 1.6
110
+ },
111
+ {
112
+ "barrier_event_count": 1,
113
+ "pid": 222,
114
+ "top_barrier": "cudaDeviceSynchronize",
115
+ "top_barrier_kind": "sync_api",
116
+ "total_barrier_time_ms": 0.7
117
+ }
118
+ ],
119
+ "present": true,
120
+ "sql": {
121
+ "runtime_barriers": "SELECT (CAST(r.globalTid / 16777216 AS INT) % 16777216) AS pid, s.value AS api_name, r.start AS start_ns, r.end AS end_ns FROM CUPTI_ACTIVITY_KIND_RUNTIME r JOIN StringIds s ON s.id = r.nameId WHERE r.end IS NOT NULL AND r.end > r.start AND (LOWER(s.value) LIKE ? OR LOWER(s.value) LIKE ? OR LOWER(s.value) LIKE ? OR LOWER(s.value) LIKE ? OR LOWER(s.value) LIKE ? OR LOWER(s.value) LIKE ? OR LOWER(s.value) LIKE ? OR LOWER(s.value) LIKE ? OR LOWER(s.value) LIKE ? OR LOWER(s.value) LIKE ? OR LOWER(s.value) LIKE ? OR LOWER(s.value) LIKE ? OR LOWER(s.value) LIKE ? OR LOWER(s.value) LIKE ? OR LOWER(s.value) LIKE ? OR LOWER(s.value) LIKE ? OR LOWER(s.value) LIKE ? OR LOWER(s.value) LIKE ? OR LOWER(s.value) LIKE ? OR LOWER(s.value) LIKE ? OR LOWER(s.value) LIKE ? OR LOWER(s.value) LIKE ?) ORDER BY pid, start_ns"
122
+ }
123
+ },
124
+ "by_pid": {
125
+ "kernels": {
126
+ "kernel_table": "CUPTI_ACTIVITY_KIND_KERNEL",
127
+ "kernels": [
128
+ {
129
+ "avg_duration_us": 1000.0,
130
+ "call_count": 2,
131
+ "device_id": 0,
132
+ "kernel_name": "computeKernel",
133
+ "pct_of_pid_kernel_time": 50.0,
134
+ "pct_of_total_kernel_time": 32.78688524590164,
135
+ "pid": 111,
136
+ "pid_pct_of_total_kernel_time": 65.57377049180327,
137
+ "pid_total_kernel_time_ms": 4.0,
138
+ "total_time_ms": 2.0
139
+ },
140
+ {
141
+ "avg_duration_us": 2000.0,
142
+ "call_count": 1,
143
+ "device_id": 0,
144
+ "kernel_name": "ncclAllReduceRingKernel",
145
+ "pct_of_pid_kernel_time": 50.0,
146
+ "pct_of_total_kernel_time": 32.78688524590164,
147
+ "pid": 111,
148
+ "pid_pct_of_total_kernel_time": 65.57377049180327,
149
+ "pid_total_kernel_time_ms": 4.0,
150
+ "total_time_ms": 2.0
151
+ },
152
+ {
153
+ "avg_duration_us": 1500.0,
154
+ "call_count": 1,
155
+ "device_id": 0,
156
+ "kernel_name": "ncclBroadcastRingKernel",
157
+ "pct_of_pid_kernel_time": 71.42857142857143,
158
+ "pct_of_total_kernel_time": 24.59016393442623,
159
+ "pid": 222,
160
+ "pid_pct_of_total_kernel_time": 34.42622950819672,
161
+ "pid_total_kernel_time_ms": 2.1,
162
+ "total_time_ms": 1.5
163
+ },
164
+ {
165
+ "avg_duration_us": 600.0,
166
+ "call_count": 1,
167
+ "device_id": 0,
168
+ "kernel_name": "computeKernel",
169
+ "pct_of_pid_kernel_time": 28.57142857142857,
170
+ "pct_of_total_kernel_time": 9.836065573770492,
171
+ "pid": 222,
172
+ "pid_pct_of_total_kernel_time": 34.42622950819672,
173
+ "pid_total_kernel_time_ms": 2.1,
174
+ "total_time_ms": 0.6
175
+ }
176
+ ],
177
+ "notes": [],
178
+ "pid_quality": {
179
+ "pid0_fraction": 0.0,
180
+ "pid0_rows": 0,
181
+ "pid_ge_10m_fraction": 0.0,
182
+ "pid_ge_10m_rows": 0,
183
+ "present": true,
184
+ "rows_with_pid": 5
185
+ },
186
+ "pid_source": "globalPid",
187
+ "pids": [
188
+ {
189
+ "kernel_count": 3,
190
+ "pct_of_total_kernel_time": 65.57377049180327,
191
+ "pid": 111,
192
+ "total_kernel_time_ms": 4.0,
193
+ "total_kernel_time_ns": 4000000
194
+ },
195
+ {
196
+ "kernel_count": 2,
197
+ "pct_of_total_kernel_time": 34.42622950819672,
198
+ "pid": 222,
199
+ "total_kernel_time_ms": 2.1,
200
+ "total_kernel_time_ns": 2100000
201
+ }
202
+ ],
203
+ "present": true,
204
+ "sql": {
205
+ "kernels": "SELECT (CAST(k.globalPid / 16777216 AS INT) % 16777216) AS pid, s.value AS kernel_name, k.deviceId AS device_id, COUNT(*) AS call_count, SUM(k.end-k.start) AS total_ns, AVG(k.end-k.start) AS avg_ns FROM CUPTI_ACTIVITY_KIND_KERNEL k JOIN StringIds s ON s.id = k.demangledName WHERE ((CAST(k.globalPid / 16777216 AS INT) % 16777216)) IN (?,?) GROUP BY pid, kernel_name, device_id ORDER BY pid, total_ns DESC",
206
+ "pid_quality": "SELECT COUNT(*) AS rows_with_pid, SUM(CASE WHEN (CAST(k.globalPid / 16777216 AS INT) % 16777216) = 0 THEN 1 ELSE 0 END) AS pid0_rows, SUM(CASE WHEN (CAST(k.globalPid / 16777216 AS INT) % 16777216) >= 10000000 THEN 1 ELSE 0 END) AS pid_ge_10m_rows FROM CUPTI_ACTIVITY_KIND_KERNEL k WHERE k.globalPid IS NOT NULL",
207
+ "top_pids": "SELECT (CAST(k.globalPid / 16777216 AS INT) % 16777216) AS pid, SUM(k.end-k.start) AS total_ns, COUNT(*) AS kernel_count FROM CUPTI_ACTIVITY_KIND_KERNEL k WHERE k.globalPid IS NOT NULL GROUP BY pid ORDER BY total_ns DESC LIMIT ?"
208
+ }
209
+ },
210
+ "nvtx": {
211
+ "notes": [
212
+ "No NVTX ranges found."
213
+ ],
214
+ "present": false,
215
+ "sql": {}
216
+ },
217
+ "nvtx_kernel_phases": null,
218
+ "sync": {
219
+ "notes": [],
220
+ "pid_source": "globalTid",
221
+ "pids": [
222
+ {
223
+ "pid": 111,
224
+ "sync_total_time_ms": 0.8
225
+ },
226
+ {
227
+ "pid": 222,
228
+ "sync_total_time_ms": 0.7
229
+ }
230
+ ],
231
+ "present": true,
232
+ "runtime_table": "CUPTI_ACTIVITY_KIND_RUNTIME",
233
+ "sql": {
234
+ "sync_by_pid": "SELECT (CAST(r.globalTid / 16777216 AS INT) % 16777216) AS pid, s.value AS api_name, COUNT(*) AS call_count, SUM(r.end-r.start) AS total_ns, AVG(r.end-r.start) AS avg_ns FROM CUPTI_ACTIVITY_KIND_RUNTIME r JOIN StringIds s ON s.id = r.nameId WHERE ((s.value LIKE ?) OR (s.value LIKE ?) OR (s.value LIKE ?) OR (s.value LIKE ?) OR (s.value LIKE ?) OR (s.value LIKE ?) OR (s.value LIKE ?) OR (s.value LIKE ?) OR (s.value LIKE ?) OR (s.value LIKE ?) OR (s.value LIKE ?)) AND (r.globalTid IS NOT NULL) GROUP BY pid, api_name ORDER BY total_ns DESC LIMIT ?"
235
+ },
236
+ "sync_calls": [
237
+ {
238
+ "api_name": "cudaStreamSynchronize",
239
+ "avg_duration_us": 800.0,
240
+ "call_count": 1,
241
+ "pid": 111,
242
+ "total_time_ms": 0.8
243
+ },
244
+ {
245
+ "api_name": "cudaDeviceSynchronize",
246
+ "avg_duration_us": 700.0,
247
+ "call_count": 1,
248
+ "pid": 222,
249
+ "total_time_ms": 0.7
250
+ }
251
+ ]
252
+ }
253
+ },
254
+ "gpu_idle": {
255
+ "devices": [
256
+ {
257
+ "busy_ms": 4.5,
258
+ "device_id": 0,
259
+ "idle_ms": 1.0,
260
+ "idle_pct_of_window": 18.181818181818183,
261
+ "window_ms": 5.5
262
+ }
263
+ ],
264
+ "gaps": [
265
+ {
266
+ "device_id": 0,
267
+ "gap_end_ns": 4000000,
268
+ "gap_ms": 1.0,
269
+ "gap_start_ns": 3000000
270
+ }
271
+ ],
272
+ "notes": [],
273
+ "sql": {
274
+ "events": "SELECT start, end, deviceId AS device_id FROM CUPTI_ACTIVITY_KIND_KERNEL ORDER BY device_id, start"
275
+ },
276
+ "table": "CUPTI_ACTIVITY_KIND_KERNEL"
277
+ },
278
+ "launch_storm": {
279
+ "is_launch_storm": false,
280
+ "launches_per_s": 909.0909090909091,
281
+ "median_kernel_us": 1000.0,
282
+ "notes": [],
283
+ "p50_kernel_us": 1000.0,
284
+ "p90_kernel_us": 1800.0,
285
+ "p99_kernel_us": 1980.0,
286
+ "pct_under_10us": 0.0,
287
+ "pct_under_20us": 0.0,
288
+ "pct_under_5us": 0.0,
289
+ "sql": {
290
+ "tiny_kernels": "SELECT s.value AS kernel_name, COUNT(*) AS call_count, AVG(k.end-k.start) AS avg_dur_ns FROM CUPTI_ACTIVITY_KIND_KERNEL k JOIN StringIds s ON s.id = k.demangledName WHERE (k.end-k.start) <= ? GROUP BY kernel_name ORDER BY call_count DESC LIMIT ?"
291
+ },
292
+ "storm_thresholds": {
293
+ "launches_per_s_threshold_1": 50000.0,
294
+ "launches_per_s_threshold_2": 100000.0,
295
+ "p50_kernel_us_threshold_1": 10.0,
296
+ "p50_kernel_us_threshold_2": 20.0
297
+ },
298
+ "tiny_kernel_us": 5.0,
299
+ "tiny_kernels": [],
300
+ "total_launches": 5,
301
+ "window_s": 0.0055
302
+ },
303
+ "nccl": {
304
+ "event_count": 2,
305
+ "notes": [
306
+ "Using NCCL kernel names as NCCL windows; collective names may be inferred only from kernel names."
307
+ ],
308
+ "ops": [
309
+ {
310
+ "avg_duration_us": 2000.0,
311
+ "compute_overlap_ms": 1.0,
312
+ "compute_overlap_pct": 50.0,
313
+ "count": 1,
314
+ "max_duration_ms": 2.0,
315
+ "op_name": "allreduce",
316
+ "raw_name_example": "ncclAllReduceRingKernel",
317
+ "source": "kernel",
318
+ "straggler": "pid:111",
319
+ "straggler_max_ms": 2.0,
320
+ "straggler_total_ms": 2.0,
321
+ "total_time_ms": 2.0
322
+ },
323
+ {
324
+ "avg_duration_us": 1500.0,
325
+ "compute_overlap_ms": 0.6,
326
+ "compute_overlap_pct": 40.0,
327
+ "count": 1,
328
+ "max_duration_ms": 1.5,
329
+ "op_name": "broadcast",
330
+ "raw_name_example": "ncclBroadcastRingKernel",
331
+ "source": "kernel",
332
+ "straggler": "pid:222",
333
+ "straggler_max_ms": 1.5,
334
+ "straggler_total_ms": 1.5,
335
+ "total_time_ms": 1.5
336
+ }
337
+ ],
338
+ "pids": [
339
+ {
340
+ "max_duration_ms": 2.0,
341
+ "nccl_event_count": 1,
342
+ "pid": 111,
343
+ "top_nccl_op": "allreduce",
344
+ "total_nccl_time_ms": 2.0
345
+ },
346
+ {
347
+ "max_duration_ms": 1.5,
348
+ "nccl_event_count": 1,
349
+ "pid": 222,
350
+ "top_nccl_op": "broadcast",
351
+ "total_nccl_time_ms": 1.5
352
+ }
353
+ ],
354
+ "present": true,
355
+ "source": "kernel",
356
+ "sql": {
357
+ "compute_overlap": "SELECT k.start AS start_ns, k.end AS end_ns, s.value AS kernel_name FROM CUPTI_ACTIVITY_KIND_KERNEL k JOIN StringIds s ON s.id = k.demangledName WHERE k.end IS NOT NULL AND k.end > k.start ORDER BY k.start",
358
+ "nccl_kernels": "SELECT (CAST(k.globalPid / 16777216 AS INT) % 16777216) AS pid, k.deviceId AS device_id, s.value AS kernel_name, k.start AS start_ns, k.end AS end_ns FROM CUPTI_ACTIVITY_KIND_KERNEL k JOIN StringIds s ON s.id = k.demangledName WHERE k.end IS NOT NULL AND k.end > k.start AND (LOWER(s.value) LIKE ?) ORDER BY k.start",
359
+ "nccl_runtime": "SELECT (CAST(r.globalTid / 16777216 AS INT) % 16777216) AS pid, s.value AS api_name, r.start AS start_ns, r.end AS end_ns FROM CUPTI_ACTIVITY_KIND_RUNTIME r JOIN StringIds s ON s.id = r.nameId WHERE r.end IS NOT NULL AND r.end > r.start AND (LOWER(s.value) LIKE ? OR LOWER(s.value) LIKE ? OR LOWER(s.value) LIKE ? OR LOWER(s.value) LIKE ? OR LOWER(s.value) LIKE ? OR LOWER(s.value) LIKE ? OR LOWER(s.value) LIKE ? OR LOWER(s.value) LIKE ? OR LOWER(s.value) LIKE ? OR LOWER(s.value) LIKE ? OR LOWER(s.value) LIKE ?) ORDER BY r.start"
360
+ },
361
+ "windows": [
362
+ {
363
+ "end_ns": 3000000,
364
+ "start_ns": 1000000
365
+ },
366
+ {
367
+ "end_ns": 5500000,
368
+ "start_ns": 4000000
369
+ }
370
+ ]
371
+ },
372
+ "nvlink_during_nccl": {
373
+ "capture_instructions": [
374
+ "NVLink counters not found in the SQLite export.",
375
+ "List supported metric sets first: `nsys profile --gpu-metrics-devices=all --gpu-metrics-set=help`.",
376
+ "Then re-capture with GPU Metrics enabled, for example: `sudo nsys profile --trace=nccl,cuda,nvtx,osrt --cuda-trace-scope=process-tree --gpu-metrics-devices=all --gpu-metrics-set=<supported-set> --gpu-metrics-frequency=10000 --cuda-graph-trace=node -o trace <app>`.",
377
+ "Export again with SQLite output: `nsys export --type sqlite --output trace.sqlite --force-overwrite=true --lazy=false trace.nsys-rep`."
378
+ ],
379
+ "missing_counters": true,
380
+ "notes": [
381
+ "GPU metric tables were not found in this export."
382
+ ],
383
+ "present": false,
384
+ "rows": [],
385
+ "sql": {}
386
+ },
387
+ "nvtx": {
388
+ "instances": [],
389
+ "notes": [
390
+ "No NVTX table found."
391
+ ],
392
+ "ranges": [],
393
+ "sql": {},
394
+ "table": null
395
+ },
396
+ "nvtx_coverage_warn_threshold": 0.7,
397
+ "nvtx_kernel_phases": null,
398
+ "nvtx_kernel_time": {
399
+ "notes": [
400
+ "Need kernel + runtime + NVTX tables for NVTX\u2192kernel attribution."
401
+ ],
402
+ "present": false,
403
+ "ranges": [],
404
+ "sql": {}
405
+ },
406
+ "nvtx_phases": null,
407
+ "per_pid": {
408
+ "notes": [],
409
+ "pid_source": "globalPid",
410
+ "pids": [
411
+ {
412
+ "launch_storm": {
413
+ "is_launch_storm": false,
414
+ "launches_per_s": 1000.0,
415
+ "median_kernel_us": 1000.0,
416
+ "p50_kernel_us": 1000.0,
417
+ "p90_kernel_us": 2000.0,
418
+ "p99_kernel_us": 2000.0,
419
+ "pct_under_10us": 0.0,
420
+ "pct_under_20us": 0.0,
421
+ "pct_under_5us": 0.0,
422
+ "storm_thresholds": {
423
+ "launches_per_s_threshold_1": 50000.0,
424
+ "launches_per_s_threshold_2": 100000.0,
425
+ "p50_kernel_us_threshold_1": 10.0,
426
+ "p50_kernel_us_threshold_2": 20.0
427
+ },
428
+ "total_launches": 3,
429
+ "window_s": 0.003
430
+ },
431
+ "nvtx": {
432
+ "notes": [
433
+ "No NVTX table."
434
+ ],
435
+ "present": false,
436
+ "ranges": [],
437
+ "table": null
438
+ },
439
+ "pid": 111,
440
+ "sync": {
441
+ "notes": [],
442
+ "present": true,
443
+ "sql": "SELECT s.value AS api_name, COUNT(*) AS call_count, SUM(r.end-r.start) AS total_time_ns, AVG(r.end-r.start) AS avg_time_ns FROM CUPTI_ACTIVITY_KIND_RUNTIME r JOIN StringIds s ON s.id = r.nameId WHERE ((r.globalTid >> 24) & 16777215) = ? AND ((s.value LIKE ?) OR (s.value LIKE ?) OR (s.value LIKE ?) OR (s.value LIKE ?) OR (s.value LIKE ?) OR (s.value LIKE ?) OR (s.value LIKE ?) OR (s.value LIKE ?) OR (s.value LIKE ?) OR (s.value LIKE ?) OR (s.value LIKE ?)) GROUP BY api_name ORDER BY total_time_ns DESC LIMIT 50",
444
+ "sync_calls": [
445
+ {
446
+ "api_name": "cudaStreamSynchronize",
447
+ "avg_duration_us": 800.0,
448
+ "call_count": 1,
449
+ "total_time_ms": 0.8
450
+ }
451
+ ],
452
+ "table": "CUPTI_ACTIVITY_KIND_RUNTIME"
453
+ },
454
+ "top_kernels": {
455
+ "kernels": [
456
+ {
457
+ "avg_duration_us": 1000.0,
458
+ "call_count": 2,
459
+ "device_id": 0,
460
+ "kernel_name": "computeKernel",
461
+ "total_time_ms": 2.0
462
+ },
463
+ {
464
+ "avg_duration_us": 2000.0,
465
+ "call_count": 1,
466
+ "device_id": 0,
467
+ "kernel_name": "ncclAllReduceRingKernel",
468
+ "total_time_ms": 2.0
469
+ }
470
+ ],
471
+ "table": "CUPTI_ACTIVITY_KIND_KERNEL",
472
+ "tiny_kernels": []
473
+ }
474
+ },
475
+ {
476
+ "launch_storm": {
477
+ "is_launch_storm": false,
478
+ "launches_per_s": 1333.3333333333333,
479
+ "median_kernel_us": 600.0,
480
+ "p50_kernel_us": 600.0,
481
+ "p90_kernel_us": 1500.0,
482
+ "p99_kernel_us": 1500.0,
483
+ "pct_under_10us": 0.0,
484
+ "pct_under_20us": 0.0,
485
+ "pct_under_5us": 0.0,
486
+ "storm_thresholds": {
487
+ "launches_per_s_threshold_1": 50000.0,
488
+ "launches_per_s_threshold_2": 100000.0,
489
+ "p50_kernel_us_threshold_1": 10.0,
490
+ "p50_kernel_us_threshold_2": 20.0
491
+ },
492
+ "total_launches": 2,
493
+ "window_s": 0.0015
494
+ },
495
+ "nvtx": {
496
+ "notes": [
497
+ "No NVTX table."
498
+ ],
499
+ "present": false,
500
+ "ranges": [],
501
+ "table": null
502
+ },
503
+ "pid": 222,
504
+ "sync": {
505
+ "notes": [],
506
+ "present": true,
507
+ "sql": "SELECT s.value AS api_name, COUNT(*) AS call_count, SUM(r.end-r.start) AS total_time_ns, AVG(r.end-r.start) AS avg_time_ns FROM CUPTI_ACTIVITY_KIND_RUNTIME r JOIN StringIds s ON s.id = r.nameId WHERE ((r.globalTid >> 24) & 16777215) = ? AND ((s.value LIKE ?) OR (s.value LIKE ?) OR (s.value LIKE ?) OR (s.value LIKE ?) OR (s.value LIKE ?) OR (s.value LIKE ?) OR (s.value LIKE ?) OR (s.value LIKE ?) OR (s.value LIKE ?) OR (s.value LIKE ?) OR (s.value LIKE ?)) GROUP BY api_name ORDER BY total_time_ns DESC LIMIT 50",
508
+ "sync_calls": [
509
+ {
510
+ "api_name": "cudaDeviceSynchronize",
511
+ "avg_duration_us": 700.0,
512
+ "call_count": 1,
513
+ "total_time_ms": 0.7
514
+ }
515
+ ],
516
+ "table": "CUPTI_ACTIVITY_KIND_RUNTIME"
517
+ },
518
+ "top_kernels": {
519
+ "kernels": [
520
+ {
521
+ "avg_duration_us": 1500.0,
522
+ "call_count": 1,
523
+ "device_id": 0,
524
+ "kernel_name": "ncclBroadcastRingKernel",
525
+ "total_time_ms": 1.5
526
+ },
527
+ {
528
+ "avg_duration_us": 600.0,
529
+ "call_count": 1,
530
+ "device_id": 0,
531
+ "kernel_name": "computeKernel",
532
+ "total_time_ms": 0.6
533
+ }
534
+ ],
535
+ "table": "CUPTI_ACTIVITY_KIND_KERNEL",
536
+ "tiny_kernels": []
537
+ }
538
+ }
539
+ ],
540
+ "present": true,
541
+ "sql": {
542
+ "top_pids": "SELECT ((k.globalPid >> 24) & 16777215) AS pid, SUM(k.end-k.start) AS total_ns, COUNT(*) AS launches FROM CUPTI_ACTIVITY_KIND_KERNEL k WHERE k.end > k.start AND ((k.globalPid >> 24) & 16777215) IS NOT NULL GROUP BY pid ORDER BY total_ns DESC LIMIT ?"
543
+ },
544
+ "top_pids": [
545
+ {
546
+ "kernel_launches": 3,
547
+ "pct_of_total_kernel_time": 65.57377049180327,
548
+ "pid": 111,
549
+ "total_kernel_time_ms": 4.0,
550
+ "total_kernel_time_ns": 4000000
551
+ },
552
+ {
553
+ "kernel_launches": 2,
554
+ "pct_of_total_kernel_time": 34.42622950819672,
555
+ "pid": 222,
556
+ "total_kernel_time_ms": 2.1,
557
+ "total_kernel_time_ns": 2100000
558
+ }
559
+ ]
560
+ },
561
+ "pid_attribution": {
562
+ "kernel_pid_count": 2,
563
+ "kernel_pid_source": "globalPid",
564
+ "kernel_pids_sample": [
565
+ 111,
566
+ 222
567
+ ],
568
+ "nvtx_pid_count": 0,
569
+ "nvtx_pid_source": null,
570
+ "nvtx_pids_sample": [],
571
+ "runtime_pid_count": 2,
572
+ "runtime_pid_source": "globalTid",
573
+ "runtime_pids_sample": [
574
+ 111,
575
+ 222
576
+ ]
577
+ },
578
+ "sync": {
579
+ "notes": [],
580
+ "sql": {
581
+ "sync_calls": "SELECT s.value AS api_name, COUNT(*) AS call_count, SUM(r.end - r.start) AS total_time_ns, AVG(r.end-r.start) AS avg_time_ns FROM CUPTI_ACTIVITY_KIND_RUNTIME r JOIN StringIds s ON s.id = r.nameId WHERE (s.value LIKE ?) OR (s.value LIKE ?) OR (s.value LIKE ?) OR (s.value LIKE ?) OR (s.value LIKE ?) OR (s.value LIKE ?) OR (s.value LIKE ?) OR (s.value LIKE ?) OR (s.value LIKE ?) OR (s.value LIKE ?) OR (s.value LIKE ?) GROUP BY api_name ORDER BY total_time_ns DESC LIMIT ?"
582
+ },
583
+ "sync_calls": [
584
+ {
585
+ "api_name": "cudaStreamSynchronize",
586
+ "avg_duration_us": 800.0,
587
+ "call_count": 1,
588
+ "total_time_ms": 0.8
589
+ },
590
+ {
591
+ "api_name": "cudaDeviceSynchronize",
592
+ "avg_duration_us": 700.0,
593
+ "call_count": 1,
594
+ "total_time_ms": 0.7
595
+ }
596
+ ],
597
+ "table": "CUPTI_ACTIVITY_KIND_RUNTIME"
598
+ },
599
+ "top_kernels": {
600
+ "kernels": [
601
+ {
602
+ "avg_duration_us": 866.6666666666666,
603
+ "call_count": 3,
604
+ "device_id": 0,
605
+ "kernel_name": "computeKernel",
606
+ "max_duration_us": 1000.0,
607
+ "min_duration_us": 600.0,
608
+ "p50_duration_us": 1000.0,
609
+ "p90_duration_us": 1000.0,
610
+ "pct_total_kernel_time": 42.62295081967213,
611
+ "total_time_ms": 2.6,
612
+ "total_time_ns": 2600000
613
+ },
614
+ {
615
+ "avg_duration_us": 2000.0,
616
+ "call_count": 1,
617
+ "device_id": 0,
618
+ "kernel_name": "ncclAllReduceRingKernel",
619
+ "max_duration_us": 2000.0,
620
+ "min_duration_us": 2000.0,
621
+ "p50_duration_us": 2000.0,
622
+ "p90_duration_us": 2000.0,
623
+ "pct_total_kernel_time": 32.78688524590164,
624
+ "total_time_ms": 2.0,
625
+ "total_time_ns": 2000000
626
+ },
627
+ {
628
+ "avg_duration_us": 1500.0,
629
+ "call_count": 1,
630
+ "device_id": 0,
631
+ "kernel_name": "ncclBroadcastRingKernel",
632
+ "max_duration_us": 1500.0,
633
+ "min_duration_us": 1500.0,
634
+ "p50_duration_us": 1500.0,
635
+ "p90_duration_us": 1500.0,
636
+ "pct_total_kernel_time": 24.59016393442623,
637
+ "total_time_ms": 1.5,
638
+ "total_time_ns": 1500000
639
+ }
640
+ ],
641
+ "notes": [],
642
+ "sql": {
643
+ "agg": "SELECT s.value AS kernel_name, k.deviceId AS device_id, COUNT(*) AS call_count, SUM(k.end - k.start) AS total_time_ns, AVG(k.end - k.start) AS avg_time_ns, MIN(k.end - k.start) AS min_time_ns, MAX(k.end - k.start) AS max_time_ns FROM CUPTI_ACTIVITY_KIND_KERNEL k JOIN StringIds s ON s.id = k.demangledName GROUP BY kernel_name, device_id ORDER BY total_time_ns DESC LIMIT ?",
644
+ "durations": "SELECT (end-start) FROM CUPTI_ACTIVITY_KIND_KERNEL ... ORDER BY",
645
+ "total": "SELECT SUM(end - start) FROM CUPTI_ACTIVITY_KIND_KERNEL"
646
+ },
647
+ "table": "CUPTI_ACTIVITY_KIND_KERNEL",
648
+ "total_kernel_time_ns": 6100000
649
+ }
650
+ },
651
+ "schema": {
652
+ "capabilities": {
653
+ "cuda_graph_table": {
654
+ "present": false,
655
+ "table": null
656
+ },
657
+ "gpu_metrics_table": {
658
+ "present": false,
659
+ "table": null,
660
+ "target_info_table": null
661
+ },
662
+ "has_string_table": true,
663
+ "kernel_table": {
664
+ "has_correlationId": false,
665
+ "has_deviceId": true,
666
+ "has_globalPid": true,
667
+ "has_pid": false,
668
+ "has_processId": false,
669
+ "present": true
670
+ },
671
+ "nvtx_table": {
672
+ "has_end": false,
673
+ "has_globalTid": false,
674
+ "has_text": false,
675
+ "has_textId": false,
676
+ "present": false
677
+ },
678
+ "runtime_table": {
679
+ "has_correlationId": false,
680
+ "has_globalTid": true,
681
+ "has_name": false,
682
+ "has_nameId": true,
683
+ "has_pid": false,
684
+ "has_processId": false,
685
+ "present": true
686
+ }
687
+ },
688
+ "cuda_graph_table": null,
689
+ "gpu_metrics_table": null,
690
+ "kernel_pid_source": "globalPid",
691
+ "kernel_table": "CUPTI_ACTIVITY_KIND_KERNEL",
692
+ "nvtx_pid_source": null,
693
+ "nvtx_table": null,
694
+ "path": "synthetic fixture (raw trace.sqlite not committed)",
695
+ "runtime_pid_source": "globalTid",
696
+ "runtime_table": "CUPTI_ACTIVITY_KIND_RUNTIME",
697
+ "sqlite_version": "3.26.0",
698
+ "string_table": "StringIds",
699
+ "sync_table": "CUPTI_ACTIVITY_KIND_RUNTIME",
700
+ "tables": {
701
+ "CUPTI_ACTIVITY_KIND_KERNEL": {
702
+ "columns": [
703
+ "start",
704
+ "end",
705
+ "deviceId",
706
+ "contextId",
707
+ "streamId",
708
+ "globalPid",
709
+ "demangledName"
710
+ ],
711
+ "types": {
712
+ "contextId": "INT",
713
+ "demangledName": "INT",
714
+ "deviceId": "INT",
715
+ "end": "INT",
716
+ "globalPid": "INT",
717
+ "start": "INT",
718
+ "streamId": "INT"
719
+ }
720
+ },
721
+ "CUPTI_ACTIVITY_KIND_RUNTIME": {
722
+ "columns": [
723
+ "start",
724
+ "end",
725
+ "nameId",
726
+ "globalTid"
727
+ ],
728
+ "types": {
729
+ "end": "INT",
730
+ "globalTid": "INT",
731
+ "nameId": "INT",
732
+ "start": "INT"
733
+ }
734
+ },
735
+ "StringIds": {
736
+ "columns": [
737
+ "id",
738
+ "value"
739
+ ],
740
+ "types": {
741
+ "id": "INTEGER",
742
+ "value": "TEXT"
743
+ }
744
+ }
745
+ },
746
+ "target_info_gpu_metrics_table": null,
747
+ "timestamp_unit_assumed": "ns",
748
+ "timestamp_unit_guess": "ns_likely",
749
+ "timestamp_unit_guess_basis": "kernel_window_ns_ge_1ms"
750
+ },
751
+ "tool": {
752
+ "name": "nsys-llm-explain",
753
+ "version": "0.1.0"
754
+ },
755
+ "trace": {
756
+ "path": "synthetic fixture (raw trace.sqlite not committed)"
757
+ },
758
+ "warnings": [
759
+ "NVLink counters not found. The report cannot correlate NCCL windows with NVLink metrics for this export."
760
+ ]
761
+ }
space_utils.py ADDED
@@ -0,0 +1,376 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ import json
4
+ import sys
5
+ import tempfile
6
+ import zipfile
7
+ from dataclasses import dataclass
8
+ from pathlib import Path
9
+ from typing import Any, Dict, List, Mapping, Optional, Sequence, Tuple
10
+
11
+
12
+ def _bootstrap_src_path() -> None:
13
+ here = Path(__file__).resolve()
14
+ for candidate in (here.parents[2] / "src", here.parents[1] / "src"):
15
+ if candidate.exists() and str(candidate) not in sys.path:
16
+ sys.path.insert(0, str(candidate))
17
+ return
18
+
19
+
20
+ _bootstrap_src_path()
21
+
22
+ from nsys_llm_explainer.queries import TraceDB # type: ignore
23
+ from nsys_llm_explainer.report import AnalysisOutputs, analyze, render_markdown, write_artifacts # type: ignore
24
+
25
+
26
+ @dataclass(frozen=True)
27
+ class SpaceBundle:
28
+ source_path: Path
29
+ source_kind: str
30
+ report: Dict[str, Any]
31
+ markdown: str
32
+ artifacts_dir: Path
33
+ artifact_paths: List[Path]
34
+ summary_rows: List[Dict[str, str]]
35
+ manifest_rows: List[Dict[str, str]]
36
+ findings_markdown: str
37
+ status_markdown: str
38
+
39
+
40
+ def _coerce_float(value: Any, default: float = 0.0) -> float:
41
+ try:
42
+ return float(value)
43
+ except Exception:
44
+ return float(default)
45
+
46
+
47
+ def _safe_text(value: Any, default: str = "-") -> str:
48
+ text = str(value).strip() if value is not None else ""
49
+ return text if text else default
50
+
51
+
52
+ def _safe_trace_name(report: Mapping[str, Any]) -> str:
53
+ trace_path = ((report.get("trace") or {}).get("path") or report.get("_source_name") or "")
54
+ return Path(str(trace_path)).name if trace_path else "unknown"
55
+
56
+
57
+ def _top_kernel_row(report: Mapping[str, Any]) -> Optional[Mapping[str, Any]]:
58
+ rows = ((report.get("metrics") or {}).get("top_kernels") or {}).get("kernels") or []
59
+ return rows[0] if rows else None
60
+
61
+
62
+ def _top_nccl_row(report: Mapping[str, Any]) -> Optional[Mapping[str, Any]]:
63
+ rows = ((report.get("metrics") or {}).get("nccl") or {}).get("ops") or []
64
+ return rows[0] if rows else None
65
+
66
+
67
+ def _format_ms(value: Any) -> str:
68
+ return "{:.3f} ms".format(_coerce_float(value))
69
+
70
+
71
+ def _format_us(value: Any) -> str:
72
+ return "{:.2f} us".format(_coerce_float(value))
73
+
74
+
75
+ def _format_pct(value: Any) -> str:
76
+ return "{:.1f}%".format(_coerce_float(value))
77
+
78
+
79
+ def _bottleneck_sentence(report: Mapping[str, Any]) -> str:
80
+ metrics = report.get("metrics") or {}
81
+ total_gpu_ms = _coerce_float((metrics.get("top_kernels") or {}).get("total_kernel_time_ns")) / 1_000_000.0
82
+ top_kernel = _top_kernel_row(report)
83
+ top_nccl = _top_nccl_row(report)
84
+ if total_gpu_ms > 0.0 and top_nccl:
85
+ nccl_pct = (_coerce_float(top_nccl.get("total_time_ms")) / total_gpu_ms) * 100.0
86
+ kernel_pct = _coerce_float(top_kernel.get("pct_total_kernel_time") if top_kernel else 0.0)
87
+ if nccl_pct >= kernel_pct:
88
+ return "{} dominates {:.1f}% of GPU time".format(str(top_nccl.get("op_name") or "NCCL"), nccl_pct)
89
+ if top_kernel:
90
+ return "{} dominates {:.1f}% of GPU time".format(
91
+ str(top_kernel.get("kernel_name") or "Top kernel"),
92
+ _coerce_float(top_kernel.get("pct_total_kernel_time")),
93
+ )
94
+ return "No dominant GPU bottleneck detected from available metrics"
95
+
96
+
97
+ def _summary_rows(report: Mapping[str, Any]) -> List[Dict[str, str]]:
98
+ metrics = report.get("metrics") or {}
99
+ timeline = metrics.get("timeline") or {}
100
+ gpu_total_ms = _coerce_float(timeline.get("total_gpu_time_ms"))
101
+ if gpu_total_ms <= 0:
102
+ gpu_total_ms = _coerce_float((metrics.get("top_kernels") or {}).get("total_kernel_time_ns")) / 1_000_000.0
103
+ cpu_total_ms = _coerce_float(timeline.get("total_cpu_time_ms"))
104
+ if cpu_total_ms <= 0:
105
+ sync_rows = (metrics.get("sync") or {}).get("sync_calls") or []
106
+ cpu_total_ms = sum(_coerce_float(row.get("total_time_ms")) for row in sync_rows)
107
+
108
+ warnings = report.get("warnings") or []
109
+ report_version = _safe_text((report.get("tool") or {}).get("version"), default="unknown")
110
+ top_kernel = _top_kernel_row(report)
111
+ top_nccl = _top_nccl_row(report)
112
+ nvlink = (metrics.get("nvlink_during_nccl") or {}).get("rows") or []
113
+ nvlink_row = nvlink[0] if nvlink else None
114
+ capability_checks = {
115
+ "Kernel table": bool((metrics.get("top_kernels") or {}).get("present")),
116
+ "Runtime table": bool((metrics.get("sync") or {}).get("present")),
117
+ "NVTX ranges": bool((metrics.get("nvtx") or {}).get("present")),
118
+ "GPU metrics": bool((metrics.get("nvlink_during_nccl") or {}).get("present")),
119
+ "Per-process breakdown": bool((metrics.get("per_pid") or {}).get("present")),
120
+ }
121
+
122
+ rows: List[Dict[str, str]] = [
123
+ {"section": "Overview", "metric": "Trace", "value": _safe_trace_name(report)},
124
+ {"section": "Overview", "metric": "Tool version", "value": report_version},
125
+ {"section": "Overview", "metric": "Generated at (UTC)", "value": _safe_text(report.get("generated_at"))},
126
+ {"section": "Overview", "metric": "Total GPU time", "value": _format_ms(gpu_total_ms)},
127
+ {"section": "Overview", "metric": "Total CPU time", "value": _format_ms(cpu_total_ms)},
128
+ {"section": "Overview", "metric": "Top bottleneck", "value": _bottleneck_sentence(report)},
129
+ {"section": "Overview", "metric": "Warnings", "value": str(len(warnings))},
130
+ ]
131
+ if top_kernel:
132
+ rows.extend(
133
+ [
134
+ {"section": "Evidence", "metric": "Top kernel", "value": _safe_text(top_kernel.get("kernel_name"))},
135
+ {"section": "Evidence", "metric": "Top kernel time", "value": _format_ms(top_kernel.get("total_time_ms"))},
136
+ {"section": "Evidence", "metric": "Top kernel share", "value": _format_pct(top_kernel.get("pct_total_kernel_time"))},
137
+ ]
138
+ )
139
+ if top_nccl:
140
+ rows.extend(
141
+ [
142
+ {"section": "Evidence", "metric": "Top NCCL op", "value": _safe_text(top_nccl.get("op_name"))},
143
+ {"section": "Evidence", "metric": "Top NCCL time", "value": _format_ms(top_nccl.get("total_time_ms"))},
144
+ {"section": "Evidence", "metric": "Top NCCL overlap", "value": _format_pct(top_nccl.get("compute_overlap_pct"))},
145
+ ]
146
+ )
147
+ if nvlink_row:
148
+ rows.extend(
149
+ [
150
+ {"section": "Evidence", "metric": "NVLink metric(s)", "value": _safe_text(nvlink_row.get("metric_names"))},
151
+ {
152
+ "section": "Evidence",
153
+ "metric": "NVLink during NCCL",
154
+ "value": "{:.2f} export units".format(_coerce_float(nvlink_row.get("avg_metric_during_nccl"), 0.0)),
155
+ },
156
+ {
157
+ "section": "Evidence",
158
+ "metric": "NVLink outside NCCL",
159
+ "value": "{:.2f} export units".format(_coerce_float(nvlink_row.get("avg_metric_outside_nccl"), 0.0)),
160
+ },
161
+ {
162
+ "section": "Evidence",
163
+ "metric": "NVLink correlation",
164
+ "value": "{:.3f}".format(_coerce_float(nvlink_row.get("nccl_activity_correlation"), 0.0)),
165
+ },
166
+ ]
167
+ )
168
+ for label, present in capability_checks.items():
169
+ rows.append({"section": "Capabilities", "metric": label, "value": "present" if present else "missing"})
170
+ return rows
171
+
172
+
173
+ def _findings_markdown(report: Mapping[str, Any]) -> str:
174
+ findings = report.get("findings") or []
175
+ warnings = report.get("warnings") or []
176
+
177
+ lines: List[str] = ["## What to do next", ""]
178
+ if not findings:
179
+ lines.append("No findings were generated for this trace.")
180
+ else:
181
+ for finding in findings:
182
+ severity = _safe_text(finding.get("severity"), default="unknown").upper()
183
+ title = _safe_text(finding.get("title"), default="Untitled finding")
184
+ lines.append("### [{}] {}".format(severity, title))
185
+ evidence = finding.get("evidence") or []
186
+ recommendations = finding.get("recommendation") or finding.get("recommendations") or []
187
+ if evidence:
188
+ lines.append("Evidence:")
189
+ for item in evidence:
190
+ lines.append("- {}".format(item))
191
+ if recommendations:
192
+ lines.append("Recommendation:")
193
+ if isinstance(recommendations, (list, tuple)):
194
+ for item in recommendations:
195
+ lines.append("- {}".format(item))
196
+ else:
197
+ lines.append("- {}".format(recommendations))
198
+ lines.append("")
199
+ if warnings:
200
+ lines.append("## Warnings")
201
+ lines.append("")
202
+ for warning in warnings:
203
+ lines.append("- {}".format(warning))
204
+ return "\n".join(lines).strip()
205
+
206
+
207
+ def _artifact_manifest(out_dir: Path) -> List[Dict[str, str]]:
208
+ purpose_map = {
209
+ "report.md": "Human-readable report",
210
+ "report.json": "Machine-readable report",
211
+ "kernels.csv": "Top kernels",
212
+ "barriers.csv": "CPU/GPU barriers",
213
+ "nccl_ops.csv": "Top NCCL ops",
214
+ "nccl_rank_skew.csv": "Per-rank NCCL skew",
215
+ "nccl_by_pid.csv": "NCCL per PID",
216
+ "nvlink_during_nccl.csv": "NVLink correlation rows",
217
+ "nvlink_timeseries.csv": "NVLink correlation timeseries",
218
+ "timeline_events.csv": "Timeline events",
219
+ "copy_engine_events.csv": "Copy engine events",
220
+ "launch_latency_rows.csv": "Launch latency rows",
221
+ "launch_latency_histogram.csv": "Launch latency histogram",
222
+ "stream_overlap.csv": "Stream overlap summary",
223
+ "phase_split.csv": "Phase split",
224
+ "roofline.csv": "Roofline rows",
225
+ "gpu_idle_gaps.csv": "GPU idle gaps",
226
+ "kernels_by_pid.csv": "Per-PID kernels",
227
+ "sync_by_pid.csv": "Per-PID sync calls",
228
+ "nvtx_by_pid.csv": "Per-PID NVTX ranges",
229
+ "nvtx_ranges.csv": "NVTX ranges",
230
+ "bundle.zip": "Download all artifacts as a zip",
231
+ }
232
+ rows: List[Dict[str, str]] = []
233
+ for name, purpose in purpose_map.items():
234
+ path = out_dir / name
235
+ if not path.exists():
236
+ path = out_dir / "tables" / name
237
+ if path.exists():
238
+ rows.append({"artifact": name, "purpose": purpose, "path": str(path)})
239
+ return rows
240
+
241
+
242
+ def _zip_artifacts(out_dir: Path) -> Path:
243
+ zip_path = out_dir / "bundle.zip"
244
+ with zipfile.ZipFile(zip_path, "w", compression=zipfile.ZIP_DEFLATED) as zf:
245
+ for path in sorted(out_dir.rglob("*")):
246
+ if path.is_file() and path != zip_path:
247
+ zf.write(path, arcname=path.relative_to(out_dir).as_posix())
248
+ return zip_path
249
+
250
+
251
+ def _normalize_report_for_artifacts(report: Mapping[str, Any]) -> Dict[str, Any]:
252
+ normalized: Dict[str, Any] = dict(report)
253
+ metrics: Dict[str, Any] = dict(normalized.get("metrics") or {})
254
+
255
+ metrics.setdefault("top_kernels", {"present": False, "kernels": []})
256
+ metrics.setdefault("barriers", {"present": False, "barriers": []})
257
+ metrics.setdefault("nccl", {"present": False, "ops": [], "rank_rows": [], "pids": []})
258
+ metrics.setdefault("nvlink_during_nccl", {"present": False, "rows": [], "timeseries": []})
259
+ metrics.setdefault("timeline", {"present": False, "events": []})
260
+ metrics.setdefault("copy_engine", {"present": False, "events": []})
261
+ metrics.setdefault("launch_latency", {"present": False, "rows": [], "histogram": []})
262
+ metrics.setdefault("stream_overlap", {"present": False, "summary": []})
263
+ metrics.setdefault("phase_split", {"present": False, "rows": []})
264
+ metrics.setdefault("roofline", {"present": False, "rows": []})
265
+ metrics.setdefault("gpu_idle", {"present": False, "gaps": []})
266
+ metrics.setdefault("nvtx", {"present": False, "ranges": []})
267
+
268
+ by_pid = dict(metrics.get("by_pid") or {})
269
+ by_pid.setdefault("kernels", {"kernels": []})
270
+ by_pid.setdefault("sync", {"sync_calls": []})
271
+ by_pid.setdefault("nvtx", {"present": False, "ranges": []})
272
+ metrics["by_pid"] = by_pid
273
+
274
+ normalized["metrics"] = metrics
275
+ return normalized
276
+
277
+
278
+ def _load_report(path: Path) -> Tuple[str, Dict[str, Any], str]:
279
+ lower = path.suffix.lower()
280
+ if lower in (".sqlite", ".db"):
281
+ db = TraceDB.open(path)
282
+ try:
283
+ outputs = analyze(
284
+ db,
285
+ phase_map_path=None,
286
+ kernel_limit=50,
287
+ compute_kernel_percentiles=True,
288
+ compute_nvtx_kernel_map=True,
289
+ )
290
+ return "sqlite", dict(outputs.report), str(outputs.markdown)
291
+ finally:
292
+ db.close()
293
+ if lower == ".json":
294
+ report = json.loads(path.read_text(encoding="utf-8"))
295
+ if not isinstance(report, dict):
296
+ raise ValueError("Input JSON root must be an object.")
297
+ try:
298
+ markdown = render_markdown(report)
299
+ except Exception:
300
+ markdown = "# Nsight Systems LLM Hotspot Report\n\nJSON loaded, but markdown rendering failed for this input."
301
+ return "json", report, markdown
302
+ header = path.read_bytes()[:32]
303
+ if header.startswith(b"SQLite format 3"):
304
+ db = TraceDB.open(path)
305
+ try:
306
+ outputs = analyze(
307
+ db,
308
+ phase_map_path=None,
309
+ kernel_limit=50,
310
+ compute_kernel_percentiles=True,
311
+ compute_nvtx_kernel_map=True,
312
+ )
313
+ return "sqlite", dict(outputs.report), str(outputs.markdown)
314
+ finally:
315
+ db.close()
316
+ report = json.loads(path.read_text(encoding="utf-8"))
317
+ if not isinstance(report, dict):
318
+ raise ValueError("Input JSON root must be an object.")
319
+ try:
320
+ markdown = render_markdown(report)
321
+ except Exception:
322
+ markdown = "# Nsight Systems LLM Hotspot Report\n\nJSON loaded, but markdown rendering failed for this input."
323
+ return "json", report, markdown
324
+
325
+
326
+ def analyze_path(path: Path) -> SpaceBundle:
327
+ source_kind, report, markdown = _load_report(path)
328
+ report = _normalize_report_for_artifacts(report)
329
+ outputs = AnalysisOutputs(report=report, markdown=markdown)
330
+ artifacts_dir = Path(tempfile.mkdtemp(prefix="nsys-llm-explainer-space-")) / path.stem
331
+ write_artifacts(outputs, artifacts_dir)
332
+ _zip_artifacts(artifacts_dir)
333
+ artifact_paths = sorted(
334
+ [p for p in artifacts_dir.rglob("*") if p.is_file()],
335
+ key=lambda item: item.relative_to(artifacts_dir).as_posix(),
336
+ )
337
+ return SpaceBundle(
338
+ source_path=path,
339
+ source_kind=source_kind,
340
+ report=report,
341
+ markdown=markdown,
342
+ artifacts_dir=artifacts_dir,
343
+ artifact_paths=artifact_paths,
344
+ summary_rows=_summary_rows(report),
345
+ manifest_rows=_artifact_manifest(artifacts_dir),
346
+ findings_markdown=_findings_markdown(report),
347
+ status_markdown="Loaded `{}` as `{}` and wrote artifacts to `{}`.".format(path.name, source_kind, artifacts_dir),
348
+ )
349
+
350
+
351
+ def find_local_sample() -> Optional[Path]:
352
+ here = Path(__file__).resolve()
353
+ candidates = [
354
+ here.parent / "sample_report.json",
355
+ here.parents[2] / "examples" / "synthetic" / "report.json",
356
+ here.parents[2] / "examples" / "a100_vllm" / "report.json",
357
+ here.parents[1] / "examples" / "synthetic" / "report.json",
358
+ ]
359
+ for candidate in candidates:
360
+ if candidate.exists():
361
+ return candidate
362
+ return None
363
+
364
+
365
+ def coerce_upload_path(uploaded: Any) -> Optional[Path]:
366
+ if uploaded is None:
367
+ return None
368
+ if isinstance(uploaded, (str, Path)):
369
+ path = Path(uploaded)
370
+ return path if path.exists() else None
371
+ if isinstance(uploaded, Sequence) and uploaded:
372
+ first = uploaded[0]
373
+ if isinstance(first, (str, Path)):
374
+ path = Path(first)
375
+ return path if path.exists() else None
376
+ return None