Buckets:

zk521
/

LLM-Prof-Dataset

Files

xet

zk521/LLM-Prof-Dataset / dataset_description.md

zk521

30 days ago

preview code

download

raw

4.81 kB

LLM-Prof Dataset Description

Dataset link: https://huggingface.co/buckets/zk521/LLM-Prof-Dataset/

Overview

The LLM-Prof dataset contains 207 GPU kernel traces collected for the paper "LLM-Prof: A Hierarchical Cross-Stack Performance Profiling Framework for Production LLM Inference Services". The dataset supports cross-layer analysis of LLM inference services at the service, model, and operator levels.

The traces cover three inference frameworks:

Framework	Cases	Source
RTP-LLM	51	Production MaaS services with real runtime telemetry
SGLang	73	Controlled benchmark traces across models, GPUs, and workloads
vLLM	83	Controlled benchmark traces across models, GPUs, and workloads
Total	207	Production plus controlled benchmark traces

Dataset Files

The released dataset is organized as three compressed archives:

Archive	Content
`original_trace_RTP-LLM_51.tar.gz`	RTP-LLM production traces and related runtime metadata
`original_trace_SGLang_73.tar.gz`	SGLang benchmark traces
`original_trace_vLLM_83.tar.gz`	vLLM benchmark traces

All traces are stored in Chrome Trace Event JSON format and compressed as .json.gz or framework-specific trace files. A typical trace contains GPU kernels, memory copies, CUDA runtime calls, CUDA driver calls, synchronization events, and related metadata.

Framework-Specific Data

RTP-LLM

The RTP-LLM subset contains traces from real production services. Each case may include:

GPU kernel trace files collected from production inference execution.
prefill_metrics_with_config.txt, which records model configuration, hardware information, QPS time series, GPU utilization, token-size information, and service metadata.
Profiling metadata and optional analysis outputs.

This subset reflects real deployment diversity, including varying business workloads, request rates, token sizes, GPU utilization, model sizes, and hardware configurations.

SGLang and vLLM

The SGLang and vLLM subsets are controlled benchmark traces. They systematically vary:

Model family and model size.
Batch size.
Input length and output length.
GPU type.
Inference framework.

These traces provide controlled comparisons across frameworks and hardware platforms, complementing the production diversity of the RTP-LLM subset.

Hardware and Model Coverage

The dataset covers six NVIDIA GPU types:

A10
A100
A800
H20
H800
L20

The evaluated services include 21 model variants from three major model families: Qwen, LLaMA, and BERT-family services. The model sizes range from sub-billion scale to 70B-scale deployments, including dense and MoE-style models.

Trace Format

Each trace follows the Chrome Trace Event format. The top-level structure usually contains:

{
  "schemaVersion": 1,
  "deviceProperties": [],
  "traceEvents": [],
  "traceName": "...",
  "displayTimeUnit": "ns"
}

The traceEvents array is the main analysis input. Important event categories include:

Event category	Description
`kernel`	GPU kernel execution events used for operator-level analysis
`gpu_memcpy`	Host-device and device-device memory copy events used for iteration anchoring
`gpu_memset`	GPU memory initialization events
`cuda_runtime`	CUDA Runtime API calls
`cuda_driver`	CUDA Driver API calls
`synchronization`	CUDA synchronization events
`cuda_event`	CUDA event records

For GPU kernel events, important fields include:

Field	Description
`name`	Compiled CUDA kernel name
`ts`	Start timestamp
`dur`	Duration
`tid`	CUDA stream or thread identifier
`args.grid`	Grid configuration
`args.block`	Block configuration
`args.shared memory`	Shared memory usage
`args.registers per thread`	Register count per thread
`args.est. achieved occupancy %`	Estimated occupancy

Analysis Layers Supported by the Dataset

The dataset supports the three LLM-Prof analysis layers:

Layer	Purpose	Key metrics
SEA	Service-level hotspot detection	QPS, FPR, token size, GPU utilization
MEA	Model-level iteration analysis	IIPS, MIE, iteration duration
OEA	Operator-level bottleneck analysis	operator efficiency, BottleScore, time proportion, Roofline position

Notes

RTP-LLM traces reflect real production workloads and may contain more heterogeneous runtime behavior than controlled benchmarks.
SGLang and vLLM traces are better suited for controlled model-framework-hardware comparisons.
Fine-grained operator attribution may be affected by framework-specific kernel fusion, custom kernels, and CUDA Graph-based execution.
Current profiling support is based on NVIDIA/CUPTI traces.

Xet Storage Details

Size:: 4.81 kB
Xet hash:: 83116f7cb9be9db035d8e126b5c5cc4cf79cf0d9f06b79fdb1467a68ae455015

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.