File size: 2,422 Bytes
2edd63e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
---
tags:
- gpu-runtime-prediction
- code-understanding
- regression
- performance-modeling
datasets:
- RajBhope/gpu-runtime-prediction-dataset
language:
- code
library_name: scikit-learn
pipeline_tag: tabular-regression
---

# GPU Runtime Predictor 🚀⚡

Predicts GPU kernel/operation **runtime in milliseconds** given **source code** + **GPU hardware specifications**.

## How It Works

1. **Code Feature Extraction**: Analyzes source code to extract 48 features (tensor dimensions, operation types, complexity indicators)
2. **GPU Feature Encoding**: Uses 12 hardware specs (CUDA cores, memory bandwidth, compute capability, etc.)
3. **ML Prediction**: Ensemble of Gradient Boosted Trees + Random Forest + Neural Network

### Model Comparison

| Model | R² | RMSE | Spearman ρ | MAPE % |
|-------|-----|------|------------|--------|
| **GBR** | 0.9923 | 0.0728 | 0.9264 | 16.5% |
| **RF** | 0.9924 | 0.0724 | 0.9277 | 16.3% |
| **NN** | 0.9932 | 0.0687 | 0.9187 | 17.0% |
| **Ensemble** | 0.9930 | 0.0693 | 0.9272 | 16.3% |

### GPU Catalog (12 GPUs)

| GPU | FP32 TFLOPS | Memory BW | VRAM |
|-----|------------|-----------|------|
| NVIDIA T4 | 8.1 | 320 GB/s | 16 GB |
| NVIDIA V100 | 15.7 | 900 GB/s | 32 GB |
| NVIDIA A10G | 31.2 | 600 GB/s | 24 GB |
| NVIDIA A100 40GB | 19.5 | 1555 GB/s | 40 GB |
| NVIDIA A100 80GB | 19.5 | 2039 GB/s | 80 GB |
| NVIDIA L4 | 30.3 | 300 GB/s | 24 GB |
| NVIDIA L40S | 91.6 | 864 GB/s | 48 GB |
| NVIDIA RTX 3090 | 35.6 | 936 GB/s | 24 GB |
| NVIDIA RTX 4090 | 82.6 | 1008 GB/s | 24 GB |
| NVIDIA H100 SXM | 67.0 | 3350 GB/s | 80 GB |
| NVIDIA H100 PCIe | 48.0 | 2039 GB/s | 80 GB |
| NVIDIA RTX A6000 | 38.7 | 768 GB/s | 48 GB |

### 15 Supported Workload Types
matmul, conv2d, attention, transformer_block, linear, layernorm, batchnorm, 
softmax, embedding, elementwise, reduction, pooling, FFT, sort, loss+backward

## Usage

```python
# See the Gradio demo for interactive use
# Or load models directly:
import pickle
with open('model_gbr.pkl', 'rb') as f:
    model = pickle.load(f)
```

## Training

- **Dataset**: [RajBhope/gpu-runtime-prediction-dataset](https://hf.co/datasets/RajBhope/gpu-runtime-prediction-dataset)
- **51,900 samples** = 4,325 workloads × 12 GPUs
- Runtime generated via physics-based roofline performance model
- Based on research from [Regression Language Models](https://arxiv.org/abs/2509.26476) and [HELP](https://arxiv.org/abs/2106.08630)