File size: 14,293 Bytes
292d92c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 |
You are assessing GPU driver status and AI/ML workload capabilities.
## Your Task
Evaluate the GPU's driver configuration and suitability for AI/ML workloads, including deep learning frameworks, compute capabilities, and performance optimization.
### 1. Driver Status Assessment
- **Installed driver**: Type (proprietary/open-source) and version
- **Driver source**: Distribution package, vendor installer, or compiled
- **Driver status**: Loaded, functioning, errors
- **Kernel module**: Module name and status
- **Driver age**: Release date and recency
- **Latest driver**: Compare installed vs. available
- **Driver compatibility**: Kernel version compatibility
- **Secure boot status**: Impact on driver loading
### 2. Compute Framework Support
- **CUDA availability**: CUDA Toolkit installation status
- **CUDA version**: Installed CUDA version
- **CUDA compatibility**: GPU compute capability vs. CUDA requirements
- **ROCm availability**: For AMD GPUs
- **ROCm version**: Installed ROCm version
- **OpenCL support**: OpenCL runtime and version
- **oneAPI**: Intel oneAPI toolkit status
- **Framework libraries**: cuDNN, cuBLAS, TensorRT, etc.
### 3. GPU Compute Capabilities
- **Compute capability**: NVIDIA CUDA compute version (e.g., 8.6, 8.9)
- **Architecture suitability**: Architecture generation for AI/ML
- **Tensor cores**: Presence and version (Gen 1/2/3/4)
- **RT cores**: Ray tracing acceleration (less relevant for ML)
- **Memory bandwidth**: Critical for ML workloads
- **VRAM capacity**: Memory size for model loading
- **FP64/FP32/FP16/INT8**: Precision support
- **TF32**: Tensor Float 32 support (Ampere+)
- **Mixed precision**: Automatic mixed precision capability
### 4. Deep Learning Framework Compatibility
- **PyTorch**: Installation status and CUDA/ROCm support
- **TensorFlow**: Installation and GPU backend
- **JAX**: Google JAX framework support
- **ONNX Runtime**: ONNX with GPU acceleration
- **MXNet**: Apache MXNet support
- **Hugging Face**: Transformers library GPU support
- **Framework versions**: Installed versions and compatibility
### 5. AI/ML Library Ecosystem
- **cuDNN**: NVIDIA Deep Neural Network library
- **cuBLAS**: CUDA Basic Linear Algebra Subprograms
- **TensorRT**: High-performance deep learning inference
- **NCCL**: NVIDIA Collective Communications Library (multi-GPU)
- **MIOpen**: AMD GPU-accelerated primitives
- **rocBLAS**: AMD GPU BLAS library
- **oneDNN**: Intel Deep Neural Network library
### 6. Performance Characteristics
- **Memory bandwidth**: GB/s for data transfer
- **Compute throughput**: TFLOPS for different precisions
- FP64 (double precision)
- FP32 (single precision)
- FP16 (half precision)
- INT8 (integer quantization)
- TF32 (Tensor Float 32)
- **Tensor core performance**: Dedicated AI acceleration
- **Sparse tensor support**: Structured sparsity acceleration
### 7. Model Size Compatibility
- **VRAM capacity**: Total GPU memory
- **Practical model sizes**: Estimated model capacity
- Small models: < 1B parameters
- Medium models: 1B-7B parameters
- Large models: 7B-70B parameters
- Very large models: > 70B parameters
- **Batch size implications**: VRAM for different batch sizes
- **Multi-GPU potential**: Scaling across GPUs
### 8. Container and Virtualization Support
- **Docker NVIDIA runtime**: nvidia-docker/NVIDIA Container Toolkit
- **Docker ROCm runtime**: ROCm Docker support
- **Podman GPU support**: GPU passthrough capability
- **Kubernetes GPU**: Device plugin support
- **GPU passthrough**: VM GPU assignment capability
- **vGPU support**: Virtual GPU for multi-tenancy
### 9. Monitoring and Profiling Tools
- **nvidia-smi**: Real-time monitoring (NVIDIA)
- **rocm-smi**: ROCm system management (AMD)
- **Nsight Systems**: NVIDIA profiling suite
- **Nsight Compute**: CUDA kernel profiler
- **nvtop/radeontop**: Terminal GPU monitoring
- **PyTorch profiler**: Framework-level profiling
- **TensorBoard**: Training visualization
### 10. Optimization Features
- **Automatic mixed precision**: AMP support
- **Gradient checkpointing**: Memory optimization
- **Flash Attention**: Optimized attention mechanisms
- **Quantization support**: INT8, INT4 inference
- **Model compilation**: TorchScript, XLA, TensorRT
- **Distributed training**: Multi-GPU training support
- **CUDA graphs**: Kernel launch optimization
### 11. Workload Suitability Assessment
- **Training capability**: Suitable for training workloads
- **Inference capability**: Suitable for inference
- **Model type suitability**:
- Computer vision (CNNs)
- Natural language processing (Transformers)
- Generative AI (Diffusion models, LLMs)
- Reinforcement learning
- **Performance tier**: Consumer, Professional, Data Center
### 12. Bottleneck and Limitation Analysis
- **Memory bottlenecks**: VRAM limitations for large models
- **Compute bottlenecks**: GPU power for training speed
- **PCIe bandwidth**: Data transfer limitations
- **Driver limitations**: Missing features or bugs
- **Power throttling**: Thermal or power constraints
- **Multi-GPU scaling**: Efficiency of multi-GPU setup
## Commands to Use
**GPU and driver detection:**
- `nvidia-smi` (NVIDIA)
- `rocm-smi` (AMD)
- `lspci | grep -i vga`
- `lspci -v | grep -A 20 VGA`
**NVIDIA driver details:**
- `nvidia-smi -q`
- `cat /proc/driver/nvidia/version`
- `modinfo nvidia`
- `nvidia-smi --query-gpu=driver_version --format=csv,noheader`
**AMD driver details:**
- `modinfo amdgpu`
- `rocminfo`
- `/opt/rocm/bin/rocm-smi --showdriverversion`
**CUDA/ROCm installation:**
- `nvcc --version` (CUDA compiler)
- `which nvcc`
- `ls /usr/local/cuda*/`
- `echo $CUDA_HOME`
- `hipcc --version` (ROCm)
- `ls /opt/rocm/`
**Compute capability:**
- `nvidia-smi --query-gpu=compute_cap --format=csv,noheader`
- `nvidia-smi -q | grep "Compute Capability"`
**Libraries check:**
- `ldconfig -p | grep cudnn`
- `ldconfig -p | grep cublas`
- `ldconfig -p | grep tensorrt`
- `ldconfig -p | grep nccl`
- `ls /usr/lib/x86_64-linux-gnu/ | grep -i cuda`
**Python framework check:**
- `python3 -c "import torch; print(f'PyTorch: {torch.__version__}, CUDA: {torch.cuda.is_available()}, Version: {torch.version.cuda}')"`
- `python3 -c "import tensorflow as tf; print(f'TensorFlow: {tf.__version__}, GPU: {tf.config.list_physical_devices(\"GPU\")}')"`
- `python3 -c "import torch; print(f'Tensor Cores: {torch.cuda.get_device_capability()}')"`
**Container runtime:**
- `docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi`
- `which nvidia-container-cli`
- `nvidia-container-cli info`
**OpenCL:**
- `clinfo`
- `clinfo | grep "Device Name"`
**System libraries:**
- `dpkg -l | grep -i cuda`
- `dpkg -l | grep -i nvidia`
- `dpkg -l | grep -i rocm`
**Performance info:**
- `nvidia-smi --query-gpu=name,memory.total,memory.free,driver_version,compute_cap --format=csv`
- `nvidia-smi dmon -s pucvmet` (dynamic monitoring)
## Output Format
### Executive Summary
```
GPU: [model]
Driver: [proprietary/open] v[version] ([status])
Compute: [CUDA/ROCm] v[version] (Compute [capability])
AI/ML Readiness: [Ready/Partial/Not Ready]
Best For: [Training/Inference/Both]
Recommended Frameworks: [PyTorch, TensorFlow, etc.]
```
### Detailed AI/ML Assessment
**Driver Status:**
- Type: [Proprietary/Open Source]
- Version: [version number]
- Release Date: [date]
- Status: [Loaded/Error]
- Kernel Module: [module] ([loaded/not loaded])
- Latest Available: [version]
- Update Recommended: [Yes/No]
- Secure Boot: [Compatible/Issue]
**Compute Framework Availability:**
- CUDA Toolkit: [Installed/Not Installed] - v[version]
- CUDA Driver API: v[version]
- ROCm: [Installed/Not Installed] - v[version]
- OpenCL: [Available/Not Available] - v[version]
- Compute Capability: [X.X] ([architecture name])
**GPU Compute Specifications:**
- Architecture: [Turing/Ampere/Ada/RDNA3/Xe]
- Tensor Cores: [Yes/No] - [Generation]
- CUDA Cores / SPs: [count]
- VRAM: [GB] [memory type]
- Memory Bandwidth: [GB/s]
- Precision Support:
- FP64: [TFLOPS]
- FP32: [TFLOPS]
- FP16: [TFLOPS]
- INT8: [TOPS]
- TF32: [Yes/No]
**AI/ML Libraries:**
- cuDNN: [version] ([installed/missing])
- cuBLAS: [version] ([installed/missing])
- TensorRT: [version] ([installed/missing])
- NCCL: [version] ([installed/missing])
- MIOpen: [version] (AMD only)
- rocBLAS: [version] (AMD only)
**Deep Learning Framework Support:**
- PyTorch: [version]
- CUDA Enabled: [Yes/No]
- CUDA Version: [version]
- cuDNN Version: [version]
- TensorFlow: [version]
- GPU Support: [Yes/No]
- CUDA Version: [version]
- JAX: [installed/not installed]
- ONNX Runtime: [GPU backend available]
**Container Support:**
- NVIDIA Container Toolkit: [installed/not installed]
- Docker GPU Access: [working/not working]
- Podman GPU Support: [available]
**Model Capacity Estimates:**
- Small Models (< 1B params): [batch size X]
- Medium Models (1B-7B params): [batch size X]
- Large Models (7B-13B params): [batch size X]
- Very Large Models (13B-70B params): [requires multi-GPU or not possible]
Example workload estimates based on [GB] VRAM:
- LLaMA 7B: [inference only/training possible]
- Stable Diffusion: [batch size X]
- BERT Base: [batch size X]
- GPT-2: [batch size X]
**Workload Suitability:**
- Training:
- Small models: [Excellent/Good/Fair/Poor]
- Medium models: [rating]
- Large models: [rating]
- Inference:
- Real-time: [Excellent/Good/Fair/Poor]
- Batch: [rating]
- Low-latency: [rating]
**Use Case Recommendations:**
- Computer Vision (CNNs): [Excellent/Good/Fair/Poor]
- NLP (Transformers): [rating]
- Generative AI (LLMs): [rating]
- Diffusion Models: [rating]
- Reinforcement Learning: [rating]
**Performance Tier:**
- Category: [Consumer/Professional/Data Center]
- Training Performance: [rating]
- Inference Performance: [rating]
- Multi-GPU Scaling: [available/not available]
**Optimization Features Available:**
- Automatic Mixed Precision: [Yes/No]
- Tensor Core Utilization: [Yes/No]
- TensorRT Optimization: [Available]
- Flash Attention: [Supported]
- INT8 Quantization: [Supported]
- Multi-GPU Training: [Possible with [count] GPUs]
**Limitations and Bottlenecks:**
- VRAM Constraint: [assessment]
- Memory Bandwidth: [adequate/limited]
- Compute Throughput: [assessment]
- PCIe Bottleneck: [yes/no]
- Driver Limitations: [any known issues]
- Power/Thermal: [throttling concerns]
**Recommendations:**
1. [Driver update/optimization suggestions]
2. [Framework installation recommendations]
3. [Workload optimization suggestions]
4. [Hardware upgrade path if applicable]
5. [Container/virtualization setup if beneficial]
### AI/ML Readiness Scorecard
```
Driver Setup: [✓/✗/⚠] [details]
CUDA/ROCm Install: [✓/✗/⚠] [details]
Framework Support: [✓/✗/⚠] [details]
Library Ecosystem: [✓/✗/⚠] [details]
Container Runtime: [✓/✗/⚠] [details]
VRAM Capacity: [✓/✗/⚠] [details]
Compute Performance: [✓/✗/⚠] [details]
Overall Readiness: [Ready/Needs Setup/Limited/Not Suitable]
```
### AI-Readable JSON
```json
{
"driver": {
"type": "proprietary|open_source",
"version": "",
"status": "loaded|error",
"latest_available": "",
"update_recommended": false
},
"compute_platform": {
"cuda": {
"installed": false,
"version": "",
"compute_capability": ""
},
"rocm": {
"installed": false,
"version": ""
},
"opencl": {
"available": false,
"version": ""
}
},
"gpu_specs": {
"architecture": "",
"tensor_cores": false,
"vram_gb": 0,
"memory_bandwidth_gbs": 0,
"fp32_tflops": 0,
"fp16_tflops": 0,
"int8_tops": 0,
"tf32_support": false
},
"libraries": {
"cudnn": "",
"cublas": "",
"tensorrt": "",
"nccl": ""
},
"frameworks": {
"pytorch": {
"installed": false,
"version": "",
"cuda_available": false
},
"tensorflow": {
"installed": false,
"version": "",
"gpu_available": false
}
},
"container_support": {
"nvidia_container_toolkit": false,
"docker_gpu_working": false
},
"workload_suitability": {
"training": {
"small_models": "excellent|good|fair|poor",
"medium_models": "",
"large_models": ""
},
"inference": {
"real_time": "",
"batch": ""
}
},
"model_capacity": {
"vram_gb": 0,
"small_model_batch_size": 0,
"llama_7b_possible": false,
"stable_diffusion_batch": 0
},
"optimization_features": {
"amp_support": false,
"tensor_core_utilization": false,
"tensorrt_available": false,
"int8_quantization": false
},
"bottlenecks": {
"vram_limited": false,
"compute_limited": false,
"pcie_bottleneck": false
},
"ai_ml_readiness": "ready|needs_setup|limited|not_suitable"
}
```
## Execution Guidelines
1. **Identify GPU vendor first**: NVIDIA, AMD, or Intel
2. **Check driver installation**: Verify driver is loaded and working
3. **Assess compute platform**: CUDA for NVIDIA, ROCm for AMD
4. **Query compute capability**: Critical for framework compatibility
5. **Check library installation**: cuDNN, TensorRT, etc.
6. **Test framework access**: Try importing PyTorch/TensorFlow with GPU
7. **Evaluate VRAM capacity**: Estimate model sizes
8. **Check container support**: Important for ML workflows
9. **Identify bottlenecks**: VRAM, compute, or driver issues
10. **Provide actionable recommendations**: Setup steps or optimizations
## Important Notes
- NVIDIA GPUs have the most mature AI/ML ecosystem
- CUDA compute capability determines supported features
- cuDNN is critical for deep learning performance
- VRAM is often the primary bottleneck for large models
- Container runtimes simplify framework management
- AMD ROCm support is improving but less mature than CUDA
- Intel GPUs are emerging in AI/ML space
- Tensor cores provide significant speedup for mixed precision
- Driver version must match CUDA toolkit requirements
- Some features require specific GPU generations
- Multi-GPU setups require additional configuration
- Consumer GPUs can be effective for smaller workloads
- Professional/datacenter GPUs offer better reliability and support
Be thorough and practical - provide a clear assessment of AI/ML readiness and actionable next steps.
|