File size: 4,473 Bytes
708f4a3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 | # CRAYON CUDA Testing Guide for Google Colab T4
## Quick Setup Commands
Run these cells in sequence in Google Colab (with T4 GPU runtime):
```bash
# Cell 1: Check GPU
!nvidia-smi
!nvcc --version
```
```bash
# Cell 2: Install PyTorch CUDA
!pip uninstall torch torchvision torchaudio -y
!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
import torch
print(f"PyTorch: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"GPU: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'None'}")
```
```bash
# Cell 3: Install CRAYON with CUDA
!pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ xerv-crayon[cuda]
# Verify installation
!python -c "import crayon; print('CRAYON installed')"
```
```python
# Cell 4: Test CUDA functionality
import logging
logging.basicConfig(level=logging.INFO, format='%(levelname)s: %(message)s')
from crayon.core.vocabulary import CrayonVocab
print("=== CRAYON CUDA Test ===")
# Auto-detection (should pick CUDA)
vocab = CrayonVocab(device="auto")
print(f"Device: {vocab.device}")
# Load profile
vocab.load_profile("lite")
print(f"Profile loaded: {len(vocab)} tokens")
# Test tokenization
text = "Hello, world! This is CUDA-accelerated tokenization."
tokens = vocab.tokenize(text)
print(f"Text: {text}")
print(f"Tokens: {tokens}")
print(f"Count: {len(tokens)}")
```
```python
# Cell 5: Performance benchmark
import time
def benchmark(vocab, text, runs=5):
times = []
for _ in range(runs):
start = time.time()
tokens = vocab.tokenize(text)
times.append(time.time() - start)
avg_time = sum(times) / len(times)
return avg_time, len(tokens)
# Test texts
texts = [
"Hello world",
"Hello world! " * 10,
"Hello world! " * 100,
"Hello world! " * 1000,
]
# CPU comparison
vocab_cpu = CrayonVocab(device="cpu")
vocab_cpu.load_profile("lite")
print("=== Performance Comparison ===")
for i, text in enumerate(texts):
print(f"\nTest {i+1}: {len(text)} chars")
# CPU
cpu_time, cpu_tokens = benchmark(vocab_cpu, text)
print(f" CPU: {cpu_time:.6f}s ({cpu_tokens} tokens)")
# CUDA
cuda_time, cuda_tokens = benchmark(vocab, text)
print(f" CUDA: {cuda_time:.6f}s ({cuda_tokens} tokens)")
# Speedup
speedup = cpu_time / cuda_time if cuda_time > 0 else 0
print(f" Speedup: {speedup:.2f}x")
```
```python
# Cell 6: Batch processing test
batch_texts = [
"def fibonacci(n): return n if n <= 1 else fibonacci(n-1) + fibonacci(n-2)",
"class NeuralNetwork(nn.Module): def __init__(self): super().__init__()",
"import torch; model = torch.nn.Sequential(torch.nn.Linear(10, 5), torch.nn.ReLU())",
] * 50 # Large batch
print(f"Batch size: {len(batch_texts)}")
# CUDA batch
start = time.time()
batch_tokens = vocab.tokenize(batch_texts)
cuda_batch_time = time.time() - start
# CPU batch
start = time.time()
batch_tokens_cpu = vocab_cpu.tokenize(batch_texts)
cpu_batch_time = time.time() - start
print(f"CPU batch: {cpu_batch_time:.4f}s")
print(f"CUDA batch: {cuda_batch_time:.4f}s")
print(f"Speedup: {cpu_batch_time/cuda_batch_time:.2f}x")
```
## Expected Results on T4
- **Device Detection**: Should automatically select "cuda"
- **Hardware**: NVIDIA T4, ~16GB VRAM, Compute Capability 7.5
- **Performance**: 2-5x speedup on single texts, 5-10x on batches
- **Memory**: Efficient GPU utilization
## Troubleshooting
If CUDA doesn't work, run this diagnostic:
```python
# Get detailed error information
vocab = CrayonVocab(device="cpu") # Initialize first
print(vocab._get_cuda_import_error())
```
Common fixes:
1. **PyTorch not CUDA**: Reinstall with `cu121` wheels
2. **CUDA_HOME**: Colab usually has this set correctly
3. **GPU runtime**: Ensure "GPU" is selected in runtime settings
## Colab-Specific Notes
- **Free T4 GPU**: Limited to ~12 hours, may disconnect
- **Memory**: ~16GB GPU RAM, ~25GB system RAM
- **CUDA**: Pre-installed CUDA 12.2, but we use 12.1 for compatibility
- **PyTorch**: Must be CUDA-enabled version
## Alternative: Use Development Version
```bash
# Install directly from GitHub
!pip install git+https://github.com/Electroiscoding/CRAYON.git
# Force CUDA build if needed
!CRAYON_FORCE_CUDA=1 pip install git+https://github.com/Electroiscoding/CRAYON.git
```
This guide tests the CRAYON improvements made to fix CUDA extension issues and provide better error messaging.
|