| # CRAYON CUDA Testing Guide for Google Colab T4 |
|
|
| ## Quick Setup Commands |
|
|
| Run these cells in sequence in Google Colab (with T4 GPU runtime): |
|
|
| ```bash |
| # Cell 1: Check GPU |
| !nvidia-smi |
| !nvcc --version |
| ``` |
|
|
| ```bash |
| # Cell 2: Install PyTorch CUDA |
| !pip uninstall torch torchvision torchaudio -y |
| !pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 |
| |
| import torch |
| print(f"PyTorch: {torch.__version__}") |
| print(f"CUDA available: {torch.cuda.is_available()}") |
| print(f"GPU: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'None'}") |
| ``` |
|
|
| ```bash |
| # Cell 3: Install CRAYON with CUDA |
| !pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ xerv-crayon[cuda] |
| |
| # Verify installation |
| !python -c "import crayon; print('CRAYON installed')" |
| ``` |
|
|
| ```python |
| # Cell 4: Test CUDA functionality |
| import logging |
| logging.basicConfig(level=logging.INFO, format='%(levelname)s: %(message)s') |
| |
| from crayon.core.vocabulary import CrayonVocab |
| |
| print("=== CRAYON CUDA Test ===") |
| |
| # Auto-detection (should pick CUDA) |
| vocab = CrayonVocab(device="auto") |
| print(f"Device: {vocab.device}") |
| |
| # Load profile |
| vocab.load_profile("lite") |
| print(f"Profile loaded: {len(vocab)} tokens") |
| |
| # Test tokenization |
| text = "Hello, world! This is CUDA-accelerated tokenization." |
| tokens = vocab.tokenize(text) |
| print(f"Text: {text}") |
| print(f"Tokens: {tokens}") |
| print(f"Count: {len(tokens)}") |
| ``` |
|
|
| ```python |
| # Cell 5: Performance benchmark |
| import time |
| |
| def benchmark(vocab, text, runs=5): |
| times = [] |
| for _ in range(runs): |
| start = time.time() |
| tokens = vocab.tokenize(text) |
| times.append(time.time() - start) |
| avg_time = sum(times) / len(times) |
| return avg_time, len(tokens) |
| |
| # Test texts |
| texts = [ |
| "Hello world", |
| "Hello world! " * 10, |
| "Hello world! " * 100, |
| "Hello world! " * 1000, |
| ] |
| |
| # CPU comparison |
| vocab_cpu = CrayonVocab(device="cpu") |
| vocab_cpu.load_profile("lite") |
| |
| print("=== Performance Comparison ===") |
| for i, text in enumerate(texts): |
| print(f"\nTest {i+1}: {len(text)} chars") |
| |
| # CPU |
| cpu_time, cpu_tokens = benchmark(vocab_cpu, text) |
| print(f" CPU: {cpu_time:.6f}s ({cpu_tokens} tokens)") |
| |
| # CUDA |
| cuda_time, cuda_tokens = benchmark(vocab, text) |
| print(f" CUDA: {cuda_time:.6f}s ({cuda_tokens} tokens)") |
| |
| # Speedup |
| speedup = cpu_time / cuda_time if cuda_time > 0 else 0 |
| print(f" Speedup: {speedup:.2f}x") |
| ``` |
|
|
| ```python |
| # Cell 6: Batch processing test |
| batch_texts = [ |
| "def fibonacci(n): return n if n <= 1 else fibonacci(n-1) + fibonacci(n-2)", |
| "class NeuralNetwork(nn.Module): def __init__(self): super().__init__()", |
| "import torch; model = torch.nn.Sequential(torch.nn.Linear(10, 5), torch.nn.ReLU())", |
| ] * 50 # Large batch |
| |
| print(f"Batch size: {len(batch_texts)}") |
| |
| # CUDA batch |
| start = time.time() |
| batch_tokens = vocab.tokenize(batch_texts) |
| cuda_batch_time = time.time() - start |
| |
| # CPU batch |
| start = time.time() |
| batch_tokens_cpu = vocab_cpu.tokenize(batch_texts) |
| cpu_batch_time = time.time() - start |
| |
| print(f"CPU batch: {cpu_batch_time:.4f}s") |
| print(f"CUDA batch: {cuda_batch_time:.4f}s") |
| print(f"Speedup: {cpu_batch_time/cuda_batch_time:.2f}x") |
| ``` |
|
|
| ## Expected Results on T4 |
|
|
| - **Device Detection**: Should automatically select "cuda" |
| - **Hardware**: NVIDIA T4, ~16GB VRAM, Compute Capability 7.5 |
| - **Performance**: 2-5x speedup on single texts, 5-10x on batches |
| - **Memory**: Efficient GPU utilization |
|
|
| ## Troubleshooting |
|
|
| If CUDA doesn't work, run this diagnostic: |
|
|
| ```python |
| # Get detailed error information |
| vocab = CrayonVocab(device="cpu") # Initialize first |
| print(vocab._get_cuda_import_error()) |
| ``` |
|
|
| Common fixes: |
| 1. **PyTorch not CUDA**: Reinstall with `cu121` wheels |
| 2. **CUDA_HOME**: Colab usually has this set correctly |
| 3. **GPU runtime**: Ensure "GPU" is selected in runtime settings |
| |
| ## Colab-Specific Notes |
| |
| - **Free T4 GPU**: Limited to ~12 hours, may disconnect |
| - **Memory**: ~16GB GPU RAM, ~25GB system RAM |
| - **CUDA**: Pre-installed CUDA 12.2, but we use 12.1 for compatibility |
| - **PyTorch**: Must be CUDA-enabled version |
| |
| ## Alternative: Use Development Version |
| |
| ```bash |
| # Install directly from GitHub |
| !pip install git+https://github.com/Electroiscoding/CRAYON.git |
| |
| # Force CUDA build if needed |
| !CRAYON_FORCE_CUDA=1 pip install git+https://github.com/Electroiscoding/CRAYON.git |
| ``` |
| |
| This guide tests the CRAYON improvements made to fix CUDA extension issues and provide better error messaging. |
| |