File size: 4,473 Bytes
708f4a3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
# CRAYON CUDA Testing Guide for Google Colab T4

## Quick Setup Commands

Run these cells in sequence in Google Colab (with T4 GPU runtime):

```bash
# Cell 1: Check GPU
!nvidia-smi
!nvcc --version
```

```bash
# Cell 2: Install PyTorch CUDA
!pip uninstall torch torchvision torchaudio -y
!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

import torch
print(f"PyTorch: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"GPU: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'None'}")
```

```bash
# Cell 3: Install CRAYON with CUDA
!pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ xerv-crayon[cuda]

# Verify installation
!python -c "import crayon; print('CRAYON installed')"
```

```python
# Cell 4: Test CUDA functionality
import logging
logging.basicConfig(level=logging.INFO, format='%(levelname)s: %(message)s')

from crayon.core.vocabulary import CrayonVocab

print("=== CRAYON CUDA Test ===")

# Auto-detection (should pick CUDA)
vocab = CrayonVocab(device="auto")
print(f"Device: {vocab.device}")

# Load profile
vocab.load_profile("lite")
print(f"Profile loaded: {len(vocab)} tokens")

# Test tokenization
text = "Hello, world! This is CUDA-accelerated tokenization."
tokens = vocab.tokenize(text)
print(f"Text: {text}")
print(f"Tokens: {tokens}")
print(f"Count: {len(tokens)}")
```

```python
# Cell 5: Performance benchmark
import time

def benchmark(vocab, text, runs=5):
    times = []
    for _ in range(runs):
        start = time.time()
        tokens = vocab.tokenize(text)
        times.append(time.time() - start)
    avg_time = sum(times) / len(times)
    return avg_time, len(tokens)

# Test texts
texts = [
    "Hello world",
    "Hello world! " * 10,
    "Hello world! " * 100,
    "Hello world! " * 1000,
]

# CPU comparison
vocab_cpu = CrayonVocab(device="cpu")
vocab_cpu.load_profile("lite")

print("=== Performance Comparison ===")
for i, text in enumerate(texts):
    print(f"\nTest {i+1}: {len(text)} chars")

    # CPU
    cpu_time, cpu_tokens = benchmark(vocab_cpu, text)
    print(f"  CPU:  {cpu_time:.6f}s ({cpu_tokens} tokens)")

    # CUDA
    cuda_time, cuda_tokens = benchmark(vocab, text)
    print(f"  CUDA: {cuda_time:.6f}s ({cuda_tokens} tokens)")

    # Speedup
    speedup = cpu_time / cuda_time if cuda_time > 0 else 0
    print(f"  Speedup: {speedup:.2f}x")
```

```python
# Cell 6: Batch processing test
batch_texts = [
    "def fibonacci(n): return n if n <= 1 else fibonacci(n-1) + fibonacci(n-2)",
    "class NeuralNetwork(nn.Module): def __init__(self): super().__init__()",
    "import torch; model = torch.nn.Sequential(torch.nn.Linear(10, 5), torch.nn.ReLU())",
] * 50  # Large batch

print(f"Batch size: {len(batch_texts)}")

# CUDA batch
start = time.time()
batch_tokens = vocab.tokenize(batch_texts)
cuda_batch_time = time.time() - start

# CPU batch
start = time.time()
batch_tokens_cpu = vocab_cpu.tokenize(batch_texts)
cpu_batch_time = time.time() - start

print(f"CPU batch:  {cpu_batch_time:.4f}s")
print(f"CUDA batch: {cuda_batch_time:.4f}s")
print(f"Speedup: {cpu_batch_time/cuda_batch_time:.2f}x")
```

## Expected Results on T4

- **Device Detection**: Should automatically select "cuda"
- **Hardware**: NVIDIA T4, ~16GB VRAM, Compute Capability 7.5
- **Performance**: 2-5x speedup on single texts, 5-10x on batches
- **Memory**: Efficient GPU utilization

## Troubleshooting

If CUDA doesn't work, run this diagnostic:

```python
# Get detailed error information
vocab = CrayonVocab(device="cpu")  # Initialize first
print(vocab._get_cuda_import_error())
```

Common fixes:
1. **PyTorch not CUDA**: Reinstall with `cu121` wheels
2. **CUDA_HOME**: Colab usually has this set correctly
3. **GPU runtime**: Ensure "GPU" is selected in runtime settings

## Colab-Specific Notes

- **Free T4 GPU**: Limited to ~12 hours, may disconnect
- **Memory**: ~16GB GPU RAM, ~25GB system RAM
- **CUDA**: Pre-installed CUDA 12.2, but we use 12.1 for compatibility
- **PyTorch**: Must be CUDA-enabled version

## Alternative: Use Development Version

```bash
# Install directly from GitHub
!pip install git+https://github.com/Electroiscoding/CRAYON.git

# Force CUDA build if needed
!CRAYON_FORCE_CUDA=1 pip install git+https://github.com/Electroiscoding/CRAYON.git
```

This guide tests the CRAYON improvements made to fix CUDA extension issues and provide better error messaging.