Upload folder using huggingface_hub

708f4a3 verified 4 days ago

4.47 kB

	# CRAYON CUDA Testing Guide for Google Colab T4

	## Quick Setup Commands

	Run these cells in sequence in Google Colab (with T4 GPU runtime):

	```bash
	# Cell 1: Check GPU
	!nvidia-smi
	!nvcc --version
	```

	```bash
	# Cell 2: Install PyTorch CUDA
	!pip uninstall torch torchvision torchaudio -y
	!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

	import torch
	print(f"PyTorch: {torch.__version__}")
	print(f"CUDA available: {torch.cuda.is_available()}")
	print(f"GPU: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'None'}")
	```

	```bash
	# Cell 3: Install CRAYON with CUDA
	!pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ xerv-crayon[cuda]

	# Verify installation
	!python -c "import crayon; print('CRAYON installed')"
	```

	```python
	# Cell 4: Test CUDA functionality
	import logging
	logging.basicConfig(level=logging.INFO, format='%(levelname)s: %(message)s')

	from crayon.core.vocabulary import CrayonVocab

	print("=== CRAYON CUDA Test ===")

	# Auto-detection (should pick CUDA)
	vocab = CrayonVocab(device="auto")
	print(f"Device: {vocab.device}")

	# Load profile
	vocab.load_profile("lite")
	print(f"Profile loaded: {len(vocab)} tokens")

	# Test tokenization
	text = "Hello, world! This is CUDA-accelerated tokenization."
	tokens = vocab.tokenize(text)
	print(f"Text: {text}")
	print(f"Tokens: {tokens}")
	print(f"Count: {len(tokens)}")
	```

	```python
	# Cell 5: Performance benchmark
	import time

	def benchmark(vocab, text, runs=5):
	times = []
	for _ in range(runs):
	start = time.time()
	tokens = vocab.tokenize(text)
	times.append(time.time() - start)
	avg_time = sum(times) / len(times)
	return avg_time, len(tokens)

	# Test texts
	texts = [
	"Hello world",
	"Hello world! " * 10,
	"Hello world! " * 100,
	"Hello world! " * 1000,
	]

	# CPU comparison
	vocab_cpu = CrayonVocab(device="cpu")
	vocab_cpu.load_profile("lite")

	print("=== Performance Comparison ===")
	for i, text in enumerate(texts):
	print(f"\nTest {i+1}: {len(text)} chars")

	# CPU
	cpu_time, cpu_tokens = benchmark(vocab_cpu, text)
	print(f" CPU: {cpu_time:.6f}s ({cpu_tokens} tokens)")

	# CUDA
	cuda_time, cuda_tokens = benchmark(vocab, text)
	print(f" CUDA: {cuda_time:.6f}s ({cuda_tokens} tokens)")

	# Speedup
	speedup = cpu_time / cuda_time if cuda_time > 0 else 0
	print(f" Speedup: {speedup:.2f}x")
	```

	```python
	# Cell 6: Batch processing test
	batch_texts = [
	"def fibonacci(n): return n if n <= 1 else fibonacci(n-1) + fibonacci(n-2)",
	"class NeuralNetwork(nn.Module): def __init__(self): super().__init__()",
	"import torch; model = torch.nn.Sequential(torch.nn.Linear(10, 5), torch.nn.ReLU())",
	] * 50 # Large batch

	print(f"Batch size: {len(batch_texts)}")

	# CUDA batch
	start = time.time()
	batch_tokens = vocab.tokenize(batch_texts)
	cuda_batch_time = time.time() - start

	# CPU batch
	start = time.time()
	batch_tokens_cpu = vocab_cpu.tokenize(batch_texts)
	cpu_batch_time = time.time() - start

	print(f"CPU batch: {cpu_batch_time:.4f}s")
	print(f"CUDA batch: {cuda_batch_time:.4f}s")
	print(f"Speedup: {cpu_batch_time/cuda_batch_time:.2f}x")
	```

	## Expected Results on T4

	- Device Detection: Should automatically select "cuda"
	- Hardware: NVIDIA T4, ~16GB VRAM, Compute Capability 7.5
	- Performance: 2-5x speedup on single texts, 5-10x on batches
	- Memory: Efficient GPU utilization

	## Troubleshooting

	If CUDA doesn't work, run this diagnostic:

	```python
	# Get detailed error information
	vocab = CrayonVocab(device="cpu") # Initialize first
	print(vocab._get_cuda_import_error())
	```

	Common fixes:
	1. PyTorch not CUDA: Reinstall with `cu121` wheels
	2. CUDA_HOME: Colab usually has this set correctly
	3. GPU runtime: Ensure "GPU" is selected in runtime settings

	## Colab-Specific Notes

	- Free T4 GPU: Limited to ~12 hours, may disconnect
	- Memory: ~16GB GPU RAM, ~25GB system RAM
	- CUDA: Pre-installed CUDA 12.2, but we use 12.1 for compatibility
	- PyTorch: Must be CUDA-enabled version

	## Alternative: Use Development Version

	```bash
	# Install directly from GitHub
	!pip install git+https://github.com/Electroiscoding/CRAYON.git

	# Force CUDA build if needed
	!CRAYON_FORCE_CUDA=1 pip install git+https://github.com/Electroiscoding/CRAYON.git
	```

	This guide tests the CRAYON improvements made to fix CUDA extension issues and provide better error messaging.