# Quick Start Guide Get up and running with BitLinear in minutes. ## Installation ### Prerequisites - Python >= 3.8 - PyTorch >= 2.0.0 - (Optional) CUDA toolkit for GPU acceleration ### Install from Source ```bash # Clone the repository git clone https://github.com/yourusername/bitlinear.git cd bitlinear # Install in development mode (CPU-only) pip install -e . # Or with development dependencies pip install -e ".[dev]" ``` ### Install with CUDA Support ```bash # Set CUDA_HOME if not already set export CUDA_HOME=/usr/local/cuda # Linux/Mac # or set CUDA_HOME=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8 # Windows # Install pip install -e . ``` ## Basic Usage ### Simple Example ```python import torch from bitlinear import BitLinear # Create a BitLinear layer (same interface as nn.Linear) layer = BitLinear(in_features=512, out_features=1024, bias=True) # Forward pass x = torch.randn(32, 128, 512) # [batch, seq_len, features] output = layer(x) # [32, 128, 1024] print(f"Input shape: {x.shape}") print(f"Output shape: {output.shape}") ``` ### Convert Existing Model ```python import torch.nn as nn from bitlinear import BitLinear # Start with a standard Linear layer linear = nn.Linear(512, 1024) # ... possibly pre-trained ... # Convert to BitLinear bitlinear = BitLinear.from_linear(linear) # Use as drop-in replacement x = torch.randn(16, 512) output = bitlinear(x) ``` ### Multi-Component Ternary Layer For better approximation quality: ```python from bitlinear import MultiTernaryLinear # k=4 means 4 ternary components (better approximation, 4x compute) layer = MultiTernaryLinear( in_features=512, out_features=1024, k=4, # Number of ternary components bias=True ) x = torch.randn(32, 512) output = layer(x) ``` ### Convert Entire Model ```python from bitlinear import convert_linear_to_bitlinear import torch.nn as nn # Original model with nn.Linear layers model = nn.Sequential( nn.Linear(512, 1024), nn.ReLU(), nn.Linear(1024, 512), nn.Softmax(dim=-1) ) # Convert all Linear layers to BitLinear model_bitlinear = convert_linear_to_bitlinear(model, inplace=False) # Use as normal x = torch.randn(16, 512) output = model_bitlinear(x) ``` ## In a Transformer Replace attention projection layers: ```python import torch.nn as nn from bitlinear import BitLinear class TransformerBlock(nn.Module): def __init__(self, d_model=512, nhead=8): super().__init__() # Replace nn.Linear with BitLinear self.q_proj = BitLinear(d_model, d_model) self.k_proj = BitLinear(d_model, d_model) self.v_proj = BitLinear(d_model, d_model) self.out_proj = BitLinear(d_model, d_model) # Keep other components unchanged self.norm = nn.LayerNorm(d_model) self.dropout = nn.Dropout(0.1) def forward(self, x): # Standard Transformer forward pass q = self.q_proj(x) k = self.k_proj(x) v = self.v_proj(x) # ... attention computation ... ``` ## Memory Savings Example ```python import torch import torch.nn as nn from bitlinear import BitLinear def count_params(model): return sum(p.numel() for p in model.parameters()) def estimate_memory_mb(model): total_bytes = sum(p.numel() * p.element_size() for p in model.parameters()) return total_bytes / (1024 ** 2) # Standard Linear linear = nn.Linear(2048, 2048) print(f"Linear parameters: {count_params(linear):,}") print(f"Linear memory: {estimate_memory_mb(linear):.2f} MB") # BitLinear bitlinear = BitLinear(2048, 2048) print(f"BitLinear parameters: {count_params(bitlinear):,}") print(f"BitLinear memory: {estimate_memory_mb(bitlinear):.2f} MB") # Savings savings = (estimate_memory_mb(linear) - estimate_memory_mb(bitlinear)) / estimate_memory_mb(linear) * 100 print(f"Memory savings: {savings:.1f}%") ``` ## Training with BitLinear ### Fine-tuning a Pre-trained Model ```python import torch import torch.nn as nn import torch.optim as optim from bitlinear import convert_linear_to_bitlinear # Load pre-trained model model = YourModel.from_pretrained('model_name') # Convert to BitLinear model = convert_linear_to_bitlinear(model, inplace=True) # Fine-tune with standard PyTorch training loop optimizer = optim.Adam(model.parameters(), lr=1e-4) criterion = nn.CrossEntropyLoss() for epoch in range(num_epochs): for batch in dataloader: x, y = batch # Forward pass output = model(x) loss = criterion(output, y) # Backward pass optimizer.zero_grad() loss.backward() optimizer.step() ``` ### Quantization-Aware Training (QAT) Train with quantization from scratch: ```python from bitlinear import BitLinear # Model with BitLinear from the start model = nn.Sequential( BitLinear(784, 512), nn.ReLU(), BitLinear(512, 256), nn.ReLU(), BitLinear(256, 10), ) # Standard training loop # Gradients will flow through quantization (straight-through estimator) optimizer = optim.Adam(model.parameters(), lr=1e-3) # ... train as usual ... ``` ## Testing Run the test suite: ```bash # Install test dependencies pip install -e ".[dev]" # Run all tests pytest tests/ -v # Run specific test file pytest tests/test_layers.py -v # Run with coverage pytest tests/ -v --cov=bitlinear --cov-report=html # Skip slow tests pytest tests/ -m "not slow" # Skip CUDA tests (if no GPU available) pytest tests/ -m "not cuda" ``` ## Examples Run included examples: ```bash # Basic usage python examples/basic_usage.py # Transformer example python examples/transformer_example.py ``` ## Troubleshooting ### Import Error If you get `ModuleNotFoundError: No module named 'bitlinear'`: ```bash # Make sure you installed the package pip install -e . # Or add to PYTHONPATH export PYTHONPATH=/path/to/BitLinear:$PYTHONPATH ``` ### CUDA Build Failures If CUDA build fails: 1. **Check CUDA_HOME:** ```bash echo $CUDA_HOME # Should point to CUDA installation ``` 2. **Check PyTorch CUDA version:** ```python import torch print(torch.version.cuda) ``` 3. **Match CUDA versions:** PyTorch and system CUDA should match 4. **Fall back to CPU:** ```bash # Build CPU-only version unset CUDA_HOME pip install -e . ``` ### Tests Failing All tests are currently marked as `pytest.skip()` because implementation is not yet complete. This is expected! To implement: 1. Follow `IMPLEMENTATION_GUIDE.md` 2. Start with `bitlinear/quantization.py` 3. Remove `pytest.skip()` as you implement each function 4. Tests should pass as you complete implementation ## Next Steps 1. **Read the Implementation Guide:** `IMPLEMENTATION_GUIDE.md` 2. **Explore the Project Structure:** `PROJECT_STRUCTURE.md` 3. **Start Implementing:** - Begin with `bitlinear/quantization.py` - Move to `bitlinear/functional.py` - Then `bitlinear/layers.py` 4. **Test as You Go:** Run tests after implementing each component 5. **Try Examples:** Test with `examples/transformer_example.py` ## Getting Help - **Documentation:** Check docstrings in each module - **Issues:** Open an issue on GitHub - **Examples:** See `examples/` directory - **Tests:** Look at `tests/` for usage patterns ## Performance Tips ### Memory Optimization 1. **Use packed weights** (when implemented): ```python from bitlinear.packing import pack_ternary_base3 packed, shape = pack_ternary_base3(W_ternary) ``` 2. **Batch processing:** Larger batches are more efficient 3. **Mixed precision:** Combine with torch.amp for activation quantization ### Speed Optimization 1. **Use CUDA:** Build with CUDA support for GPU acceleration 2. **Larger layers:** BitLinear benefits increase with layer size 3. **Profile:** Use PyTorch profiler to find bottlenecks ```python import torch.profiler as profiler with profiler.profile() as prof: output = model(x) print(prof.key_averages().table(sort_by="cuda_time_total", row_limit=10)) ``` ## Resources - **Paper:** https://jmlr.org/papers/volume26/24-2050/24-2050.pdf - **BitNet:** https://arxiv.org/abs/2310.11453 - **PyTorch Quantization:** https://pytorch.org/docs/stable/quantization.html Happy coding! 🚀