BitLinear / read /QUICKSTART.md
krisaujla's picture
Upload folder using huggingface_hub
fd8c8b9 verified
# Quick Start Guide
Get up and running with BitLinear in minutes.
## Installation
### Prerequisites
- Python >= 3.8
- PyTorch >= 2.0.0
- (Optional) CUDA toolkit for GPU acceleration
### Install from Source
```bash
# Clone the repository
git clone https://github.com/yourusername/bitlinear.git
cd bitlinear
# Install in development mode (CPU-only)
pip install -e .
# Or with development dependencies
pip install -e ".[dev]"
```
### Install with CUDA Support
```bash
# Set CUDA_HOME if not already set
export CUDA_HOME=/usr/local/cuda # Linux/Mac
# or
set CUDA_HOME=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8 # Windows
# Install
pip install -e .
```
## Basic Usage
### Simple Example
```python
import torch
from bitlinear import BitLinear
# Create a BitLinear layer (same interface as nn.Linear)
layer = BitLinear(in_features=512, out_features=1024, bias=True)
# Forward pass
x = torch.randn(32, 128, 512) # [batch, seq_len, features]
output = layer(x) # [32, 128, 1024]
print(f"Input shape: {x.shape}")
print(f"Output shape: {output.shape}")
```
### Convert Existing Model
```python
import torch.nn as nn
from bitlinear import BitLinear
# Start with a standard Linear layer
linear = nn.Linear(512, 1024)
# ... possibly pre-trained ...
# Convert to BitLinear
bitlinear = BitLinear.from_linear(linear)
# Use as drop-in replacement
x = torch.randn(16, 512)
output = bitlinear(x)
```
### Multi-Component Ternary Layer
For better approximation quality:
```python
from bitlinear import MultiTernaryLinear
# k=4 means 4 ternary components (better approximation, 4x compute)
layer = MultiTernaryLinear(
in_features=512,
out_features=1024,
k=4, # Number of ternary components
bias=True
)
x = torch.randn(32, 512)
output = layer(x)
```
### Convert Entire Model
```python
from bitlinear import convert_linear_to_bitlinear
import torch.nn as nn
# Original model with nn.Linear layers
model = nn.Sequential(
nn.Linear(512, 1024),
nn.ReLU(),
nn.Linear(1024, 512),
nn.Softmax(dim=-1)
)
# Convert all Linear layers to BitLinear
model_bitlinear = convert_linear_to_bitlinear(model, inplace=False)
# Use as normal
x = torch.randn(16, 512)
output = model_bitlinear(x)
```
## In a Transformer
Replace attention projection layers:
```python
import torch.nn as nn
from bitlinear import BitLinear
class TransformerBlock(nn.Module):
def __init__(self, d_model=512, nhead=8):
super().__init__()
# Replace nn.Linear with BitLinear
self.q_proj = BitLinear(d_model, d_model)
self.k_proj = BitLinear(d_model, d_model)
self.v_proj = BitLinear(d_model, d_model)
self.out_proj = BitLinear(d_model, d_model)
# Keep other components unchanged
self.norm = nn.LayerNorm(d_model)
self.dropout = nn.Dropout(0.1)
def forward(self, x):
# Standard Transformer forward pass
q = self.q_proj(x)
k = self.k_proj(x)
v = self.v_proj(x)
# ... attention computation ...
```
## Memory Savings Example
```python
import torch
import torch.nn as nn
from bitlinear import BitLinear
def count_params(model):
return sum(p.numel() for p in model.parameters())
def estimate_memory_mb(model):
total_bytes = sum(p.numel() * p.element_size() for p in model.parameters())
return total_bytes / (1024 ** 2)
# Standard Linear
linear = nn.Linear(2048, 2048)
print(f"Linear parameters: {count_params(linear):,}")
print(f"Linear memory: {estimate_memory_mb(linear):.2f} MB")
# BitLinear
bitlinear = BitLinear(2048, 2048)
print(f"BitLinear parameters: {count_params(bitlinear):,}")
print(f"BitLinear memory: {estimate_memory_mb(bitlinear):.2f} MB")
# Savings
savings = (estimate_memory_mb(linear) - estimate_memory_mb(bitlinear)) / estimate_memory_mb(linear) * 100
print(f"Memory savings: {savings:.1f}%")
```
## Training with BitLinear
### Fine-tuning a Pre-trained Model
```python
import torch
import torch.nn as nn
import torch.optim as optim
from bitlinear import convert_linear_to_bitlinear
# Load pre-trained model
model = YourModel.from_pretrained('model_name')
# Convert to BitLinear
model = convert_linear_to_bitlinear(model, inplace=True)
# Fine-tune with standard PyTorch training loop
optimizer = optim.Adam(model.parameters(), lr=1e-4)
criterion = nn.CrossEntropyLoss()
for epoch in range(num_epochs):
for batch in dataloader:
x, y = batch
# Forward pass
output = model(x)
loss = criterion(output, y)
# Backward pass
optimizer.zero_grad()
loss.backward()
optimizer.step()
```
### Quantization-Aware Training (QAT)
Train with quantization from scratch:
```python
from bitlinear import BitLinear
# Model with BitLinear from the start
model = nn.Sequential(
BitLinear(784, 512),
nn.ReLU(),
BitLinear(512, 256),
nn.ReLU(),
BitLinear(256, 10),
)
# Standard training loop
# Gradients will flow through quantization (straight-through estimator)
optimizer = optim.Adam(model.parameters(), lr=1e-3)
# ... train as usual ...
```
## Testing
Run the test suite:
```bash
# Install test dependencies
pip install -e ".[dev]"
# Run all tests
pytest tests/ -v
# Run specific test file
pytest tests/test_layers.py -v
# Run with coverage
pytest tests/ -v --cov=bitlinear --cov-report=html
# Skip slow tests
pytest tests/ -m "not slow"
# Skip CUDA tests (if no GPU available)
pytest tests/ -m "not cuda"
```
## Examples
Run included examples:
```bash
# Basic usage
python examples/basic_usage.py
# Transformer example
python examples/transformer_example.py
```
## Troubleshooting
### Import Error
If you get `ModuleNotFoundError: No module named 'bitlinear'`:
```bash
# Make sure you installed the package
pip install -e .
# Or add to PYTHONPATH
export PYTHONPATH=/path/to/BitLinear:$PYTHONPATH
```
### CUDA Build Failures
If CUDA build fails:
1. **Check CUDA_HOME:**
```bash
echo $CUDA_HOME # Should point to CUDA installation
```
2. **Check PyTorch CUDA version:**
```python
import torch
print(torch.version.cuda)
```
3. **Match CUDA versions:** PyTorch and system CUDA should match
4. **Fall back to CPU:**
```bash
# Build CPU-only version
unset CUDA_HOME
pip install -e .
```
### Tests Failing
All tests are currently marked as `pytest.skip()` because implementation is not yet complete. This is expected!
To implement:
1. Follow `IMPLEMENTATION_GUIDE.md`
2. Start with `bitlinear/quantization.py`
3. Remove `pytest.skip()` as you implement each function
4. Tests should pass as you complete implementation
## Next Steps
1. **Read the Implementation Guide:** `IMPLEMENTATION_GUIDE.md`
2. **Explore the Project Structure:** `PROJECT_STRUCTURE.md`
3. **Start Implementing:**
- Begin with `bitlinear/quantization.py`
- Move to `bitlinear/functional.py`
- Then `bitlinear/layers.py`
4. **Test as You Go:** Run tests after implementing each component
5. **Try Examples:** Test with `examples/transformer_example.py`
## Getting Help
- **Documentation:** Check docstrings in each module
- **Issues:** Open an issue on GitHub
- **Examples:** See `examples/` directory
- **Tests:** Look at `tests/` for usage patterns
## Performance Tips
### Memory Optimization
1. **Use packed weights** (when implemented):
```python
from bitlinear.packing import pack_ternary_base3
packed, shape = pack_ternary_base3(W_ternary)
```
2. **Batch processing:** Larger batches are more efficient
3. **Mixed precision:** Combine with torch.amp for activation quantization
### Speed Optimization
1. **Use CUDA:** Build with CUDA support for GPU acceleration
2. **Larger layers:** BitLinear benefits increase with layer size
3. **Profile:** Use PyTorch profiler to find bottlenecks
```python
import torch.profiler as profiler
with profiler.profile() as prof:
output = model(x)
print(prof.key_averages().table(sort_by="cuda_time_total", row_limit=10))
```
## Resources
- **Paper:** https://jmlr.org/papers/volume26/24-2050/24-2050.pdf
- **BitNet:** https://arxiv.org/abs/2310.11453
- **PyTorch Quantization:** https://pytorch.org/docs/stable/quantization.html
Happy coding! 🚀