read/QUICKSTART.md · krisaujla/BitLinear at main

BitLinear / read /QUICKSTART.md

krisaujla

Upload folder using huggingface_hub

fd8c8b9 verified 11 days ago

preview code

raw

history blame contribute delete

8.61 kB

	# Quick Start Guide

	Get up and running with BitLinear in minutes.

	## Installation

	### Prerequisites

	- Python >= 3.8
	- PyTorch >= 2.0.0
	- (Optional) CUDA toolkit for GPU acceleration

	### Install from Source

	```bash
	# Clone the repository
	git clone https://github.com/yourusername/bitlinear.git
	cd bitlinear

	# Install in development mode (CPU-only)
	pip install -e .

	# Or with development dependencies
	pip install -e ".[dev]"
	```

	### Install with CUDA Support

	```bash
	# Set CUDA_HOME if not already set
	export CUDA_HOME=/usr/local/cuda # Linux/Mac
	# or
	set CUDA_HOME=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8 # Windows

	# Install
	pip install -e .
	```

	## Basic Usage

	### Simple Example

	```python
	import torch
	from bitlinear import BitLinear

	# Create a BitLinear layer (same interface as nn.Linear)
	layer = BitLinear(in_features=512, out_features=1024, bias=True)

	# Forward pass
	x = torch.randn(32, 128, 512) # [batch, seq_len, features]
	output = layer(x) # [32, 128, 1024]

	print(f"Input shape: {x.shape}")
	print(f"Output shape: {output.shape}")
	```

	### Convert Existing Model

	```python
	import torch.nn as nn
	from bitlinear import BitLinear

	# Start with a standard Linear layer
	linear = nn.Linear(512, 1024)
	# ... possibly pre-trained ...

	# Convert to BitLinear
	bitlinear = BitLinear.from_linear(linear)

	# Use as drop-in replacement
	x = torch.randn(16, 512)
	output = bitlinear(x)
	```

	### Multi-Component Ternary Layer

	For better approximation quality:

	```python
	from bitlinear import MultiTernaryLinear

	# k=4 means 4 ternary components (better approximation, 4x compute)
	layer = MultiTernaryLinear(
	in_features=512,
	out_features=1024,
	k=4, # Number of ternary components
	bias=True
	)

	x = torch.randn(32, 512)
	output = layer(x)
	```

	### Convert Entire Model

	```python
	from bitlinear import convert_linear_to_bitlinear
	import torch.nn as nn

	# Original model with nn.Linear layers
	model = nn.Sequential(
	nn.Linear(512, 1024),
	nn.ReLU(),
	nn.Linear(1024, 512),
	nn.Softmax(dim=-1)
	)

	# Convert all Linear layers to BitLinear
	model_bitlinear = convert_linear_to_bitlinear(model, inplace=False)

	# Use as normal
	x = torch.randn(16, 512)
	output = model_bitlinear(x)
	```

	## In a Transformer

	Replace attention projection layers:

	```python
	import torch.nn as nn
	from bitlinear import BitLinear

	class TransformerBlock(nn.Module):
	def __init__(self, d_model=512, nhead=8):
	super().__init__()

	# Replace nn.Linear with BitLinear
	self.q_proj = BitLinear(d_model, d_model)
	self.k_proj = BitLinear(d_model, d_model)
	self.v_proj = BitLinear(d_model, d_model)
	self.out_proj = BitLinear(d_model, d_model)

	# Keep other components unchanged
	self.norm = nn.LayerNorm(d_model)
	self.dropout = nn.Dropout(0.1)

	def forward(self, x):
	# Standard Transformer forward pass
	q = self.q_proj(x)
	k = self.k_proj(x)
	v = self.v_proj(x)
	# ... attention computation ...
	```

	## Memory Savings Example

	```python
	import torch
	import torch.nn as nn
	from bitlinear import BitLinear

	def count_params(model):
	return sum(p.numel() for p in model.parameters())

	def estimate_memory_mb(model):
	total_bytes = sum(p.numel() * p.element_size() for p in model.parameters())
	return total_bytes / (1024 ** 2)

	# Standard Linear
	linear = nn.Linear(2048, 2048)
	print(f"Linear parameters: {count_params(linear):,}")
	print(f"Linear memory: {estimate_memory_mb(linear):.2f} MB")

	# BitLinear
	bitlinear = BitLinear(2048, 2048)
	print(f"BitLinear parameters: {count_params(bitlinear):,}")
	print(f"BitLinear memory: {estimate_memory_mb(bitlinear):.2f} MB")

	# Savings
	savings = (estimate_memory_mb(linear) - estimate_memory_mb(bitlinear)) / estimate_memory_mb(linear) * 100
	print(f"Memory savings: {savings:.1f}%")
	```

	## Training with BitLinear

	### Fine-tuning a Pre-trained Model

	```python
	import torch
	import torch.nn as nn
	import torch.optim as optim
	from bitlinear import convert_linear_to_bitlinear

	# Load pre-trained model
	model = YourModel.from_pretrained('model_name')

	# Convert to BitLinear
	model = convert_linear_to_bitlinear(model, inplace=True)

	# Fine-tune with standard PyTorch training loop
	optimizer = optim.Adam(model.parameters(), lr=1e-4)
	criterion = nn.CrossEntropyLoss()

	for epoch in range(num_epochs):
	for batch in dataloader:
	x, y = batch

	# Forward pass
	output = model(x)
	loss = criterion(output, y)

	# Backward pass
	optimizer.zero_grad()
	loss.backward()
	optimizer.step()
	```

	### Quantization-Aware Training (QAT)

	Train with quantization from scratch:

	```python
	from bitlinear import BitLinear

	# Model with BitLinear from the start
	model = nn.Sequential(
	BitLinear(784, 512),
	nn.ReLU(),
	BitLinear(512, 256),
	nn.ReLU(),
	BitLinear(256, 10),
	)

	# Standard training loop
	# Gradients will flow through quantization (straight-through estimator)
	optimizer = optim.Adam(model.parameters(), lr=1e-3)
	# ... train as usual ...
	```

	## Testing

	Run the test suite:

	```bash
	# Install test dependencies
	pip install -e ".[dev]"

	# Run all tests
	pytest tests/ -v

	# Run specific test file
	pytest tests/test_layers.py -v

	# Run with coverage
	pytest tests/ -v --cov=bitlinear --cov-report=html

	# Skip slow tests
	pytest tests/ -m "not slow"

	# Skip CUDA tests (if no GPU available)
	pytest tests/ -m "not cuda"
	```

	## Examples

	Run included examples:

	```bash
	# Basic usage
	python examples/basic_usage.py

	# Transformer example
	python examples/transformer_example.py
	```

	## Troubleshooting

	### Import Error

	If you get `ModuleNotFoundError: No module named 'bitlinear'`:

	```bash
	# Make sure you installed the package
	pip install -e .

	# Or add to PYTHONPATH
	export PYTHONPATH=/path/to/BitLinear:$PYTHONPATH
	```

	### CUDA Build Failures

	If CUDA build fails:

	1. Check CUDA_HOME:
	```bash
	echo $CUDA_HOME # Should point to CUDA installation
	```

	2. Check PyTorch CUDA version:
	```python
	import torch
	print(torch.version.cuda)
	```

	3. Match CUDA versions: PyTorch and system CUDA should match

	4. Fall back to CPU:
	```bash
	# Build CPU-only version
	unset CUDA_HOME
	pip install -e .
	```

	### Tests Failing

	All tests are currently marked as `pytest.skip()` because implementation is not yet complete. This is expected!

	To implement:
	1. Follow `IMPLEMENTATION_GUIDE.md`
	2. Start with `bitlinear/quantization.py`
	3. Remove `pytest.skip()` as you implement each function
	4. Tests should pass as you complete implementation

	## Next Steps

	1. Read the Implementation Guide: `IMPLEMENTATION_GUIDE.md`
	2. Explore the Project Structure: `PROJECT_STRUCTURE.md`
	3. Start Implementing:
	- Begin with `bitlinear/quantization.py`
	- Move to `bitlinear/functional.py`
	- Then `bitlinear/layers.py`
	4. Test as You Go: Run tests after implementing each component
	5. Try Examples: Test with `examples/transformer_example.py`

	## Getting Help

	- Documentation: Check docstrings in each module
	- Issues: Open an issue on GitHub
	- Examples: See `examples/` directory
	- Tests: Look at `tests/` for usage patterns

	## Performance Tips

	### Memory Optimization

	1. Use packed weights (when implemented):
	```python
	from bitlinear.packing import pack_ternary_base3
	packed, shape = pack_ternary_base3(W_ternary)
	```

	2. Batch processing: Larger batches are more efficient

	3. Mixed precision: Combine with torch.amp for activation quantization

	### Speed Optimization

	1. Use CUDA: Build with CUDA support for GPU acceleration
	2. Larger layers: BitLinear benefits increase with layer size
	3. Profile: Use PyTorch profiler to find bottlenecks

	```python
	import torch.profiler as profiler

	with profiler.profile() as prof:
	output = model(x)

	print(prof.key_averages().table(sort_by="cuda_time_total", row_limit=10))
	```

	## Resources

	- Paper: https://jmlr.org/papers/volume26/24-2050/24-2050.pdf
	- BitNet: https://arxiv.org/abs/2310.11453
	- PyTorch Quantization: https://pytorch.org/docs/stable/quantization.html

	Happy coding! 🚀