BitLinear / read /QUICKSTART.md
krisaujla's picture
Upload folder using huggingface_hub
fd8c8b9 verified

Quick Start Guide

Get up and running with BitLinear in minutes.

Installation

Prerequisites

  • Python >= 3.8
  • PyTorch >= 2.0.0
  • (Optional) CUDA toolkit for GPU acceleration

Install from Source

# Clone the repository
git clone https://github.com/yourusername/bitlinear.git
cd bitlinear

# Install in development mode (CPU-only)
pip install -e .

# Or with development dependencies
pip install -e ".[dev]"

Install with CUDA Support

# Set CUDA_HOME if not already set
export CUDA_HOME=/usr/local/cuda  # Linux/Mac
# or
set CUDA_HOME=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8  # Windows

# Install
pip install -e .

Basic Usage

Simple Example

import torch
from bitlinear import BitLinear

# Create a BitLinear layer (same interface as nn.Linear)
layer = BitLinear(in_features=512, out_features=1024, bias=True)

# Forward pass
x = torch.randn(32, 128, 512)  # [batch, seq_len, features]
output = layer(x)  # [32, 128, 1024]

print(f"Input shape: {x.shape}")
print(f"Output shape: {output.shape}")

Convert Existing Model

import torch.nn as nn
from bitlinear import BitLinear

# Start with a standard Linear layer
linear = nn.Linear(512, 1024)
# ... possibly pre-trained ...

# Convert to BitLinear
bitlinear = BitLinear.from_linear(linear)

# Use as drop-in replacement
x = torch.randn(16, 512)
output = bitlinear(x)

Multi-Component Ternary Layer

For better approximation quality:

from bitlinear import MultiTernaryLinear

# k=4 means 4 ternary components (better approximation, 4x compute)
layer = MultiTernaryLinear(
    in_features=512,
    out_features=1024,
    k=4,  # Number of ternary components
    bias=True
)

x = torch.randn(32, 512)
output = layer(x)

Convert Entire Model

from bitlinear import convert_linear_to_bitlinear
import torch.nn as nn

# Original model with nn.Linear layers
model = nn.Sequential(
    nn.Linear(512, 1024),
    nn.ReLU(),
    nn.Linear(1024, 512),
    nn.Softmax(dim=-1)
)

# Convert all Linear layers to BitLinear
model_bitlinear = convert_linear_to_bitlinear(model, inplace=False)

# Use as normal
x = torch.randn(16, 512)
output = model_bitlinear(x)

In a Transformer

Replace attention projection layers:

import torch.nn as nn
from bitlinear import BitLinear

class TransformerBlock(nn.Module):
    def __init__(self, d_model=512, nhead=8):
        super().__init__()
        
        # Replace nn.Linear with BitLinear
        self.q_proj = BitLinear(d_model, d_model)
        self.k_proj = BitLinear(d_model, d_model)
        self.v_proj = BitLinear(d_model, d_model)
        self.out_proj = BitLinear(d_model, d_model)
        
        # Keep other components unchanged
        self.norm = nn.LayerNorm(d_model)
        self.dropout = nn.Dropout(0.1)
    
    def forward(self, x):
        # Standard Transformer forward pass
        q = self.q_proj(x)
        k = self.k_proj(x)
        v = self.v_proj(x)
        # ... attention computation ...

Memory Savings Example

import torch
import torch.nn as nn
from bitlinear import BitLinear

def count_params(model):
    return sum(p.numel() for p in model.parameters())

def estimate_memory_mb(model):
    total_bytes = sum(p.numel() * p.element_size() for p in model.parameters())
    return total_bytes / (1024 ** 2)

# Standard Linear
linear = nn.Linear(2048, 2048)
print(f"Linear parameters: {count_params(linear):,}")
print(f"Linear memory: {estimate_memory_mb(linear):.2f} MB")

# BitLinear
bitlinear = BitLinear(2048, 2048)
print(f"BitLinear parameters: {count_params(bitlinear):,}")
print(f"BitLinear memory: {estimate_memory_mb(bitlinear):.2f} MB")

# Savings
savings = (estimate_memory_mb(linear) - estimate_memory_mb(bitlinear)) / estimate_memory_mb(linear) * 100
print(f"Memory savings: {savings:.1f}%")

Training with BitLinear

Fine-tuning a Pre-trained Model

import torch
import torch.nn as nn
import torch.optim as optim
from bitlinear import convert_linear_to_bitlinear

# Load pre-trained model
model = YourModel.from_pretrained('model_name')

# Convert to BitLinear
model = convert_linear_to_bitlinear(model, inplace=True)

# Fine-tune with standard PyTorch training loop
optimizer = optim.Adam(model.parameters(), lr=1e-4)
criterion = nn.CrossEntropyLoss()

for epoch in range(num_epochs):
    for batch in dataloader:
        x, y = batch
        
        # Forward pass
        output = model(x)
        loss = criterion(output, y)
        
        # Backward pass
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

Quantization-Aware Training (QAT)

Train with quantization from scratch:

from bitlinear import BitLinear

# Model with BitLinear from the start
model = nn.Sequential(
    BitLinear(784, 512),
    nn.ReLU(),
    BitLinear(512, 256),
    nn.ReLU(),
    BitLinear(256, 10),
)

# Standard training loop
# Gradients will flow through quantization (straight-through estimator)
optimizer = optim.Adam(model.parameters(), lr=1e-3)
# ... train as usual ...

Testing

Run the test suite:

# Install test dependencies
pip install -e ".[dev]"

# Run all tests
pytest tests/ -v

# Run specific test file
pytest tests/test_layers.py -v

# Run with coverage
pytest tests/ -v --cov=bitlinear --cov-report=html

# Skip slow tests
pytest tests/ -m "not slow"

# Skip CUDA tests (if no GPU available)
pytest tests/ -m "not cuda"

Examples

Run included examples:

# Basic usage
python examples/basic_usage.py

# Transformer example
python examples/transformer_example.py

Troubleshooting

Import Error

If you get ModuleNotFoundError: No module named 'bitlinear':

# Make sure you installed the package
pip install -e .

# Or add to PYTHONPATH
export PYTHONPATH=/path/to/BitLinear:$PYTHONPATH

CUDA Build Failures

If CUDA build fails:

  1. Check CUDA_HOME:

    echo $CUDA_HOME  # Should point to CUDA installation
    
  2. Check PyTorch CUDA version:

    import torch
    print(torch.version.cuda)
    
  3. Match CUDA versions: PyTorch and system CUDA should match

  4. Fall back to CPU:

    # Build CPU-only version
    unset CUDA_HOME
    pip install -e .
    

Tests Failing

All tests are currently marked as pytest.skip() because implementation is not yet complete. This is expected!

To implement:

  1. Follow IMPLEMENTATION_GUIDE.md
  2. Start with bitlinear/quantization.py
  3. Remove pytest.skip() as you implement each function
  4. Tests should pass as you complete implementation

Next Steps

  1. Read the Implementation Guide: IMPLEMENTATION_GUIDE.md
  2. Explore the Project Structure: PROJECT_STRUCTURE.md
  3. Start Implementing:
    • Begin with bitlinear/quantization.py
    • Move to bitlinear/functional.py
    • Then bitlinear/layers.py
  4. Test as You Go: Run tests after implementing each component
  5. Try Examples: Test with examples/transformer_example.py

Getting Help

  • Documentation: Check docstrings in each module
  • Issues: Open an issue on GitHub
  • Examples: See examples/ directory
  • Tests: Look at tests/ for usage patterns

Performance Tips

Memory Optimization

  1. Use packed weights (when implemented):

    from bitlinear.packing import pack_ternary_base3
    packed, shape = pack_ternary_base3(W_ternary)
    
  2. Batch processing: Larger batches are more efficient

  3. Mixed precision: Combine with torch.amp for activation quantization

Speed Optimization

  1. Use CUDA: Build with CUDA support for GPU acceleration
  2. Larger layers: BitLinear benefits increase with layer size
  3. Profile: Use PyTorch profiler to find bottlenecks
import torch.profiler as profiler

with profiler.profile() as prof:
    output = model(x)

print(prof.key_averages().table(sort_by="cuda_time_total", row_limit=10))

Resources

Happy coding! 🚀