# Quick Start Guide

Get up and running with BitLinear in minutes.

## Installation

### Prerequisites

- Python >= 3.8
- PyTorch >= 2.0.0
- (Optional) CUDA toolkit for GPU acceleration

### Install from Source

```bash
# Clone the repository
git clone https://github.com/yourusername/bitlinear.git
cd bitlinear

# Install in development mode (CPU-only)
pip install -e .

# Or with development dependencies
pip install -e ".[dev]"
```

### Install with CUDA Support

```bash
# Set CUDA_HOME if not already set
export CUDA_HOME=/usr/local/cuda  # Linux/Mac
# or
set CUDA_HOME=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8  # Windows

# Install
pip install -e .
```

## Basic Usage

### Simple Example

```python
import torch
from bitlinear import BitLinear

# Create a BitLinear layer (same interface as nn.Linear)
layer = BitLinear(in_features=512, out_features=1024, bias=True)

# Forward pass
x = torch.randn(32, 128, 512)  # [batch, seq_len, features]
output = layer(x)  # [32, 128, 1024]

print(f"Input shape: {x.shape}")
print(f"Output shape: {output.shape}")
```

### Convert Existing Model

```python
import torch.nn as nn
from bitlinear import BitLinear

# Start with a standard Linear layer
linear = nn.Linear(512, 1024)
# ... possibly pre-trained ...

# Convert to BitLinear
bitlinear = BitLinear.from_linear(linear)

# Use as drop-in replacement
x = torch.randn(16, 512)
output = bitlinear(x)
```

### Multi-Component Ternary Layer

For better approximation quality:

```python
from bitlinear import MultiTernaryLinear

# k=4 means 4 ternary components (better approximation, 4x compute)
layer = MultiTernaryLinear(
    in_features=512,
    out_features=1024,
    k=4,  # Number of ternary components
    bias=True
)

x = torch.randn(32, 512)
output = layer(x)
```

### Convert Entire Model

```python
from bitlinear import convert_linear_to_bitlinear
import torch.nn as nn

# Original model with nn.Linear layers
model = nn.Sequential(
    nn.Linear(512, 1024),
    nn.ReLU(),
    nn.Linear(1024, 512),
    nn.Softmax(dim=-1)
)

# Convert all Linear layers to BitLinear
model_bitlinear = convert_linear_to_bitlinear(model, inplace=False)

# Use as normal
x = torch.randn(16, 512)
output = model_bitlinear(x)
```

## In a Transformer

Replace attention projection layers:

```python
import torch.nn as nn
from bitlinear import BitLinear

class TransformerBlock(nn.Module):
    def __init__(self, d_model=512, nhead=8):
        super().__init__()
        
        # Replace nn.Linear with BitLinear
        self.q_proj = BitLinear(d_model, d_model)
        self.k_proj = BitLinear(d_model, d_model)
        self.v_proj = BitLinear(d_model, d_model)
        self.out_proj = BitLinear(d_model, d_model)
        
        # Keep other components unchanged
        self.norm = nn.LayerNorm(d_model)
        self.dropout = nn.Dropout(0.1)
    
    def forward(self, x):
        # Standard Transformer forward pass
        q = self.q_proj(x)
        k = self.k_proj(x)
        v = self.v_proj(x)
        # ... attention computation ...
```

## Memory Savings Example

```python
import torch
import torch.nn as nn
from bitlinear import BitLinear

def count_params(model):
    return sum(p.numel() for p in model.parameters())

def estimate_memory_mb(model):
    total_bytes = sum(p.numel() * p.element_size() for p in model.parameters())
    return total_bytes / (1024 ** 2)

# Standard Linear
linear = nn.Linear(2048, 2048)
print(f"Linear parameters: {count_params(linear):,}")
print(f"Linear memory: {estimate_memory_mb(linear):.2f} MB")

# BitLinear
bitlinear = BitLinear(2048, 2048)
print(f"BitLinear parameters: {count_params(bitlinear):,}")
print(f"BitLinear memory: {estimate_memory_mb(bitlinear):.2f} MB")

# Savings
savings = (estimate_memory_mb(linear) - estimate_memory_mb(bitlinear)) / estimate_memory_mb(linear) * 100
print(f"Memory savings: {savings:.1f}%")
```

## Training with BitLinear

### Fine-tuning a Pre-trained Model

```python
import torch
import torch.nn as nn
import torch.optim as optim
from bitlinear import convert_linear_to_bitlinear

# Load pre-trained model
model = YourModel.from_pretrained('model_name')

# Convert to BitLinear
model = convert_linear_to_bitlinear(model, inplace=True)

# Fine-tune with standard PyTorch training loop
optimizer = optim.Adam(model.parameters(), lr=1e-4)
criterion = nn.CrossEntropyLoss()

for epoch in range(num_epochs):
    for batch in dataloader:
        x, y = batch
        
        # Forward pass
        output = model(x)
        loss = criterion(output, y)
        
        # Backward pass
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
```

### Quantization-Aware Training (QAT)

Train with quantization from scratch:

```python
from bitlinear import BitLinear

# Model with BitLinear from the start
model = nn.Sequential(
    BitLinear(784, 512),
    nn.ReLU(),
    BitLinear(512, 256),
    nn.ReLU(),
    BitLinear(256, 10),
)

# Standard training loop
# Gradients will flow through quantization (straight-through estimator)
optimizer = optim.Adam(model.parameters(), lr=1e-3)
# ... train as usual ...
```

## Testing

Run the test suite:

```bash
# Install test dependencies
pip install -e ".[dev]"

# Run all tests
pytest tests/ -v

# Run specific test file
pytest tests/test_layers.py -v

# Run with coverage
pytest tests/ -v --cov=bitlinear --cov-report=html

# Skip slow tests
pytest tests/ -m "not slow"

# Skip CUDA tests (if no GPU available)
pytest tests/ -m "not cuda"
```

## Examples

Run included examples:

```bash
# Basic usage
python examples/basic_usage.py

# Transformer example
python examples/transformer_example.py
```

## Troubleshooting

### Import Error

If you get `ModuleNotFoundError: No module named 'bitlinear'`:

```bash
# Make sure you installed the package
pip install -e .

# Or add to PYTHONPATH
export PYTHONPATH=/path/to/BitLinear:$PYTHONPATH
```

### CUDA Build Failures

If CUDA build fails:

1. **Check CUDA_HOME:**
   ```bash
   echo $CUDA_HOME  # Should point to CUDA installation
   ```

2. **Check PyTorch CUDA version:**
   ```python
   import torch
   print(torch.version.cuda)
   ```

3. **Match CUDA versions:** PyTorch and system CUDA should match

4. **Fall back to CPU:**
   ```bash
   # Build CPU-only version
   unset CUDA_HOME
   pip install -e .
   ```

### Tests Failing

All tests are currently marked as `pytest.skip()` because implementation is not yet complete. This is expected!

To implement:
1. Follow `IMPLEMENTATION_GUIDE.md`
2. Start with `bitlinear/quantization.py`
3. Remove `pytest.skip()` as you implement each function
4. Tests should pass as you complete implementation

## Next Steps

1. **Read the Implementation Guide:** `IMPLEMENTATION_GUIDE.md`
2. **Explore the Project Structure:** `PROJECT_STRUCTURE.md`
3. **Start Implementing:**
   - Begin with `bitlinear/quantization.py`
   - Move to `bitlinear/functional.py`
   - Then `bitlinear/layers.py`
4. **Test as You Go:** Run tests after implementing each component
5. **Try Examples:** Test with `examples/transformer_example.py`

## Getting Help

- **Documentation:** Check docstrings in each module
- **Issues:** Open an issue on GitHub
- **Examples:** See `examples/` directory
- **Tests:** Look at `tests/` for usage patterns

## Performance Tips

### Memory Optimization

1. **Use packed weights** (when implemented):
   ```python
   from bitlinear.packing import pack_ternary_base3
   packed, shape = pack_ternary_base3(W_ternary)
   ```

2. **Batch processing:** Larger batches are more efficient

3. **Mixed precision:** Combine with torch.amp for activation quantization

### Speed Optimization

1. **Use CUDA:** Build with CUDA support for GPU acceleration
2. **Larger layers:** BitLinear benefits increase with layer size
3. **Profile:** Use PyTorch profiler to find bottlenecks

```python
import torch.profiler as profiler

with profiler.profile() as prof:
    output = model(x)

print(prof.key_averages().table(sort_by="cuda_time_total", row_limit=10))
```

## Resources

- **Paper:** https://jmlr.org/papers/volume26/24-2050/24-2050.pdf
- **BitNet:** https://arxiv.org/abs/2310.11453
- **PyTorch Quantization:** https://pytorch.org/docs/stable/quantization.html

Happy coding! 🚀