# BitLinear Project Structure

Complete directory tree and file descriptions.

```
BitLinear/
│
├── README.md                      # Project overview and quick start
├── LICENSE                        # MIT License
├── setup.py                       # Build system with torch.utils.cpp_extension
├── pyproject.toml                 # Tool configurations (pytest, black, mypy)
├── requirements.txt               # Core dependencies
├── requirements-dev.txt           # Development dependencies
├── .gitignore                     # Git ignore rules
├── IMPLEMENTATION_GUIDE.md        # Step-by-step implementation roadmap
│
├── bitlinear/                     # Main package
│   ├── __init__.py               # Package exports
│   ├── layers.py                 # BitLinear and MultiTernaryLinear modules
│   ├── functional.py             # Core functional implementations
│   ├── quantization.py           # Ternary quantization utilities
│   ├── packing.py                # Base-3 packing for memory efficiency
│   │
│   └── cpp/                      # C++/CUDA extensions
│       ├── bitlinear.cpp         # PyBind11 bindings and CPU implementation
│       └── bitlinear_kernel.cu   # CUDA kernel implementations
│
├── tests/                         # Test suite
│   ├── __init__.py
│   ├── test_functional.py        # Tests for functional API
│   ├── test_layers.py            # Tests for layer modules
│   └── test_quantization.py     # Tests for quantization and packing
│
└── examples/                      # Usage examples
    ├── basic_usage.py            # Simple usage demonstration
    └── transformer_example.py    # Transformer integration example
```

## File Descriptions

### Root Level

- **README.md**: Project overview, installation instructions, quick start guide, and citations
- **LICENSE**: MIT License for open-source distribution
- **setup.py**: Build configuration using PyTorch's cpp_extension, handles CPU/CUDA builds
- **pyproject.toml**: Configuration for pytest, black, mypy, and coverage
- **requirements.txt**: Core runtime dependencies (torch, numpy)
- **requirements-dev.txt**: Development tools (pytest, black, flake8, mypy)
- **.gitignore**: Ignores Python cache, build artifacts, CUDA objects
- **IMPLEMENTATION_GUIDE.md**: Detailed implementation roadmap with phases and best practices

### bitlinear/ (Main Package)

#### Python Modules

- **`__init__.py`**: Package initialization, exports main classes and functions
- **`layers.py`**: nn.Module implementations
  - `BitLinear`: Drop-in replacement for nn.Linear with ternary weights
  - `MultiTernaryLinear`: Sum of k ternary components
  - `convert_linear_to_bitlinear()`: Recursive model conversion utility

- **`functional.py`**: Core functional implementations
  - `bitlinear_python()`: Pure PyTorch ternary matmul with scaling
  - `greedy_ternary_decomposition()`: Iterative residual quantization
  - `multi_ternary_linear_python()`: Multi-component forward pass
  - `activation_quant()`: Activation quantization for full BitNet

- **`quantization.py`**: Quantization utilities
  - `absmax_scale()`: Compute absmax scaling factors
  - `ternary_quantize()`: Quantize to {-1, 0, +1}
  - `weight_to_ternary()`: Full quantization pipeline
  - `quantize_activations_absmax()`: 8-bit activation quantization
  - `dequantize_scale()`: Reverse quantization

- **`packing.py`**: Memory optimization
  - `pack_ternary_base3()`: Pack 5 ternary values per byte
  - `unpack_ternary_base3()`: Unpack base-3 encoded weights
  - `compute_compression_ratio()`: Calculate compression statistics
  - `estimate_memory_savings()`: Memory estimation utilities

#### C++/CUDA Extensions

- **`cpp/bitlinear.cpp`**: C++ interface
  - PyBind11 module definition
  - CPU implementations: `bitlinear_cpu_forward()`, `multi_ternary_cpu_forward()`
  - Device dispatcher (routes to CPU or CUDA)
  - Packing utilities in C++

- **`cpp/bitlinear_kernel.cu`**: CUDA kernels
  - `bitlinear_forward_kernel()`: Optimized ternary matmul kernel
  - `multi_ternary_forward_kernel()`: Fused multi-component kernel
  - Kernel launchers with error handling
  - TODO: Tensor Core optimization

### tests/

Comprehensive test suite using pytest:

- **`test_functional.py`**: Tests for functional API
  - Shape correctness
  - Numerical correctness vs. nn.Linear
  - Greedy decomposition quality
  - Multi-ternary equivalence

- **`test_layers.py`**: Tests for layer modules
  - Initialization and parameter counts
  - Forward pass shapes
  - Compatibility with nn.Linear
  - Conversion utilities
  - Gradient flow (QAT)
  - Integration with Transformer blocks

- **`test_quantization.py`**: Tests for quantization
  - Absmax scaling (global and per-channel)
  - Ternary quantization values and thresholds
  - Reconstruction quality
  - Base-3 packing roundtrip
  - Compression ratios
  - Memory estimation

### examples/

Demonstration scripts:

- **`basic_usage.py`**: Minimal example showing basic API
  - Creating BitLinear layers
  - Forward pass
  - Conversion from nn.Linear

- **`transformer_example.py`**: Realistic Transformer example
  - Complete Transformer block implementation
  - Conversion to BitLinear
  - Output comparison
  - Memory savings calculation

## Key Design Patterns

### 1. Progressive Enhancement
- Python baseline → C++ CPU → CUDA GPU
- Each layer fully functional before adding next

### 2. Drop-in Compatibility
- Same interface as nn.Linear
- Same initialization arguments
- Same forward signature
- Works with existing PyTorch features

### 3. Modular Testing
- Unit tests for each component
- Integration tests for full pipelines
- Performance benchmarks separate

### 4. Extensive Documentation
- Docstrings explain mathematical operations
- TODO comments mark implementation points
- References to papers for algorithms
- Type hints for clarity

## Build Targets

### CPU-only (Development)
```bash
pip install -e .
```

### With CUDA (Production)
```bash
CUDA_HOME=/usr/local/cuda pip install -e .
```

### Testing
```bash
pip install -e ".[dev]"
pytest tests/ -v
```

## What's NOT Implemented Yet

All files are **stubs with TODOs**:
- ✅ Structure is complete
- ✅ Interfaces are defined
- ✅ Documentation is written
- ❌ Logic is NOT implemented (by design)
- ❌ Tests will skip/fail until implementation

## Next Steps

Follow IMPLEMENTATION_GUIDE.md:
1. Start with `quantization.py` (absmax_scale, ternary_quantize)
2. Move to `functional.py` (bitlinear_python)
3. Implement `layers.py` (BitLinear module)
4. Test with examples
5. Add C++/CUDA if needed

## Design Philosophy

**Correctness > Speed > Memory**
1. First make it work (Python)
2. Then make it fast (C++/CUDA)
3. Then make it efficient (packing)

Every component is:
- Well-documented
- Testable
- Modular
- Extensible