read/PROJECT_STRUCTURE.md · krisaujla/BitLinear at main

File size: 7,178 Bytes

fd8c8b9

# BitLinear Project Structure

Complete directory tree and file descriptions.

```

BitLinear/

│

├── README.md                      # Project overview and quick start

├── LICENSE                        # MIT License

├── setup.py                       # Build system with torch.utils.cpp_extension

├── pyproject.toml                 # Tool configurations (pytest, black, mypy)

├── requirements.txt               # Core dependencies

├── requirements-dev.txt           # Development dependencies

├── .gitignore                     # Git ignore rules

├── IMPLEMENTATION_GUIDE.md        # Step-by-step implementation roadmap

│

├── bitlinear/                     # Main package

│   ├── __init__.py               # Package exports

│   ├── layers.py                 # BitLinear and MultiTernaryLinear modules

│   ├── functional.py             # Core functional implementations

│   ├── quantization.py           # Ternary quantization utilities

│   ├── packing.py                # Base-3 packing for memory efficiency

│   │

│   └── cpp/                      # C++/CUDA extensions

│       ├── bitlinear.cpp         # PyBind11 bindings and CPU implementation

│       └── bitlinear_kernel.cu   # CUDA kernel implementations

│

├── tests/                         # Test suite

│   ├── __init__.py

│   ├── test_functional.py        # Tests for functional API

│   ├── test_layers.py            # Tests for layer modules

│   └── test_quantization.py     # Tests for quantization and packing

│

└── examples/                      # Usage examples

    ├── basic_usage.py            # Simple usage demonstration

    └── transformer_example.py    # Transformer integration example

```

## File Descriptions

### Root Level

- **README.md**: Project overview, installation instructions, quick start guide, and citations
- **LICENSE**: MIT License for open-source distribution
- **setup.py**: Build configuration using PyTorch's cpp_extension, handles CPU/CUDA builds

- **pyproject.toml**: Configuration for pytest, black, mypy, and coverage

- **requirements.txt**: Core runtime dependencies (torch, numpy)

- **requirements-dev.txt**: Development tools (pytest, black, flake8, mypy)

- **.gitignore**: Ignores Python cache, build artifacts, CUDA objects

- **IMPLEMENTATION_GUIDE.md**: Detailed implementation roadmap with phases and best practices



### bitlinear/ (Main Package)



#### Python Modules



- **`__init__.py`**: Package initialization, exports main classes and functions

- **`layers.py`**: nn.Module implementations

  - `BitLinear`: Drop-in replacement for nn.Linear with ternary weights

  - `MultiTernaryLinear`: Sum of k ternary components

  - `convert_linear_to_bitlinear()`: Recursive model conversion utility

- **`functional.py`**: Core functional implementations
  - `bitlinear_python()`: Pure PyTorch ternary matmul with scaling
  - `greedy_ternary_decomposition()`: Iterative residual quantization
  - `multi_ternary_linear_python()`: Multi-component forward pass
  - `activation_quant()`: Activation quantization for full BitNet

- **`quantization.py`**: Quantization utilities
  - `absmax_scale()`: Compute absmax scaling factors
  - `ternary_quantize()`: Quantize to {-1, 0, +1}
  - `weight_to_ternary()`: Full quantization pipeline
  - `quantize_activations_absmax()`: 8-bit activation quantization
  - `dequantize_scale()`: Reverse quantization

- **`packing.py`**: Memory optimization
  - `pack_ternary_base3()`: Pack 5 ternary values per byte
  - `unpack_ternary_base3()`: Unpack base-3 encoded weights
  - `compute_compression_ratio()`: Calculate compression statistics
  - `estimate_memory_savings()`: Memory estimation utilities

#### C++/CUDA Extensions

- **`cpp/bitlinear.cpp`**: C++ interface
  - PyBind11 module definition
  - CPU implementations: `bitlinear_cpu_forward()`, `multi_ternary_cpu_forward()`
  - Device dispatcher (routes to CPU or CUDA)
  - Packing utilities in C++

- **`cpp/bitlinear_kernel.cu`**: CUDA kernels

  - `bitlinear_forward_kernel()`: Optimized ternary matmul kernel

  - `multi_ternary_forward_kernel()`: Fused multi-component kernel

  - Kernel launchers with error handling

  - TODO: Tensor Core optimization



### tests/



Comprehensive test suite using pytest:



- **`test_functional.py`**: Tests for functional API

  - Shape correctness

  - Numerical correctness vs. nn.Linear

  - Greedy decomposition quality

  - Multi-ternary equivalence



- **`test_layers.py`**: Tests for layer modules

  - Initialization and parameter counts

  - Forward pass shapes

  - Compatibility with nn.Linear

  - Conversion utilities

  - Gradient flow (QAT)

  - Integration with Transformer blocks



- **`test_quantization.py`**: Tests for quantization

  - Absmax scaling (global and per-channel)

  - Ternary quantization values and thresholds

  - Reconstruction quality

  - Base-3 packing roundtrip

  - Compression ratios

  - Memory estimation



### examples/



Demonstration scripts:



- **`basic_usage.py`**: Minimal example showing basic API

  - Creating BitLinear layers

  - Forward pass

  - Conversion from nn.Linear



- **`transformer_example.py`**: Realistic Transformer example

  - Complete Transformer block implementation

  - Conversion to BitLinear

  - Output comparison

  - Memory savings calculation



## Key Design Patterns



### 1. Progressive Enhancement

- Python baseline → C++ CPU → CUDA GPU

- Each layer fully functional before adding next



### 2. Drop-in Compatibility

- Same interface as nn.Linear

- Same initialization arguments

- Same forward signature

- Works with existing PyTorch features



### 3. Modular Testing

- Unit tests for each component

- Integration tests for full pipelines

- Performance benchmarks separate



### 4. Extensive Documentation

- Docstrings explain mathematical operations

- TODO comments mark implementation points

- References to papers for algorithms

- Type hints for clarity



## Build Targets



### CPU-only (Development)

```bash

pip install -e .

```



### With CUDA (Production)

```bash

CUDA_HOME=/usr/local/cuda pip install -e .

```



### Testing

```bash

pip install -e ".[dev]"

pytest tests/ -v

```



## What's NOT Implemented Yet



All files are **stubs with TODOs**:

- ✅ Structure is complete

- ✅ Interfaces are defined

- ✅ Documentation is written

- ❌ Logic is NOT implemented (by design)

- ❌ Tests will skip/fail until implementation



## Next Steps



Follow IMPLEMENTATION_GUIDE.md:

1. Start with `quantization.py` (absmax_scale, ternary_quantize)

2. Move to `functional.py` (bitlinear_python)

3. Implement `layers.py` (BitLinear module)

4. Test with examples

5. Add C++/CUDA if needed



## Design Philosophy



**Correctness > Speed > Memory**

1. First make it work (Python)

2. Then make it fast (C++/CUDA)

3. Then make it efficient (packing)



Every component is:

- Well-documented

- Testable

- Modular

- Extensible