| # BitLinear Project Structure | |
| Complete directory tree and file descriptions. | |
| ``` | |
| BitLinear/ | |
| β | |
| βββ README.md # Project overview and quick start | |
| βββ LICENSE # MIT License | |
| βββ setup.py # Build system with torch.utils.cpp_extension | |
| βββ pyproject.toml # Tool configurations (pytest, black, mypy) | |
| βββ requirements.txt # Core dependencies | |
| βββ requirements-dev.txt # Development dependencies | |
| βββ .gitignore # Git ignore rules | |
| βββ IMPLEMENTATION_GUIDE.md # Step-by-step implementation roadmap | |
| β | |
| βββ bitlinear/ # Main package | |
| β βββ __init__.py # Package exports | |
| β βββ layers.py # BitLinear and MultiTernaryLinear modules | |
| β βββ functional.py # Core functional implementations | |
| β βββ quantization.py # Ternary quantization utilities | |
| β βββ packing.py # Base-3 packing for memory efficiency | |
| β β | |
| β βββ cpp/ # C++/CUDA extensions | |
| β βββ bitlinear.cpp # PyBind11 bindings and CPU implementation | |
| β βββ bitlinear_kernel.cu # CUDA kernel implementations | |
| β | |
| βββ tests/ # Test suite | |
| β βββ __init__.py | |
| β βββ test_functional.py # Tests for functional API | |
| β βββ test_layers.py # Tests for layer modules | |
| β βββ test_quantization.py # Tests for quantization and packing | |
| β | |
| βββ examples/ # Usage examples | |
| βββ basic_usage.py # Simple usage demonstration | |
| βββ transformer_example.py # Transformer integration example | |
| ``` | |
| ## File Descriptions | |
| ### Root Level | |
| - **README.md**: Project overview, installation instructions, quick start guide, and citations | |
| - **LICENSE**: MIT License for open-source distribution | |
| - **setup.py**: Build configuration using PyTorch's cpp_extension, handles CPU/CUDA builds | |
| - **pyproject.toml**: Configuration for pytest, black, mypy, and coverage | |
| - **requirements.txt**: Core runtime dependencies (torch, numpy) | |
| - **requirements-dev.txt**: Development tools (pytest, black, flake8, mypy) | |
| - **.gitignore**: Ignores Python cache, build artifacts, CUDA objects | |
| - **IMPLEMENTATION_GUIDE.md**: Detailed implementation roadmap with phases and best practices | |
| ### bitlinear/ (Main Package) | |
| #### Python Modules | |
| - **`__init__.py`**: Package initialization, exports main classes and functions | |
| - **`layers.py`**: nn.Module implementations | |
| - `BitLinear`: Drop-in replacement for nn.Linear with ternary weights | |
| - `MultiTernaryLinear`: Sum of k ternary components | |
| - `convert_linear_to_bitlinear()`: Recursive model conversion utility | |
| - **`functional.py`**: Core functional implementations | |
| - `bitlinear_python()`: Pure PyTorch ternary matmul with scaling | |
| - `greedy_ternary_decomposition()`: Iterative residual quantization | |
| - `multi_ternary_linear_python()`: Multi-component forward pass | |
| - `activation_quant()`: Activation quantization for full BitNet | |
| - **`quantization.py`**: Quantization utilities | |
| - `absmax_scale()`: Compute absmax scaling factors | |
| - `ternary_quantize()`: Quantize to {-1, 0, +1} | |
| - `weight_to_ternary()`: Full quantization pipeline | |
| - `quantize_activations_absmax()`: 8-bit activation quantization | |
| - `dequantize_scale()`: Reverse quantization | |
| - **`packing.py`**: Memory optimization | |
| - `pack_ternary_base3()`: Pack 5 ternary values per byte | |
| - `unpack_ternary_base3()`: Unpack base-3 encoded weights | |
| - `compute_compression_ratio()`: Calculate compression statistics | |
| - `estimate_memory_savings()`: Memory estimation utilities | |
| #### C++/CUDA Extensions | |
| - **`cpp/bitlinear.cpp`**: C++ interface | |
| - PyBind11 module definition | |
| - CPU implementations: `bitlinear_cpu_forward()`, `multi_ternary_cpu_forward()` | |
| - Device dispatcher (routes to CPU or CUDA) | |
| - Packing utilities in C++ | |
| - **`cpp/bitlinear_kernel.cu`**: CUDA kernels | |
| - `bitlinear_forward_kernel()`: Optimized ternary matmul kernel | |
| - `multi_ternary_forward_kernel()`: Fused multi-component kernel | |
| - Kernel launchers with error handling | |
| - TODO: Tensor Core optimization | |
| ### tests/ | |
| Comprehensive test suite using pytest: | |
| - **`test_functional.py`**: Tests for functional API | |
| - Shape correctness | |
| - Numerical correctness vs. nn.Linear | |
| - Greedy decomposition quality | |
| - Multi-ternary equivalence | |
| - **`test_layers.py`**: Tests for layer modules | |
| - Initialization and parameter counts | |
| - Forward pass shapes | |
| - Compatibility with nn.Linear | |
| - Conversion utilities | |
| - Gradient flow (QAT) | |
| - Integration with Transformer blocks | |
| - **`test_quantization.py`**: Tests for quantization | |
| - Absmax scaling (global and per-channel) | |
| - Ternary quantization values and thresholds | |
| - Reconstruction quality | |
| - Base-3 packing roundtrip | |
| - Compression ratios | |
| - Memory estimation | |
| ### examples/ | |
| Demonstration scripts: | |
| - **`basic_usage.py`**: Minimal example showing basic API | |
| - Creating BitLinear layers | |
| - Forward pass | |
| - Conversion from nn.Linear | |
| - **`transformer_example.py`**: Realistic Transformer example | |
| - Complete Transformer block implementation | |
| - Conversion to BitLinear | |
| - Output comparison | |
| - Memory savings calculation | |
| ## Key Design Patterns | |
| ### 1. Progressive Enhancement | |
| - Python baseline β C++ CPU β CUDA GPU | |
| - Each layer fully functional before adding next | |
| ### 2. Drop-in Compatibility | |
| - Same interface as nn.Linear | |
| - Same initialization arguments | |
| - Same forward signature | |
| - Works with existing PyTorch features | |
| ### 3. Modular Testing | |
| - Unit tests for each component | |
| - Integration tests for full pipelines | |
| - Performance benchmarks separate | |
| ### 4. Extensive Documentation | |
| - Docstrings explain mathematical operations | |
| - TODO comments mark implementation points | |
| - References to papers for algorithms | |
| - Type hints for clarity | |
| ## Build Targets | |
| ### CPU-only (Development) | |
| ```bash | |
| pip install -e . | |
| ``` | |
| ### With CUDA (Production) | |
| ```bash | |
| CUDA_HOME=/usr/local/cuda pip install -e . | |
| ``` | |
| ### Testing | |
| ```bash | |
| pip install -e ".[dev]" | |
| pytest tests/ -v | |
| ``` | |
| ## What's NOT Implemented Yet | |
| All files are **stubs with TODOs**: | |
| - β Structure is complete | |
| - β Interfaces are defined | |
| - β Documentation is written | |
| - β Logic is NOT implemented (by design) | |
| - β Tests will skip/fail until implementation | |
| ## Next Steps | |
| Follow IMPLEMENTATION_GUIDE.md: | |
| 1. Start with `quantization.py` (absmax_scale, ternary_quantize) | |
| 2. Move to `functional.py` (bitlinear_python) | |
| 3. Implement `layers.py` (BitLinear module) | |
| 4. Test with examples | |
| 5. Add C++/CUDA if needed | |
| ## Design Philosophy | |
| **Correctness > Speed > Memory** | |
| 1. First make it work (Python) | |
| 2. Then make it fast (C++/CUDA) | |
| 3. Then make it efficient (packing) | |
| Every component is: | |
| - Well-documented | |
| - Testable | |
| - Modular | |
| - Extensible | |