BitLinear Project Structure
Complete directory tree and file descriptions.
BitLinear/
β
βββ README.md # Project overview and quick start
βββ LICENSE # MIT License
βββ setup.py # Build system with torch.utils.cpp_extension
βββ pyproject.toml # Tool configurations (pytest, black, mypy)
βββ requirements.txt # Core dependencies
βββ requirements-dev.txt # Development dependencies
βββ .gitignore # Git ignore rules
βββ IMPLEMENTATION_GUIDE.md # Step-by-step implementation roadmap
β
βββ bitlinear/ # Main package
β βββ __init__.py # Package exports
β βββ layers.py # BitLinear and MultiTernaryLinear modules
β βββ functional.py # Core functional implementations
β βββ quantization.py # Ternary quantization utilities
β βββ packing.py # Base-3 packing for memory efficiency
β β
β βββ cpp/ # C++/CUDA extensions
β βββ bitlinear.cpp # PyBind11 bindings and CPU implementation
β βββ bitlinear_kernel.cu # CUDA kernel implementations
β
βββ tests/ # Test suite
β βββ __init__.py
β βββ test_functional.py # Tests for functional API
β βββ test_layers.py # Tests for layer modules
β βββ test_quantization.py # Tests for quantization and packing
β
βββ examples/ # Usage examples
βββ basic_usage.py # Simple usage demonstration
βββ transformer_example.py # Transformer integration example
File Descriptions
Root Level
- README.md: Project overview, installation instructions, quick start guide, and citations
- LICENSE: MIT License for open-source distribution
- setup.py: Build configuration using PyTorch's cpp_extension, handles CPU/CUDA builds
- pyproject.toml: Configuration for pytest, black, mypy, and coverage
- requirements.txt: Core runtime dependencies (torch, numpy)
- requirements-dev.txt: Development tools (pytest, black, flake8, mypy)
- .gitignore: Ignores Python cache, build artifacts, CUDA objects
- IMPLEMENTATION_GUIDE.md: Detailed implementation roadmap with phases and best practices
bitlinear/ (Main Package)
Python Modules
__init__.py: Package initialization, exports main classes and functionslayers.py: nn.Module implementationsBitLinear: Drop-in replacement for nn.Linear with ternary weightsMultiTernaryLinear: Sum of k ternary componentsconvert_linear_to_bitlinear(): Recursive model conversion utility
functional.py: Core functional implementationsbitlinear_python(): Pure PyTorch ternary matmul with scalinggreedy_ternary_decomposition(): Iterative residual quantizationmulti_ternary_linear_python(): Multi-component forward passactivation_quant(): Activation quantization for full BitNet
quantization.py: Quantization utilitiesabsmax_scale(): Compute absmax scaling factorsternary_quantize(): Quantize to {-1, 0, +1}weight_to_ternary(): Full quantization pipelinequantize_activations_absmax(): 8-bit activation quantizationdequantize_scale(): Reverse quantization
packing.py: Memory optimizationpack_ternary_base3(): Pack 5 ternary values per byteunpack_ternary_base3(): Unpack base-3 encoded weightscompute_compression_ratio(): Calculate compression statisticsestimate_memory_savings(): Memory estimation utilities
C++/CUDA Extensions
cpp/bitlinear.cpp: C++ interface- PyBind11 module definition
- CPU implementations:
bitlinear_cpu_forward(),multi_ternary_cpu_forward() - Device dispatcher (routes to CPU or CUDA)
- Packing utilities in C++
cpp/bitlinear_kernel.cu: CUDA kernelsbitlinear_forward_kernel(): Optimized ternary matmul kernelmulti_ternary_forward_kernel(): Fused multi-component kernel- Kernel launchers with error handling
- TODO: Tensor Core optimization
tests/
Comprehensive test suite using pytest:
test_functional.py: Tests for functional API- Shape correctness
- Numerical correctness vs. nn.Linear
- Greedy decomposition quality
- Multi-ternary equivalence
test_layers.py: Tests for layer modules- Initialization and parameter counts
- Forward pass shapes
- Compatibility with nn.Linear
- Conversion utilities
- Gradient flow (QAT)
- Integration with Transformer blocks
test_quantization.py: Tests for quantization- Absmax scaling (global and per-channel)
- Ternary quantization values and thresholds
- Reconstruction quality
- Base-3 packing roundtrip
- Compression ratios
- Memory estimation
examples/
Demonstration scripts:
basic_usage.py: Minimal example showing basic API- Creating BitLinear layers
- Forward pass
- Conversion from nn.Linear
transformer_example.py: Realistic Transformer example- Complete Transformer block implementation
- Conversion to BitLinear
- Output comparison
- Memory savings calculation
Key Design Patterns
1. Progressive Enhancement
- Python baseline β C++ CPU β CUDA GPU
- Each layer fully functional before adding next
2. Drop-in Compatibility
- Same interface as nn.Linear
- Same initialization arguments
- Same forward signature
- Works with existing PyTorch features
3. Modular Testing
- Unit tests for each component
- Integration tests for full pipelines
- Performance benchmarks separate
4. Extensive Documentation
- Docstrings explain mathematical operations
- TODO comments mark implementation points
- References to papers for algorithms
- Type hints for clarity
Build Targets
CPU-only (Development)
pip install -e .
With CUDA (Production)
CUDA_HOME=/usr/local/cuda pip install -e .
Testing
pip install -e ".[dev]"
pytest tests/ -v
What's NOT Implemented Yet
All files are stubs with TODOs:
- β Structure is complete
- β Interfaces are defined
- β Documentation is written
- β Logic is NOT implemented (by design)
- β Tests will skip/fail until implementation
Next Steps
Follow IMPLEMENTATION_GUIDE.md:
- Start with
quantization.py(absmax_scale, ternary_quantize) - Move to
functional.py(bitlinear_python) - Implement
layers.py(BitLinear module) - Test with examples
- Add C++/CUDA if needed
Design Philosophy
Correctness > Speed > Memory
- First make it work (Python)
- Then make it fast (C++/CUDA)
- Then make it efficient (packing)
Every component is:
- Well-documented
- Testable
- Modular
- Extensible