BitLinear / read /PROJECT_STRUCTURE.md
krisaujla's picture
Upload folder using huggingface_hub
fd8c8b9 verified
# BitLinear Project Structure
Complete directory tree and file descriptions.
```
BitLinear/
β”‚
β”œβ”€β”€ README.md # Project overview and quick start
β”œβ”€β”€ LICENSE # MIT License
β”œβ”€β”€ setup.py # Build system with torch.utils.cpp_extension
β”œβ”€β”€ pyproject.toml # Tool configurations (pytest, black, mypy)
β”œβ”€β”€ requirements.txt # Core dependencies
β”œβ”€β”€ requirements-dev.txt # Development dependencies
β”œβ”€β”€ .gitignore # Git ignore rules
β”œβ”€β”€ IMPLEMENTATION_GUIDE.md # Step-by-step implementation roadmap
β”‚
β”œβ”€β”€ bitlinear/ # Main package
β”‚ β”œβ”€β”€ __init__.py # Package exports
β”‚ β”œβ”€β”€ layers.py # BitLinear and MultiTernaryLinear modules
β”‚ β”œβ”€β”€ functional.py # Core functional implementations
β”‚ β”œβ”€β”€ quantization.py # Ternary quantization utilities
β”‚ β”œβ”€β”€ packing.py # Base-3 packing for memory efficiency
β”‚ β”‚
β”‚ └── cpp/ # C++/CUDA extensions
β”‚ β”œβ”€β”€ bitlinear.cpp # PyBind11 bindings and CPU implementation
β”‚ └── bitlinear_kernel.cu # CUDA kernel implementations
β”‚
β”œβ”€β”€ tests/ # Test suite
β”‚ β”œβ”€β”€ __init__.py
β”‚ β”œβ”€β”€ test_functional.py # Tests for functional API
β”‚ β”œβ”€β”€ test_layers.py # Tests for layer modules
β”‚ └── test_quantization.py # Tests for quantization and packing
β”‚
└── examples/ # Usage examples
β”œβ”€β”€ basic_usage.py # Simple usage demonstration
└── transformer_example.py # Transformer integration example
```
## File Descriptions
### Root Level
- **README.md**: Project overview, installation instructions, quick start guide, and citations
- **LICENSE**: MIT License for open-source distribution
- **setup.py**: Build configuration using PyTorch's cpp_extension, handles CPU/CUDA builds
- **pyproject.toml**: Configuration for pytest, black, mypy, and coverage
- **requirements.txt**: Core runtime dependencies (torch, numpy)
- **requirements-dev.txt**: Development tools (pytest, black, flake8, mypy)
- **.gitignore**: Ignores Python cache, build artifacts, CUDA objects
- **IMPLEMENTATION_GUIDE.md**: Detailed implementation roadmap with phases and best practices
### bitlinear/ (Main Package)
#### Python Modules
- **`__init__.py`**: Package initialization, exports main classes and functions
- **`layers.py`**: nn.Module implementations
- `BitLinear`: Drop-in replacement for nn.Linear with ternary weights
- `MultiTernaryLinear`: Sum of k ternary components
- `convert_linear_to_bitlinear()`: Recursive model conversion utility
- **`functional.py`**: Core functional implementations
- `bitlinear_python()`: Pure PyTorch ternary matmul with scaling
- `greedy_ternary_decomposition()`: Iterative residual quantization
- `multi_ternary_linear_python()`: Multi-component forward pass
- `activation_quant()`: Activation quantization for full BitNet
- **`quantization.py`**: Quantization utilities
- `absmax_scale()`: Compute absmax scaling factors
- `ternary_quantize()`: Quantize to {-1, 0, +1}
- `weight_to_ternary()`: Full quantization pipeline
- `quantize_activations_absmax()`: 8-bit activation quantization
- `dequantize_scale()`: Reverse quantization
- **`packing.py`**: Memory optimization
- `pack_ternary_base3()`: Pack 5 ternary values per byte
- `unpack_ternary_base3()`: Unpack base-3 encoded weights
- `compute_compression_ratio()`: Calculate compression statistics
- `estimate_memory_savings()`: Memory estimation utilities
#### C++/CUDA Extensions
- **`cpp/bitlinear.cpp`**: C++ interface
- PyBind11 module definition
- CPU implementations: `bitlinear_cpu_forward()`, `multi_ternary_cpu_forward()`
- Device dispatcher (routes to CPU or CUDA)
- Packing utilities in C++
- **`cpp/bitlinear_kernel.cu`**: CUDA kernels
- `bitlinear_forward_kernel()`: Optimized ternary matmul kernel
- `multi_ternary_forward_kernel()`: Fused multi-component kernel
- Kernel launchers with error handling
- TODO: Tensor Core optimization
### tests/
Comprehensive test suite using pytest:
- **`test_functional.py`**: Tests for functional API
- Shape correctness
- Numerical correctness vs. nn.Linear
- Greedy decomposition quality
- Multi-ternary equivalence
- **`test_layers.py`**: Tests for layer modules
- Initialization and parameter counts
- Forward pass shapes
- Compatibility with nn.Linear
- Conversion utilities
- Gradient flow (QAT)
- Integration with Transformer blocks
- **`test_quantization.py`**: Tests for quantization
- Absmax scaling (global and per-channel)
- Ternary quantization values and thresholds
- Reconstruction quality
- Base-3 packing roundtrip
- Compression ratios
- Memory estimation
### examples/
Demonstration scripts:
- **`basic_usage.py`**: Minimal example showing basic API
- Creating BitLinear layers
- Forward pass
- Conversion from nn.Linear
- **`transformer_example.py`**: Realistic Transformer example
- Complete Transformer block implementation
- Conversion to BitLinear
- Output comparison
- Memory savings calculation
## Key Design Patterns
### 1. Progressive Enhancement
- Python baseline β†’ C++ CPU β†’ CUDA GPU
- Each layer fully functional before adding next
### 2. Drop-in Compatibility
- Same interface as nn.Linear
- Same initialization arguments
- Same forward signature
- Works with existing PyTorch features
### 3. Modular Testing
- Unit tests for each component
- Integration tests for full pipelines
- Performance benchmarks separate
### 4. Extensive Documentation
- Docstrings explain mathematical operations
- TODO comments mark implementation points
- References to papers for algorithms
- Type hints for clarity
## Build Targets
### CPU-only (Development)
```bash
pip install -e .
```
### With CUDA (Production)
```bash
CUDA_HOME=/usr/local/cuda pip install -e .
```
### Testing
```bash
pip install -e ".[dev]"
pytest tests/ -v
```
## What's NOT Implemented Yet
All files are **stubs with TODOs**:
- βœ… Structure is complete
- βœ… Interfaces are defined
- βœ… Documentation is written
- ❌ Logic is NOT implemented (by design)
- ❌ Tests will skip/fail until implementation
## Next Steps
Follow IMPLEMENTATION_GUIDE.md:
1. Start with `quantization.py` (absmax_scale, ternary_quantize)
2. Move to `functional.py` (bitlinear_python)
3. Implement `layers.py` (BitLinear module)
4. Test with examples
5. Add C++/CUDA if needed
## Design Philosophy
**Correctness > Speed > Memory**
1. First make it work (Python)
2. Then make it fast (C++/CUDA)
3. Then make it efficient (packing)
Every component is:
- Well-documented
- Testable
- Modular
- Extensible