File size: 7,178 Bytes
fd8c8b9 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 |
# BitLinear Project Structure
Complete directory tree and file descriptions.
```
BitLinear/
β
βββ README.md # Project overview and quick start
βββ LICENSE # MIT License
βββ setup.py # Build system with torch.utils.cpp_extension
βββ pyproject.toml # Tool configurations (pytest, black, mypy)
βββ requirements.txt # Core dependencies
βββ requirements-dev.txt # Development dependencies
βββ .gitignore # Git ignore rules
βββ IMPLEMENTATION_GUIDE.md # Step-by-step implementation roadmap
β
βββ bitlinear/ # Main package
β βββ __init__.py # Package exports
β βββ layers.py # BitLinear and MultiTernaryLinear modules
β βββ functional.py # Core functional implementations
β βββ quantization.py # Ternary quantization utilities
β βββ packing.py # Base-3 packing for memory efficiency
β β
β βββ cpp/ # C++/CUDA extensions
β βββ bitlinear.cpp # PyBind11 bindings and CPU implementation
β βββ bitlinear_kernel.cu # CUDA kernel implementations
β
βββ tests/ # Test suite
β βββ __init__.py
β βββ test_functional.py # Tests for functional API
β βββ test_layers.py # Tests for layer modules
β βββ test_quantization.py # Tests for quantization and packing
β
βββ examples/ # Usage examples
βββ basic_usage.py # Simple usage demonstration
βββ transformer_example.py # Transformer integration example
```
## File Descriptions
### Root Level
- **README.md**: Project overview, installation instructions, quick start guide, and citations
- **LICENSE**: MIT License for open-source distribution
- **setup.py**: Build configuration using PyTorch's cpp_extension, handles CPU/CUDA builds
- **pyproject.toml**: Configuration for pytest, black, mypy, and coverage
- **requirements.txt**: Core runtime dependencies (torch, numpy)
- **requirements-dev.txt**: Development tools (pytest, black, flake8, mypy)
- **.gitignore**: Ignores Python cache, build artifacts, CUDA objects
- **IMPLEMENTATION_GUIDE.md**: Detailed implementation roadmap with phases and best practices
### bitlinear/ (Main Package)
#### Python Modules
- **`__init__.py`**: Package initialization, exports main classes and functions
- **`layers.py`**: nn.Module implementations
- `BitLinear`: Drop-in replacement for nn.Linear with ternary weights
- `MultiTernaryLinear`: Sum of k ternary components
- `convert_linear_to_bitlinear()`: Recursive model conversion utility
- **`functional.py`**: Core functional implementations
- `bitlinear_python()`: Pure PyTorch ternary matmul with scaling
- `greedy_ternary_decomposition()`: Iterative residual quantization
- `multi_ternary_linear_python()`: Multi-component forward pass
- `activation_quant()`: Activation quantization for full BitNet
- **`quantization.py`**: Quantization utilities
- `absmax_scale()`: Compute absmax scaling factors
- `ternary_quantize()`: Quantize to {-1, 0, +1}
- `weight_to_ternary()`: Full quantization pipeline
- `quantize_activations_absmax()`: 8-bit activation quantization
- `dequantize_scale()`: Reverse quantization
- **`packing.py`**: Memory optimization
- `pack_ternary_base3()`: Pack 5 ternary values per byte
- `unpack_ternary_base3()`: Unpack base-3 encoded weights
- `compute_compression_ratio()`: Calculate compression statistics
- `estimate_memory_savings()`: Memory estimation utilities
#### C++/CUDA Extensions
- **`cpp/bitlinear.cpp`**: C++ interface
- PyBind11 module definition
- CPU implementations: `bitlinear_cpu_forward()`, `multi_ternary_cpu_forward()`
- Device dispatcher (routes to CPU or CUDA)
- Packing utilities in C++
- **`cpp/bitlinear_kernel.cu`**: CUDA kernels
- `bitlinear_forward_kernel()`: Optimized ternary matmul kernel
- `multi_ternary_forward_kernel()`: Fused multi-component kernel
- Kernel launchers with error handling
- TODO: Tensor Core optimization
### tests/
Comprehensive test suite using pytest:
- **`test_functional.py`**: Tests for functional API
- Shape correctness
- Numerical correctness vs. nn.Linear
- Greedy decomposition quality
- Multi-ternary equivalence
- **`test_layers.py`**: Tests for layer modules
- Initialization and parameter counts
- Forward pass shapes
- Compatibility with nn.Linear
- Conversion utilities
- Gradient flow (QAT)
- Integration with Transformer blocks
- **`test_quantization.py`**: Tests for quantization
- Absmax scaling (global and per-channel)
- Ternary quantization values and thresholds
- Reconstruction quality
- Base-3 packing roundtrip
- Compression ratios
- Memory estimation
### examples/
Demonstration scripts:
- **`basic_usage.py`**: Minimal example showing basic API
- Creating BitLinear layers
- Forward pass
- Conversion from nn.Linear
- **`transformer_example.py`**: Realistic Transformer example
- Complete Transformer block implementation
- Conversion to BitLinear
- Output comparison
- Memory savings calculation
## Key Design Patterns
### 1. Progressive Enhancement
- Python baseline β C++ CPU β CUDA GPU
- Each layer fully functional before adding next
### 2. Drop-in Compatibility
- Same interface as nn.Linear
- Same initialization arguments
- Same forward signature
- Works with existing PyTorch features
### 3. Modular Testing
- Unit tests for each component
- Integration tests for full pipelines
- Performance benchmarks separate
### 4. Extensive Documentation
- Docstrings explain mathematical operations
- TODO comments mark implementation points
- References to papers for algorithms
- Type hints for clarity
## Build Targets
### CPU-only (Development)
```bash
pip install -e .
```
### With CUDA (Production)
```bash
CUDA_HOME=/usr/local/cuda pip install -e .
```
### Testing
```bash
pip install -e ".[dev]"
pytest tests/ -v
```
## What's NOT Implemented Yet
All files are **stubs with TODOs**:
- β
Structure is complete
- β
Interfaces are defined
- β
Documentation is written
- β Logic is NOT implemented (by design)
- β Tests will skip/fail until implementation
## Next Steps
Follow IMPLEMENTATION_GUIDE.md:
1. Start with `quantization.py` (absmax_scale, ternary_quantize)
2. Move to `functional.py` (bitlinear_python)
3. Implement `layers.py` (BitLinear module)
4. Test with examples
5. Add C++/CUDA if needed
## Design Philosophy
**Correctness > Speed > Memory**
1. First make it work (Python)
2. Then make it fast (C++/CUDA)
3. Then make it efficient (packing)
Every component is:
- Well-documented
- Testable
- Modular
- Extensible
|