read/PROJECT_STRUCTURE.md · krisaujla/BitLinear at main

BitLinear / read /PROJECT_STRUCTURE.md

krisaujla

Upload folder using huggingface_hub

fd8c8b9 verified 10 days ago

preview code

raw

history blame contribute delete

7.18 kB

	# BitLinear Project Structure

	Complete directory tree and file descriptions.

	```
	BitLinear/
	│
	├── README.md # Project overview and quick start
	├── LICENSE # MIT License
	├── setup.py # Build system with torch.utils.cpp_extension
	├── pyproject.toml # Tool configurations (pytest, black, mypy)
	├── requirements.txt # Core dependencies
	├── requirements-dev.txt # Development dependencies
	├── .gitignore # Git ignore rules
	├── IMPLEMENTATION_GUIDE.md # Step-by-step implementation roadmap
	│
	├── bitlinear/ # Main package
	│ ├── __init__.py # Package exports
	│ ├── layers.py # BitLinear and MultiTernaryLinear modules
	│ ├── functional.py # Core functional implementations
	│ ├── quantization.py # Ternary quantization utilities
	│ ├── packing.py # Base-3 packing for memory efficiency
	│ │
	│ └── cpp/ # C++/CUDA extensions
	│ ├── bitlinear.cpp # PyBind11 bindings and CPU implementation
	│ └── bitlinear_kernel.cu # CUDA kernel implementations
	│
	├── tests/ # Test suite
	│ ├── __init__.py
	│ ├── test_functional.py # Tests for functional API
	│ ├── test_layers.py # Tests for layer modules
	│ └── test_quantization.py # Tests for quantization and packing
	│
	└── examples/ # Usage examples
	├── basic_usage.py # Simple usage demonstration
	└── transformer_example.py # Transformer integration example
	```

	## File Descriptions

	### Root Level

	- README.md: Project overview, installation instructions, quick start guide, and citations
	- LICENSE: MIT License for open-source distribution
	- setup.py: Build configuration using PyTorch's cpp_extension, handles CPU/CUDA builds
	- pyproject.toml: Configuration for pytest, black, mypy, and coverage
	- requirements.txt: Core runtime dependencies (torch, numpy)
	- requirements-dev.txt: Development tools (pytest, black, flake8, mypy)
	- .gitignore: Ignores Python cache, build artifacts, CUDA objects
	- IMPLEMENTATION_GUIDE.md: Detailed implementation roadmap with phases and best practices

	### bitlinear/ (Main Package)

	#### Python Modules

	- `__init__.py`: Package initialization, exports main classes and functions
	- `layers.py`: nn.Module implementations
	- `BitLinear`: Drop-in replacement for nn.Linear with ternary weights
	- `MultiTernaryLinear`: Sum of k ternary components
	- `convert_linear_to_bitlinear()`: Recursive model conversion utility

	- `functional.py`: Core functional implementations
	- `bitlinear_python()`: Pure PyTorch ternary matmul with scaling
	- `greedy_ternary_decomposition()`: Iterative residual quantization
	- `multi_ternary_linear_python()`: Multi-component forward pass
	- `activation_quant()`: Activation quantization for full BitNet

	- `quantization.py`: Quantization utilities
	- `absmax_scale()`: Compute absmax scaling factors
	- `ternary_quantize()`: Quantize to {-1, 0, +1}
	- `weight_to_ternary()`: Full quantization pipeline
	- `quantize_activations_absmax()`: 8-bit activation quantization
	- `dequantize_scale()`: Reverse quantization

	- `packing.py`: Memory optimization
	- `pack_ternary_base3()`: Pack 5 ternary values per byte
	- `unpack_ternary_base3()`: Unpack base-3 encoded weights
	- `compute_compression_ratio()`: Calculate compression statistics
	- `estimate_memory_savings()`: Memory estimation utilities

	#### C++/CUDA Extensions

	- `cpp/bitlinear.cpp`: C++ interface
	- PyBind11 module definition
	- CPU implementations: `bitlinear_cpu_forward()`, `multi_ternary_cpu_forward()`
	- Device dispatcher (routes to CPU or CUDA)
	- Packing utilities in C++

	- `cpp/bitlinear_kernel.cu`: CUDA kernels
	- `bitlinear_forward_kernel()`: Optimized ternary matmul kernel
	- `multi_ternary_forward_kernel()`: Fused multi-component kernel
	- Kernel launchers with error handling
	- TODO: Tensor Core optimization

	### tests/

	Comprehensive test suite using pytest:

	- `test_functional.py`: Tests for functional API
	- Shape correctness
	- Numerical correctness vs. nn.Linear
	- Greedy decomposition quality
	- Multi-ternary equivalence

	- `test_layers.py`: Tests for layer modules
	- Initialization and parameter counts
	- Forward pass shapes
	- Compatibility with nn.Linear
	- Conversion utilities
	- Gradient flow (QAT)
	- Integration with Transformer blocks

	- `test_quantization.py`: Tests for quantization
	- Absmax scaling (global and per-channel)
	- Ternary quantization values and thresholds
	- Reconstruction quality
	- Base-3 packing roundtrip
	- Compression ratios
	- Memory estimation

	### examples/

	Demonstration scripts:

	- `basic_usage.py`: Minimal example showing basic API
	- Creating BitLinear layers
	- Forward pass
	- Conversion from nn.Linear

	- `transformer_example.py`: Realistic Transformer example
	- Complete Transformer block implementation
	- Conversion to BitLinear
	- Output comparison
	- Memory savings calculation

	## Key Design Patterns

	### 1. Progressive Enhancement
	- Python baseline → C++ CPU → CUDA GPU
	- Each layer fully functional before adding next

	### 2. Drop-in Compatibility
	- Same interface as nn.Linear
	- Same initialization arguments
	- Same forward signature
	- Works with existing PyTorch features

	### 3. Modular Testing
	- Unit tests for each component
	- Integration tests for full pipelines
	- Performance benchmarks separate

	### 4. Extensive Documentation
	- Docstrings explain mathematical operations
	- TODO comments mark implementation points
	- References to papers for algorithms
	- Type hints for clarity

	## Build Targets

	### CPU-only (Development)
	```bash
	pip install -e .
	```

	### With CUDA (Production)
	```bash
	CUDA_HOME=/usr/local/cuda pip install -e .
	```

	### Testing
	```bash
	pip install -e ".[dev]"
	pytest tests/ -v
	```

	## What's NOT Implemented Yet

	All files are stubs with TODOs:
	- ✅ Structure is complete
	- ✅ Interfaces are defined
	- ✅ Documentation is written
	- ❌ Logic is NOT implemented (by design)
	- ❌ Tests will skip/fail until implementation

	## Next Steps

	Follow IMPLEMENTATION_GUIDE.md:
	1. Start with `quantization.py` (absmax_scale, ternary_quantize)
	2. Move to `functional.py` (bitlinear_python)
	3. Implement `layers.py` (BitLinear module)
	4. Test with examples
	5. Add C++/CUDA if needed

	## Design Philosophy

	Correctness > Speed > Memory
	1. First make it work (Python)
	2. Then make it fast (C++/CUDA)
	3. Then make it efficient (packing)

	Every component is:
	- Well-documented
	- Testable
	- Modular
	- Extensible