BitLinear / read /PROJECT_STRUCTURE.md
krisaujla's picture
Upload folder using huggingface_hub
fd8c8b9 verified

BitLinear Project Structure

Complete directory tree and file descriptions.

BitLinear/
β”‚
β”œβ”€β”€ README.md                      # Project overview and quick start
β”œβ”€β”€ LICENSE                        # MIT License
β”œβ”€β”€ setup.py                       # Build system with torch.utils.cpp_extension
β”œβ”€β”€ pyproject.toml                 # Tool configurations (pytest, black, mypy)
β”œβ”€β”€ requirements.txt               # Core dependencies
β”œβ”€β”€ requirements-dev.txt           # Development dependencies
β”œβ”€β”€ .gitignore                     # Git ignore rules
β”œβ”€β”€ IMPLEMENTATION_GUIDE.md        # Step-by-step implementation roadmap
β”‚
β”œβ”€β”€ bitlinear/                     # Main package
β”‚   β”œβ”€β”€ __init__.py               # Package exports
β”‚   β”œβ”€β”€ layers.py                 # BitLinear and MultiTernaryLinear modules
β”‚   β”œβ”€β”€ functional.py             # Core functional implementations
β”‚   β”œβ”€β”€ quantization.py           # Ternary quantization utilities
β”‚   β”œβ”€β”€ packing.py                # Base-3 packing for memory efficiency
β”‚   β”‚
β”‚   └── cpp/                      # C++/CUDA extensions
β”‚       β”œβ”€β”€ bitlinear.cpp         # PyBind11 bindings and CPU implementation
β”‚       └── bitlinear_kernel.cu   # CUDA kernel implementations
β”‚
β”œβ”€β”€ tests/                         # Test suite
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ test_functional.py        # Tests for functional API
β”‚   β”œβ”€β”€ test_layers.py            # Tests for layer modules
β”‚   └── test_quantization.py     # Tests for quantization and packing
β”‚
└── examples/                      # Usage examples
    β”œβ”€β”€ basic_usage.py            # Simple usage demonstration
    └── transformer_example.py    # Transformer integration example

File Descriptions

Root Level

  • README.md: Project overview, installation instructions, quick start guide, and citations
  • LICENSE: MIT License for open-source distribution
  • setup.py: Build configuration using PyTorch's cpp_extension, handles CPU/CUDA builds
  • pyproject.toml: Configuration for pytest, black, mypy, and coverage
  • requirements.txt: Core runtime dependencies (torch, numpy)
  • requirements-dev.txt: Development tools (pytest, black, flake8, mypy)
  • .gitignore: Ignores Python cache, build artifacts, CUDA objects
  • IMPLEMENTATION_GUIDE.md: Detailed implementation roadmap with phases and best practices

bitlinear/ (Main Package)

Python Modules

  • __init__.py: Package initialization, exports main classes and functions

  • layers.py: nn.Module implementations

    • BitLinear: Drop-in replacement for nn.Linear with ternary weights
    • MultiTernaryLinear: Sum of k ternary components
    • convert_linear_to_bitlinear(): Recursive model conversion utility
  • functional.py: Core functional implementations

    • bitlinear_python(): Pure PyTorch ternary matmul with scaling
    • greedy_ternary_decomposition(): Iterative residual quantization
    • multi_ternary_linear_python(): Multi-component forward pass
    • activation_quant(): Activation quantization for full BitNet
  • quantization.py: Quantization utilities

    • absmax_scale(): Compute absmax scaling factors
    • ternary_quantize(): Quantize to {-1, 0, +1}
    • weight_to_ternary(): Full quantization pipeline
    • quantize_activations_absmax(): 8-bit activation quantization
    • dequantize_scale(): Reverse quantization
  • packing.py: Memory optimization

    • pack_ternary_base3(): Pack 5 ternary values per byte
    • unpack_ternary_base3(): Unpack base-3 encoded weights
    • compute_compression_ratio(): Calculate compression statistics
    • estimate_memory_savings(): Memory estimation utilities

C++/CUDA Extensions

  • cpp/bitlinear.cpp: C++ interface

    • PyBind11 module definition
    • CPU implementations: bitlinear_cpu_forward(), multi_ternary_cpu_forward()
    • Device dispatcher (routes to CPU or CUDA)
    • Packing utilities in C++
  • cpp/bitlinear_kernel.cu: CUDA kernels

    • bitlinear_forward_kernel(): Optimized ternary matmul kernel
    • multi_ternary_forward_kernel(): Fused multi-component kernel
    • Kernel launchers with error handling
    • TODO: Tensor Core optimization

tests/

Comprehensive test suite using pytest:

  • test_functional.py: Tests for functional API

    • Shape correctness
    • Numerical correctness vs. nn.Linear
    • Greedy decomposition quality
    • Multi-ternary equivalence
  • test_layers.py: Tests for layer modules

    • Initialization and parameter counts
    • Forward pass shapes
    • Compatibility with nn.Linear
    • Conversion utilities
    • Gradient flow (QAT)
    • Integration with Transformer blocks
  • test_quantization.py: Tests for quantization

    • Absmax scaling (global and per-channel)
    • Ternary quantization values and thresholds
    • Reconstruction quality
    • Base-3 packing roundtrip
    • Compression ratios
    • Memory estimation

examples/

Demonstration scripts:

  • basic_usage.py: Minimal example showing basic API

    • Creating BitLinear layers
    • Forward pass
    • Conversion from nn.Linear
  • transformer_example.py: Realistic Transformer example

    • Complete Transformer block implementation
    • Conversion to BitLinear
    • Output comparison
    • Memory savings calculation

Key Design Patterns

1. Progressive Enhancement

  • Python baseline β†’ C++ CPU β†’ CUDA GPU
  • Each layer fully functional before adding next

2. Drop-in Compatibility

  • Same interface as nn.Linear
  • Same initialization arguments
  • Same forward signature
  • Works with existing PyTorch features

3. Modular Testing

  • Unit tests for each component
  • Integration tests for full pipelines
  • Performance benchmarks separate

4. Extensive Documentation

  • Docstrings explain mathematical operations
  • TODO comments mark implementation points
  • References to papers for algorithms
  • Type hints for clarity

Build Targets

CPU-only (Development)

pip install -e .

With CUDA (Production)

CUDA_HOME=/usr/local/cuda pip install -e .

Testing

pip install -e ".[dev]"
pytest tests/ -v

What's NOT Implemented Yet

All files are stubs with TODOs:

  • βœ… Structure is complete
  • βœ… Interfaces are defined
  • βœ… Documentation is written
  • ❌ Logic is NOT implemented (by design)
  • ❌ Tests will skip/fail until implementation

Next Steps

Follow IMPLEMENTATION_GUIDE.md:

  1. Start with quantization.py (absmax_scale, ternary_quantize)
  2. Move to functional.py (bitlinear_python)
  3. Implement layers.py (BitLinear module)
  4. Test with examples
  5. Add C++/CUDA if needed

Design Philosophy

Correctness > Speed > Memory

  1. First make it work (Python)
  2. Then make it fast (C++/CUDA)
  3. Then make it efficient (packing)

Every component is:

  • Well-documented
  • Testable
  • Modular
  • Extensible