# Lite LLM Lite LLM is a deterministic, tiered-parameter, hierarchical sparse expert (HSER) language model runtime designed to scale from **1B → 1T parameters** and beyond (up to quadrillion-scale parameter universes) while keeping **active compute bounded** per token. The Github organization hosts the **specification corpus**, **reference implementations**, and **operational tooling** for building and deploying Lite LLM as an enterprise / reference-grade system. Model optimization for the LiteCore Coherent Silicon Photonic Complex Multiply-Accumulate (CSP-cMAC) Unit Cell Hardware focuses on maximizing inference efficiency under tight memory and power constraints by combining compression, quantization, and memory aware execution. LiteCore is a fundamental photonic compute primitive purpose-built for large language model (LLM) inference at quadrillion-parameter scales. LiteCore leverages silicon-on-insulator (SOI) photonics to perform complex-valued multiply-accumulate operations at <1 fJ energy and 1–10 ps latency—representing 500–2,000× energy and 1,000–10,000× latency improvements over state-of-the-art electronic GPUs. --- ## What makes Lite LLM different ### Deterministic by design Lite LLM treats determinism as a first-class requirement: - Stable top‑k routing with seeded tie‑breaking - Deterministic collectives and reproducible distributed execution - Deterministic audit logs and replayable training runs ### Tiered Parameter Architecture (TPA) Parameters are partitioned across storage tiers: - **Hot** (HBM / GPU) - **Warm** (DRAM) - **Cold** (NVMe) - **Archive** (Object Store) Only the TierSet for a request is eligible for routing; everything else has **zero activation probability**. ### Hierarchical Sparse Expert Routing (HSER) Routing is hierarchical: **Tier → Group → Expert** with bounded activation: `k_tier × k_group × k_expert` experts per token per layer. This enables extreme parameter scaling while keeping per-token compute predictable. ### Enterprise runtime focus Lite LLM is not only a model architecture—it is a runtime system: - Distributed execution protocols - Storage hierarchy and prefetching - Secure loading and integrity verification - Multi-tenant isolation, quotas, and compliance readiness --- ## Repositories ### Specifications (authoritative) - `lite-llm-specs` — Enterprise Runtime Engineering Specification Corpus (SPEC‑001…SPEC‑060) - `lite-llm-schemas` — JSON/YAML schemas for manifests, telemetry, policies - `lite-llm-rfcs` — Design proposals and evolution process (RFCs) ### Reference implementations - `lite-llm-runtime` — Rust runtime (routing, caches, dispatch, TierSet engine) - `lite-llm-train` — Training orchestration, checkpointing, determinism harness - `lite-llm-kernels` — Device kernels + safe wrappers (CUDA/HIP/Metal/CPU) - `lite-llm-comm` — Transport abstraction (RDMA / NCCL / QUIC), collectives - `lite-llm-storage` — Shards, manifests, tier placement, streaming + prefetch ### Tooling - `lite-llm-cli` — Operator CLI (inspect checkpoints, tier policies, telemetry) - `lite-llm-observability` — Metrics exporters, dashboards, tracing - `lite-llm-deploy` — Helm charts, Terraform modules, bare‑metal playbooks > The organization may not yet contain all repositories listed above; this is the intended long-term structure. --- ## Getting started ### 1) Read the specs Start with: - **SPEC‑001** Runtime Architecture Overview - **SPEC‑003** Deterministic Routing Engine - **SPEC‑004** Tiered Parameter Architecture (TPA) - **SPEC‑005** Hierarchical Sparse Expert Routing (HSER) - **SPEC‑006** Active Compute Bounding Model - **SPEC‑021…030** Storage hierarchy (hot/warm/cold/archive) - **SPEC‑041…050** Inference runtime (TierSet selection, dispatch, KV cache) ### 2) Implement the contracts The specs are written to be directly implementable: - Deterministic routing + stable sorting - Tier placement policies and shard formats - All‑to‑all dispatch and imbalance handling - Audit logging and integrity verification ### 3) Validate determinism Before performance optimization: - Ensure cross-node routing reproducibility - Validate deterministic collectives - Use the replay engine during training --- ## Contribution We welcome contributions in: - Spec clarifications and testable invariants - Rust runtime modules (memory model, routing, dispatch, caching) - Deterministic training harness and replay tooling - Storage tier orchestration and prefetch algorithms - Security hardening and audit improvements Please read: - `CONTRIBUTING.md` for workflow and standards - `CODE_OF_CONDUCT.md` for community expectations - `SECURITY.md` for vulnerability reporting --- ## Security Lite LLM emphasizes: - Memory-safe runtime design in Rust - Secure checkpoint loading and integrity verification - Encryption at rest for tier storage - Key management and auditability - Sandboxing and capability isolation for extensions See `SECURITY.md` to report vulnerabilities responsibly. --- ## Governance The specification corpus is the **normative authority**. Changes to the corpus should go through the RFC process: 1. Open an RFC in `lite-llm-rfcs` 2. Discuss and iterate 3. Land a spec patch with tests, invariants, and migration notes --- ## License Lite-LLM is distributed under the Dust Open Source License license: other license_name: dosl-iie-1.0 license_link: https://github.com/lite-llm/lite-llm/raw/refs/heads/main/LICENSE --- ## Contact - Security: see `SECURITY.md` - General: open an issue in the relevant repository ---