SOTA ternary-packed versions of 1.58-bit LLMs for efficient on-device inference with vlut.cpp.
Xiangyu Li
XXXXyu
AI & ML interests
On-device and physical AI
Recent Activity
authored a paper 1 day ago
OxyGen: Unified KV Cache Management for Vision-Language-Action Models under Multi-Task Parallelism commentedon a paper 1 day ago
Vec-LUT: Vector Table Lookup for Parallel Ultra-Low-Bit LLM Inference on Edge DevicesOrganizations
None yet