view article Article No GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL +4 Jun 3 • 96
sarvam-m Collection Collection of all variations of the sarvam-m model • 3 items • Updated May 24 • 19
view article Article A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using transformers, accelerate and bitsandbytes Aug 17, 2022 • 121
view article Article Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA +3 May 24, 2023 • 171
view article Article Llama 3.1 - 405B, 70B & 8B with multilinguality and long context +6 Jul 23, 2024 • 241