shenzhi-wang/Llama3-8B-Chinese-Chat-GGUF-8bit Text Generation • 8B • Updated Jul 4, 2024 • 1.25k • 167
LLM in a flash: Efficient Large Language Model Inference with Limited Memory Paper • 2312.11514 • Published Dec 12, 2023 • 264