Spaces:
Runtime error
Runtime error
A newer version of the Gradio SDK is available: 6.13.0
ExLlama
About
ExLlama is an extremely optimized GPTQ backend ("loader") for LLaMA models. It features much lower VRAM usage and much higher speeds due to not relying on unoptimized transformers code.
Installation:
- Clone the ExLlama repository into your
text-generation-webui/repositoriesfolder:
mkdir repositories
cd repositories
git clone https://github.com/turboderp/exllama
Follow the remaining set up instructions in the official README: https://github.com/turboderp/exllama#exllama
Configure text-generation-webui to use exllama via the UI or command line:
- In the "Model" tab, set "Loader" to "exllama"
- Specify
--loader exllamaon the command line