# llama-65b-4bit This works with my branch of GPTQ-for-LLaMa: https://github.com/catid/GPTQ-for-LLaMa-65B-2GPU To test it out on two RTX4090 GPUs and 64GB RAM (might work with a big swap file haven't tested): ```bash # Install git-lfs sudo apt install git git-lfs # Clone the code git clone https://github.com/catid/GPTQ-for-LLaMa-65B-2GPU cd GPTQ-for-LLaMa-65B-2GPU # Clone the model weights git lfs install git clone https://huggingface.co/catid/llama-65b-4bit # Set up conda environment conda create -n gptq python=3.10 conda activate gptq # Install script dependencies pip install -r requirements.txt # Work around protobuf error export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python # Run a test python llama_inference.py llama-65b-4bit --load llama-65b-4bit/llama65b-4bit-128g.safetensors --groupsize 128 --wbits 4 --text "I woke up with a dent in my forehead. " --max_length 128 --min_length 32 ``` --- license: bsd-3-clause ---