catid
/

llama-65b-4bit

Text Generation

Model card Files Files and versions

llama-65b-4bit / README.md

catid's picture

Add notes

47c9974 almost 3 years ago

|

history blame contribute delete

946 Bytes

	# llama-65b-4bit

	This works with my branch of GPTQ-for-LLaMa: https://github.com/catid/GPTQ-for-LLaMa-65B-2GPU

	To test it out on two RTX4090 GPUs and 64GB RAM (might work with a big swap file haven't tested):

	```bash
	# Install git-lfs
	sudo apt install git git-lfs

	# Clone the code
	git clone https://github.com/catid/GPTQ-for-LLaMa-65B-2GPU
	cd GPTQ-for-LLaMa-65B-2GPU

	# Clone the model weights
	git lfs install
	git clone https://huggingface.co/catid/llama-65b-4bit

	# Set up conda environment
	conda create -n gptq python=3.10
	conda activate gptq

	# Install script dependencies
	pip install -r requirements.txt

	# Work around protobuf error
	export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python

	# Run a test
	python llama_inference.py llama-65b-4bit --load llama-65b-4bit/llama65b-4bit-128g.safetensors --groupsize 128 --wbits 4 --text "I woke up with a dent in my forehead. " --max_length 128 --min_length 32
	```

	---
	license: bsd-3-clause
	---