Instructions to use TheBloke/Falcon-7B-Instruct-GPTQ with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use TheBloke/Falcon-7B-Instruct-GPTQ with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="TheBloke/Falcon-7B-Instruct-GPTQ", trust_remote_code=True)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("TheBloke/Falcon-7B-Instruct-GPTQ", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use TheBloke/Falcon-7B-Instruct-GPTQ with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "TheBloke/Falcon-7B-Instruct-GPTQ"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "TheBloke/Falcon-7B-Instruct-GPTQ",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/TheBloke/Falcon-7B-Instruct-GPTQ

SGLang

How to use TheBloke/Falcon-7B-Instruct-GPTQ with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "TheBloke/Falcon-7B-Instruct-GPTQ" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "TheBloke/Falcon-7B-Instruct-GPTQ",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "TheBloke/Falcon-7B-Instruct-GPTQ" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "TheBloke/Falcon-7B-Instruct-GPTQ",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use TheBloke/Falcon-7B-Instruct-GPTQ with Docker Model Runner:
```
docker model run hf.co/TheBloke/Falcon-7B-Instruct-GPTQ
```

Do you know anything about this error?

by RedXeol - opened May 28, 2023

Discussion

RedXeol

May 28, 2023

•

edited May 28, 2023

hello sorry to bother you have some knowledge about a bug that now happens in oobabooga when updating it with bitsandbytes 0.39.0 now i cant use rtx 3090 gpu i thought it was windows installer so i download it again and do a clean install but the error still persists.

bin A:\LLMs_LOCAL\oobabooga_windows\installer_files\env\lib\site-packages\bitsandbytes\libbitsandbytes_cpu.dll
R:\LLMs_LOCAL\oobabooga_windows\installer_files\env\lib\site-packages\bitsandbytes\cextension.py:34: User Warning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiply, and GPU quantization are not available.
warn("The installed version of bitsandbytes was compiled without GPU support. "
function 'cadam32bit_grad_fp32' not found

RedXeol changed discussion title from help!!! to Do you know anything about this error? May 28, 2023

TheBloke

Owner May 28, 2023

Hmm sorry I'm not quite sure why you're getting that. The text-generation-webui installation should take care of that

Maybe ask on the text-generation-webui Reddit page or Discord? I'm afraid I've never seen that error before

RedXeol

May 28, 2023

•

edited May 28, 2023

Thank you very much for your answer, I found the problem is an incompatibility of the xformers library that uses pytorch 2.0.1 and is still not compatible with the ui interfaces... solution, do not install it

TheBloke

Owner May 28, 2023

Ahh thanks for letting me know!

texturalnewbie

May 29, 2023

It's working perfectly fine for me on Win 11, RTX4090 and bitsandbytes 0.39.0. @RedXeol : support for Falcon was just merged into main branch. Feel free to git pull ;)

dougtaylor

Jun 1, 2023

Red try this tutorial. https://agi-sphere.com/text-generation-webui-windows It is highly specific to Windows and text-generation-webui
I nearly pulled my hair out trying to get past it. I got that error every try until I discovered the Windows pre-built wheels for Bits & Bytes.

Good luck!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment