Instructions to use TheBloke/phi-2-GPTQ with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use TheBloke/phi-2-GPTQ with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="TheBloke/phi-2-GPTQ", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("TheBloke/phi-2-GPTQ", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use TheBloke/phi-2-GPTQ with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "TheBloke/phi-2-GPTQ" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TheBloke/phi-2-GPTQ", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/TheBloke/phi-2-GPTQ
- SGLang
How to use TheBloke/phi-2-GPTQ with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "TheBloke/phi-2-GPTQ" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TheBloke/phi-2-GPTQ", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "TheBloke/phi-2-GPTQ" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TheBloke/phi-2-GPTQ", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use TheBloke/phi-2-GPTQ with Docker Model Runner:
docker model run hf.co/TheBloke/phi-2-GPTQ
Not loading on Text Gen Web UI
I have tried the Q3 and Q4 models. They fail to load with llama.cpp in the text gen web ui. I am on Linux.
In installed the text gen web ui yesterday so it should be up to date. Other models work fine.
This is the error from the phi-2.Q3_K_M.gguf model:
The error on the command line where server.py is running says: 2023-12-19 15:38:43 ERROR:Failed to load the model.
Traceback (most recent call last):
File "/home/somename/text-generation-webui/modules/ui_model_menu.py", line 210, in load_model_wrapper
shared.model, shared.tokenizer = load_model(selected_model, loader)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/xxx/text-generation-webui/modules/models.py", line 89, in load_model
output = load_func_maploader
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/somename/text-generation-webui/modules/models.py", line 259, in llamacpp_loader
model, tokenizer = LlamaCppModel.from_pretrained(model_file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/somename/text-generation-webui/modules/llamacpp_model.py", line 91, in from_pretrained
result.model = Llama(**params)
^^^^^^^^^^^^^^^
File "/home/somename/miniconda3/envs/textgen/lib/python3.11/site-packages/llama_cpp_cuda/llama.py", line 957, in init
self._n_vocab = self.n_vocab()
^^^^^^^^^^^^^^
File "/home/somename/miniconda3/envs/textgen/lib/python3.11/site-packages/llama_cpp_cuda/llama.py", line 2264, in n_vocab
return self._model.n_vocab()
^^^^^^^^^^^^^^^^^^^^^
File "/home/somename/miniconda3/envs/textgen/lib/python3.11/site-packages/llama_cpp_cuda/llama.py", line 252, in n_vocab
assert self.model is not None
^^^^^^^^^^^^^^^^^^^^^^
AssertionError
Exception ignored in: <function LlamaCppModel.__del__ at 0x7fd3c3130180>
Traceback (most recent call last):
File "/home/somename/text-generation-webui/modules/llamacpp_model.py", line 49, in del
del self.model
^^^^^^^^^^
AttributeError: 'LlamaCppModel' object has no attribute 'model'
========================
========================
========================
========================
The error in the web console says:
Traceback (most recent call last):
File "/home/somename/text-generation-webui/modules/ui_model_menu.py", line 210, in load_model_wrapper
shared.model, shared.tokenizer = load_model(selected_model, loader)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/somename/text-generation-webui/modules/models.py", line 89, in load_model
output = load_func_maploader
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/somename/text-generation-webui/modules/models.py", line 259, in llamacpp_loader
model, tokenizer = LlamaCppModel.from_pretrained(model_file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/somename/text-generation-webui/modules/llamacpp_model.py", line 91, in from_pretrained
result.model = Llama(**params)
^^^^^^^^^^^^^^^
File "/home/somename/miniconda3/envs/textgen/lib/python3.11/site-packages/llama_cpp_cuda/llama.py", line 957, in init
self._n_vocab = self.n_vocab()
^^^^^^^^^^^^^^
File "/home/somename/miniconda3/envs/textgen/lib/python3.11/site-packages/llama_cpp_cuda/llama.py", line 2264, in n_vocab
return self._model.n_vocab()
^^^^^^^^^^^^^^^^^^^^^
File "/home/somename/miniconda3/envs/textgen/lib/python3.11/site-packages/llama_cpp_cuda/llama.py", line 252, in n_vocab
assert self.model is not None
^^^^^^^^^^^^^^^^^^^^^^
AssertionError
Same
Is there any relevant merge requests or issues?