HPC-Coder-v2 Quantizations
Collection
4 items • Updated
How to use hpcgroup/hpc-coder-v2-6.7b-Q8_0-GGUF with Transformers:
# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("hpcgroup/hpc-coder-v2-6.7b-Q8_0-GGUF", dtype="auto")How to use hpcgroup/hpc-coder-v2-6.7b-Q8_0-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="hpcgroup/hpc-coder-v2-6.7b-Q8_0-GGUF", filename="hpc-coder-v2-6.7b-q8_0.gguf", )
output = llm( "Once upon a time,", max_tokens=512, echo=True ) print(output)
How to use hpcgroup/hpc-coder-v2-6.7b-Q8_0-GGUF with llama.cpp:
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf hpcgroup/hpc-coder-v2-6.7b-Q8_0-GGUF:Q8_0 # Run inference directly in the terminal: llama-cli -hf hpcgroup/hpc-coder-v2-6.7b-Q8_0-GGUF:Q8_0
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf hpcgroup/hpc-coder-v2-6.7b-Q8_0-GGUF:Q8_0 # Run inference directly in the terminal: llama-cli -hf hpcgroup/hpc-coder-v2-6.7b-Q8_0-GGUF:Q8_0
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf hpcgroup/hpc-coder-v2-6.7b-Q8_0-GGUF:Q8_0 # Run inference directly in the terminal: ./llama-cli -hf hpcgroup/hpc-coder-v2-6.7b-Q8_0-GGUF:Q8_0
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf hpcgroup/hpc-coder-v2-6.7b-Q8_0-GGUF:Q8_0 # Run inference directly in the terminal: ./build/bin/llama-cli -hf hpcgroup/hpc-coder-v2-6.7b-Q8_0-GGUF:Q8_0
docker model run hf.co/hpcgroup/hpc-coder-v2-6.7b-Q8_0-GGUF:Q8_0
How to use hpcgroup/hpc-coder-v2-6.7b-Q8_0-GGUF with Ollama:
ollama run hf.co/hpcgroup/hpc-coder-v2-6.7b-Q8_0-GGUF:Q8_0
How to use hpcgroup/hpc-coder-v2-6.7b-Q8_0-GGUF with Unsloth Studio:
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for hpcgroup/hpc-coder-v2-6.7b-Q8_0-GGUF to start chatting
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for hpcgroup/hpc-coder-v2-6.7b-Q8_0-GGUF to start chatting
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for hpcgroup/hpc-coder-v2-6.7b-Q8_0-GGUF to start chatting
How to use hpcgroup/hpc-coder-v2-6.7b-Q8_0-GGUF with Docker Model Runner:
docker model run hf.co/hpcgroup/hpc-coder-v2-6.7b-Q8_0-GGUF:Q8_0
How to use hpcgroup/hpc-coder-v2-6.7b-Q8_0-GGUF with Lemonade:
# Download Lemonade from https://lemonade-server.ai/ lemonade pull hpcgroup/hpc-coder-v2-6.7b-Q8_0-GGUF:Q8_0
lemonade run user.hpc-coder-v2-6.7b-Q8_0-GGUF-Q8_0
lemonade list
output = llm(
"Once upon a time,",
max_tokens=512,
echo=True
)
print(output)This is the HPC-Coder-v2-6.7b model with 8 bit quantized weights in the GGUF format that can be used with llama.cpp. Refer to the original model card for more details on the model.
See the llama.cpp repo for installation instructions. You can then use the model as:
llama-cli --hf-repo hpcgroup/hpc-coder-v2-6.7b-Q8_0-GGUF --hf-file hpc-coder-v2-6.7b-q8_0.gguf -r "Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:" --in-prefix "\n" --in-suffix "\n### Response:\n" -c 8096 -p "your prompt here"
8-bit
Base model
hpcgroup/hpc-coder-v2-6.7b
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="hpcgroup/hpc-coder-v2-6.7b-Q8_0-GGUF", filename="hpc-coder-v2-6.7b-q8_0.gguf", )