How to use from
llama.cppInstall from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf vdpappu/gemma2_coding_assistant_gguf# Run inference directly in the terminal:
llama-cli -hf vdpappu/gemma2_coding_assistant_ggufUse pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf vdpappu/gemma2_coding_assistant_gguf# Run inference directly in the terminal:
./llama-cli -hf vdpappu/gemma2_coding_assistant_ggufBuild from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf vdpappu/gemma2_coding_assistant_gguf# Run inference directly in the terminal:
./build/bin/llama-cli -hf vdpappu/gemma2_coding_assistant_ggufUse Docker
docker model run hf.co/vdpappu/gemma2_coding_assistant_ggufQuick Links
Usage
from llama_cpp import Llama
from typing import Optional
import time
from huggingface_hub import hf_hub_download
def generate_prompt(input_text: str, instruction: Optional[str] = None) -> str:
text = f"### Question: {input_text}\n\n### Answer: "
if instruction:
text = f"### Instruction: {instruction}\n\n{text}"
return text
# Set up the parameters
repo_id = "vdpappu/gemma2_coding_assistant_gguf"
filename = "gemma2_coding.gguf"
local_dir = "."
downloaded_file_path = hf_hub_download(repo_id=repo_id, filename=filename, local_dir=local_dir)
print(f"File downloaded to: {downloaded_file_path}")
# Load the model
llm = Llama(model_path=downloaded_file_path) #1 is thug
question = "Develop a Python program to clearly understand the concept of recursion."
prompt = generate_prompt(input_text=question)
start = time.time()
output = llm(prompt,
temperature=0.7,
top_p=0.9,
top_k=50,
repeat_penalty=1.5,
max_tokens=200,
stop=["Question:","<eos>"])
end = time.time()
print(f"Inference time: {end-start:.2f} seconds \n")
print(output['choices'][0]['text'])
- Downloads last month
- 31
Hardware compatibility
Log In to add your hardware
We're not able to determine the quantization variants.
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Install from brew
# Start a local OpenAI-compatible server with a web UI: llama-server -hf vdpappu/gemma2_coding_assistant_gguf# Run inference directly in the terminal: llama-cli -hf vdpappu/gemma2_coding_assistant_gguf