AnonySub628/physics-scienceqa
Viewer • Updated • 810 • 15
How to use vdpappu/gemma2_science_qa_gguf with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="vdpappu/gemma2_science_qa_gguf", filename="gemma2_scienceqa.gguf", )
output = llm( "Once upon a time,", max_tokens=512, echo=True ) print(output)
How to use vdpappu/gemma2_science_qa_gguf with llama.cpp:
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf vdpappu/gemma2_science_qa_gguf # Run inference directly in the terminal: llama-cli -hf vdpappu/gemma2_science_qa_gguf
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf vdpappu/gemma2_science_qa_gguf # Run inference directly in the terminal: llama-cli -hf vdpappu/gemma2_science_qa_gguf
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf vdpappu/gemma2_science_qa_gguf # Run inference directly in the terminal: ./llama-cli -hf vdpappu/gemma2_science_qa_gguf
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf vdpappu/gemma2_science_qa_gguf # Run inference directly in the terminal: ./build/bin/llama-cli -hf vdpappu/gemma2_science_qa_gguf
docker model run hf.co/vdpappu/gemma2_science_qa_gguf
How to use vdpappu/gemma2_science_qa_gguf with Ollama:
ollama run hf.co/vdpappu/gemma2_science_qa_gguf
How to use vdpappu/gemma2_science_qa_gguf with Unsloth Studio:
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for vdpappu/gemma2_science_qa_gguf to start chatting
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for vdpappu/gemma2_science_qa_gguf to start chatting
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for vdpappu/gemma2_science_qa_gguf to start chatting
How to use vdpappu/gemma2_science_qa_gguf with Docker Model Runner:
docker model run hf.co/vdpappu/gemma2_science_qa_gguf
How to use vdpappu/gemma2_science_qa_gguf with Lemonade:
# Download Lemonade from https://lemonade-server.ai/ lemonade pull vdpappu/gemma2_science_qa_gguf
lemonade run user.gemma2_science_qa_gguf-{{QUANT_TAG}}lemonade list
Usage
from llama_cpp import Llama
from typing import Optional
import time
from huggingface_hub import hf_hub_download
def generate_prompt(input_text: str, instruction: Optional[str] = None) -> str:
text = f"### Question: {input_text}\n\n### Answer: "
if instruction:
text = f"### Instruction: {instruction}\n\n{text}"
return text
# Set up the parameters
repo_id = "vdpappu/gemma2_science_qa_gguf"
filename = "gemma2_scienceqa.gguf"
local_dir = "."
downloaded_file_path = hf_hub_download(repo_id=repo_id, filename=filename, local_dir=local_dir)
print(f"File downloaded to: {downloaded_file_path}")
# Load the model
llm = Llama(model_path=downloaded_file_path) #1 is thug
question = "Which is the smoothest? Choose from: concrete sidewalk, sandpaper, paper."
prompt = generate_prompt(input_text=question)
start = time.time()
output = llm(prompt,
temperature=0.7,
top_p=0.9,
top_k=50,
repeat_penalty=1.5,
max_tokens=200,
stop=["Question:","<eos>"])
end = time.time()
print(f"Inference time: {end-start:.2f} seconds \n")
print(output['choices'][0]['text'])
We're not able to determine the quantization variants.