DeepMount00/gquad_it
Viewer • Updated • 277k • 21 • 9
How to use DeepMount00/Minerva-3B-base-RAG with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="DeepMount00/Minerva-3B-base-RAG") # Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("DeepMount00/Minerva-3B-base-RAG")
model = AutoModelForCausalLM.from_pretrained("DeepMount00/Minerva-3B-base-RAG")How to use DeepMount00/Minerva-3B-base-RAG with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "DeepMount00/Minerva-3B-base-RAG"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "DeepMount00/Minerva-3B-base-RAG",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker model run hf.co/DeepMount00/Minerva-3B-base-RAG
How to use DeepMount00/Minerva-3B-base-RAG with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "DeepMount00/Minerva-3B-base-RAG" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "DeepMount00/Minerva-3B-base-RAG",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "DeepMount00/Minerva-3B-base-RAG" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "DeepMount00/Minerva-3B-base-RAG",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'How to use DeepMount00/Minerva-3B-base-RAG with Docker Model Runner:
docker model run hf.co/DeepMount00/Minerva-3B-base-RAG
Minerva-3B-base-RAG is a specialized question-answering (QA) model derived through the finetuning of Minerva-3B-base-v1.0. This finetuning was independently conducted to enhance the model's performance for QA tasks, making it ideally suited for use in Retrieval-Augmented Generation (RAG) applications.
import transformers
import torch
model_id = "DeepMount00/Minerva-3B-base-RAG"
# Initialize the pipeline.
pipeline = transformers.pipeline(
"text-generation",
model=model_id,
model_kwargs={"torch_dtype": torch.bfloat16},
device_map="auto",
)
def generate_text(pipeline, context, question):
input_text = f"[INST]Contesto: {context}\nDomanda:{question}\n[/INST]"
output = pipeline(
input_text,
max_new_tokens=512,
)
generated_text = output[0]['generated_text']
response_text = generated_text.split("[/INST]", 1)[1].strip()
return response_text.replace("<end_of_text>", "")
contesto = """La torre degli Asinelli è una delle cosiddette due torri di Bologna, simbolo della città, situate in piazza di porta Ravegnana, all'incrocio tra le antiche strade San Donato (ora via Zamboni), San Vitale, Maggiore e Castiglione. Eretta, secondo la tradizione, fra il 1109 e il 1119 dal nobile Gherardo Asinelli, la torre è alta 97,20 metri, pende verso ovest per 2,23 metri e presenta all'interno una scalinata composta da 498 gradini. Ancora non si può dire con certezza quando e da chi fu costruita la torre degli Asinelli. Si presume che la torre debba il proprio nome a Gherardo Asinelli, il nobile cavaliere di fazione ghibellina al quale se ne attribuisce la costruzione, iniziata secondo una consolidata tradizione l'11 ottobre 1109 e terminata dieci anni dopo, nel 1119."""
domanda = """In che città si trova la torre degli Asinelli?"""
answer = generate_text(pipeline, contesto, domanda)
print(answer)
docker model run hf.co/DeepMount00/Minerva-3B-base-RAG