Instructions to use DeepMount00/Minerva-3B-base-RAG with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use DeepMount00/Minerva-3B-base-RAG with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="DeepMount00/Minerva-3B-base-RAG")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("DeepMount00/Minerva-3B-base-RAG")
model = AutoModelForCausalLM.from_pretrained("DeepMount00/Minerva-3B-base-RAG")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use DeepMount00/Minerva-3B-base-RAG with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "DeepMount00/Minerva-3B-base-RAG"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "DeepMount00/Minerva-3B-base-RAG",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/DeepMount00/Minerva-3B-base-RAG

SGLang

How to use DeepMount00/Minerva-3B-base-RAG with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "DeepMount00/Minerva-3B-base-RAG" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "DeepMount00/Minerva-3B-base-RAG",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "DeepMount00/Minerva-3B-base-RAG" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "DeepMount00/Minerva-3B-base-RAG",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use DeepMount00/Minerva-3B-base-RAG with Docker Model Runner:
```
docker model run hf.co/DeepMount00/Minerva-3B-base-RAG
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Model Card for Minerva-3B-base-QA-v1.0

Minerva-3B-base-RAG is a specialized question-answering (QA) model derived through the finetuning of Minerva-3B-base-v1.0. This finetuning was independently conducted to enhance the model's performance for QA tasks, making it ideally suited for use in Retrieval-Augmented Generation (RAG) applications.

Overview

Model Type: Fine-tuned Large Language Model (LLM)
Base Model: Minerva-3B-base-v1.0, developed by Sapienza NLP in collaboration with Future Artificial Intelligence Research (FAIR) and CINECA
Specialization: Question-Answering (QA)
Ideal Use Case: Retrieval-Augmented Generation applications

How to Use

import transformers
import torch

model_id = "DeepMount00/Minerva-3B-base-RAG"

# Initialize the pipeline.
pipeline = transformers.pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device_map="auto",
)

def generate_text(pipeline, context, question):
    input_text = f"[INST]Contesto: {context}\nDomanda:{question}\n[/INST]"
    output = pipeline(
        input_text,
        max_new_tokens=512,
    )
    generated_text = output[0]['generated_text']
    response_text = generated_text.split("[/INST]", 1)[1].strip()
    return response_text.replace("<end_of_text>", "")

contesto = """La torre degli Asinelli è una delle cosiddette due torri di Bologna, simbolo della città, situate in piazza di porta Ravegnana, all'incrocio tra le antiche strade San Donato (ora via Zamboni), San Vitale, Maggiore e Castiglione. Eretta, secondo la tradizione, fra il 1109 e il 1119 dal nobile Gherardo Asinelli, la torre è alta 97,20 metri, pende verso ovest per 2,23 metri e presenta all'interno una scalinata composta da 498 gradini. Ancora non si può dire con certezza quando e da chi fu costruita la torre degli Asinelli. Si presume che la torre debba il proprio nome a Gherardo Asinelli, il nobile cavaliere di fazione ghibellina al quale se ne attribuisce la costruzione, iniziata secondo una consolidata tradizione l'11 ottobre 1109 e terminata dieci anni dopo, nel 1119."""
domanda = """In che città si trova la torre degli Asinelli?"""

answer = generate_text(pipeline, contesto, domanda)
print(answer)

Downloads last month: 13

Safetensors

Model size

3B params

Tensor type

BF16

Model tree for DeepMount00/Minerva-3B-base-RAG

Merges

2 models

Quantizations

2 models

DeepMount00
/

Minerva-3B-base-RAG

Model Card for Minerva-3B-base-QA-v1.0

Overview

How to Use

Model tree for DeepMount00/Minerva-3B-base-RAG

Dataset used to train DeepMount00/Minerva-3B-base-RAG