Instructions to use RichardErkhov/deepset_-_roberta-base-squad2-distilled-8bits with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use RichardErkhov/deepset_-_roberta-base-squad2-distilled-8bits with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="RichardErkhov/deepset_-_roberta-base-squad2-distilled-8bits")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("RichardErkhov/deepset_-_roberta-base-squad2-distilled-8bits")
model = AutoModelForCausalLM.from_pretrained("RichardErkhov/deepset_-_roberta-base-squad2-distilled-8bits")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use RichardErkhov/deepset_-_roberta-base-squad2-distilled-8bits with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "RichardErkhov/deepset_-_roberta-base-squad2-distilled-8bits"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RichardErkhov/deepset_-_roberta-base-squad2-distilled-8bits",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/RichardErkhov/deepset_-_roberta-base-squad2-distilled-8bits

SGLang

How to use RichardErkhov/deepset_-_roberta-base-squad2-distilled-8bits with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "RichardErkhov/deepset_-_roberta-base-squad2-distilled-8bits" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RichardErkhov/deepset_-_roberta-base-squad2-distilled-8bits",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "RichardErkhov/deepset_-_roberta-base-squad2-distilled-8bits" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RichardErkhov/deepset_-_roberta-base-squad2-distilled-8bits",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use RichardErkhov/deepset_-_roberta-base-squad2-distilled-8bits with Docker Model Runner:
```
docker model run hf.co/RichardErkhov/deepset_-_roberta-base-squad2-distilled-8bits
```

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Quantization made by Richard Erkhov.

Github

Discord

Request more models

roberta-base-squad2-distilled - bnb 8bits

Model creator: https://huggingface.co/deepset/
Original model: https://huggingface.co/deepset/roberta-base-squad2-distilled/

Original model description:

language: en license: mit tags: - exbert datasets: - squad_v2 thumbnail: https://thumb.tildacdn.com/tild3433-3637-4830-a533-353833613061/-/resize/720x/-/format/webp/germanquad.jpg model-index: - name: deepset/roberta-base-squad2-distilled results: - task: type: question-answering name: Question Answering dataset: name: squad_v2 type: squad_v2 config: squad_v2 split: validation metrics: - type: exact_match value: 80.8593 name: Exact Match verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMzVjNzkxNmNiNDkzNzdiYjJjZGM3ZTViMGJhOGM2ZjFmYjg1MjYxMDM2YzM5NWMwNDIyYzNlN2QwNGYyNDMzZSIsInZlcnNpb24iOjF9.Rgww8tf8D7nF2dh2U_DMrFzmp87k8s7RFibrDXSvQyA66PGWXwjlsd1552lzjHnNV5hvHUM1-h3PTuY_5p64BA - type: f1 value: 84.0104 name: F1 verified: true verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNTAyZDViNWYzNjA4OWQ5MzgyYmQ2ZDlhNWRhMTIzYTYxYzViMmI4NWE4ZGU5MzVhZTAwNTRlZmRlNWUwMjI0ZSIsInZlcnNpb24iOjF9.Er21BNgJ3jJXLuZtpubTYq9wCwO1i_VLQFwS5ET0e4eAYVVj0aOA40I5FvP5pZac3LjkCnVacxzsFWGCYVmnDA - task: type: question-answering name: Question Answering dataset: name: squad type: squad config: plain_text split: validation metrics: - type: exact_match value: 86.225 name: Exact Match - type: f1 value: 92.483 name: F1 - task: type: question-answering name: Question Answering dataset: name: adversarial_qa type: adversarial_qa config: adversarialQA split: validation metrics: - type: exact_match value: 29.900 name: Exact Match - type: f1 value: 41.183 name: F1 - task: type: question-answering name: Question Answering dataset: name: squad_adversarial type: squad_adversarial config: AddOneSent split: validation metrics: - type: exact_match value: 79.071 name: Exact Match - type: f1 value: 84.472 name: F1 - task: type: question-answering name: Question Answering dataset: name: squadshifts amazon type: squadshifts config: amazon split: test metrics: - type: exact_match value: 70.733 name: Exact Match - type: f1 value: 83.958 name: F1 - task: type: question-answering name: Question Answering dataset: name: squadshifts new_wiki type: squadshifts config: new_wiki split: test metrics: - type: exact_match value: 82.011 name: Exact Match - type: f1 value: 91.092 name: F1 - task: type: question-answering name: Question Answering dataset: name: squadshifts nyt type: squadshifts config: nyt split: test metrics: - type: exact_match value: 84.203 name: Exact Match - type: f1 value: 91.521 name: F1 - task: type: question-answering name: Question Answering dataset: name: squadshifts reddit type: squadshifts config: reddit split: test metrics: - type: exact_match value: 72.029 name: Exact Match - type: f1 value: 83.454 name: F1

Overview

Language model: deepset/roberta-base-squad2-distilled
Language: English
Training data: SQuAD 2.0 training set Eval data: SQuAD 2.0 dev set Infrastructure: 4x V100 GPU
Published: Dec 8th, 2021

Details

haystack's distillation feature was used for training. deepset/roberta-large-squad2 was used as the teacher model.

Hyperparameters

batch_size = 80
n_epochs = 4
max_seq_len = 384
learning_rate = 3e-5
lr_schedule = LinearWarmup
embeds_dropout_prob = 0.1
temperature = 1.5
distillation_loss_weight = 0.75

Performance

"exact": 79.8366040596311
"f1": 83.916407079888

Authors

Timo Möller: timo.moeller@deepset.ai
Julian Risch: julian.risch@deepset.ai
Malte Pietsch: malte.pietsch@deepset.ai
Michel Bartels: michel.bartels@deepset.ai

About us

deepset is the company behind the open-source NLP framework Haystack which is designed to help you build production ready NLP systems that use: Question answering, summarization, ranking etc.

Some of our other work:

Get in touch and join the Haystack community

For more info on Haystack, visit our GitHub repo and Documentation.

We also have a Discord community open to everyone!

Twitter | LinkedIn | Discord | GitHub Discussions | Website

By the way: we're hiring!

Downloads last month: 2

Safetensors

Model size

0.1B params

Tensor type

F32

F16