polish-qa-v2-bnb / README.md

pkedzia

Update README.md

0670ee9 verified 3 months ago

preview code

raw

history blame contribute delete

2.7 kB

metadata

license: cc
datasets:
  - clarin-pl/poquad
language:
  - pl
base_model:
  - radlab/polish-qa-v2
pipeline_tag: question-answering
library_name: transformers
tags:
  - qa
  - poquad
  - quant
  - bitsandbytes

Model Overview

Model name: radlab/polish-qa-v2-bnb
Developer: radlab.dev
Model type: Extractive Question‑Answering (QA)
Base model: radlab/polish-qa-v (sdadas/polish-roberta-large-v2 fine‑tuned for QA)
Quantization: 8‑bit inference‑only quantization via bitsandbytes (load_in_8bit=True, double‑quantization enabled, qa_outputs excluded from quantization)
Maximum context size: 512 tokens

Intended Use

This model is designed for extractive QA on Polish text. Given a question and a context passage, it returns the most relevant span of the context as the answer. This model is bnb-quantized version of radlab/polish-qa-v2 model.

Limitations

The model works best with contexts up to 512 tokens. Longer passages should be truncated or split.
8‑bit quantization reduces memory usage and inference latency but may introduce a slight drop in accuracy compared with the full‑precision model.
Only suitable for inference; it cannot be further fine‑tuned while kept in 8‑bit mode.

How to Use

from transformers import pipeline

model_path = "radlab/polish-qa-v2-bnb"

qa = pipeline(
    "question-answering",
    model=model_path,
)

question = "Co będzie w budowanym obiekcie?"
context = """Pozwolenie na budowę zostało wydane w marcu. Pierwsze prace przygotowawcze
na terenie przy ul. Wojska Polskiego już się rozpoczęły.
Działkę ogrodzono, pojawił się również monitoring, a także kontenery
dla pracowników budowy. Na ten moment nie jest znana lista sklepów,
które pojawią się w nowym pasażu handlowym."""

result = qa(
    question=question,
    context=context.replace("\n", " ")
)

print(result)

Sample output

{
  "score": 0.32568359375,
  "start": 259,
  "end": 268,
  "answer": "sklepów,"
}

Technical Details

Quantization strategy: BitsAndBytesStrategy (8‑bit, double‑quant, qa_outputs excluded).
Loading code (for reference)

from transformers import AutoConfig, BitsAndBytesConfig, AutoModelForQuestionAnswering

config = AutoConfig.from_pretrained(original_path)
bnb_cfg = BitsAndBytesConfig(
    load_in_8bit=True,
    bnb_8bit_use_double_quant=True,
    bnb_8bit_excluded_modules=["qa_outputs"],
)

model = AutoModelForQuestionAnswering.from_pretrained(
    original_path,
    config=config,
    quantization_config=bnb_cfg,
    device_map="auto",
)