polish-qa-v2-bnb / README.md
pkedzia's picture
Update README.md
0670ee9 verified
metadata
license: cc
datasets:
  - clarin-pl/poquad
language:
  - pl
base_model:
  - radlab/polish-qa-v2
pipeline_tag: question-answering
library_name: transformers
tags:
  - qa
  - poquad
  - quant
  - bitsandbytes

Model Overview

  • Model name: radlab/polish-qa-v2-bnb
  • Developer: radlab.dev
  • Model type: Extractive Question鈥慉nswering (QA)
  • Base model: radlab/polish-qa-v (sdadas/polish-roberta-large-v2 fine鈥憈uned for QA)
  • Quantization: 8鈥慴it inference鈥憃nly quantization via bitsandbytes (load_in_8bit=True, double鈥憅uantization enabled, qa_outputs excluded from quantization)
  • Maximum context size: 512 tokens

Intended Use

This model is designed for extractive QA on Polish text. Given a question and a context passage, it returns the most relevant span of the context as the answer. This model is bnb-quantized version of radlab/polish-qa-v2 model.

Limitations

  • The model works best with contexts up to 512 tokens. Longer passages should be truncated or split.
  • 8鈥慴it quantization reduces memory usage and inference latency but may introduce a slight drop in accuracy compared with the full鈥憄recision model.
  • Only suitable for inference; it cannot be further fine鈥憈uned while kept in 8鈥慴it mode.

How to Use

from transformers import pipeline

model_path = "radlab/polish-qa-v2-bnb"

qa = pipeline(
    "question-answering",
    model=model_path,
)

question = "Co b臋dzie w budowanym obiekcie?"
context = """Pozwolenie na budow臋 zosta艂o wydane w marcu. Pierwsze prace przygotowawcze
na terenie przy ul. Wojska Polskiego ju偶 si臋 rozpocz臋艂y.
Dzia艂k臋 ogrodzono, pojawi艂 si臋 r贸wnie偶 monitoring, a tak偶e kontenery
dla pracownik贸w budowy. Na ten moment nie jest znana lista sklep贸w,
kt贸re pojawi膮 si臋 w nowym pasa偶u handlowym."""

result = qa(
    question=question,
    context=context.replace("\n", " ")
)

print(result)

Sample output

{
  "score": 0.32568359375,
  "start": 259,
  "end": 268,
  "answer": "sklep贸w,"
}

Technical Details

  • Quantization strategy: BitsAndBytesStrategy (8鈥慴it, double鈥憅uant, qa_outputs excluded).
  • Loading code (for reference)
from transformers import AutoConfig, BitsAndBytesConfig, AutoModelForQuestionAnswering

config = AutoConfig.from_pretrained(original_path)
bnb_cfg = BitsAndBytesConfig(
    load_in_8bit=True,
    bnb_8bit_use_double_quant=True,
    bnb_8bit_excluded_modules=["qa_outputs"],
)

model = AutoModelForQuestionAnswering.from_pretrained(
    original_path,
    config=config,
    quantization_config=bnb_cfg,
    device_map="auto",
)