polish-qa-v2-bnb / README.md
pkedzia's picture
Update README.md
0670ee9 verified
---
license: cc
datasets:
- clarin-pl/poquad
language:
- pl
base_model:
- radlab/polish-qa-v2
pipeline_tag: question-answering
library_name: transformers
tags:
- qa
- poquad
- quant
- bitsandbytes
---
### Model Overview
- **Model name**: `radlab/polish-qa-v2-bnb`
- **Developer**: [radlab.dev](https://radlab.dev)
- **Model type**: Extractive Question鈥慉nswering (QA)
- **Base model**: `radlab/polish-qa-v` (`sdadas/polish-roberta-large-v2` fine鈥憈uned for QA)
- **Quantization**: 8鈥慴it inference鈥憃nly quantization via **bitsandbytes** (`load_in_8bit=True`, double鈥憅uantization enabled, `qa_outputs` excluded from quantization)
- **Maximum context size**: 512 tokens
### Intended Use
This model is designed for **extractive QA** on Polish text. Given a question and a context passage,
it returns the most relevant span of the context as the answer.
This model is bnb-quantized version of `radlab/polish-qa-v2` model.
### Limitations
- The model works best with contexts up to 512 tokens. Longer passages should be truncated or split.
- 8鈥慴it quantization reduces memory usage and inference latency but may introduce a slight drop in accuracy
compared with the full鈥憄recision model.
- Only suitable for inference; it cannot be further fine鈥憈uned while kept in 8鈥慴it mode.
### How to Use
```python
from transformers import pipeline
model_path = "radlab/polish-qa-v2-bnb"
qa = pipeline(
"question-answering",
model=model_path,
)
question = "Co b臋dzie w budowanym obiekcie?"
context = """Pozwolenie na budow臋 zosta艂o wydane w marcu. Pierwsze prace przygotowawcze
na terenie przy ul. Wojska Polskiego ju偶 si臋 rozpocz臋艂y.
Dzia艂k臋 ogrodzono, pojawi艂 si臋 r贸wnie偶 monitoring, a tak偶e kontenery
dla pracownik贸w budowy. Na ten moment nie jest znana lista sklep贸w,
kt贸re pojawi膮 si臋 w nowym pasa偶u handlowym."""
result = qa(
question=question,
context=context.replace("\n", " ")
)
print(result)
```
**Sample output**
```json
{
"score": 0.32568359375,
"start": 259,
"end": 268,
"answer": "sklep贸w,"
}
```
### Technical Details
- **Quantization strategy**: `BitsAndBytesStrategy` (8鈥慴it, double鈥憅uant, `qa_outputs` excluded).
- **Loading code (for reference)**
```python
from transformers import AutoConfig, BitsAndBytesConfig, AutoModelForQuestionAnswering
config = AutoConfig.from_pretrained(original_path)
bnb_cfg = BitsAndBytesConfig(
load_in_8bit=True,
bnb_8bit_use_double_quant=True,
bnb_8bit_excluded_modules=["qa_outputs"],
)
model = AutoModelForQuestionAnswering.from_pretrained(
original_path,
config=config,
quantization_config=bnb_cfg,
device_map="auto",
)
```