File size: 1,685 Bytes
3faa324
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
be2d054
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3815ee6
 
 
 
3faa324
 
 
 
 
 
527d720
 
3faa324
73bf51c
3faa324
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
---
language: et
tags:
  - estonian
  - token-classification
  - quantifier-extraction
  - roberta
  - transformers
license: mit
datasets:
  - custom
metrics:
  - precision
  - recall
  - f1
  - accuracy
---

# Est-RoBERTa for Quantifier Extraction (Estonian)

This model is a fine-tuned version of [`EMBEDDIA/est-roberta`](https://huggingface.co/EMBEDDIA/est-roberta) on a custom dataset for extracting **quantifier constructions** (e.g., "kari koeri", "hunnik raamatuid") in Estonian text.

It performs **token classification** using the BIO labeling scheme with the following labels:

- `O`: Outside
- `B-QUANT`: Beginning of a quantifier expression
- `I-QUANT`: Inside a quantifier expression

📊 Training and Evaluation

Epochs: 12

Batch size: 8

Test set: 159 positive cases, 1000 negative cases

Precision: 87.05%

Recall: 94.53%

F1-score: 90.64%

Accuracy: 99.88%

🏛️ Funding
This work was supported by the Estonian Research Council grant (PRG 1978).
Uurimistööd on finantseerinud Eesti Teadusagentuur (PRG 1978).

## 🔍 Example Usage

```python
from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch

model = AutoModelForTokenClassification.from_pretrained("ahtokiil/est-roberta-quant-extraction_EKI")
tokenizer = AutoTokenizer.from_pretrained("ahtokiil/est-roberta-quant-extraction_EKI")

sentence = "Arsti juures tuli tükk aega oodata."
inputs = tokenizer(sentence, return_tensors="pt")
outputs = model(**inputs)
predictions = torch.argmax(outputs.logits, dim=2)

tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])
labels = [model.config.id2label[p.item()] for p in predictions[0]]

print(list(zip(tokens, labels)))