Finnish Grocery NER + Text Classification Pipeline
spaCy pipeline for Finnish grocery text: named entity recognition (NER) and text classification (textcat) in a single model backed by TurkuNLP/bert-base-finnish-cased-v1.
Both tasks share one transformer encoder (transformer โ ner โ textcat).
Labels
NER: NOTE, PRODUCT, QUANTITY, UNIT
Textcat categories:
Performance (dev set)
NER
| Metric | Score |
|---|---|
| F1 | 0.999 |
| Precision | 0.999 |
| Recall | 0.999 |
Per-label:
| Label | Precision | Recall | F1 |
|---|---|---|---|
| QUANTITY | 1.000 | 1.000 | 1.000 |
| UNIT | 1.000 | 1.000 | 1.000 |
| PRODUCT | 0.998 | 0.998 | 0.998 |
| NOTE | 0.998 | 0.999 | 0.999 |
Text Classification
| Metric | Score |
|---|---|
| Macro AUC | 0.000 |
Per-category F1:
| Category | F1 |
|---|
Usage
import spacy
nlp = spacy.load("juusopi/grocery-fi-ner")
doc = nlp("500 g omenaa")
# NER
for ent in doc.ents:
print(ent.text, ent.label_)
# Text classification
best = max(doc.cats, key=doc.cats.get)
print("Category:", best)
Model details
- Base model: TurkuNLP/bert-base-finnish-cased-v1
- spaCy version: >=3.8.11,<3.9.0
- Pipeline version: 0.0.0
- Downloads last month
- -