Finnish Grocery NER + Text Classification Pipeline
spaCy pipeline for Finnish grocery text: named entity recognition (NER) and text classification (textcat) in a single model backed by TurkuNLP/bert-base-finnish-cased-v1.
Both tasks share one transformer encoder (transformer โ ner โ textcat).
Labels
NER:
Textcat categories: BABY, BAKERY, BEVERAGES, CONVENIENCE_FOOD, DAIRY, DRY_GOODS, FROZEN, FRUITS_VEGETABLES, HOUSEHOLD, HYGIENE, MEAT_FISH, PET_SUPPLIES
Performance (dev set)
NER
| Metric | Score |
|---|---|
| F1 | 0.000 |
| Precision | 0.000 |
| Recall | 0.000 |
Per-label:
| Label | Precision | Recall | F1 |
|---|
Text Classification
| Metric | Score |
|---|---|
| Macro AUC | 0.976 |
Per-category F1:
| Category | F1 |
|---|---|
| BABY | 0.979 |
| BAKERY | 0.831 |
| BEVERAGES | 0.881 |
| CONVENIENCE_FOOD | 0.692 |
| DAIRY | 0.830 |
| DRY_GOODS | 0.743 |
| FROZEN | 1.000 |
| FRUITS_VEGETABLES | 0.779 |
| HOUSEHOLD | 0.875 |
| HYGIENE | 0.876 |
| MEAT_FISH | 0.905 |
| PET_SUPPLIES | 0.987 |
Usage
import spacy
nlp = spacy.load("juusopi/grocery-fi-textcat")
doc = nlp("500 g omenaa")
# NER
for ent in doc.ents:
print(ent.text, ent.label_)
# Text classification
best = max(doc.cats, key=doc.cats.get)
print("Category:", best)
Model details
- Base model: TurkuNLP/bert-base-finnish-cased-v1
- spaCy version: >=3.8.11,<3.9.0
- Pipeline version: 0.0.0
- Downloads last month
- -