Finnish Grocery NER + Text Classification Pipeline

spaCy pipeline for Finnish grocery text: named entity recognition (NER) and text classification (textcat) in a single model backed by TurkuNLP/bert-base-finnish-cased-v1.

Both tasks share one transformer encoder (transformer โ†’ ner โ†’ textcat).

Labels

NER:

Textcat categories: BABY, BAKERY, BEVERAGES, CONVENIENCE_FOOD, DAIRY, DRY_GOODS, FROZEN, FRUITS_VEGETABLES, HOUSEHOLD, HYGIENE, MEAT_FISH, PET_SUPPLIES

Performance (dev set)

NER

Metric Score
F1 0.000
Precision 0.000
Recall 0.000

Per-label:

Label Precision Recall F1

Text Classification

Metric Score
Macro AUC 0.976

Per-category F1:

Category F1
BABY 0.979
BAKERY 0.831
BEVERAGES 0.881
CONVENIENCE_FOOD 0.692
DAIRY 0.830
DRY_GOODS 0.743
FROZEN 1.000
FRUITS_VEGETABLES 0.779
HOUSEHOLD 0.875
HYGIENE 0.876
MEAT_FISH 0.905
PET_SUPPLIES 0.987

Usage

import spacy

nlp = spacy.load("juusopi/grocery-fi-textcat")
doc = nlp("500 g omenaa")

# NER
for ent in doc.ents:
    print(ent.text, ent.label_)

# Text classification
best = max(doc.cats, key=doc.cats.get)
print("Category:", best)

Model details

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support