product-query-ner
Named entity recognition for product search queries. Identifies brand, product category, product name, and origin spans in free-text queries.
Fine-tuned from bltlab/queryner-bert-base-uncased, which was trained on Amazon ESCI queries. This model extends it with domain-specific vocabulary drawn from a European product database — brand names, multilingual product titles, and origin countries.
Labels
The model predicts the full 17-type label set from the base queryner model. The four types most relevant to product search are:
| Label | HF tag | Example span |
|---|---|---|
| Brand | B-creator / I-creator |
Ecover, Dr. Bronner's |
| Product category | B-core_product_type / I-core_product_type |
washing up liquid, shampoo |
| Product name | B-product_name / I-product_name |
Skin Food, Men 48H Deodorant |
| Origin | B-origin / I-origin |
Germany, Italy |
All other queryner types (modifier, department, UoM, color, material, etc.) are preserved from the base model.
Usage
Via evident_decode (recommended)
The evident_decode Python package applies required pre/post-processing on top of the raw
pipeline: apostrophe normalisation, offset-based span reconstruction, and punctuation
collapsing. Using the raw pipeline without these fixes will produce degraded output on
queries containing apostrophes (e.g. Dr. Bronner's) or diacritics (e.g. Spülmittel).
from evident_decode.ner import tag
entities = tag("Dr. Bronner's peppermint soap")
# [{"text": "Dr. Bronner's", "label": "brand"},
# {"text": "peppermint soap", "label": "product category"}]
entities = tag("organic olive oil from Italy under €15")
# [{"text": "olive oil", "label": "product category"},
# {"text": "Italy", "label": "origin"}]
Raw transformers pipeline
from transformers import pipeline
ner = pipeline("token-classification", model="thepian/product-query-ner", aggregation_strategy="simple")
results = ner("Ecover washing up liquid without palm oil")
# [{'entity_group': 'creator', 'word': 'Ecover', ...},
# {'entity_group': 'core_product_type', 'word': 'washing up liquid', ...}]
Training data
20,203 examples from three sources:
| Source | Examples | Notes |
|---|---|---|
| bltlab/queryner | 9,140 | Amazon ESCI queries; all 17 label types |
| Local domain fixtures | ~1,063 | Hand-annotated product search queries (incl. substitute-frame fixtures) |
| Synthetic DB fixtures | ~10,000 | Template-generated from brand/category/product vocabulary; includes 1,000 substitute-frame (multilingual) |
Synthetic examples are generated by generate_db_dataset.py from a European product database. Brand names come from EU-registered brands; product names are extracted from all language variants stored in product.name (en, de, fr, it, es, nl, and others). Product names that are exact matches of English category strings are excluded to avoid contradictory training signal.
Label balance and product name vs category
The two most commonly confused labels are core_product_type (product category) and product_name
(specific named product). The model's only reliable cue for distinguishing them is positional:
text following a known brand is a candidate for product_name, while standalone noun phrases are
typically core_product_type. This positional signal is structural, not lexical — "Dove shampoo"
and "Dove Skin Food" look identical to the model at the template level.
Why category dominates in training (~2:1 target)
Real product search queries are category-heavy by a large margin. Most users type "shampoo",
"olive oil", or "washing powder", not "Fuji Green Tea Refreshingly Hydrating Conditioner".
Training data should approximate inference-time distribution; over-representing product_name
creates a mismatch that degrades category precision on the majority of queries.
The base model (bltlab/queryner-bert-base-uncased) was trained on Amazon ESCI queries, which
are also category-heavy. The marginal value of additional core_product_type examples is lower
than the marginal value of product_name examples, but collapsing to 1:1 risks the model
labeling any noun phrase after a brand as product_name — including generic category words like
"shampoo" or "washing up liquid".
Current ratio: ~2.3:1 (core_product_type : product_name). Target: ~2:1.
Why going below 2:1 requires better data, not just more examples
Increasing product_name examples without addressing lexical quality introduces contradictory
signal:
- A product named "Shampoo" and a category called "shampoo" become competing labels for the same string. The model cannot resolve this without knowing whether the token is generic or specific — information that is not present in the query.
- The category cross-reference filter (dropping product names that are exact English category matches) addresses the worst cases, but morphological variants ("Shampoos", "Crème") and multi-language overlaps remain.
To move significantly below 2:1 safely, the product_name training data would need to satisfy:
| Requirement | Why |
|---|---|
| Lexically distinct from category vocabulary | Prevents the model learning a single label for identical strings |
| High word-count names (3+ tokens) | Single and two-token product names are indistinguishable from short category slugs by surface form alone |
| Brand diversity | The positional cue (brand precedes product name) only generalises if many different brands are paired with many different product names — a narrow brand set leads to brand-specific memorisation |
| Multilingual coverage proportional to expected query mix | Training on English product names only means the model will underperform on French/German/Italian queries even though multilingual product names exist in the DB |
| Minimal repetition | A product name seen 20 times with the same brand drowns signal from rarer names |
Until those conditions are met, product_name_ratio should stay at 0.25–0.30 and the 2:1
overall ratio maintained by generating more total synthetic examples rather than increasing the
ratio.
Training procedure
- Base model:
bltlab/queryner-bert-base-uncased - Tokenizer: BERT WordPiece; subword tokens after the first in each word are masked (
-100) - Max sequence length: 128
- Label set: collected from training data (all 17 queryner types preserved)
- Optimiser: AdamW, weight decay 0.01, warmup ratio 0.1
- Segmented training: brand/product/origin first, then certification O-token signal at lower LR
Typical segment configuration:
Segment 1: epochs=3, lr=3e-5 (base → domain)
Segment 2: epochs=2, lr=1e-5 (add cert O-token signal)
Segment 3: epochs=2, lr=5e-6 (product name ratio increase)
Segment 4: epochs=2, lr=5e-6 (substitute-frame + multilingual, brand F1 0.698 → 0.897)
Evaluation
Evaluated on 63 held-out domain fixtures (39 general + 24 substitute-frame / multilingual) with exact and partial span matching.
Segment 4 — 2 epochs, lr=5e-6, base=segment 3 checkpoint, 20,203 training examples (incl. substitute-frame):
| Label | P (partial) | R (partial) | F1 (partial) | F1 (exact) |
|---|---|---|---|---|
| brand | 0.929 | 0.867 | 0.897 | 0.897 |
| product category | 0.895 | 0.962 | 0.927 | 0.891 |
| product name | 0.875 | 0.700 | 0.778 | 0.556 |
| origin | 1.000 | 0.917 | 0.957 | 0.957 |
| overall | 0.915 | 0.900 | 0.908 | 0.874 |
Key remaining gaps:
- Ecover brand FN (4 fixtures):
country_code IS NULLin DB excludes it from the training data generator. Deutschland/Allemagne/Duitslandorigin FN: generator uses English country names only.
Resolved at inference time (not model gaps):
- Apostrophe artifact (
dr. bronner ' s): handled byevident_decode.nerapostrophe normaliser before tokenisation. - Umlaut span mismatch (
Spülmittel→spulmittel): resolved by offset-based span reconstruction inevident_decode.ner.
Quantized variant
An int8 ONNX-quantized variant (104.6 MB vs 415 MB full) is available at
thepian/product-query-ner-int8.
Uses dynamic int8 quantization via ONNX Runtime (avx512_vnni). Validated against the
acceptance fixture set (≤1% F1 regression allowed). Requires optimum[onnxruntime] to load.
Paper
A paper describing the product query decoding architecture (deterministic decoder + NER + NLI intent) is in preparation. Link will be added here on publication.
Limitations
- Extraction patterns are primarily English; avoidance frames in other languages (
ohne,sans,senza) are not NER targets — they are handled by a separate parser - Multilingual product names are included in training but evaluation is English-only
- Origin recognition covers ~13 European countries drawn from product records; global coverage is partial
- Barcode and price extraction are not NER tasks — handled by a dedicated parser
Citation
If you use this model, please cite the base model:
@misc{queryner,
author = {Björklund, Love and Ljunglöf, Peter},
title = {QueryNER: Named Entity Recognition for Product Search Queries},
year = {2024},
publisher = {HuggingFace},
url = {https://huggingface.co/bltlab/queryner-bert-base-uncased}
}
- Downloads last month
- 82
Model tree for thepian/product-query-ner
Base model
bltlab/queryner-bert-base-uncased