product-query-ner

Named entity recognition for product search queries. Identifies brand, product category, product name, and origin spans in free-text queries.

Fine-tuned from bltlab/queryner-bert-base-uncased, which was trained on Amazon ESCI queries. This model extends it with domain-specific vocabulary drawn from a European product database — brand names, multilingual product titles, and origin countries.

Labels

The model predicts the full 17-type label set from the base queryner model. The four types most relevant to product search are:

Label	HF tag	Example span
Brand	`B-creator` / `I-creator`	`Ecover`, `Dr. Bronner's`
Product category	`B-core_product_type` / `I-core_product_type`	`washing up liquid`, `shampoo`
Product name	`B-product_name` / `I-product_name`	`Skin Food`, `Men 48H Deodorant`
Origin	`B-origin` / `I-origin`	`Germany`, `Italy`

All other queryner types (modifier, department, UoM, color, material, etc.) are preserved from the base model.

Usage

Via evident_decode (recommended)

The evident_decode Python package applies required pre/post-processing on top of the raw pipeline: apostrophe normalisation, offset-based span reconstruction, and punctuation collapsing. Using the raw pipeline without these fixes will produce degraded output on queries containing apostrophes (e.g. Dr. Bronner's) or diacritics (e.g. Spülmittel).

from evident_decode.ner import tag

entities = tag("Dr. Bronner's peppermint soap")
# [{"text": "Dr. Bronner's", "label": "brand"},
#  {"text": "peppermint soap", "label": "product category"}]

entities = tag("organic olive oil from Italy under €15")
# [{"text": "olive oil", "label": "product category"},
#  {"text": "Italy", "label": "origin"}]

Raw transformers pipeline

from transformers import pipeline

ner = pipeline("token-classification", model="thepian/product-query-ner", aggregation_strategy="simple")

results = ner("Ecover washing up liquid without palm oil")
# [{'entity_group': 'creator', 'word': 'Ecover', ...},
#  {'entity_group': 'core_product_type', 'word': 'washing up liquid', ...}]

Training data

20,203 examples from three sources:

Source	Examples	Notes
bltlab/queryner	9,140	Amazon ESCI queries; all 17 label types
Local domain fixtures	~1,063	Hand-annotated product search queries (incl. substitute-frame fixtures)
Synthetic DB fixtures	~10,000	Template-generated from brand/category/product vocabulary; includes 1,000 substitute-frame (multilingual)

Synthetic examples are generated by generate_db_dataset.py from a European product database. Brand names come from EU-registered brands; product names are extracted from all language variants stored in product.name (en, de, fr, it, es, nl, and others). Product names that are exact matches of English category strings are excluded to avoid contradictory training signal.

Label balance and product name vs category

The two most commonly confused labels are core_product_type (product category) and product_name (specific named product). The model's only reliable cue for distinguishing them is positional: text following a known brand is a candidate for product_name, while standalone noun phrases are typically core_product_type. This positional signal is structural, not lexical — "Dove shampoo" and "Dove Skin Food" look identical to the model at the template level.

Why category dominates in training (~2:1 target)

Real product search queries are category-heavy by a large margin. Most users type "shampoo", "olive oil", or "washing powder", not "Fuji Green Tea Refreshingly Hydrating Conditioner". Training data should approximate inference-time distribution; over-representing product_name creates a mismatch that degrades category precision on the majority of queries.

The base model (bltlab/queryner-bert-base-uncased) was trained on Amazon ESCI queries, which are also category-heavy. The marginal value of additional core_product_type examples is lower than the marginal value of product_name examples, but collapsing to 1:1 risks the model labeling any noun phrase after a brand as product_name — including generic category words like "shampoo" or "washing up liquid".

Current ratio: ~2.3:1 (core_product_type : product_name). Target: ~2:1.

Why going below 2:1 requires better data, not just more examples

Increasing product_name examples without addressing lexical quality introduces contradictory signal:

A product named "Shampoo" and a category called "shampoo" become competing labels for the same string. The model cannot resolve this without knowing whether the token is generic or specific — information that is not present in the query.
The category cross-reference filter (dropping product names that are exact English category matches) addresses the worst cases, but morphological variants ("Shampoos", "Crème") and multi-language overlaps remain.

To move significantly below 2:1 safely, the product_name training data would need to satisfy:

Requirement	Why
Lexically distinct from category vocabulary	Prevents the model learning a single label for identical strings
High word-count names (3+ tokens)	Single and two-token product names are indistinguishable from short category slugs by surface form alone
Brand diversity	The positional cue (brand precedes product name) only generalises if many different brands are paired with many different product names — a narrow brand set leads to brand-specific memorisation
Multilingual coverage proportional to expected query mix	Training on English product names only means the model will underperform on French/German/Italian queries even though multilingual product names exist in the DB
Minimal repetition	A product name seen 20 times with the same brand drowns signal from rarer names

Until those conditions are met, product_name_ratio should stay at 0.25–0.30 and the 2:1 overall ratio maintained by generating more total synthetic examples rather than increasing the ratio.

Training procedure

Base model: bltlab/queryner-bert-base-uncased
Tokenizer: BERT WordPiece; subword tokens after the first in each word are masked (-100)
Max sequence length: 128
Label set: collected from training data (all 17 queryner types preserved)
Optimiser: AdamW, weight decay 0.01, warmup ratio 0.1
Segmented training: brand/product/origin first, then certification O-token signal at lower LR

Typical segment configuration:

Segment 1: epochs=3, lr=3e-5   (base → domain)
Segment 2: epochs=2, lr=1e-5   (add cert O-token signal)
Segment 3: epochs=2, lr=5e-6   (product name ratio increase)
Segment 4: epochs=2, lr=5e-6   (substitute-frame + multilingual, brand F1 0.698 → 0.897)

Evaluation

Evaluated on 63 held-out domain fixtures (39 general + 24 substitute-frame / multilingual) with exact and partial span matching.

Segment 4 — 2 epochs, lr=5e-6, base=segment 3 checkpoint, 20,203 training examples (incl. substitute-frame):

Label	P (partial)	R (partial)	F1 (partial)	F1 (exact)
brand	0.929	0.867	0.897	0.897
product category	0.895	0.962	0.927	0.891
product name	0.875	0.700	0.778	0.556
origin	1.000	0.917	0.957	0.957
overall	0.915	0.900	0.908	0.874

Key remaining gaps:

Ecover brand FN (4 fixtures): country_code IS NULL in DB excludes it from the training data generator.
Deutschland / Allemagne / Duitsland origin FN: generator uses English country names only.

Resolved at inference time (not model gaps):

Apostrophe artifact (dr. bronner ' s): handled by evident_decode.ner apostrophe normaliser before tokenisation.
Umlaut span mismatch (Spülmittel → spulmittel): resolved by offset-based span reconstruction in evident_decode.ner.

Quantized variant

An int8 ONNX-quantized variant (104.6 MB vs 415 MB full) is available at thepian/product-query-ner-int8. Uses dynamic int8 quantization via ONNX Runtime (avx512_vnni). Validated against the acceptance fixture set (≤1% F1 regression allowed). Requires optimum[onnxruntime] to load.

Paper

A paper describing the product query decoding architecture (deterministic decoder + NER + NLI intent) is in preparation. Link will be added here on publication.

Limitations

Extraction patterns are primarily English; avoidance frames in other languages (ohne, sans, senza) are not NER targets — they are handled by a separate parser
Multilingual product names are included in training but evaluation is English-only
Origin recognition covers ~13 European countries drawn from product records; global coverage is partial
Barcode and price extraction are not NER tasks — handled by a dedicated parser

Citation

If you use this model, please cite the base model:

@misc{queryner,
  author = {Björklund, Love and Ljunglöf, Peter},
  title = {QueryNER: Named Entity Recognition for Product Search Queries},
  year = {2024},
  publisher = {HuggingFace},
  url = {https://huggingface.co/bltlab/queryner-bert-base-uncased}
}

Downloads last month: 3

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for thepian/product-query-ner

Base model

bltlab/queryner-bert-base-uncased

Finetuned

(1)

this model

thepian
/

product-query-ner