Hebrew Grocery Item Classifier v2
Fine-tuned multilingual-MiniLMv2-L6-mnli-xnli
for classifying Hebrew grocery shopping list items into 12 supermarket categories.
This is an improved version over v1 with a larger, more diverse training set and targeted augmentation for weak classes.
finetune code at https://github.com/davidit17/shopping_list_bot
What's new in v2
- Larger dataset โ 3043 total items (2434 train / 609 test), up from ~600 in v1
- 4 training sources โ manual lists, Excel grocery data, Gemini-labelled web data, and a ChatGPT-labelled set
- Synthetic augmentation โ rule-based Hebrew item generator to balance under-represented categories
- Weak-class booster โ extra hand-curated examples for ืืฉืงืืืช, ืงืคืืืื ืืกืืืื, and ืจืืืื ืืืืจืืื
- Per-class F1 tracking โ evaluation now reports per-category F1 to surface weak spots
Categories
ืืืฆืจื ืืื ืืืืฆืื
ืืจืงืืช ืืคืืจืืช
ืืื ืืืคืื ืืืื ืื
ืืืืคืื ืืืชืืงืื
ื ืืงืืื ืืืคืื ืืื ืคืขืื
ืืฉืงืืืช
ืืืฆืจืื ืืืคืืื
ืืฉืจ ืืืืื
ืงืคืืืื ืืกืืืื
ืืืฉืื
ืจืืืื ืืืืจืืื
ืืืจ
Usage
from transformers import pipeline
classifier = pipeline(
"text-classification",
model=f"davidit17/minilm-grocery-hebrew-v2",
)
items = ["ืืืื ื ืฆืืืื", "ืขืืื ืืืช ืฉืจื", "ืฉืืคื", "ืงืืงื ืงืืื", "ืืื ืขืืฃ"]
for item in items:
result = classifier(item)[0]
print(item, "โ", result["label"], f"({result['score']:.2f})")
With confidence threshold (recommended)
CONFIDENCE_THRESHOLD = 0.5
def predict(text: str) -> str:
result = classifier(text)[0]
return result["label"] if result["score"] >= CONFIDENCE_THRESHOLD else "ืืืจ"
Training
| Parameter |
Value |
| Base model |
multilingual-MiniLMv2-L6-mnli-xnli |
| Max epochs |
50 (early stopping, patience=8) |
| Learning rate |
2e-5 |
| Weight decay |
0.01 |
| Warmup ratio |
0.1 |
| Batch size |
16 |
| Max sequence length |
32 |
| Train / test split |
80% / 20% stratified |
| Hardware |
GPU (CUDA) |
Data sources
| Source |
Description |
| Source A |
Manual Hebrew grocery lists from Israeli forum |
| Source B |
Excel-format grocery list, cleaned and melted |
| Source C |
Web grocery list labelled with Gemini |
| Source D |
ChatGPT-labelled items across all 12 categories |
| Source E |
Synthetic items generated with rule-based Hebrew augmentor |
| Source F |
Hand-curated booster set for weak categories |
Evaluation
Overall
| Metric |
Base (zero-shot) |
Fine-tuned v2 |
| Accuracy |
0.1494 |
0.8818 |
| Weighted F1 |
0.1363 |
0.8808 |
Per-class F1 (fine-tuned v2)
| Category |
F1 |
ืืืฆืจื ืืื ืืืืฆืื |
0.9057 |
ืืจืงืืช ืืคืืจืืช |
0.8696 |
ืืื ืืืคืื ืืืื ืื |
0.8767 |
ืืืืคืื ืืืชืืงืื |
0.9091 |
ื ืืงืืื ืืืคืื ืืื ืคืขืื |
0.9020 |
ืืฉืงืืืช |
0.8750 |
ืืืฆืจืื ืืืคืืื |
0.8687 |
ืืฉืจ ืืืืื |
0.9333 |
ืงืคืืืื ืืกืืืื |
0.8632 |
ืืืฉืื |
0.8000 |
ืจืืืื ืืืืจืืื |
0.8913 |
ืืืจ |
0.9024 |
Limitations
- Primarily covers Israeli supermarket conventions and Hebrew product naming.
- Brand-specific or very niche items may fall back to
ืืืจ.
- Low-confidence predictions (score < 0.5) should be treated as
ืืืจ.
- Synthetic training examples may not fully reflect natural shopping list variation.
- The
ืืืจ category is a catch-all and may have lower precision than domain-specific categories.