Instructions to use spivi87/alephbert-intent-he with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use spivi87/alephbert-intent-he with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="spivi87/alephbert-intent-he")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("spivi87/alephbert-intent-he") model = AutoModelForSequenceClassification.from_pretrained("spivi87/alephbert-intent-he") - Notebooks
- Google Colab
- Kaggle
AlephBERT Hebrew Shopping Intent Classifier
סיווג כוונות בעברית
This is a small Hebrew text classifier. You give it a short message like "תוסיף חלב וביצים" ("add milk and eggs") and it tells you what the person wants, out of 17 possible intents (add an item, show the list, clear the list, and so on). It is a fine-tune of AlephBERT, a Hebrew BERT model, and I built it as a learning project.
I am not an ML researcher, and this is not a Hebrew NLP benchmark. It is a worked example of turning one narrow problem into a small fine-tuned model and checking it honestly. The full tutorial and code are in the GitHub repo: github.com/spivi/alephbert-intent-he.
Important limitation: the train and test data are synthetic paraphrases, not real chat traffic. The split avoids leakage by holding out source seeds before paraphrasing, but read the numbers as a controlled experiment, not as proof of real-world accuracy.
Try it
from transformers import pipeline
clf = pipeline("text-classification", model="spivi87/alephbert-intent-he", top_k=3)
clf("תקנה ביצים")
# [{'label': 'GROCERY_REQUEST', 'score': 0.90}, ...]
You get readable label names back (not LABEL_0). The mapping from numbers to
names is saved inside config.json. If you would rather run without PyTorch, the
GitHub repo has a one-command ONNX export.
There is also a live demo you can try in the browser, with no setup: spivi87/alephbert-intent-he-demo.
What it is for
It sorts short Hebrew messages from a shopping or grocery chat into one of 17 intents: add an item, show the list, clear the list, a recipe link, group admin actions, and so on (the full list is in the table further down).
It also works as a starting point if you want to fine-tune your own Hebrew classifier for a different topic. The weights are already comfortable with short, informal Hebrew that has typos, emoji, and a little English mixed in.
A practical tip: I use about 0.7 as a confidence heuristic in the demo. I did not
calibrate the probabilities, so treat it as an operational rule of thumb, not a
proven threshold. Below it, fall back to a larger model or treat the message as
OTHER.
Results
Mean over 3 training runs (seeds 42, 43, 44), on a held-out test set of 374 messages (22 per intent). The published checkpoint is the seed 42 run.
- Accuracy: 0.773 ± 0.008
- Macro F1: 0.763 ± 0.008
Per-intent metrics (seed 42)
| Intent | Precision | Recall | F1 | Support |
|---|---|---|---|---|
GROCERY_REQUEST |
0.800 | 0.909 | 0.851 | 22 |
RECIPE_URL |
0.900 | 0.818 | 0.857 | 22 |
LIST_QUERY |
0.760 | 0.864 | 0.809 | 22 |
CLEAR_LIST |
0.722 | 0.591 | 0.650 | 22 |
REMOVE_ITEM |
0.750 | 0.818 | 0.783 | 22 |
PARTIAL_COMPLETION |
0.909 | 0.909 | 0.909 | 22 |
GROUP_INFO |
1.000 | 0.545 | 0.706 | 22 |
GET_INVITE_CODE |
0.786 | 1.000 | 0.880 | 22 |
CREATE_INVITE |
0.619 | 0.591 | 0.605 | 22 |
RENAME_GROUP |
0.957 | 1.000 | 0.978 | 22 |
LEAVE_GROUP |
0.714 | 0.909 | 0.800 | 22 |
NOTIFICATION_SETTINGS |
0.857 | 0.545 | 0.667 | 22 |
REVOKE_INVITE |
0.808 | 0.955 | 0.875 | 22 |
RECIPE_SEARCH |
0.808 | 0.955 | 0.875 | 22 |
UPDATE_QUANTITY |
1.000 | 0.955 | 0.977 | 22 |
BUG_REPORT |
0.375 | 0.273 | 0.316 | 22 |
OTHER |
0.600 | 0.682 | 0.638 | 22 |
The full report and confusion matrix are in
EVALUATION.md and
confusion_matrix.png.
How it compares to a zero-shot LLM
On the same 374-message test set:
| Approach | Accuracy | Cost per 1,000 messages |
|---|---|---|
| Random guessing | 0.0668 | $0 |
| Always the most common label | 0.0588 | $0 |
| Keyword rules (hand-written) | 0.2487 | $0 |
| GPT-4o-mini, zero-shot | 0.5722 | about $0.05 (estimate, see note) |
| AlephBERT fine-tune (this model, seed 42) | 0.7834 | $0 |
On this narrow synthetic grocery-intent task the fine-tune scored higher than my zero-shot GPT-4o-mini baseline on the same split. That does not mean it is a better general Hebrew model. It means that for a narrow, repeated task, a small task-specific model can be cheaper, private, and surprisingly effective. The GPT-4o-mini cost is an estimate for my zero-shot prompt and these short messages; OpenAI bills per token, not per message, so your cost will vary with prompt length, number of labels, and output format.
How it was trained (short version)
The training data is synthetic. I started from a handful of Hebrew example
sentences per intent and asked an LLM to rewrite each one into many variations.
After testing the first version I found gaps and fed the failures back into the
data: "buy X" requests like "תקנה חלב" were misread, and short single-item requests
like "תקנה ביצים" were confused with removing an item. I added hand-authored
examples to cover the missing phrasings and retrained. No real user messages were
used, so there is no private data. A sample is in the GitHub repo, under
data/sample.jsonl.
The train/test split happens at the seed level, before the rewriting step. For each intent, 2 seeds are held out completely, and the test set only contains rewrites of those held-out seeds. Splitting after rewriting instead would let rewrites of the same sentence land on both sides, so the model has effectively seen the test questions and the accuracy looks better than it is.
Training settings
| Setting | Value |
|---|---|
| Base model | onlplab/alephbert-base |
| Optimizer | AdamW (HF Trainer default) |
| Learning rate | 2e-5 (linear warmup, linear decay) |
| Batch size | 16 (train) / 32 (eval) |
| Max sequence length | 128 tokens |
| Max epochs | 10 (early stopping on eval_accuracy, patience 3) |
| Seeds | 42, 43, 44 (seed 42 is the published checkpoint) |
The full step-by-step instructions are in the GitHub repo.
Labels
| ID | Label | What it means |
|---|---|---|
| 0 | GROCERY_REQUEST |
Add items to the shopping list |
| 1 | RECIPE_URL |
A recipe link: pull the ingredients from it |
| 2 | LIST_QUERY |
Show the current shopping list |
| 3 | CLEAR_LIST |
Mark everything bought and clear the list |
| 4 | REMOVE_ITEM |
Remove one item from the list |
| 5 | PARTIAL_COMPLETION |
Mark most items bought, except a few |
| 6 | GROUP_INFO |
Show the group members and details |
| 7 | GET_INVITE_CODE |
Get the existing group invite code |
| 8 | CREATE_INVITE |
Generate a new group invite code |
| 9 | RENAME_GROUP |
Change the group name |
| 10 | LEAVE_GROUP |
Leave the current group |
| 11 | NOTIFICATION_SETTINGS |
Toggle notification preferences |
| 12 | REVOKE_INVITE |
Cancel a group invite code |
| 13 | RECIPE_SEARCH |
Build a shopping list for a known dish |
| 14 | UPDATE_QUANTITY |
Change the quantity of an item already on the list |
| 15 | BUG_REPORT |
Report a problem with the bot |
| 16 | OTHER |
A chat or off-topic message, not a shopping action |
Good to know before you rely on it
- It was trained only on shopping and grocery messages. For another topic, treat it as a base to fine-tune, not a finished classifier.
BUG_REPORTandOTHERare the weakest classes (F1 around 0.32 to 0.64). Read a low-confidence prediction as "not sure" rather than a firm label.- It expects Hebrew, in short messages of up to 128 tokens. A little Hebrew and English mixing is fine.
- The training data is synthetic, so unusual or very noisy phrasing may be classified less reliably than the test examples.
Credit and license
This model is only a fine-tune. The real work, teaching a model the Hebrew language in the first place, was done by the AlephBERT team at the ONLP Lab, Bar-Ilan University: Amit Seker, Elron Bandel, Dan Bareket, Idan Brusilovsky, Refael Greenfeld, and Reut Tsarfaty. Thank you to them. Their code and models are at github.com/OnlpLab/AlephBERT. If AlephBERT is useful in your own work, please cite their paper.
Base model: onlplab/alephbert-base
(Apache 2.0). This fine-tune is released under Apache 2.0 as well.
@inproceedings{seker-etal-2022-alephbert,
title = "{A}leph{BERT}: Language Model Pre-training and Evaluation from Sub-Word to Sentence Level",
author = "Seker, Amit and Bandel, Elron and Bareket, Dan and Brusilovsky, Idan and Greenfeld, Refael and Tsarfaty, Reut",
booktitle = "Proceedings of the 60th Annual Meeting of the ACL",
year = "2022",
address = "Dublin, Ireland",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.acl-long.4",
pages = "46--56",
}
- Downloads last month
- 59
Model tree for spivi87/alephbert-intent-he
Base model
onlplab/alephbert-base