Instructions to use haimgoldfisher/HeBERT_sentiment_analysis with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use haimgoldfisher/HeBERT_sentiment_analysis with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="haimgoldfisher/HeBERT_sentiment_analysis")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("haimgoldfisher/HeBERT_sentiment_analysis") model = AutoModelForSequenceClassification.from_pretrained("haimgoldfisher/HeBERT_sentiment_analysis") - Notebooks
- Google Colab
- Kaggle
HeBERT_sentiment_analysis
This model is a fine-tuned version of avichr/heBERT_sentiment_analysis for Hebrew sentiment classification.
It achieves the following results on the evaluation set:
- Loss: 0.3750
- Accuracy: 0.8683
- Macro F1: 0.8646
- Weighted F1: 0.8682
🚀 Use this model
Quickstart with pipeline
The easiest way to run the model is with the transformers pipeline API:
from transformers import pipeline
classifier = pipeline(
task="text-classification",
model="<YOUR_HF_USERNAME>/HeBERT_sentiment_analysis",
tokenizer="<YOUR_HF_USERNAME>/HeBERT_sentiment_analysis",
return_all_scores=True,
)
text = "השירות היה מצוין והאוכל היה טעים מאוד!"
print(classifier(text))
# [[{'label': 'positive', 'score': 0.97}, {'label': 'neutral', 'score': 0.02}, {'label': 'negative', 'score': 0.01}]]
Direct loading with AutoModel
For more control (batching, custom thresholds, ONNX export, etc.):
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
model_id = "<YOUR_HF_USERNAME>/HeBERT_sentiment_analysis"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)
model.eval()
texts = [
"השירות היה מצוין והאוכל היה טעים מאוד!",
"החוויה הייתה מאכזבת והמחיר היה גבוה מדי.",
"ההזמנה הגיעה בזמן.",
]
inputs = tokenizer(texts, padding=True, truncation=True, max_length=128, return_tensors="pt")
with torch.no_grad():
logits = model(**inputs).logits
probs = torch.softmax(logits, dim=-1)
preds = probs.argmax(dim=-1)
labels = [model.config.id2label[p.item()] for p in preds]
for text, label, prob in zip(texts, labels, probs):
print(f"{label}\t({prob.max():.3f})\t{text}")
GPU / half-precision
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
device = "cuda" if torch.cuda.is_available() else "cpu"
model = AutoModelForSequenceClassification.from_pretrained(
"<YOUR_HF_USERNAME>/HeBERT_sentiment_analysis",
torch_dtype=torch.float16 if device == "cuda" else torch.float32,
).to(device)
🌐 Deploy
Option 1 — Hugging Face Inference API (zero infra)
The model is exposed via the free Inference API as soon as it's pushed to the Hub:
curl https://api-inference.huggingface.co/models/<YOUR_HF_USERNAME>/HeBERT_sentiment_analysis \
-H "Authorization: Bearer $HF_TOKEN" \
-H "Content-Type: application/json" \
-d '{"inputs": "השירות היה מצוין והאוכל היה טעים מאוד!"}'
Option 2 — Inference Endpoints (production)
Click Deploy → Inference Endpoints on the model page, or via the CLI:
huggingface-cli login
# In the UI: choose CPU (small/medium) or a T4/A10G GPU for higher throughput.
Recommended starting config:
- Hardware: CPU-Small for < 50 req/min, GPU T4 for higher load
- Replicas: 1 (autoscale 1→3)
- Task:
text-classification - Max input length: 128 tokens
Option 3 — Docker (self-hosted with TGI / TEI)
For the lowest-latency self-hosted deployment, use text-embeddings-inference (supports BERT classifiers):
docker run -p 8080:80 \
-v $PWD/data:/data \
--gpus all \
ghcr.io/huggingface/text-embeddings-inference:1.5 \
--model-id <YOUR_HF_USERNAME>/HeBERT_sentiment_analysis
Then call it:
curl http://localhost:8080/predict \
-H 'Content-Type: application/json' \
-d '{"inputs": "השירות היה מצוין והאוכל היה טעים מאוד!"}'
Option 4 — ONNX / quantized for edge
from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import AutoTokenizer
model = ORTModelForSequenceClassification.from_pretrained(
"<YOUR_HF_USERNAME>/HeBERT_sentiment_analysis",
export=True,
)
tokenizer = AutoTokenizer.from_pretrained("<YOUR_HF_USERNAME>/HeBERT_sentiment_analysis")
model.save_pretrained("./onnx-hebert-sentiment")
tokenizer.save_pretrained("./onnx-hebert-sentiment")
🎮 Demo
A live Gradio demo is available as a Hugging Face Space: 👉 Try it on Spaces
Run the same demo locally:
# app.py
import gradio as gr
from transformers import pipeline
clf = pipeline("text-classification",
model="<YOUR_HF_USERNAME>/HeBERT_sentiment_analysis",
return_all_scores=True)
def predict(text):
scores = clf(text)[0]
return {item["label"]: float(item["score"]) for item in scores}
demo = gr.Interface(
fn=predict,
inputs=gr.Textbox(label="טקסט בעברית", rtl=True, lines=3,
placeholder="הכנס טקסט לניתוח רגש..."),
outputs=gr.Label(num_top_classes=3, label="סנטימנט"),
title="HeBERT Sentiment Analysis",
description="ניתוח רגש בעברית — חיובי / נייטרלי / שלילי",
examples=[
["השירות היה מצוין והאוכל היה טעים מאוד!"],
["החוויה הייתה מאכזבת והמחיר היה גבוה מדי."],
["ההזמנה הגיעה בזמן."],
],
)
if __name__ == "__main__":
demo.launch()
pip install gradio transformers torch
python app.py
Model description
HeBERT_sentiment_analysis is a Hebrew sentiment classifier built on top of avichr/heBERT_sentiment_analysis, itself a HeBERT (Hebrew BERT) checkpoint pre-trained on the Hebrew portion of OSCAR, Wikipedia, and a large Hebrew news corpus.
This fine-tune adapts the base classifier to a new domain-specific labeled dataset, improving accuracy and F1 on in-domain examples.
Intended uses & limitations
Intended uses
- Sentiment classification of Hebrew short-to-medium text (reviews, comments, social posts, support tickets).
- Backbone for downstream Hebrew NLP pipelines (alerting, content moderation triage, customer-feedback analytics).
Limitations
- Trained on Hebrew only — performance on code-switched (Hebrew + English/Arabic) text is not guaranteed.
- Optimized for inputs up to 128 tokens. Longer documents should be chunked.
- The model may reflect biases present in the underlying HeBERT pre-training data and the fine-tuning dataset; review predictions before using in high-stakes settings.
- Not designed for sarcasm-heavy text, multi-aspect sentiment, or emotion classification beyond polarity.
Training and evaluation data
The model was fine-tuned on a labeled Hebrew sentiment dataset (positive / neutral / negative). Detailed dataset card to be added.
Training procedure
Training hyperparameters
| Hyperparameter | Value |
|---|---|
| learning_rate | 2e-05 |
| train_batch_size | 16 |
| eval_batch_size | 32 |
| seed | 42 |
| gradient_accumulation_steps | 2 |
| total_train_batch_size | 32 |
| optimizer | ADAMW_TORCH_FUSED (β=(0.9, 0.999), ε=1e-08) |
| lr_scheduler_type | linear |
| lr_scheduler_warmup_steps | 0.06 |
| num_epochs | 2 |
| mixed_precision_training | Native AMP |
Training results
| Training Loss | Epoch | Step | Validation Loss | Accuracy | Macro F1 | Weighted F1 |
|---|---|---|---|---|---|---|
| 0.7800 | 1.0 | 1784 | 0.4106 | 0.8362 | 0.8334 | 0.8373 |
| 0.4435 | 2.0 | 3568 | 0.3750 | 0.8683 | 0.8646 | 0.8682 |
Framework versions
- Transformers 5.8.0
- PyTorch 2.11.0+cu130
- Datasets 4.8.5
- Tokenizers 0.22.2
- Downloads last month
- 45
Model tree for haimgoldfisher/HeBERT_sentiment_analysis
Base model
avichr/heBERT_sentiment_analysisEvaluation results
- Accuracyself-reported0.868
- Macro F1self-reported0.865
- Weighted F1self-reported0.868