Polish Twitter Emotion Classifier (ONNX)

This is the ONNX FP32 version of yazoniak/twitter-emotion-pl-classifier

This model is an ONNX-converted version of the Polish Twitter Emotion Classifier, optimized for 2x faster inference while maintaining full accuracy. The model was converted using Hugging Face Optimum.

Quick Links

Model Description

This model predicts 8 emotion and sentiment labels simultaneously for Polish text:

  • Emotions: radość (joy), wstręt (disgust), gniew (anger), przeczuwanie (anticipation)
  • Sentiment: pozytywny (positive), negatywny (negative), neutralny (neutral)
  • Special: sarkazm (sarcasm)

Model Details

Attribute Value
Base Model PKOBP/polish-roberta-8k
Original Model yazoniak/twitter-emotion-pl-classifier
Architecture RoBERTa for Sequence Classification
Task Multi-label text classification
Language Polish
Format ONNX (FP32)
ONNX Opset 18
Model Size 1.7 GB
License GPL-3.0

Performance

ONNX vs PyTorch Comparison

Metric PyTorch ONNX FP32 Improvement
Mean Latency (CPU) 110.71 ms 55.28 ms 2.00x faster
P95 Latency 116.11 ms 56.70 ms 2.05x faster
Throughput 9.03/sec 18.09/sec 2.00x
Std Deviation 5.25 ms 0.69 ms 7.6x more consistent
Model Size 1.7 GB 1.7 GB Same

Note: ONNX has slower cold start (2.6s vs 0.3s) but significantly faster inference.

Model Accuracy

The ONNX model maintains the same accuracy as the original PyTorch model:

Metric Score
F1 Macro 0.8500
F1 Micro 0.8900
F1 Weighted 0.8895
Exact Match Accuracy 0.5125

For detailed per-label performance, see the original model card.

Numerical Validation

  • Structural Validation: Passed ONNX checker
  • Numerical Accuracy: All tests passed
    • Max absolute difference: 5.65e-06
    • Max relative difference: 1.93e-04

Installation

pip install optimum[onnxruntime] transformers numpy

For GPU support:

pip install optimum[onnxruntime-gpu] transformers numpy

Usage

Quick Start (Command Line)

# Download the inference scripts
wget https://huggingface.co/yazoniak/twitter-emotion-pl-classifier-onnx/resolve/main/predict.py
wget https://huggingface.co/yazoniak/twitter-emotion-pl-classifier-onnx/resolve/main/predict_calibrated.py

# Basic inference
python predict.py "Wspaniały dzień! Jestem bardzo szczęśliwy :)"

# Calibrated inference (recommended for best accuracy)
python predict_calibrated.py "Wspaniały dzień! Jestem bardzo szczęśliwy :)"

Python API - Basic Inference

from optimum.onnxruntime import ORTModelForSequenceClassification
from transformers import AutoTokenizer
import numpy as np
import re

# Load model and tokenizer
model_name = "yazoniak/twitter-emotion-pl-classifier-onnx"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = ORTModelForSequenceClassification.from_pretrained(
    model_name,
    provider="CPUExecutionProvider"  # or "CUDAExecutionProvider" for GPU
)

# Preprocess text (anonymize @mentions - IMPORTANT!)
def preprocess_text(text):
    return re.sub(r"@\w+", "@anonymized_account", text)

text = "@user To jest wspaniały dzień!"
processed_text = preprocess_text(text)

# Tokenize and run inference
inputs = tokenizer(processed_text, return_tensors="pt", truncation=True, max_length=8192)
outputs = model(**inputs)

# Get probabilities (sigmoid for multi-label)
logits = outputs.logits.squeeze().numpy()
probabilities = 1 / (1 + np.exp(-logits))

# Get labels above threshold
labels = [model.config.id2label[i] for i in range(model.config.num_labels)]
threshold = 0.5
predictions = {labels[i]: float(probabilities[i]) 
               for i in range(len(labels)) if probabilities[i] > threshold}

print(predictions)
# Output: {'radość': 0.9758, 'pozytywny': 0.9856}

Python API - Calibrated Inference (Recommended)

For improved accuracy, use temperature scaling and optimal thresholds:

import json
from huggingface_hub import hf_hub_download

# Download calibration artifacts
calib_path = hf_hub_download(
    repo_id="yazoniak/twitter-emotion-pl-classifier-onnx",
    filename="calibration_artifacts.json"
)

with open(calib_path) as f:
    calib = json.load(f)

temperatures = calib["temperatures"]
optimal_thresholds = calib["optimal_thresholds"]

# Apply temperature scaling and optimal thresholds
calibrated_probs = {}
for i, label in enumerate(labels):
    temp = temperatures[label]
    thresh = optimal_thresholds[label]
    
    # Temperature scaling
    calibrated_logit = logits[i] / temp
    prob = 1 / (1 + np.exp(-calibrated_logit))
    
    if prob > thresh:
        calibrated_probs[label] = float(prob)

print(calibrated_probs)

GPU Inference

model = ORTModelForSequenceClassification.from_pretrained(
    "yazoniak/twitter-emotion-pl-classifier-onnx",
    provider="CUDAExecutionProvider"
)

When to Use This Model

Use ONNX FP32 when:

  • You need 2x faster inference than PyTorch
  • You want full FP32 precision
  • You're deploying on CPU servers
  • You need cross-platform compatibility

Consider alternatives:

  • Original PyTorch: For fine-tuning or GPU training
  • ONNX INT8: For even faster inference (3x) and smaller size (75% reduction)

Important Notes

Text Preprocessing

⚠️ The model expects @mentions to be anonymized!

The model was trained with anonymized Twitter mentions. Always preprocess text:

text = re.sub(r"@\w+", "@anonymized_account", text)

The provided scripts (predict.py, predict_calibrated.py) handle this automatically.

Calibration

For best accuracy, use calibrated inference with:

  • Temperature scaling (per-label)
  • Optimized thresholds (per-label)

See predict_calibrated.py or the calibrated inference example above.

Limitations

  • Twitter-specific: Optimized for informal Polish social media text
  • Sarcasm detection: Lower performance (F1: 0.53) - inherently difficult
  • Context length: Optimal for tweet-length texts (up to 8,192 tokens)
  • Formal text: May not generalize well to news or academic writing

For detailed limitations, see the original model card.

Files in This Repository

File Size Description
model.onnx 1.7 GB ONNX model weights (FP32)
config.json 2 KB Model configuration
tokenizer.json 8.2 MB Tokenizer vocabulary
tokenizer_config.json 12 KB Tokenizer settings
calibration_artifacts.json 1 KB Temperature scaling & optimal thresholds
predict.py 4 KB Simple inference script
predict_calibrated.py 5 KB Calibrated inference script (recommended)

Citation

@model{yazoniak2025twitteremotionpl,
  title={Polish Twitter Emotion Classifier (RoBERTa-8k)},
  author={yazoniak},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/yazoniak/twitter-emotion-pl-classifier}
}

Also cite the dataset and base model:

@dataset{yazoniak_twitteremo_pl_refined_2025,
  title={TwitterEmo-PL-Refined: Polish Twitter Emotions (8 labels, refined)},
  author={yazoniak},
  year={2025},
  url={https://huggingface.co/datasets/yazoniak/TwitterEmo-PL-Refined}
}

@inproceedings{bogdanowicz2023twitteremo,
  title={TwitterEmo: Annotating Emotions and Sentiment in Polish Twitter},
  author={Bogdanowicz, S. and Cwynar, H. and Zwierzchowska, A. and Klamra, C. and Kiera{\'s}, W. and Kobyli{\'n}ski, {\L}.},
  booktitle={Computational Science -- ICCS 2023},
  series={Lecture Notes in Computer Science},
  volume={14074},
  publisher={Springer, Cham},
  year={2023},
  doi={10.1007/978-3-031-36021-3_20}
}

License

This model is released under the GNU General Public License v3.0 (GPL-3.0), inherited from the training dataset.

License Chain:

Acknowledgments


Model Version: v1.0-onnx
Last Updated: 2026-01-29

Downloads last month
10
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for yazoniak/twitter-emotion-pl-classifier-ONNX

Quantized
(2)
this model

Evaluation results