Text Classification
Transformers
TensorBoard
Safetensors
English
Hungarian
German
distilbert
multilingual
fine-tuned
text-embeddings-inference
Instructions to use uvegesistvan/EGD_distilbert-base-multilingual-cased with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use uvegesistvan/EGD_distilbert-base-multilingual-cased with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="uvegesistvan/EGD_distilbert-base-multilingual-cased")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("uvegesistvan/EGD_distilbert-base-multilingual-cased") model = AutoModelForSequenceClassification.from_pretrained("uvegesistvan/EGD_distilbert-base-multilingual-cased") - Notebooks
- Google Colab
- Kaggle
EGD DistilBERT (Multilingual Cased)
Model Overview
This model is based on DistilBERT-base-multilingual-cased and has been fine-tuned on English, Hungarian, and German data for text classification of European Parliamentary speeches into rhetorical categories.
The model classifies text into three categories:
- 0 - Other (text that does not fit into moralist or realist categories)
- 1 - Moralist (arguments emphasizing moral reasoning)
- 2 - Realist (arguments applying pragmatic or realist reasoning)
This model is useful for analyzing political discourse and rhetorical styles in multiple languages.
Evaluation Results
The model was evaluated on a test set of 938 sentences, with the following results:
| Label | Precision | Recall | F1-score | Support |
|---|---|---|---|---|
| 0 - Other | 0.91 | 0.92 | 0.92 | 783 |
| 1 - Moralist | 0.49 | 0.40 | 0.44 | 65 |
| 2 - Realist | 0.43 | 0.44 | 0.44 | 90 |
- Overall accuracy: 0.84
- Macro average F1-score: 0.60
- Weighted average F1-score: 0.84
The model reliably distinguishes the general (other) class from moralist and realist arguments, though performance on the minority classes (1 and 2) is lower.
Usage
This model can be used with the Hugging Face Transformers library:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
model_name = "uvegesistvan/EGD_distilbert-base-multilingual-cased"
# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# Classify an example text
text = "The European Union has a responsibility towards future generations."
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
logits = outputs.logits
# Get predicted class
predicted_class = logits.argmax().item()
print(f"Predicted class: {predicted_class}")
- Downloads last month
- 1