File size: 3,250 Bytes
031a339 3bc62f7 031a339 19fdbb7 031a339 2c073b3 031a339 19fdbb7 031a339 19fdbb7 031a339 19fdbb7 031a339 19fdbb7 031a339 19fdbb7 031a339 69bdc08 031a339 19fdbb7 031a339 19fdbb7 031a339 19fdbb7 6bee3ea 19fdbb7 69bdc08 19fdbb7 69bdc08 19fdbb7 b0fca2e 19fdbb7 b0fca2e 19fdbb7 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 |
---
tags:
- tf-keras
- bert
- alberto
- multi-task-learning
- text-classification
- italian
- gender-classification
- ideology-detection
library_name: tf-keras
language:
- it
datasets:
- custom
---
# PIDIT: Political Ideology Detection in Italian Texts
A Multi-Task BERT + ALBERTO Model for Gender and Ideology Prediction 🇮🇹
This `tf.keras` model combines two pre-trained encoders — `BERT` and `ALBERTO` — to perform multi-task classification on Italian-language texts.
It is designed to predict:
- **Author gender** (binary classification)
- **Binary ideology** (e.g., progressive vs conservative)
- **Multiclass ideology** (4 ideological classes)
## ✨ Architecture
- `TFBertModel` from `bert-base-italian-uncased` (frozen)
- `TFAutoModel` from `alberto-base-uncased` (frozen)
- Concatenated outputs + dense layers
- Three output heads:
- `gender`: `Dense(1, activation="sigmoid")`
- `ideology_binary`: `Dense(1, activation="sigmoid")`
- `ideology_multiclass`: `Dense(4, activation="softmax")`
## 📥 Input
The model takes **6 input tensors**:
- `bert_input_ids`, `bert_token_type_ids`, `bert_attention_mask`
- `alberto_input_ids`, `alberto_token_type_ids`, `alberto_attention_mask`
All tensors have shape `(batch_size, max_length)`.
---
## 🚀 Usage
### Load model and tokenizers
```python
from huggingface_hub import snapshot_download
from transformers import TFBertModel, TFAutoModel
import tensorflow as tf
# Download the model locally
model_path = snapshot_download("leeeov4/PIDIT")
# Load the model
model = tf.keras.models.load_model(model_path, custom_objects={
"TFBertModel": TFBertModel,
"TFAutoModel": TFAutoModel
})
# Load the tokenizers
from transformers import AutoTokenizer
bert_tokenizer = AutoTokenizer.from_pretrained("leeeov4/PIDIT/bert_tokenizer")
alberto_tokenizer = AutoTokenizer.from_pretrained("leeeov4/PIDIT/alberto_tokenizer")
```
### Preprocessing Example
```python
def preprocess_text(text, max_length=250):
bert_tokens = bert_tokenizer(text, max_length=max_length, padding='max_length', truncation=True, return_tensors='tf')
alberto_tokens = alberto_tokenizer(text, max_length=max_length, padding='max_length', truncation=True, return_tensors='tf')
return {
'bert_input_ids': bert_tokens['input_ids'],
'bert_token_type_ids': bert_tokens['token_type_ids'],
'bert_attention_mask': bert_tokens['attention_mask'],
'alberto_input_ids': alberto_tokens['input_ids'],
'alberto_token_type_ids': alberto_tokens['token_type_ids'],
'alberto_attention_mask': alberto_tokens['attention_mask']
}
```
### Inference
```python
text = "Oggi, sabato 31 dicembre, alle ore 9.34, nel Monastero Mater Ecclesiae in Vaticano, il Signore ha chiamato a Sé il Santo Padre Emerito Benedetto XVI."
inputs = preprocess_text(text)
outputs = model.predict(inputs)
gender_prob = outputs[0][0][0]
ideology_binary_prob = outputs[1][0][0]
ideology_multiclass_probs = outputs[2][0]
print("Predicted gender (male probability):", gender_prob)
print("Predicted binary ideology (left probability):", ideology_binary_prob)
print("Multiclass ideology distribution (left, right, moderate left, moderate right):", ideology_multiclass_probs)
``` |