File size: 3,250 Bytes
031a339
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3bc62f7
 
031a339
19fdbb7
 
031a339
2c073b3
 
 
031a339
19fdbb7
031a339
19fdbb7
 
 
 
031a339
 
 
 
 
 
19fdbb7
031a339
 
 
19fdbb7
031a339
 
 
19fdbb7
031a339
69bdc08
031a339
 
 
 
 
 
19fdbb7
031a339
 
19fdbb7
031a339
 
 
 
19fdbb7
6bee3ea
19fdbb7
 
 
 
 
 
 
69bdc08
19fdbb7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
69bdc08
19fdbb7
 
b0fca2e
19fdbb7
 
 
 
 
 
 
 
b0fca2e
 
 
19fdbb7
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
---
tags:
- tf-keras
- bert
- alberto
- multi-task-learning
- text-classification
- italian
- gender-classification
- ideology-detection
library_name: tf-keras
language:
- it
datasets:
- custom
---

# PIDIT: Political Ideology Detection in Italian Texts
A Multi-Task BERT + ALBERTO Model for Gender and Ideology Prediction 🇮🇹

This `tf.keras` model combines two pre-trained encoders — `BERT` and `ALBERTO` — to perform multi-task classification on Italian-language texts.  
It is designed to predict:

- **Author gender** (binary classification)
- **Binary ideology** (e.g., progressive vs conservative)
- **Multiclass ideology** (4 ideological classes)

## ✨ Architecture

- `TFBertModel` from `bert-base-italian-uncased` (frozen)
- `TFAutoModel` from `alberto-base-uncased` (frozen)
- Concatenated outputs + dense layers
- Three output heads:
  - `gender`: `Dense(1, activation="sigmoid")`
  - `ideology_binary`: `Dense(1, activation="sigmoid")`
  - `ideology_multiclass`: `Dense(4, activation="softmax")`

## 📥 Input

The model takes **6 input tensors**:
- `bert_input_ids`, `bert_token_type_ids`, `bert_attention_mask`
- `alberto_input_ids`, `alberto_token_type_ids`, `alberto_attention_mask`

All tensors have shape `(batch_size, max_length)`.

---

## 🚀 Usage

### Load model and tokenizers

```python
from huggingface_hub import snapshot_download
from transformers import TFBertModel, TFAutoModel
import tensorflow as tf

# Download the model locally
model_path = snapshot_download("leeeov4/PIDIT")

# Load the model
model = tf.keras.models.load_model(model_path, custom_objects={
    "TFBertModel": TFBertModel,
    "TFAutoModel": TFAutoModel
})

# Load the tokenizers

from transformers import AutoTokenizer

bert_tokenizer = AutoTokenizer.from_pretrained("leeeov4/PIDIT/bert_tokenizer")
alberto_tokenizer = AutoTokenizer.from_pretrained("leeeov4/PIDIT/alberto_tokenizer")
```

### Preprocessing Example

```python
def preprocess_text(text, max_length=250):
    bert_tokens = bert_tokenizer(text, max_length=max_length, padding='max_length', truncation=True, return_tensors='tf')
    alberto_tokens = alberto_tokenizer(text, max_length=max_length, padding='max_length', truncation=True, return_tensors='tf')

    return {
        'bert_input_ids': bert_tokens['input_ids'],
        'bert_token_type_ids': bert_tokens['token_type_ids'],
        'bert_attention_mask': bert_tokens['attention_mask'],
        'alberto_input_ids': alberto_tokens['input_ids'],
        'alberto_token_type_ids': alberto_tokens['token_type_ids'],
        'alberto_attention_mask': alberto_tokens['attention_mask']
    }

```


### Inference

```python
text = "Oggi, sabato 31 dicembre, alle ore 9.34, nel Monastero Mater Ecclesiae in Vaticano, il Signore ha chiamato a Sé il Santo Padre Emerito Benedetto XVI."
inputs = preprocess_text(text)
outputs = model.predict(inputs)

gender_prob = outputs[0][0][0]
ideology_binary_prob = outputs[1][0][0]
ideology_multiclass_probs = outputs[2][0]

print("Predicted gender (male probability):", gender_prob)
print("Predicted binary ideology (left probability):", ideology_binary_prob)
print("Multiclass ideology distribution (left, right, moderate left, moderate right):", ideology_multiclass_probs)


```