leeeov4
/

PIDIT

@@ -15,49 +15,95 @@ datasets:
 - custom
 ---
-# PIDIT: Modello Multi-Task BERT + ALBERTO per analisi ideologica e di genere 🇮🇹
-Questo modello `tf.keras` unisce due encoder pre-addestrati (`BERT` e `ALBERTO`) per effettuare predizioni multi-task su testi in italiano.
-È progettato per classificare:
-- 🧑‍🤝‍🧑 **Genere** dell'autore (binary classification)
-- 🏛️ **Ideologia binaria** (es. conservatore vs progressista)
-- 🧭 **Ideologia multiclasse** (4 classi ideologiche)
-## ✨ Architettura
-- `TFBertModel` da `bert-base-italian-uncased` (non fine-tuned)
-- `TFAutoModel` da `alberto-base-uncased` (non fine-tuned)
-- Layer di concatenazione e densi condivisi
-- 3 teste di output:
   - `gender`: `Dense(1, activation="sigmoid")`
   - `ideology_binary`: `Dense(1, activation="sigmoid")`
   - `ideology_multiclass`: `Dense(4, activation="softmax")`
 ## 📥 Input
-Il modello accetta **6 input**:
 - `bert_input_ids`, `bert_token_type_ids`, `bert_attention_mask`
 - `alberto_input_ids`, `alberto_token_type_ids`, `alberto_attention_mask`
-Tutti con shape `(batch_size, max_length)`.
 ---
-## 🚀 Utilizzo
-### 1. Caricamento del modello
 ```python
 from huggingface_hub import snapshot_download
 from transformers import TFBertModel, TFAutoModel
 import tensorflow as tf
-# Scarica localmente il modello
 model_path = snapshot_download("leeeov4/PIDIT")
-# Carica il modello
 model = tf.keras.models.load_model(model_path, custom_objects={
     "TFBertModel": TFBertModel,
     "TFAutoModel": TFAutoModel
 })

 - custom
 ---
+# PIDIT: Multi-Task BERT + ALBERTO Model for Gender and Ideology Prediction 🇮🇹
+This `tf.keras` model combines two pre-trained encoders — `BERT` and `ALBERTO` — to perform multi-task classification on Italian-language texts.
+It is designed to predict:
+- 🧑‍🤝‍🧑 **Author gender** (binary classification)
+- 🏛️ **Binary ideology** (e.g., progressive vs conservative)
+- 🧭 **Multiclass ideology** (4 ideological classes)
+## ✨ Architecture
+- `TFBertModel` from `bert-base-italian-uncased` (frozen)
+- `TFAutoModel` from `alberto-base-uncased` (frozen)
+- Concatenated outputs + dense layers
+- Three output heads:
   - `gender`: `Dense(1, activation="sigmoid")`
   - `ideology_binary`: `Dense(1, activation="sigmoid")`
   - `ideology_multiclass`: `Dense(4, activation="softmax")`
 ## 📥 Input
+The model takes **6 input tensors**:
 - `bert_input_ids`, `bert_token_type_ids`, `bert_attention_mask`
 - `alberto_input_ids`, `alberto_token_type_ids`, `alberto_attention_mask`
+All tensors have shape `(batch_size, max_length)`.
 ---
+## 🚀 Usage
+### 1. Load the model
 ```python
 from huggingface_hub import snapshot_download
 from transformers import TFBertModel, TFAutoModel
 import tensorflow as tf
+# Download the model locally
 model_path = snapshot_download("leeeov4/PIDIT")
+# Load the model
 model = tf.keras.models.load_model(model_path, custom_objects={
     "TFBertModel": TFBertModel,
     "TFAutoModel": TFAutoModel
 })
+```
+### 2. Load the tokenizers
+```python
+from transformers import AutoTokenizer
+bert_tokenizer = AutoTokenizer.from_pretrained("leeeov4/PIDIT/bert_tokenizer")
+alberto_tokenizer = AutoTokenizer.from_pretrained("leeeov4/PIDIT/alberto_tokenizer")
+```
+## 🧼 Preprocessing Example
+```python
+def preprocess_text(text, max_length=250):
+    bert_tokens = bert_tokenizer(text, max_length=max_length, padding='max_length', truncation=True, return_tensors='tf')
+    alberto_tokens = alberto_tokenizer(text, max_length=max_length, padding='max_length', truncation=True, return_tensors='tf')
+    return {
+        'bert_input_ids': bert_tokens['input_ids'],
+        'bert_token_type_ids': bert_tokens['token_type_ids'],
+        'bert_attention_mask': bert_tokens['attention_mask'],
+        'alberto_input_ids': alberto_tokens['input_ids'],
+        'alberto_token_type_ids': alberto_tokens['token_type_ids'],
+        'alberto_attention_mask': alberto_tokens['attention_mask']
+    }
+```
+## 🧼 Inference
+```python
+text = "Questo è un esempio di testo italiano per testare il modello."
+inputs = preprocess_text(text)
+outputs = model.predict(inputs)
+gender_prob = outputs[0][0][0]
+ideology_binary_prob = outputs[1][0][0]
+ideology_multiclass_probs = outputs[2][0]
+print("Predicted gender (male probability):", gender_prob)
+print("Predicted binary ideology (conservative probability):", ideology_binary_prob)
+print("Multiclass ideology distribution:", ideology_multiclass_probs)
+```