--- tags: - tf-keras - bert - alberto - multi-task-learning - text-classification - italian - gender-classification - ideology-detection library_name: tf-keras language: - it datasets: - custom --- # PIDIT: Political Ideology Detection in Italian Texts A Multi-Task BERT + ALBERTO Model for Gender and Ideology Prediction 🇮🇹 This `tf.keras` model combines two pre-trained encoders — `BERT` and `ALBERTO` — to perform multi-task classification on Italian-language texts. It is designed to predict: - **Author gender** (binary classification) - **Binary ideology** (e.g., progressive vs conservative) - **Multiclass ideology** (4 ideological classes) ## ✨ Architecture - `TFBertModel` from `bert-base-italian-uncased` (frozen) - `TFAutoModel` from `alberto-base-uncased` (frozen) - Concatenated outputs + dense layers - Three output heads: - `gender`: `Dense(1, activation="sigmoid")` - `ideology_binary`: `Dense(1, activation="sigmoid")` - `ideology_multiclass`: `Dense(4, activation="softmax")` ## 📥 Input The model takes **6 input tensors**: - `bert_input_ids`, `bert_token_type_ids`, `bert_attention_mask` - `alberto_input_ids`, `alberto_token_type_ids`, `alberto_attention_mask` All tensors have shape `(batch_size, max_length)`. --- ## 🚀 Usage ### Load model and tokenizers ```python from huggingface_hub import snapshot_download from transformers import TFBertModel, TFAutoModel import tensorflow as tf # Download the model locally model_path = snapshot_download("leeeov4/PIDIT") # Load the model model = tf.keras.models.load_model(model_path, custom_objects={ "TFBertModel": TFBertModel, "TFAutoModel": TFAutoModel }) # Load the tokenizers from transformers import AutoTokenizer bert_tokenizer = AutoTokenizer.from_pretrained("leeeov4/PIDIT/bert_tokenizer") alberto_tokenizer = AutoTokenizer.from_pretrained("leeeov4/PIDIT/alberto_tokenizer") ``` ### Preprocessing Example ```python def preprocess_text(text, max_length=250): bert_tokens = bert_tokenizer(text, max_length=max_length, padding='max_length', truncation=True, return_tensors='tf') alberto_tokens = alberto_tokenizer(text, max_length=max_length, padding='max_length', truncation=True, return_tensors='tf') return { 'bert_input_ids': bert_tokens['input_ids'], 'bert_token_type_ids': bert_tokens['token_type_ids'], 'bert_attention_mask': bert_tokens['attention_mask'], 'alberto_input_ids': alberto_tokens['input_ids'], 'alberto_token_type_ids': alberto_tokens['token_type_ids'], 'alberto_attention_mask': alberto_tokens['attention_mask'] } ``` ### Inference ```python text = "Oggi, sabato 31 dicembre, alle ore 9.34, nel Monastero Mater Ecclesiae in Vaticano, il Signore ha chiamato a Sé il Santo Padre Emerito Benedetto XVI." inputs = preprocess_text(text) outputs = model.predict(inputs) gender_prob = outputs[0][0][0] ideology_binary_prob = outputs[1][0][0] ideology_multiclass_probs = outputs[2][0] print("Predicted gender (male probability):", gender_prob) print("Predicted binary ideology (left probability):", ideology_binary_prob) print("Multiclass ideology distribution (left, right, moderate left, moderate right):", ideology_multiclass_probs) ```