leeeov4
/

PIDIT

Text Classification

custom-multitask

multi-task-learning

gender-classification

ideology-detection

Model card Files Files and versions

PIDIT / README.md

leeeov4's picture

Update README.md

b0fca2e verified 8 months ago

|

history blame contribute delete

3.25 kB

	---
	tags:
	- tf-keras
	- bert
	- alberto
	- multi-task-learning
	- text-classification
	- italian
	- gender-classification
	- ideology-detection
	library_name: tf-keras
	language:
	- it
	datasets:
	- custom
	---

	# PIDIT: Political Ideology Detection in Italian Texts
	A Multi-Task BERT + ALBERTO Model for Gender and Ideology Prediction 🇮🇹

	This `tf.keras` model combines two pre-trained encoders — `BERT` and `ALBERTO` — to perform multi-task classification on Italian-language texts.
	It is designed to predict:

	- Author gender (binary classification)
	- Binary ideology (e.g., progressive vs conservative)
	- Multiclass ideology (4 ideological classes)

	## ✨ Architecture

	- `TFBertModel` from `bert-base-italian-uncased` (frozen)
	- `TFAutoModel` from `alberto-base-uncased` (frozen)
	- Concatenated outputs + dense layers
	- Three output heads:
	- `gender`: `Dense(1, activation="sigmoid")`
	- `ideology_binary`: `Dense(1, activation="sigmoid")`
	- `ideology_multiclass`: `Dense(4, activation="softmax")`

	## 📥 Input

	The model takes 6 input tensors:
	- `bert_input_ids`, `bert_token_type_ids`, `bert_attention_mask`
	- `alberto_input_ids`, `alberto_token_type_ids`, `alberto_attention_mask`

	All tensors have shape `(batch_size, max_length)`.

	---

	## 🚀 Usage

	### Load model and tokenizers

	```python
	from huggingface_hub import snapshot_download
	from transformers import TFBertModel, TFAutoModel
	import tensorflow as tf

	# Download the model locally
	model_path = snapshot_download("leeeov4/PIDIT")

	# Load the model
	model = tf.keras.models.load_model(model_path, custom_objects={
	"TFBertModel": TFBertModel,
	"TFAutoModel": TFAutoModel
	})

	# Load the tokenizers

	from transformers import AutoTokenizer

	bert_tokenizer = AutoTokenizer.from_pretrained("leeeov4/PIDIT/bert_tokenizer")
	alberto_tokenizer = AutoTokenizer.from_pretrained("leeeov4/PIDIT/alberto_tokenizer")
	```

	### Preprocessing Example

	```python
	def preprocess_text(text, max_length=250):
	bert_tokens = bert_tokenizer(text, max_length=max_length, padding='max_length', truncation=True, return_tensors='tf')
	alberto_tokens = alberto_tokenizer(text, max_length=max_length, padding='max_length', truncation=True, return_tensors='tf')

	return {
	'bert_input_ids': bert_tokens['input_ids'],
	'bert_token_type_ids': bert_tokens['token_type_ids'],
	'bert_attention_mask': bert_tokens['attention_mask'],
	'alberto_input_ids': alberto_tokens['input_ids'],
	'alberto_token_type_ids': alberto_tokens['token_type_ids'],
	'alberto_attention_mask': alberto_tokens['attention_mask']
	}

	```


	### Inference

	```python
	text = "Oggi, sabato 31 dicembre, alle ore 9.34, nel Monastero Mater Ecclesiae in Vaticano, il Signore ha chiamato a Sé il Santo Padre Emerito Benedetto XVI."
	inputs = preprocess_text(text)
	outputs = model.predict(inputs)

	gender_prob = outputs[0][0][0]
	ideology_binary_prob = outputs[1][0][0]
	ideology_multiclass_probs = outputs[2][0]

	print("Predicted gender (male probability):", gender_prob)
	print("Predicted binary ideology (left probability):", ideology_binary_prob)
	print("Multiclass ideology distribution (left, right, moderate left, moderate right):", ideology_multiclass_probs)


	```