Tesent_code_suite / README.md

Update README.md

47ef197 verified 5 months ago

6.39 kB

	---
	license: cc-by-4.0
	datasets:
	- DSL-13-SRMAP/TeSent_Benchmark-Dataset
	language:
	- te
	---
	# Multilingual Sentiment Classification & Explanation Pipeline

	This repository provides a full pipeline for training, tuning, and evaluating multilingual sentiment classification models (with a focus on Telugu text and Indian languages) using both standard and rationale-supervised approaches. The pipeline employs human-annotated rationales and the FERRET framework to assess model explanations for both faithfulness and plausibility.

	---

	## Table of Contents

	- [Project Overview](#project-overview)
	- [Dataset Format](#dataset-format)
	- [Model Selection](#model-selection)
	- [Pipeline Steps](#pipeline-steps)
	- [1. Hyperparameter Tuning](#1-hyperparameter-tuning)
	- [2. Model Training](#2-model-training)
	- [3. FERRET Faithfulness Evaluation](#3-ferret-faithfulness-evaluation)
	- [4. FERRET Plausibility Evaluation](#4-ferret-plausibility-evaluation)
	- [Metric Aggregation](#metric-aggregation)
	- [How to Run](#how-to-run)
	- [Outputs](#outputs)
	- [Citation](#citation)
	- [Contact](#contact)

	---

	## Project Overview

	This pipeline supports:

	- Hyperparameter tuning for both attention-supervised (with rationale) and standard (without rationale) models.
	- Model training for both approaches.
	- Faithfulness evaluation using FERRET to measure how well explanations justify model predictions.
	- Plausibility evaluation using FERRET to measure how closely model explanations align with human rationales.
	- Metric aggregation for reporting in papers, using annotator-wise and sentence-wise averages.

	---

	## Dataset Format

	The dataset must be in CSV format, with the following columns:

	\| Content \| Annotations \| Rationale \| Label \|
	\|---------\|-------------\|-----------\|-------\|
	\| Text (Telugu/Indian) \| Annotators' sentiment labels (pipe-separated) \| Rationale spans (pipe-separated, comma-separated) \| Final label \|

	Example:

	\| Content \| Annotations \| Rationale \| Label \|
	\|---------\|-------------\|-----------\|-------\|
	\| గేలుపు దీశగా అందరికీ అదరగొట్టిన అక్క \| Positive\\|Positive\\|Neutral \| గేలుపు,దీశగా,అదరగొట్టిన\\|గేలుపు\\| \| Positive \|

	---

	## Model Selection

	Models considered for training and evaluation:

	1. bert-base-multilingual-cased (used for tuning and baseline)
	2. ai4bharat/IndicBERTv2-MLM-only
	3. google/muril-base-cased
	4. FacebookAI/xlm-roberta-base
	5. l3cube-pune/telugu-bert

	---

	## Pipeline Steps

	### 1. Hyperparameter Tuning

	Scripts:
	- With rationale: `hyperparameter_tuning_for_rationale.py`
	- Without rationale: `hyperparameter_tuning_without_rationale.py`

	- Grid search over learning rate, batch size, and (for rationale models) rationale loss weight (`lambda`).
	- Conducted separately for models trained with and without human rationale supervision.
	- Results are saved as CSVs with detailed metrics for each configuration.

	### 2. Model Training

	Scripts:
	- With rationale: `model_training_with_rationale.py`
	- Without rationale: `model_training_without_rationale.py`

	- Trains models using selected hyperparameters from tuning.
	- Both approaches (with and without rationale supervision) are supported.
	- Trained models and tokenizers are saved for downstream evaluation.

	### 3. FERRET Faithfulness Evaluation

	Script: `ferret_faithfullness.py`
	Input: Predictions and explanations from trained models.

	- Runs model prediction on the test set.
	- Retains only "matched" samples (where prediction equals ground-truth label).
	- Generates and evaluates FERRET explanations for faithfulness:
	- Faithfulness metrics reflect how well the explanation supports the model's own prediction.
	- Metric aggregation:
	- The average of each faithfulness metric over all sentences gives the value reported in papers.

	Output: `<model_name>_ferret_matched.csv` (faithfulness metrics per sentence).

	### 4. FERRET Plausibility Evaluation

	Script: `ferret_plausibility.py`
	Input: Output file from Step 3 (`<model_name>_ferret_matched.csv`).

	- For each matched sample:
	- Generates attention vectors from human rationales (for each annotator).
	- Evaluates FERRET explanations for plausibility against each annotator's rationale using metrics such as AUPRC, token-wise F1, and IoU.
	- Metric aggregation:
	- For each metric, average over all annotators and all sentences is computed.
	- These averages are the plausibility scores presented in papers.

	Output: `<model_name>_ferret_plausibility.csv` (plausibility metrics per sentence and annotator).

	---

	## Metric Aggregation

	- Faithfulness Metrics:
	- For each metric in `<model_name>_ferret_matched.csv`, compute the average across all sentences.
	- These are reported as overall faithfulness scores.

	- Plausibility Metrics:
	- For each metric in `<model_name>_ferret_plausibility.csv`, compute the average across all annotators and all sentences.
	- These are reported as overall plausibility scores (per metric).

	---

	## How to Run

	1. Prepare dataset: Format train, validation, and test CSVs as described above.
	2. Add emoji vocabulary: Place `emoji.csv` in the project root.
	3. Hyperparameter tuning:
	```bash
	python hyperparameter_tuning_for_rationale.py
	python hyperparameter_tuning_without_rationale.py
	```
	4. Train final models:
	```bash
	python model_training_with_rationale.py
	python model_training_without_rationale.py
	```
	5. FERRET Faithfulness evaluation:
	```bash
	python ferret_faithfullness.py
	```
	6. FERRET Plausibility evaluation:
	```bash
	python ferret_plausibility.py
	```

	Edit script configs (model names, paths, batch sizes) as needed.

	---

	## Outputs

	- Hyperparameter tuning results: `grid_results_detailed.csv`
	- Model training: Model weights, tokenizer, and metric CSVs.
	- Faithfulness metrics: `<model_name>_ferret_matched.csv`
	- Plausibility metrics: `<model_name>_ferret_plausibility.csv`
	- Test metrics & predictions: `overall_test_metrics.csv`, `labelwise_test_metrics.csv`, `test_predictions.csv`, `confusion_matrix.csv`, `confusion_matrix.png`
	- Metric averages: Compute using provided scripts or pandas for reporting.

	---