DSL-13-SRMAP
/

Tesent_code_suite

Telugu

Model card Files Files and versions

xet

Community

Raj411 commited on Aug 7, 2025

Commit

d359910

verified ·

1 Parent(s): 4890177

Update README.md

Browse files

Files changed (1) hide show

README.md +184 -3

README.md CHANGED Viewed

@@ -1,3 +1,184 @@
----
-license: cc-by-4.0
----

+---
+license: cc-by-4.0
+datasets:
+- DSL-13-SRMAP/TeSent_Benchmark-Dataset
+language:
+- te
+---
+# Multilingual Sentiment Classification & Explanation Pipeline
+This repository provides a full pipeline for training, tuning, and evaluating multilingual sentiment classification models (with a focus on Telugu text and Indian languages) using both standard and rationale-supervised approaches. The pipeline employs human-annotated rationales and the FERRET framework to assess model explanations for both **faithfulness** and **plausibility**.
+---
+## Table of Contents
+- [Project Overview](#project-overview)
+- [Dataset Format](#dataset-format)
+- [Model Selection](#model-selection)
+- [Pipeline Steps](#pipeline-steps)
+  - [1. Hyperparameter Tuning](#1-hyperparameter-tuning)
+  - [2. Model Training](#2-model-training)
+  - [3. FERRET Faithfulness Evaluation](#3-ferret-faithfulness-evaluation)
+  - [4. FERRET Plausibility Evaluation](#4-ferret-plausibility-evaluation)
+- [Metric Aggregation](#metric-aggregation)
+- [How to Run](#how-to-run)
+- [Outputs](#outputs)
+- [Citation](#citation)
+- [Contact](#contact)
+---
+## Project Overview
+This pipeline supports:
+- **Hyperparameter tuning** for both attention-supervised (with rationale) and standard (without rationale) models.
+- **Model training** for both approaches.
+- **Faithfulness evaluation** using FERRET to measure how well explanations justify model predictions.
+- **Plausibility evaluation** using FERRET to measure how closely model explanations align with human rationales.
+- **Metric aggregation** for reporting in papers, using annotator-wise and sentence-wise averages.
+---
+## Dataset Format
+The dataset must be in CSV format, with the following columns:
+| Content | Annotations | Rationale | Label |
+|---------|-------------|-----------|-------|
+| Text (Telugu/Indian) | Annotators' sentiment labels (pipe-separated) | Rationale spans (pipe-separated, comma-separated) | Final label |
+**Example:**
+| Content | Annotations | Rationale | Label |
+|---------|-------------|-----------|-------|
+| గేలుపు దీశగా అందరికీ అదరగొట్టిన అక్క | Positive\|Positive\|Neutral | గేలుపు,దీశగా,అదరగొట్టిన\|గేలుపు\| | Positive |
+---
+## Model Selection
+Models considered for training and evaluation:
+1. **bert-base-multilingual-cased** (used for tuning and baseline)
+2. **ai4bharat/IndicBERTv2-MLM-only**
+3. **google/muril-base-cased**
+4. **FacebookAI/xlm-roberta-base**
+5. **l3cube-pune/telugu-bert**
+---
+## Pipeline Steps
+### 1. Hyperparameter Tuning
+**Scripts:**
+- With rationale: `hyperparameter_tuning_for_rationale.py`
+- Without rationale: `hyperparameter_tuning_without_rationale.py`
+- Grid search over learning rate, batch size, and (for rationale models) rationale loss weight (`lambda`).
+- Conducted separately for models trained **with** and **without** human rationale supervision.
+- Results are saved as CSVs with detailed metrics for each configuration.
+### 2. Model Training
+**Scripts:**
+- With rationale: `model_training_with_rationale.py`
+- Without rationale: `model_training_without_rationale.py`
+- Trains models using selected hyperparameters from tuning.
+- Both approaches (with and without rationale supervision) are supported.
+- Trained models and tokenizers are saved for downstream evaluation.
+### 3. FERRET Faithfulness Evaluation
+**Script:** `ferret_faithfullness.py`
+**Input:** Predictions and explanations from trained models.
+- Runs model prediction on the test set.
+- Retains only "matched" samples (where prediction equals ground-truth label).
+- Generates and evaluates FERRET explanations for faithfulness:
+  - Faithfulness metrics reflect how well the explanation supports the model's own prediction.
+- **Metric aggregation:**
+  - The average of each faithfulness metric **over all sentences** gives the value reported in papers.
+**Output:** `<model_name>_ferret_matched.csv` (faithfulness metrics per sentence).
+### 4. FERRET Plausibility Evaluation
+**Script:** `ferret_plausibility.py`
+**Input:** Output file from Step 3 (`<model_name>_ferret_matched.csv`).
+- For each matched sample:
+  - Generates attention vectors from human rationales (for each annotator).
+  - Evaluates FERRET explanations for plausibility against each annotator's rationale using metrics such as AUPRC, token-wise F1, and IoU.
+- **Metric aggregation:**
+  - For each metric, average **over all annotators and all sentences** is computed.
+  - These averages are the plausibility scores presented in papers.
+**Output:** `<model_name>_ferret_plausibility.csv` (plausibility metrics per sentence and annotator).
+---
+## Metric Aggregation
+- **Faithfulness Metrics:**
+  - For each metric in `<model_name>_ferret_matched.csv`, compute the average **across all sentences**.
+  - These are reported as overall faithfulness scores.
+- **Plausibility Metrics:**
+  - For each metric in `<model_name>_ferret_plausibility.csv`, compute the average **across all annotators and all sentences**.
+  - These are reported as overall plausibility scores (per metric).
+---
+## How to Run
+1. **Prepare dataset:** Format train, validation, and test CSVs as described above.
+2. **Add emoji vocabulary:** Place `emoji.csv` in the project root.
+3. **Hyperparameter tuning:**
+   ```bash
+   python hyperparameter_tuning_for_rationale.py
+   python hyperparameter_tuning_without_rationale.py
+   ```
+4. **Train final models:**
+   ```bash
+   python model_training_with_rationale.py
+   python model_training_without_rationale.py
+   ```
+5. **FERRET Faithfulness evaluation:**
+   ```bash
+   python ferret_faithfullness.py
+   ```
+6. **FERRET Plausibility evaluation:**
+   ```bash
+   python ferret_plausibility.py
+   ```
+*Edit script configs (model names, paths, batch sizes) as needed.*
+---
+## Outputs
+- **Hyperparameter tuning results:** `grid_results_detailed.csv`
+- **Model training:** Model weights, tokenizer, and metric CSVs.
+- **Faithfulness metrics:** `<model_name>_ferret_matched.csv`
+- **Plausibility metrics:** `<model_name>_ferret_plausibility.csv`
+- **Test metrics & predictions:** `overall_test_metrics.csv`, `labelwise_test_metrics.csv`, `test_predictions.csv`, `confusion_matrix.csv`, `confusion_matrix.png`
+- **Metric averages:** Compute using provided scripts or pandas for reporting.
+---
+## Citation
+If you use this pipeline, please cite the FERRET benchmark ([link](https://github.com/ferret-benchmark/ferret)) and your relevant work.
+---
+## Contact
+For questions or support, contact [rajkumar411](https://github.com/rajkumar411).
+---