Telugu
Raj411 commited on
Commit
d359910
·
verified ·
1 Parent(s): 4890177

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +184 -3
README.md CHANGED
@@ -1,3 +1,184 @@
1
- ---
2
- license: cc-by-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-4.0
3
+ datasets:
4
+ - DSL-13-SRMAP/TeSent_Benchmark-Dataset
5
+ language:
6
+ - te
7
+ ---
8
+ # Multilingual Sentiment Classification & Explanation Pipeline
9
+
10
+ This repository provides a full pipeline for training, tuning, and evaluating multilingual sentiment classification models (with a focus on Telugu text and Indian languages) using both standard and rationale-supervised approaches. The pipeline employs human-annotated rationales and the FERRET framework to assess model explanations for both **faithfulness** and **plausibility**.
11
+
12
+ ---
13
+
14
+ ## Table of Contents
15
+
16
+ - [Project Overview](#project-overview)
17
+ - [Dataset Format](#dataset-format)
18
+ - [Model Selection](#model-selection)
19
+ - [Pipeline Steps](#pipeline-steps)
20
+ - [1. Hyperparameter Tuning](#1-hyperparameter-tuning)
21
+ - [2. Model Training](#2-model-training)
22
+ - [3. FERRET Faithfulness Evaluation](#3-ferret-faithfulness-evaluation)
23
+ - [4. FERRET Plausibility Evaluation](#4-ferret-plausibility-evaluation)
24
+ - [Metric Aggregation](#metric-aggregation)
25
+ - [How to Run](#how-to-run)
26
+ - [Outputs](#outputs)
27
+ - [Citation](#citation)
28
+ - [Contact](#contact)
29
+
30
+ ---
31
+
32
+ ## Project Overview
33
+
34
+ This pipeline supports:
35
+
36
+ - **Hyperparameter tuning** for both attention-supervised (with rationale) and standard (without rationale) models.
37
+ - **Model training** for both approaches.
38
+ - **Faithfulness evaluation** using FERRET to measure how well explanations justify model predictions.
39
+ - **Plausibility evaluation** using FERRET to measure how closely model explanations align with human rationales.
40
+ - **Metric aggregation** for reporting in papers, using annotator-wise and sentence-wise averages.
41
+
42
+ ---
43
+
44
+ ## Dataset Format
45
+
46
+ The dataset must be in CSV format, with the following columns:
47
+
48
+ | Content | Annotations | Rationale | Label |
49
+ |---------|-------------|-----------|-------|
50
+ | Text (Telugu/Indian) | Annotators' sentiment labels (pipe-separated) | Rationale spans (pipe-separated, comma-separated) | Final label |
51
+
52
+ **Example:**
53
+
54
+ | Content | Annotations | Rationale | Label |
55
+ |---------|-------------|-----------|-------|
56
+ | గేలుపు దీశగా అందరికీ అదరగొట్టిన అక్క | Positive\|Positive\|Neutral | గేలుపు,దీశగా,అదరగొట్టిన\|గేలుపు\| | Positive |
57
+
58
+ ---
59
+
60
+ ## Model Selection
61
+
62
+ Models considered for training and evaluation:
63
+
64
+ 1. **bert-base-multilingual-cased** (used for tuning and baseline)
65
+ 2. **ai4bharat/IndicBERTv2-MLM-only**
66
+ 3. **google/muril-base-cased**
67
+ 4. **FacebookAI/xlm-roberta-base**
68
+ 5. **l3cube-pune/telugu-bert**
69
+
70
+ ---
71
+
72
+ ## Pipeline Steps
73
+
74
+ ### 1. Hyperparameter Tuning
75
+
76
+ **Scripts:**
77
+ - With rationale: `hyperparameter_tuning_for_rationale.py`
78
+ - Without rationale: `hyperparameter_tuning_without_rationale.py`
79
+
80
+ - Grid search over learning rate, batch size, and (for rationale models) rationale loss weight (`lambda`).
81
+ - Conducted separately for models trained **with** and **without** human rationale supervision.
82
+ - Results are saved as CSVs with detailed metrics for each configuration.
83
+
84
+ ### 2. Model Training
85
+
86
+ **Scripts:**
87
+ - With rationale: `model_training_with_rationale.py`
88
+ - Without rationale: `model_training_without_rationale.py`
89
+
90
+ - Trains models using selected hyperparameters from tuning.
91
+ - Both approaches (with and without rationale supervision) are supported.
92
+ - Trained models and tokenizers are saved for downstream evaluation.
93
+
94
+ ### 3. FERRET Faithfulness Evaluation
95
+
96
+ **Script:** `ferret_faithfullness.py`
97
+ **Input:** Predictions and explanations from trained models.
98
+
99
+ - Runs model prediction on the test set.
100
+ - Retains only "matched" samples (where prediction equals ground-truth label).
101
+ - Generates and evaluates FERRET explanations for faithfulness:
102
+ - Faithfulness metrics reflect how well the explanation supports the model's own prediction.
103
+ - **Metric aggregation:**
104
+ - The average of each faithfulness metric **over all sentences** gives the value reported in papers.
105
+
106
+ **Output:** `<model_name>_ferret_matched.csv` (faithfulness metrics per sentence).
107
+
108
+ ### 4. FERRET Plausibility Evaluation
109
+
110
+ **Script:** `ferret_plausibility.py`
111
+ **Input:** Output file from Step 3 (`<model_name>_ferret_matched.csv`).
112
+
113
+ - For each matched sample:
114
+ - Generates attention vectors from human rationales (for each annotator).
115
+ - Evaluates FERRET explanations for plausibility against each annotator's rationale using metrics such as AUPRC, token-wise F1, and IoU.
116
+ - **Metric aggregation:**
117
+ - For each metric, average **over all annotators and all sentences** is computed.
118
+ - These averages are the plausibility scores presented in papers.
119
+
120
+ **Output:** `<model_name>_ferret_plausibility.csv` (plausibility metrics per sentence and annotator).
121
+
122
+ ---
123
+
124
+ ## Metric Aggregation
125
+
126
+ - **Faithfulness Metrics:**
127
+ - For each metric in `<model_name>_ferret_matched.csv`, compute the average **across all sentences**.
128
+ - These are reported as overall faithfulness scores.
129
+
130
+ - **Plausibility Metrics:**
131
+ - For each metric in `<model_name>_ferret_plausibility.csv`, compute the average **across all annotators and all sentences**.
132
+ - These are reported as overall plausibility scores (per metric).
133
+
134
+ ---
135
+
136
+ ## How to Run
137
+
138
+ 1. **Prepare dataset:** Format train, validation, and test CSVs as described above.
139
+ 2. **Add emoji vocabulary:** Place `emoji.csv` in the project root.
140
+ 3. **Hyperparameter tuning:**
141
+ ```bash
142
+ python hyperparameter_tuning_for_rationale.py
143
+ python hyperparameter_tuning_without_rationale.py
144
+ ```
145
+ 4. **Train final models:**
146
+ ```bash
147
+ python model_training_with_rationale.py
148
+ python model_training_without_rationale.py
149
+ ```
150
+ 5. **FERRET Faithfulness evaluation:**
151
+ ```bash
152
+ python ferret_faithfullness.py
153
+ ```
154
+ 6. **FERRET Plausibility evaluation:**
155
+ ```bash
156
+ python ferret_plausibility.py
157
+ ```
158
+
159
+ *Edit script configs (model names, paths, batch sizes) as needed.*
160
+
161
+ ---
162
+
163
+ ## Outputs
164
+
165
+ - **Hyperparameter tuning results:** `grid_results_detailed.csv`
166
+ - **Model training:** Model weights, tokenizer, and metric CSVs.
167
+ - **Faithfulness metrics:** `<model_name>_ferret_matched.csv`
168
+ - **Plausibility metrics:** `<model_name>_ferret_plausibility.csv`
169
+ - **Test metrics & predictions:** `overall_test_metrics.csv`, `labelwise_test_metrics.csv`, `test_predictions.csv`, `confusion_matrix.csv`, `confusion_matrix.png`
170
+ - **Metric averages:** Compute using provided scripts or pandas for reporting.
171
+
172
+ ---
173
+
174
+ ## Citation
175
+
176
+ If you use this pipeline, please cite the FERRET benchmark ([link](https://github.com/ferret-benchmark/ferret)) and your relevant work.
177
+
178
+ ---
179
+
180
+ ## Contact
181
+
182
+ For questions or support, contact [rajkumar411](https://github.com/rajkumar411).
183
+
184
+ ---