Spaces:
Runtime error
Runtime error
Update Docs.txt
Browse files
Docs.txt
CHANGED
|
@@ -1,49 +1,76 @@
|
|
| 1 |
-
|
| 2 |
|
| 3 |
-
|
| 4 |
|
| 5 |
-
|
| 6 |
|
| 7 |
-
|
|
|
|
|
|
|
|
|
|
| 8 |
|
| 9 |
-
|
| 10 |
|
| 11 |
⸻
|
| 12 |
|
| 13 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 14 |
|
| 15 |
-
|
|
|
|
|
|
|
| 16 |
|
| 17 |
-
|
| 18 |
-
•
|
| 19 |
-
•
|
| 20 |
-
•
|
| 21 |
-
• Or, if the complaint does not align well with existing topics, forms a new topic cluster dynamically.
|
| 22 |
|
| 23 |
-
|
| 24 |
|
| 25 |
⸻
|
| 26 |
|
| 27 |
-
|
| 28 |
|
| 29 |
-
|
|
|
|
|
|
|
|
|
|
| 30 |
|
| 31 |
-
|
| 32 |
-
• As new complaints mentioning phrases like “fake app,” “OTP stolen,” or “link sent on SMS” come in, the system will cluster them into a new, distinct topic.
|
| 33 |
-
• Within a few hours or days, this cluster grows in size and is detected as an emerging issue on the dashboard.
|
| 34 |
-
• Management can then quickly investigate the trend, issue customer advisories, and initiate app takedown procedures, mitigating customer losses early.
|
| 35 |
|
| 36 |
⸻
|
| 37 |
|
| 38 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 39 |
|
| 40 |
-
|
| 41 |
-
• Automatic triaging of complaints to relevant teams,
|
| 42 |
-
• Alert generation for anomalous topic spikes (potential fraud),
|
| 43 |
-
• Resolution status tracking per topic cluster to analyze policy impact.
|
| 44 |
|
| 45 |
⸻
|
| 46 |
|
| 47 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 48 |
|
| 49 |
-
|
|
|
|
| 1 |
+
Certainly! Here’s a Results section for Benchmarking, Evaluation, and Model Selection, written in a formal and analytical report format:
|
| 2 |
|
| 3 |
+
⸻
|
| 4 |
|
| 5 |
+
5. Benchmarking, Evaluation, and Model Selection
|
| 6 |
|
| 7 |
+
To determine the most effective topic modeling technique for the SBI UPI complaint corpus, four different models were benchmarked on the July 2023 complaint dataset: Latent Semantic Analysis (LSA), Non-Negative Matrix Factorization (NMF), Latent Dirichlet Allocation (LDA), and BERTopic. These models were evaluated using three standard metrics:
|
| 8 |
+
• Topic Coherence: Measures the semantic similarity between top keywords of a topic.
|
| 9 |
+
• Topic Diversity: Measures the uniqueness of the top words across all topics.
|
| 10 |
+
• Topic Exclusivity: Measures how uniquely a word belongs to a single topic (low overlap across topics).
|
| 11 |
|
| 12 |
+
The following summarizes the performance of each model:
|
| 13 |
|
| 14 |
⸻
|
| 15 |
|
| 16 |
+
5.1 Latent Semantic Analysis (LSA)
|
| 17 |
+
|
| 18 |
+
LSA was evaluated over a topic range of 1 to 100 to identify an optimal number of topics. Based on the inflection point in the coherence curve, 15 topics were selected. The evaluation scores were:
|
| 19 |
+
• Coherence Score: 0.4616
|
| 20 |
+
• Diversity Score: 0.3000
|
| 21 |
+
• Exclusivity Score: 0.0530
|
| 22 |
+
|
| 23 |
+
While LSA was computationally efficient, the low exclusivity and diversity scores indicated substantial overlap in topic content, making the results less interpretable for business insights.
|
| 24 |
|
| 25 |
+
⸻
|
| 26 |
+
|
| 27 |
+
5.2 Non-Negative Matrix Factorization (NMF)
|
| 28 |
|
| 29 |
+
NMF outperformed LSA in both interpretability and distinctiveness of topic clusters:
|
| 30 |
+
• Coherence Score: 0.6693
|
| 31 |
+
• Diversity Score: 0.5100
|
| 32 |
+
• Exclusivity Score: 0.3100
|
|
|
|
| 33 |
|
| 34 |
+
The higher scores indicated that NMF provided more meaningful topics with reduced word overlap, although semantic similarity between them remained a limitation due to its reliance on bag-of-words representations.
|
| 35 |
|
| 36 |
⸻
|
| 37 |
|
| 38 |
+
5.3 Latent Dirichlet Allocation (LDA)
|
| 39 |
|
| 40 |
+
Multiple hyperparameter configurations were tested for LDA, including unigram, bigram, and trigram models. After rigorous tuning, the unigram model was found to perform the best:
|
| 41 |
+
• Coherence Score: 0.5639
|
| 42 |
+
• Diversity Score: 0.5800
|
| 43 |
+
• Exclusivity Score: 0.5800
|
| 44 |
|
| 45 |
+
Although LDA maintained a good balance between coherence and diversity, the quality of topics was still constrained by its inability to understand contextual semantics, especially in short or repetitive complaint data.
|
|
|
|
|
|
|
|
|
|
| 46 |
|
| 47 |
⸻
|
| 48 |
|
| 49 |
+
5.4 BERTopic
|
| 50 |
+
|
| 51 |
+
The BERTopic model was configured with a minimum topic size of 100 and top_n_words = 10. It produced the best results across all three metrics:
|
| 52 |
+
• Coherence Score: 0.7300
|
| 53 |
+
• Diversity Score: 0.9930
|
| 54 |
+
• Exclusivity Score: 0.9924
|
| 55 |
|
| 56 |
+
In addition to high evaluation scores, BERTopic’s use of transformer-based document embeddings, followed by density-based clustering (DBSCAN) and class-wise TF-IDF scoring, allowed for automatic topic discovery and grouping of semantically similar complaints. This provided a qualitatively richer and more interpretable topic structure compared to TF-IDF-based models.
|
|
|
|
|
|
|
|
|
|
| 57 |
|
| 58 |
⸻
|
| 59 |
|
| 60 |
+
5.5 Model Selection Rationale
|
| 61 |
+
|
| 62 |
+
Based on both quantitative evaluation and qualitative assessment of topic coherence and semantic similarity, BERTopic was selected as the final topic modeling technique for processing and analyzing the full SBI UPI complaint dataset.
|
| 63 |
+
|
| 64 |
+
In summary:
|
| 65 |
+
|
| 66 |
+
Model Coherence Diversity Exclusivity
|
| 67 |
+
LSA 0.4616 0.3000 0.0530
|
| 68 |
+
NMF 0.6693 0.5100 0.3100
|
| 69 |
+
LDA (Unigram) 0.5639 0.5800 0.5800
|
| 70 |
+
BERTopic 0.7300 0.9930 0.9924
|
| 71 |
+
|
| 72 |
+
The BERTopic model’s superior scores, along with its ability to model topics without predefining their number and adapt dynamically to new data, makes it a highly suitable choice for customer complaint analysis in dynamic domains like digital payments.
|
| 73 |
+
|
| 74 |
+
⸻
|
| 75 |
|
| 76 |
+
Let me know if you’d like a visual plot or graph version of these results (like a bar chart or radar chart for scores).
|