Daksh C Jain
Fix invalid colorTo metadata in README.md
fe59a4d

A newer version of the Gradio SDK is available: 6.15.1

Upgrade
metadata
title: Topic Modelling Agentic AI
emoji: πŸ”¬
colorFrom: indigo
colorTo: purple
sdk: gradio
app_file: app.py
pinned: false

πŸ”¬ Topic Modelling Agentic AI

A professional, agent-driven platform for automated Reflexive Thematic Analysis (Braun & Clarke, 2006) using state-of-the-art Natural Language Processing. Built with LangGraph, BERTopic, and Mistral AI, this agent automates the discovery, labeling, and synthesis of research topics from large-scale academic datasets (e.g., Scopus CSV exports).


πŸš€ Overview

This project implements a sophisticated "Golden Thread" pipeline for qualitative research. It moves beyond traditional keyword extraction by using sentence-level embeddings and LLM-powered context awareness to identify nuanced themes.

Key Features

  • Agentic Workflow: Powered by LangGraph, the agent autonomously decides when to load data, run clustering, or call the LLM for labeling.
  • Precision Clustering: Uses BERTopic with Agglomerative Clustering (Cosine similarity) on 384d sentence embeddings (all-MiniLM-L6-v2).
  • Human-in-the-Loop: An interactive Gradio UI allows researchers to review, rename, or reject agent-generated topics before final synthesis.
  • Automated Synthesis: Generates a 500-word research narrative and maps themes to established taxonomies (e.g., PAJAIS).
  • Rich Visualizations: Interactive Plotly charts including Intertopic Distance Maps, Hierarchical Clustering, and Heatmaps.

πŸ› οΈ Technology Stack

  • Framework: LangGraph (Agentic logic & state management)
  • Engine: BERTopic (Topic Modeling pipeline)
  • LLM: Mistral AI (mistral-small-latest)
  • Embeddings: sentence-transformers/all-MiniLM-L6-v2
  • UI: Gradio 5.x
  • Data: Pandas, NumPy, Scikit-Learn

πŸ“‹ Methodology

The agent follows the Braun & Clarke (2006) six-phase thematic analysis framework:

  1. Familiarization: Loading and preprocessing Scopus CSV metadata.
  2. Initial Coding: Sentence-level clustering to identify "semantic atoms."
  3. Searching for Themes: Aggregating clusters into broader research themes.
  4. Reviewing Themes: Researcher validation via the Review Table.
  5. Defining and Naming: Refined LLM labeling based on centroid-nearest evidence.
  6. Producing the Report: Exporting narrative sections and comparison matrices.

πŸ’» Setup & Installation

Prerequisites

  • Python 3.10+
  • Mistral AI API Key

Installation

  1. Clone the repository:

    git clone https://github.com/your-repo/topic-modelling-agent.git
    cd topic-modelling-agent
    
  2. Install dependencies:

    pip install -r requirements.txt
    
  3. Configure environment: Create a .env file in the root directory:

    MISTRAL_API_KEY=your_api_key_here
    
  4. Run the application:

    python app.py
    

πŸ“– Usage

  1. Upload Data: Drag and drop a Scopus CSV export.
  2. Initialize: Type Analyze my CSV or run abstract only in the chat.
  3. Iterate: Use the chat to refine topics (e.g., group topics 5 and 10 into "Sustainability").
  4. Review: Use the Review Table tab to approve or rename topics.
  5. Export: Download the generated Narrative and Comparison CSV from the Download tab.

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.