Spaces:
Sleeping
A newer version of the Gradio SDK is available: 6.15.1
title: Topic Modelling Agentic AI
emoji: π¬
colorFrom: indigo
colorTo: purple
sdk: gradio
app_file: app.py
pinned: false
π¬ Topic Modelling Agentic AI
A professional, agent-driven platform for automated Reflexive Thematic Analysis (Braun & Clarke, 2006) using state-of-the-art Natural Language Processing. Built with LangGraph, BERTopic, and Mistral AI, this agent automates the discovery, labeling, and synthesis of research topics from large-scale academic datasets (e.g., Scopus CSV exports).
π Overview
This project implements a sophisticated "Golden Thread" pipeline for qualitative research. It moves beyond traditional keyword extraction by using sentence-level embeddings and LLM-powered context awareness to identify nuanced themes.
Key Features
- Agentic Workflow: Powered by LangGraph, the agent autonomously decides when to load data, run clustering, or call the LLM for labeling.
- Precision Clustering: Uses BERTopic with Agglomerative Clustering (Cosine similarity) on 384d sentence embeddings (
all-MiniLM-L6-v2). - Human-in-the-Loop: An interactive Gradio UI allows researchers to review, rename, or reject agent-generated topics before final synthesis.
- Automated Synthesis: Generates a 500-word research narrative and maps themes to established taxonomies (e.g., PAJAIS).
- Rich Visualizations: Interactive Plotly charts including Intertopic Distance Maps, Hierarchical Clustering, and Heatmaps.
π οΈ Technology Stack
- Framework: LangGraph (Agentic logic & state management)
- Engine: BERTopic (Topic Modeling pipeline)
- LLM: Mistral AI (
mistral-small-latest) - Embeddings:
sentence-transformers/all-MiniLM-L6-v2 - UI: Gradio 5.x
- Data: Pandas, NumPy, Scikit-Learn
π Methodology
The agent follows the Braun & Clarke (2006) six-phase thematic analysis framework:
- Familiarization: Loading and preprocessing Scopus CSV metadata.
- Initial Coding: Sentence-level clustering to identify "semantic atoms."
- Searching for Themes: Aggregating clusters into broader research themes.
- Reviewing Themes: Researcher validation via the Review Table.
- Defining and Naming: Refined LLM labeling based on centroid-nearest evidence.
- Producing the Report: Exporting narrative sections and comparison matrices.
π» Setup & Installation
Prerequisites
- Python 3.10+
- Mistral AI API Key
Installation
Clone the repository:
git clone https://github.com/your-repo/topic-modelling-agent.git cd topic-modelling-agentInstall dependencies:
pip install -r requirements.txtConfigure environment: Create a
.envfile in the root directory:MISTRAL_API_KEY=your_api_key_hereRun the application:
python app.py
π Usage
- Upload Data: Drag and drop a Scopus CSV export.
- Initialize: Type
Analyze my CSVorrun abstract onlyin the chat. - Iterate: Use the chat to refine topics (e.g.,
group topics 5 and 10 into "Sustainability"). - Review: Use the Review Table tab to approve or rename topics.
- Export: Download the generated Narrative and Comparison CSV from the Download tab.
π License
This project is licensed under the MIT License - see the LICENSE file for details.