A newer version of the Gradio SDK is available: 6.19.0
title: BERTopic Agentic Topic Modelling
emoji: π§
colorFrom: blue
colorTo: indigo
sdk: gradio
app_file: app.py
pinned: false
π¬ BERTopic Agentic Topic Modelling
Computational Thematic Analysis powered by Braun & Clarke (2006)
π Overview
BERTopic Agentic Topic Modelling is a state-of-the-art research tool designed to automate and enhance the process of Thematic Analysis for academic literature. By integrating BERTopic's transformer-based clustering with a LangGraph-driven agentic workflow, this application guides researchers through the rigorous 6-phase framework of Braun & Clarke (2006).
It doesn't just cluster text; it reasons about it. Featuring a unique "AI Council" where multiple Large Language Models (Mistral & Groq) debate and reach consensus on topic labels, the tool ensures high-fidelity, publishable results.
π§ Theoretical Foundation: Braun & Clarke (2006)
This tool is strictly mapped to the six phases of thematic analysis as defined in the seminal work:
- Familiarisation with data: Automatic cleaning, boilerplate removal, and dataset profiling.
- Generating initial codes: BERTopic discovery and AI-assisted initial labeling.
- Searching for themes: LLM-driven consolidation of topics into overarching themes.
- Reviewing potential themes: Saturation checks and coverage analysis.
- Defining and naming themes: Generation of academic definitions and core narratives.
- Producing the report: Narrative writing (Section 7 draft) and PAJAIS taxonomy mapping.
β¨ Key Features
- π€ Agentic Workflow: A LangGraph agent manages the entire pipeline, maintaining memory and ensuring a step-by-step scientific process.
- βοΈ AI Council: Real-time debates between Mistral-Large and Llama-3 (Groq) to determine the most accurate thematic labels.
- π Dynamic Visualizations: 8+ interactive Plotly charts (Intertopic maps, Frequency bars, Heatmaps, Treemaps, and DBSCAN scatter plots).
- π‘οΈ Multi-Model Analysis: Run separate analyses on Abstracts vs. Titles and generate a side-by-side convergence CSV.
- π Density Refinement: Optional DBSCAN clustering to complement traditional hierarchical methods and handle noise points elegantly.
- π·οΈ PAJAIS Taxonomy Mapping: Automated gap analysis by mapping themes to the standard 25 PAJAIS Information Systems categories.
- π₯ One-Click Export: Download structured JSON, side-by-side CSVs, PNG charts, and a 500-word academic narrative report.
π οΈ Architecture
graph TD
A[Scopus CSV Upload] --> B{Agentic Workflow}
B -->|Phase 1| C[Data Loading & Cleaning]
C -->|Phase 2| D[BERTopic / DBSCAN Discovery]
D --> E[AI Council Labeling]
E -->|Phase 3| F[Theme Consolidation]
F -->|Phase 4| G[Saturation Check]
G -->|Phase 5| H[Definition & Naming]
H -->|Phase 5.5| I[PAJAIS Taxonomy Mapping]
I -->|Phase 6| J[Report Generation]
subgraph "AI Council"
E1[Mistral-Large] <--> E2[Groq Llama-3]
end
subgraph "Outputs"
J --> K[narrative.txt]
J --> L[comparison.csv]
J --> M[Interactive Charts]
end
π₯οΈ App Navigation & Expected UI
The interface is divided into three logical zones for a streamlined user experience:
1. Control Center (Top & Left)
- Phase Progress Bar: A visual indicator of your progress through Braun & Clarkeβs 6 phases.
- Data Input (Left): The upload zone for your Scopus CSV. Once uploaded, Phase 1 triggers automatically.
2. The Agent Laboratory (Center)
- Chatbot Interface: Your main point of interaction. The agent will ask questions, provide stats, and guide you. You can type commands like "run abstract" or "Continue".
- AI Council Feedback: Every time a label is generated, look for the reasoning block. It shows the consensus score between models.
3. Results Dashboard (Bottom Tabs)
- π Review Table: The "Heart" of the app. This is where you approve, rename, and refine the AI's findings. You MUST click "Submit Review" to move past STOP GATES.
- π Charts Tab: Switch between Intertopic Map, Frequency Bars, Hierarchy (Treemap), and Similarity Heatmap.
- βοΈ AI Council Tab: A dedicated view showing the full transcript of debates between Mistral and Groq.
- πΎ Download Tab: Your final repository. All files are generated in real-time and appear here for one-click downloading.
π€ Expected Output Preview
- In Chat: Summary tables, saturation percentages (e.g., "92.4% Coverage"), and phase completion checkmarks.
- In Files:
narrative.txt: Academic prose with structured headings.comparison.csv: Columns forAbstract Theme,Title Theme, andConvergence(marked with β).taxonomy_map.json: A mapping showing each theme's link to the PAJAIS framework and its Novelty score.
1. Prerequisites
- Python 3.9+
- API Keys for Mistral AI and Groq (optional but recommended for the Council feature).
2. Installation
Clone the repository and install the dependencies:
# Clone the repo
git clone https://github.com/ShivamKadam63s/BERT_Topic_Modelling.git
cd BERT_Topic_Modelling
# Install dependencies
pip install -r requirements.txt
3. Environment Setup
Create a .env file or export your API keys in your terminal:
$env:MISTRAL_API_KEY="your_mistral_key"
$env:GROQ_API_KEY="your_groq_key"
4. Running the App
Start the Gradio interface:
python app.py
Open your browser at http://localhost:7860.
π User Guide: Phase-by-Phase Walkthrough
Step 1: Data Input
Upload your Scopus CSV file. The agent will immediately scan the file, remove boilerplate text (Copyright notices, DOIs, etc.), and provide a dataset profile including paper counts and year ranges.
Step 2: Discovery & Coding
- Click "run abstract" or "run title".
- The system will generate clusters and invoke the AI Council.
- Navigation: Check the "βοΈ AI Council" tab to see the reasoning behind each label.
- Action: In the "π Review Table", tick Approve for clusters you accept or provide a custom name in Rename To. Click "Submit Review".
Step 3: Themes & Saturation
The agent combines approved codes into 4-8 themes. It will report Thematic Saturation (e.g., "Themes cover 92% of the corpus").
Step 4: Taxonomy Mapping
The tool automatically maps your themes to the PAJAIS Taxonomy.
- Themes marked with π NOVEL are identified as potential new research contributions not found in standard taxonomies.
Step 5: Final Report
The agent generates a 500-word Section 7 draft. Check the "πΎ Download" tab for your full suite of results.
π Expected Outputs
| Output File | Description |
|---|---|
narrative.txt |
A complete Section 7 draft following academic standards. |
comparison.csv |
Side-by-side comparison of Abstract and Title themes. |
taxonomy_map.json |
JSON mapping of themes to PAJAIS categories. |
chart_*.html |
Interactive Plotly visualizations for intertopic distance and hierarchy. |
*.png |
High-resolution static exports of all charts. |
π οΈ Built With
- Gradio: Modern UI Framework
- LangGraph: Agentic Multi-Model Workflows
- BERTopic: Advanced Topic Modeling
- Sentence-Transformers:
all-MiniLM-L6-v2embeddings - Mistral Large: Primary Reasoning LLM
- Groq (Llama-3): Secondary Council LLM
- Plotly: Dynamic Data Science Charts
βοΈ License & Citation
If you use this tool in your research, please cite: Shivam Kadam, "BERTopic Agentic Topic Modelling for Systematic Literature Reviews," 2026.
Based on: Braun, V., & Clarke, V. (2006). Using thematic analysis in psychology. Qualitative Research in Psychology, 3(2), 77-101.
Made with β€οΈ for the Research Community
