File size: 2,517 Bytes
c91d9b4
 
 
 
 
 
 
bb50b0a
c91d9b4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
---
title: EIS Topic Intelligence
sdk: gradio
sdk_version: "5.25.2"
app_file: app.py
pinned: true
license: mit
short_description: EIS topic modelling with LLM council validation
---

# EIS Topic Intelligence

SPJIMR Research Analytics β€” Topic modelling pipeline for the **Enterprise Information Systems** journal corpus.

## What It Does

- Loads a Scopus-exported CSV (needs `Title` and `Abstract` columns minimum).
- Builds paper-level embeddings from `Title + Abstract` using **SPECTER2** transformer model; falls back to TF-IDF + SVD if transformers are unavailable.
- Runs **UMAP + HDBSCAN** parameter optimization targeting 15–25 crisp clusters with 5–100 papers per cluster.
- Falls back to KMeans only if density clustering cannot meet the required range.
- Labels each cluster through a **3-member LLM council**:
  - Three Mistral council personas when `MISTRAL_API_KEY` is configured (live LLM mode).
  - Deterministic keyword/PAJAIS/local semantic fallback when no key is set β€” app still runs end to end.
- Maps clusters to the **25 PAJAIS IS-research categories**.
- Exports TCCM/computational-technique validation for the top-cited 100 papers.
- Provides a **Compliance tab** showing PASS / CONFIG_REQUIRED / INPUT_REQUIRED / MANUAL_REQUIRED for each requirement.

## Main Deliverables

- `outputs/comparison.csv` β€” All clusters with labels, PAJAIS category, confidence, agreement
- `outputs/taxonomy_map.json` β€” PAJAIS taxonomy mapping + gap analysis
- `outputs/topic_model_report.md` β€” Full markdown report
- `outputs/narrative.txt` β€” Narrative summary
- `outputs/cluster_optimization_log.csv` β€” All UMAP/HDBSCAN parameter trials + scores
- `outputs/llm_council_validation.csv` β€” Per-cluster council vote evidence
- `outputs/tccm_validation.csv` β€” Top-100 cited papers with theory/method extraction
- `outputs/compliance_checklist.csv` β€” Professor requirement compliance
- `outputs/run_metadata.json` β€” Embedding model + selected parameters
- `outputs/combined_labels.json` β€” Full cluster data with keywords and titles

## Run Locally

```bash
pip install -r requirements.txt
python app.py
```

Open the Gradio URL and click **β–Ά Run Complete Pipeline** after uploading your Scopus CSV.

For command-line generation (no UI):

```bash
python run_pipeline.py path/to/scopus.csv
```

## LLM Council Setup

Set `MISTRAL_API_KEY` as a Space secret (or in a local `.env` file) to activate live 3-LLM council labelling. The app runs fully without it using deterministic fallback.