Spaces:
Sleeping
Sleeping
File size: 2,517 Bytes
c91d9b4 bb50b0a c91d9b4 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 | ---
title: EIS Topic Intelligence
sdk: gradio
sdk_version: "5.25.2"
app_file: app.py
pinned: true
license: mit
short_description: EIS topic modelling with LLM council validation
---
# EIS Topic Intelligence
SPJIMR Research Analytics β Topic modelling pipeline for the **Enterprise Information Systems** journal corpus.
## What It Does
- Loads a Scopus-exported CSV (needs `Title` and `Abstract` columns minimum).
- Builds paper-level embeddings from `Title + Abstract` using **SPECTER2** transformer model; falls back to TF-IDF + SVD if transformers are unavailable.
- Runs **UMAP + HDBSCAN** parameter optimization targeting 15β25 crisp clusters with 5β100 papers per cluster.
- Falls back to KMeans only if density clustering cannot meet the required range.
- Labels each cluster through a **3-member LLM council**:
- Three Mistral council personas when `MISTRAL_API_KEY` is configured (live LLM mode).
- Deterministic keyword/PAJAIS/local semantic fallback when no key is set β app still runs end to end.
- Maps clusters to the **25 PAJAIS IS-research categories**.
- Exports TCCM/computational-technique validation for the top-cited 100 papers.
- Provides a **Compliance tab** showing PASS / CONFIG_REQUIRED / INPUT_REQUIRED / MANUAL_REQUIRED for each requirement.
## Main Deliverables
- `outputs/comparison.csv` β All clusters with labels, PAJAIS category, confidence, agreement
- `outputs/taxonomy_map.json` β PAJAIS taxonomy mapping + gap analysis
- `outputs/topic_model_report.md` β Full markdown report
- `outputs/narrative.txt` β Narrative summary
- `outputs/cluster_optimization_log.csv` β All UMAP/HDBSCAN parameter trials + scores
- `outputs/llm_council_validation.csv` β Per-cluster council vote evidence
- `outputs/tccm_validation.csv` β Top-100 cited papers with theory/method extraction
- `outputs/compliance_checklist.csv` β Professor requirement compliance
- `outputs/run_metadata.json` β Embedding model + selected parameters
- `outputs/combined_labels.json` β Full cluster data with keywords and titles
## Run Locally
```bash
pip install -r requirements.txt
python app.py
```
Open the Gradio URL and click **βΆ Run Complete Pipeline** after uploading your Scopus CSV.
For command-line generation (no UI):
```bash
python run_pipeline.py path/to/scopus.csv
```
## LLM Council Setup
Set `MISTRAL_API_KEY` as a Space secret (or in a local `.env` file) to activate live 3-LLM council labelling. The app runs fully without it using deterministic fallback.
|