--- title: EIS Topic Intelligence sdk: gradio sdk_version: "5.25.2" app_file: app.py pinned: true license: mit short_description: EIS topic modelling with LLM council validation --- # EIS Topic Intelligence SPJIMR Research Analytics — Topic modelling pipeline for the **Enterprise Information Systems** journal corpus. ## What It Does - Loads a Scopus-exported CSV (needs `Title` and `Abstract` columns minimum). - Builds paper-level embeddings from `Title + Abstract` using **SPECTER2** transformer model; falls back to TF-IDF + SVD if transformers are unavailable. - Runs **UMAP + HDBSCAN** parameter optimization targeting 15–25 crisp clusters with 5–100 papers per cluster. - Falls back to KMeans only if density clustering cannot meet the required range. - Labels each cluster through a **3-member LLM council**: - Three Mistral council personas when `MISTRAL_API_KEY` is configured (live LLM mode). - Deterministic keyword/PAJAIS/local semantic fallback when no key is set — app still runs end to end. - Maps clusters to the **25 PAJAIS IS-research categories**. - Exports TCCM/computational-technique validation for the top-cited 100 papers. - Provides a **Compliance tab** showing PASS / CONFIG_REQUIRED / INPUT_REQUIRED / MANUAL_REQUIRED for each requirement. ## Main Deliverables - `outputs/comparison.csv` — All clusters with labels, PAJAIS category, confidence, agreement - `outputs/taxonomy_map.json` — PAJAIS taxonomy mapping + gap analysis - `outputs/topic_model_report.md` — Full markdown report - `outputs/narrative.txt` — Narrative summary - `outputs/cluster_optimization_log.csv` — All UMAP/HDBSCAN parameter trials + scores - `outputs/llm_council_validation.csv` — Per-cluster council vote evidence - `outputs/tccm_validation.csv` — Top-100 cited papers with theory/method extraction - `outputs/compliance_checklist.csv` — Professor requirement compliance - `outputs/run_metadata.json` — Embedding model + selected parameters - `outputs/combined_labels.json` — Full cluster data with keywords and titles ## Run Locally ```bash pip install -r requirements.txt python app.py ``` Open the Gradio URL and click **▶ Run Complete Pipeline** after uploading your Scopus CSV. For command-line generation (no UI): ```bash python run_pipeline.py path/to/scopus.csv ``` ## LLM Council Setup Set `MISTRAL_API_KEY` as a Space secret (or in a local `.env` file) to activate live 3-LLM council labelling. The app runs fully without it using deterministic fallback.