Daksh C Jain
fix: shorten HF Space short_description to under 60 chars
bb50b0a
---
title: EIS Topic Intelligence
sdk: gradio
sdk_version: "5.25.2"
app_file: app.py
pinned: true
license: mit
short_description: EIS topic modelling with LLM council validation
---
# EIS Topic Intelligence
SPJIMR Research Analytics β€” Topic modelling pipeline for the **Enterprise Information Systems** journal corpus.
## What It Does
- Loads a Scopus-exported CSV (needs `Title` and `Abstract` columns minimum).
- Builds paper-level embeddings from `Title + Abstract` using **SPECTER2** transformer model; falls back to TF-IDF + SVD if transformers are unavailable.
- Runs **UMAP + HDBSCAN** parameter optimization targeting 15–25 crisp clusters with 5–100 papers per cluster.
- Falls back to KMeans only if density clustering cannot meet the required range.
- Labels each cluster through a **3-member LLM council**:
- Three Mistral council personas when `MISTRAL_API_KEY` is configured (live LLM mode).
- Deterministic keyword/PAJAIS/local semantic fallback when no key is set β€” app still runs end to end.
- Maps clusters to the **25 PAJAIS IS-research categories**.
- Exports TCCM/computational-technique validation for the top-cited 100 papers.
- Provides a **Compliance tab** showing PASS / CONFIG_REQUIRED / INPUT_REQUIRED / MANUAL_REQUIRED for each requirement.
## Main Deliverables
- `outputs/comparison.csv` β€” All clusters with labels, PAJAIS category, confidence, agreement
- `outputs/taxonomy_map.json` β€” PAJAIS taxonomy mapping + gap analysis
- `outputs/topic_model_report.md` β€” Full markdown report
- `outputs/narrative.txt` β€” Narrative summary
- `outputs/cluster_optimization_log.csv` β€” All UMAP/HDBSCAN parameter trials + scores
- `outputs/llm_council_validation.csv` β€” Per-cluster council vote evidence
- `outputs/tccm_validation.csv` β€” Top-100 cited papers with theory/method extraction
- `outputs/compliance_checklist.csv` β€” Professor requirement compliance
- `outputs/run_metadata.json` β€” Embedding model + selected parameters
- `outputs/combined_labels.json` β€” Full cluster data with keywords and titles
## Run Locally
```bash
pip install -r requirements.txt
python app.py
```
Open the Gradio URL and click **β–Ά Run Complete Pipeline** after uploading your Scopus CSV.
For command-line generation (no UI):
```bash
python run_pipeline.py path/to/scopus.csv
```
## LLM Council Setup
Set `MISTRAL_API_KEY` as a Space secret (or in a local `.env` file) to activate live 3-LLM council labelling. The app runs fully without it using deterministic fallback.