Spaces:
Sleeping
Sleeping
| title: EIS Topic Intelligence | |
| sdk: gradio | |
| sdk_version: "5.25.2" | |
| app_file: app.py | |
| pinned: true | |
| license: mit | |
| short_description: EIS topic modelling with LLM council validation | |
| # EIS Topic Intelligence | |
| SPJIMR Research Analytics β Topic modelling pipeline for the **Enterprise Information Systems** journal corpus. | |
| ## What It Does | |
| - Loads a Scopus-exported CSV (needs `Title` and `Abstract` columns minimum). | |
| - Builds paper-level embeddings from `Title + Abstract` using **SPECTER2** transformer model; falls back to TF-IDF + SVD if transformers are unavailable. | |
| - Runs **UMAP + HDBSCAN** parameter optimization targeting 15β25 crisp clusters with 5β100 papers per cluster. | |
| - Falls back to KMeans only if density clustering cannot meet the required range. | |
| - Labels each cluster through a **3-member LLM council**: | |
| - Three Mistral council personas when `MISTRAL_API_KEY` is configured (live LLM mode). | |
| - Deterministic keyword/PAJAIS/local semantic fallback when no key is set β app still runs end to end. | |
| - Maps clusters to the **25 PAJAIS IS-research categories**. | |
| - Exports TCCM/computational-technique validation for the top-cited 100 papers. | |
| - Provides a **Compliance tab** showing PASS / CONFIG_REQUIRED / INPUT_REQUIRED / MANUAL_REQUIRED for each requirement. | |
| ## Main Deliverables | |
| - `outputs/comparison.csv` β All clusters with labels, PAJAIS category, confidence, agreement | |
| - `outputs/taxonomy_map.json` β PAJAIS taxonomy mapping + gap analysis | |
| - `outputs/topic_model_report.md` β Full markdown report | |
| - `outputs/narrative.txt` β Narrative summary | |
| - `outputs/cluster_optimization_log.csv` β All UMAP/HDBSCAN parameter trials + scores | |
| - `outputs/llm_council_validation.csv` β Per-cluster council vote evidence | |
| - `outputs/tccm_validation.csv` β Top-100 cited papers with theory/method extraction | |
| - `outputs/compliance_checklist.csv` β Professor requirement compliance | |
| - `outputs/run_metadata.json` β Embedding model + selected parameters | |
| - `outputs/combined_labels.json` β Full cluster data with keywords and titles | |
| ## Run Locally | |
| ```bash | |
| pip install -r requirements.txt | |
| python app.py | |
| ``` | |
| Open the Gradio URL and click **βΆ Run Complete Pipeline** after uploading your Scopus CSV. | |
| For command-line generation (no UI): | |
| ```bash | |
| python run_pipeline.py path/to/scopus.csv | |
| ``` | |
| ## LLM Council Setup | |
| Set `MISTRAL_API_KEY` as a Space secret (or in a local `.env` file) to activate live 3-LLM council labelling. The app runs fully without it using deterministic fallback. | |