Daksh C Jain
fix: shorten HF Space short_description to under 60 chars
bb50b0a

A newer version of the Gradio SDK is available: 6.15.1

Upgrade
metadata
title: EIS Topic Intelligence
sdk: gradio
sdk_version: 5.25.2
app_file: app.py
pinned: true
license: mit
short_description: EIS topic modelling with LLM council validation

EIS Topic Intelligence

SPJIMR Research Analytics β€” Topic modelling pipeline for the Enterprise Information Systems journal corpus.

What It Does

  • Loads a Scopus-exported CSV (needs Title and Abstract columns minimum).
  • Builds paper-level embeddings from Title + Abstract using SPECTER2 transformer model; falls back to TF-IDF + SVD if transformers are unavailable.
  • Runs UMAP + HDBSCAN parameter optimization targeting 15–25 crisp clusters with 5–100 papers per cluster.
  • Falls back to KMeans only if density clustering cannot meet the required range.
  • Labels each cluster through a 3-member LLM council:
    • Three Mistral council personas when MISTRAL_API_KEY is configured (live LLM mode).
    • Deterministic keyword/PAJAIS/local semantic fallback when no key is set β€” app still runs end to end.
  • Maps clusters to the 25 PAJAIS IS-research categories.
  • Exports TCCM/computational-technique validation for the top-cited 100 papers.
  • Provides a Compliance tab showing PASS / CONFIG_REQUIRED / INPUT_REQUIRED / MANUAL_REQUIRED for each requirement.

Main Deliverables

  • outputs/comparison.csv β€” All clusters with labels, PAJAIS category, confidence, agreement
  • outputs/taxonomy_map.json β€” PAJAIS taxonomy mapping + gap analysis
  • outputs/topic_model_report.md β€” Full markdown report
  • outputs/narrative.txt β€” Narrative summary
  • outputs/cluster_optimization_log.csv β€” All UMAP/HDBSCAN parameter trials + scores
  • outputs/llm_council_validation.csv β€” Per-cluster council vote evidence
  • outputs/tccm_validation.csv β€” Top-100 cited papers with theory/method extraction
  • outputs/compliance_checklist.csv β€” Professor requirement compliance
  • outputs/run_metadata.json β€” Embedding model + selected parameters
  • outputs/combined_labels.json β€” Full cluster data with keywords and titles

Run Locally

pip install -r requirements.txt
python app.py

Open the Gradio URL and click β–Ά Run Complete Pipeline after uploading your Scopus CSV.

For command-line generation (no UI):

python run_pipeline.py path/to/scopus.csv

LLM Council Setup

Set MISTRAL_API_KEY as a Space secret (or in a local .env file) to activate live 3-LLM council labelling. The app runs fully without it using deterministic fallback.