insightpilot / README.md
Himanshu Gangwar
Rename app.py to gradio_app.py to avoid naming conflict with backend
a70c3c3

A newer version of the Gradio SDK is available: 6.2.0

Upgrade
metadata
title: InsightPilot - Autonomous Analytics Agent
emoji: 🚀
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.8.0
app_file: gradio_app.py
pinned: false
license: mit

InsightPilot – Autonomous Analytics Agent

InsightPilot is a production-style AI analyst that turns natural-language business questions into validated SQL, visualizations, insights, and executive-ready PDF reports. The stack combines LangGraph (deterministic agent workflow), FastAPI, SQLAlchemy, ReportLab, and a modern React dashboard.

Project Highlights

  • Agentic LangGraph pipeline – deterministic tool-calling chain (intent → schema → NL2SQL → execution → diagnostics → visualization → PDF) keeps results auditable.
  • Advanced analytics automation – dedicated trend and anomaly modules mine every result set and stream context-aware summaries to the UI and PDF.
  • ReportLab title-page exports – branded title page, SQL appendix, visuals, and analytics sections ready for exec briefings.
  • Draw.io workflow assets – editable diagram (docs/agent-workflow.drawio) plus lightweight SVG for README embeds.
  • Reusable data ingestion – dataset catalog + multi-table CSV upload lets you retarget the agent to any warehouse quickly.
  • Groq Llama‑3 tooling – low-latency NL→SQL and narrative insight generation.

Workflow Diagram (Draw.io)

Agent Workflow

Edit the source diagram at docs/agent-workflow.drawio in Draw.io/Lucidchart and export a fresh SVG/PNG if the workflow changes.

Advanced Analytics Modules

Module What it does Output surfaces
Trend detection Converts the query result into a monthly time series, fits a regression, and quantifies direction, slope, and % change. Chat insights, Latest Analysis cards, PDF “Trend Diagnostics”.
Anomaly detection Computes z-scores across the same period and flags statistically significant outliers. Latest Analysis cards, PDF “Anomaly Highlights”.

The insight LLM receives both diagnostics, so follow-up prompts remain grounded in prior conclusions.

Architecture

Layer Tech Notes
API & Orchestration FastAPI + LangGraph Defines graph nodes for schema, NL2SQL, execution, viz, insight, and report building.
LLM Groq llama3-70b-8192 Deterministic, low-latency NL2SQL + insight generation.
Data SQLite via SQLAlchemy Auto-generated sales dataset for local testing.
Visualization & Reports Matplotlib, ReportLab/Platypus Charts saved to backend/static, referenced in UI/PDF.
Frontend React + Vite Modern dashboard with prompt chips, chat, result stack, PDF download.

Prerequisites

  • Python 3.9+
  • Node.js 16+
  • Groq API Key (for NL2SQL + insight generation)

Backend Setup

cd backend
python -m venv venv
source venv/bin/activate        # Windows: venv\Scripts\activate
pip install -r requirements.txt
cp .env.example .env            # add GROQ_API_KEY and optional DATABASE_URL
python create_db.py             # optional; startup also seeds the sales table
python -m app.main              # runs on http://localhost:8000

DATABASE_URL defaults to sqlite:///./test.db. Ensure the .env file includes your GROQ_API_KEY for Groq access.

Frontend Setup

cd frontend
npm install
npm run dev                     # http://localhost:5173

Usage Flow

  1. Start backend and frontend dev servers.

  2. Upload CSVs via the “Dataset Control” card (or call POST /api/upload-csv with file + table_name). The agent now validates table names, swaps in new tables on demand, and refreshes the dataset catalog for quick reuse.

  3. Open the React app, pick a quick prompt (e.g., “What were total sales by category?”) or type your own.

  4. InsightPilot:

    • Reads DB schema & crafts SQL via Groq Llama‑3.
    • Executes SQL with SQLAlchemy/Pandas.
    • Generates Matplotlib charts and Groq insight summaries.
    • Streams insights to the chat, shows SQL/data preview, and produces a PDF.
  5. Download the PDF to share with stakeholders.

Dataset Upload & Multi-table Ingestion

  • Upload endpoint: POST /api/upload-csv accepts file + table_name. Table names are validated ([A-Za-z_][A-Za-z0-9_]*) to keep SQL safe, and uploads replace/create tables atomically.
  • Catalog endpoint: GET /api/datasets returns discovered tables along with row counts and column names so the frontend can surface a live catalog.
  • Frontend controls: the Dataset Control card now includes a target-table input, catalog refresh button, and a list of available tables—making it trivial to pivot the agent to another fact table without redeploying anything.

Project Structure

backend/
  app/
    agents/graph.py          # LangGraph workflow
    api/routes.py            # FastAPI routes
    core/config.py           # Settings + env
    db/database.py           # Engine + auto seed
    services/analytics.py    # Trend & anomaly detection modules
    services/pdf_generator.py# ReportLab report builder
    services/csv_loader.py   # CSV ingestion + dataset catalog helpers
  static/                    # Generated charts & PDFs
  create_db.py               # Manual seed script
  requirements.txt

frontend/
  src/App.jsx, App.css       # React dashboard
  package.json

Future Ideas

  • Intent-specific LangGraph branches (comparison vs. forecasting).
  • Supabase/Postgres adapters with connection pooling.
  • Auth + team workspaces for insights and PDF history.

Enjoy exploring data with InsightPilot! 🚀