insightpilot / README.md
Himanshu Gangwar
Rename app.py to gradio_app.py to avoid naming conflict with backend
a70c3c3
---
title: InsightPilot - Autonomous Analytics Agent
emoji: 🚀
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.8.0
app_file: gradio_app.py
pinned: false
license: mit
---
# InsightPilot – Autonomous Analytics Agent
InsightPilot is a production-style AI analyst that turns natural-language business questions into validated SQL, visualizations, insights, and executive-ready PDF reports. The stack combines LangGraph (deterministic agent workflow), FastAPI, SQLAlchemy, ReportLab, and a modern React dashboard.
## Project Highlights
- **Agentic LangGraph pipeline** – deterministic tool-calling chain (intent → schema → NL2SQL → execution → diagnostics → visualization → PDF) keeps results auditable.
- **Advanced analytics automation** – dedicated trend and anomaly modules mine every result set and stream context-aware summaries to the UI and PDF.
- **ReportLab title-page exports** – branded title page, SQL appendix, visuals, and analytics sections ready for exec briefings.
- **Draw.io workflow assets** – editable diagram (`docs/agent-workflow.drawio`) plus lightweight SVG for README embeds.
- **Reusable data ingestion** – dataset catalog + multi-table CSV upload lets you retarget the agent to any warehouse quickly.
- **Groq Llama‑3 tooling** – low-latency NL→SQL and narrative insight generation.
## Workflow Diagram (Draw.io)
![Agent Workflow](docs/agent-workflow.svg)
> Edit the source diagram at `docs/agent-workflow.drawio` in Draw.io/Lucidchart and export a fresh SVG/PNG if the workflow changes.
## Advanced Analytics Modules
| Module | What it does | Output surfaces |
| --- | --- | --- |
| **Trend detection** | Converts the query result into a monthly time series, fits a regression, and quantifies direction, slope, and % change. | Chat insights, Latest Analysis cards, PDF “Trend Diagnostics”. |
| **Anomaly detection** | Computes z-scores across the same period and flags statistically significant outliers. | Latest Analysis cards, PDF “Anomaly Highlights”. |
The insight LLM receives both diagnostics, so follow-up prompts remain grounded in prior conclusions.
## Architecture
| Layer | Tech | Notes |
| --- | --- | --- |
| API & Orchestration | FastAPI + LangGraph | Defines graph nodes for schema, NL2SQL, execution, viz, insight, and report building. |
| LLM | Groq `llama3-70b-8192` | Deterministic, low-latency NL2SQL + insight generation. |
| Data | SQLite via SQLAlchemy | Auto-generated `sales` dataset for local testing. |
| Visualization & Reports | Matplotlib, ReportLab/Platypus | Charts saved to `backend/static`, referenced in UI/PDF. |
| Frontend | React + Vite | Modern dashboard with prompt chips, chat, result stack, PDF download. |
## Prerequisites
- Python 3.9+
- Node.js 16+
- Groq API Key (for NL2SQL + insight generation)
## Backend Setup
```bash
cd backend
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
cp .env.example .env # add GROQ_API_KEY and optional DATABASE_URL
python create_db.py # optional; startup also seeds the sales table
python -m app.main # runs on http://localhost:8000
```
`DATABASE_URL` defaults to `sqlite:///./test.db`. Ensure the `.env` file includes your `GROQ_API_KEY` for Groq access.
## Frontend Setup
```bash
cd frontend
npm install
npm run dev # http://localhost:5173
```
## Usage Flow
1. Start backend and frontend dev servers.
1. Upload CSVs via the “Dataset Control” card (or call `POST /api/upload-csv` with `file` + `table_name`). The agent now validates table names, swaps in new tables on demand, and refreshes the dataset catalog for quick reuse.
1. Open the React app, pick a quick prompt (e.g., “What were total sales by category?”) or type your own.
1. InsightPilot:
- Reads DB schema & crafts SQL via Groq Llama‑3.
- Executes SQL with SQLAlchemy/Pandas.
- Generates Matplotlib charts and Groq insight summaries.
- Streams insights to the chat, shows SQL/data preview, and produces a PDF.
1. Download the PDF to share with stakeholders.
## Dataset Upload & Multi-table Ingestion
- **Upload endpoint:** `POST /api/upload-csv` accepts `file` + `table_name`. Table names are validated (`[A-Za-z_][A-Za-z0-9_]*`) to keep SQL safe, and uploads replace/create tables atomically.
- **Catalog endpoint:** `GET /api/datasets` returns discovered tables along with row counts and column names so the frontend can surface a live catalog.
- **Frontend controls:** the Dataset Control card now includes a target-table input, catalog refresh button, and a list of available tables—making it trivial to pivot the agent to another fact table without redeploying anything.
## Project Structure
```text
backend/
app/
agents/graph.py # LangGraph workflow
api/routes.py # FastAPI routes
core/config.py # Settings + env
db/database.py # Engine + auto seed
services/analytics.py # Trend & anomaly detection modules
services/pdf_generator.py# ReportLab report builder
services/csv_loader.py # CSV ingestion + dataset catalog helpers
static/ # Generated charts & PDFs
create_db.py # Manual seed script
requirements.txt
frontend/
src/App.jsx, App.css # React dashboard
package.json
```
## Future Ideas
- Intent-specific LangGraph branches (comparison vs. forecasting).
- Supabase/Postgres adapters with connection pooling.
- Auth + team workspaces for insights and PDF history.
Enjoy exploring data with InsightPilot! 🚀