# System Architecture & Design Document

## Overview
The Autonomous Multi-Agent Data Analytics System is a production-grade, state-graph-based AI agent designed to act as an expert data analyst. It converts natural language queries into safe, executable database operations, retrieves exact datasets, visualizes them, and provides human-readable insights.

This document details the architectural decisions, the system workflow, the tools chosen, and the logic underpinning all operations.

---

## 🛠️ Technology Stack & "Why" We Chose It

| Technology | Role | Why We Chose It |
|---|---|---|
| **LangGraph** | Agent Orchestrator | Unlike linear pipelines (like standard LangChain chains), LangGraph allows for cyclic execution. This is critical for our **Self-Correction Loop** where the agent writes code, hits a syntax error, routes backward, and rewrites the code. |
| **FastAPI** | Backend API Framework | Provides highly concurrent, asynchronous endpoint handling which is essential for our parallel LLM node execution. Automatic OpenAPI documentation (Swagger) is a plus. |
| **React + Vite + Tailwind** | Frontend Client | Vite offers extremely fast HMR (Hot Module Replacement) during development. Tailwind CSS allows for rapid, premium UI component styling, and React's state management perfectly handles the chat streams. |
| **Groq (LLaMA-3.1-70B & 8B)** | LLM Provider | Groq's LPU architecture delivers staggering inference speeds (often >300 tokens/sec). The 70B model handles complex SQL/Pandas generation, while the 8B model is used for extremely fast sub-tasks like intent routing and insight synthesis. |
| **Neon PostgreSQL** | Metadata & Schema Vector DB | Neon offers serverless Postgres with the `pgvector` extension natively installed. We use it to perform cosine similarity searches over compressed schema tables. |
| **Upstash Redis** | High-speed Caching | Used to cache expensive LLM execution results. If an exact generated SQL query was executed recently, Redis serves the result in <50ms. |

---

## 🔄 System Workflow & Node Logic

The core logic is orchestrated as a **State Machine** where a global `AgentState` is passed from node to node.

### 1. Intent Router (`intent_router.py`)
- **Logic:** The first node of the graph. It intercepts the natural language query and categorizes it using the fast 8B model into one of three intents: `sql`, `pandas`, or `insight_only`.
- **Implementation Detail:** Contains a hardcoded safety override. If a user asks for a "chart" or "visualization", but the LLM tries to classify it as `insight_only` (which would bypass data extraction), the router forces it back into `sql` mode.

### 2. Query Planner & Schema RAG (`query_planner.py` & `graph.py`)
- **Logic:** Instead of dumping a massive database schema into the LLM prompt (which causes context bloat and hallucination), we perform **Retrieval-Augmented Generation (RAG) on the schema**.
- **Implementation:** The user's query is converted to a vector embedding. We query the `schema_embeddings` table in our Neon DB to fetch the top 15 most relevant tables. This strict filtering guarantees the AI focuses only on the necessary schema structures.
- **Memory Retrieval:** Simultaneously, we query past successful conversational queries and their results. If a user says "filter that by last month", the memory retrieval injects the past SQL query into the context.

### 3. Code Generation (`sql_generator.py` / `pandas_generator.py`)
- **Logic:** Uses the heavyweight `LLaMA-3.1-70B-versatile` model to generate raw syntax. 
- **Implementation:** It is strictly prompted to output *only* executable code without markdown fences or explanations to prevent parsing errors down the line.

### 4. AST Safety Validator (`safety_validator.py`)
- **Logic:** Security is paramount when an AI writes database queries. This pure-Python node verifies the generated code without making any network calls.
- **Implementation:** 
  - For SQL, it uses regex/parsing to block destructive commands (`DROP`, `DELETE`, `UPDATE`, `ALTER`, etc.).
  - For Pandas, it parses the Python Abstract Syntax Tree (AST) using Python's built-in `ast` module to ensure the code does not attempt to import `os`, `sys`, or access the local filesystem.

### 5. Execution Engine (`executor.py`)
- **Logic:** Executes the validated code against the active data connector (Postgres, SQLite, or CSV).
- **Implementation:** 
  - **Caching:** Before executing, it hashes the generated SQL string and checks Upstash Redis. If a cache hit occurs, it returns the result in milliseconds.
  - **Execution Guardrails:** Hard-caps the returned rows to a maximum of 500 to prevent massive memory payloads from crashing the FastAPI server.

### 6. Error Classification & Self-Correction (`error_classifier.py` & `self_corrector.py`)
- **Logic:** If execution fails (e.g., column not found, type mismatch), the graph does not just fail. It routes to the Error Classifier.
- **Implementation:** The classifier reads the Postgres/Python traceback and maps it to a predefined error class (e.g., `nonexistent_table`, `syntax`). The Self-Corrector node takes this specific hint alongside the original failed code, queries the LLM to fix it, and re-routes the state back to the Validation and Execution nodes (up to 3 maximum attempts).

### 7. Concurrent Output Pipeline (`graph.py: _output_pipeline`)
- **Logic:** Once data is successfully retrieved, three independent tasks must be performed: Anomaly Detection, Chart Visualization, and Natural Language Insight Synthesis.
- **Implementation:** To minimize latency, these three nodes are submitted to a `ThreadPoolExecutor` and run **concurrently**. The system's final latency for this block is bounded entirely by the slowest node (the LLM Insight Synthesizer), reducing sequential execution time drastically.

---

## 🧩 Key Architecture Implementations

### Dynamic AI Chat Suggestions
Instead of providing users with static "placeholder" questions (e.g., "What are my top products?"), the system uses a dedicated `/api/query/suggest` endpoint. When a user connects a database, the backend fetches the top 10 tables of their schema and asks the LLM to generate 3-4 custom, highly relevant analytical questions. These are then streamed to the React frontend, allowing the user to begin data analysis in one click without knowing the database layout.

### Abstracted Connector System
To support various data sources cleanly, we implemented a `BaseConnector` abstract class (`connectors/base.py`). `PostgresConnector`, `CSVConnector`, and `SQLiteConnector` all inherit from this base class and must implement `get_schema()`, `execute_sql()`, and `load_dataframe()`. This factory pattern allows us to trivially add new data sources (like BigQuery or Snowflake) in the future without changing the core LangGraph agent logic.

### Statelessness for Deployment
The application is designed to be completely stateless to support horizontal scaling on platforms like Hugging Face Spaces or Railway. 
- All session history is managed via Session IDs.
- Caching is offloaded to Upstash Redis.
- Long-term memory is persisted in the Neon PostgreSQL metadata database. 
If the FastAPI pod restarts, no conversational context or data analysis progress is permanently lost.