# System Architecture & Design Document ## Overview The Autonomous Multi-Agent Data Analytics System is a production-grade, state-graph-based AI agent designed to act as an expert data analyst. It converts natural language queries into safe, executable database operations, retrieves exact datasets, visualizes them, and provides human-readable insights. This document details the architectural decisions, the system workflow, the tools chosen, and the logic underpinning all operations. --- ## 🛠️ Technology Stack & "Why" We Chose It | Technology | Role | Why We Chose It | |---|---|---| | **LangGraph** | Agent Orchestrator | Unlike linear pipelines (like standard LangChain chains), LangGraph allows for cyclic execution. This is critical for our **Self-Correction Loop** where the agent writes code, hits a syntax error, routes backward, and rewrites the code. | | **FastAPI** | Backend API Framework | Provides highly concurrent, asynchronous endpoint handling which is essential for our parallel LLM node execution. Automatic OpenAPI documentation (Swagger) is a plus. | | **React + Vite + Tailwind** | Frontend Client | Vite offers extremely fast HMR (Hot Module Replacement) during development. Tailwind CSS allows for rapid, premium UI component styling, and React's state management perfectly handles the chat streams. | | **Groq (LLaMA-3.1-70B & 8B)** | LLM Provider | Groq's LPU architecture delivers staggering inference speeds (often >300 tokens/sec). The 70B model handles complex SQL/Pandas generation, while the 8B model is used for extremely fast sub-tasks like intent routing and insight synthesis. | | **Neon PostgreSQL** | Metadata & Schema Vector DB | Neon offers serverless Postgres with the `pgvector` extension natively installed. We use it to perform cosine similarity searches over compressed schema tables. | | **Upstash Redis** | High-speed Caching | Used to cache expensive LLM execution results. If an exact generated SQL query was executed recently, Redis serves the result in <50ms. | --- ## 🔄 System Workflow & Node Logic The core logic is orchestrated as a **State Machine** where a global `AgentState` is passed from node to node. ### 1. Intent Router (`intent_router.py`) - **Logic:** The first node of the graph. It intercepts the natural language query and categorizes it using the fast 8B model into one of three intents: `sql`, `pandas`, or `insight_only`. - **Implementation Detail:** Contains a hardcoded safety override. If a user asks for a "chart" or "visualization", but the LLM tries to classify it as `insight_only` (which would bypass data extraction), the router forces it back into `sql` mode. ### 2. Query Planner & Schema RAG (`query_planner.py` & `graph.py`) - **Logic:** Instead of dumping a massive database schema into the LLM prompt (which causes context bloat and hallucination), we perform **Retrieval-Augmented Generation (RAG) on the schema**. - **Implementation:** The user's query is converted to a vector embedding. We query the `schema_embeddings` table in our Neon DB to fetch the top 15 most relevant tables. This strict filtering guarantees the AI focuses only on the necessary schema structures. - **Memory Retrieval:** Simultaneously, we query past successful conversational queries and their results. If a user says "filter that by last month", the memory retrieval injects the past SQL query into the context. ### 3. Code Generation (`sql_generator.py` / `pandas_generator.py`) - **Logic:** Uses the heavyweight `LLaMA-3.1-70B-versatile` model to generate raw syntax. - **Implementation:** It is strictly prompted to output *only* executable code without markdown fences or explanations to prevent parsing errors down the line. ### 4. AST Safety Validator (`safety_validator.py`) - **Logic:** Security is paramount when an AI writes database queries. This pure-Python node verifies the generated code without making any network calls. - **Implementation:** - For SQL, it uses regex/parsing to block destructive commands (`DROP`, `DELETE`, `UPDATE`, `ALTER`, etc.). - For Pandas, it parses the Python Abstract Syntax Tree (AST) using Python's built-in `ast` module to ensure the code does not attempt to import `os`, `sys`, or access the local filesystem. ### 5. Execution Engine (`executor.py`) - **Logic:** Executes the validated code against the active data connector (Postgres, SQLite, or CSV). - **Implementation:** - **Caching:** Before executing, it hashes the generated SQL string and checks Upstash Redis. If a cache hit occurs, it returns the result in milliseconds. - **Execution Guardrails:** Hard-caps the returned rows to a maximum of 500 to prevent massive memory payloads from crashing the FastAPI server. ### 6. Error Classification & Self-Correction (`error_classifier.py` & `self_corrector.py`) - **Logic:** If execution fails (e.g., column not found, type mismatch), the graph does not just fail. It routes to the Error Classifier. - **Implementation:** The classifier reads the Postgres/Python traceback and maps it to a predefined error class (e.g., `nonexistent_table`, `syntax`). The Self-Corrector node takes this specific hint alongside the original failed code, queries the LLM to fix it, and re-routes the state back to the Validation and Execution nodes (up to 3 maximum attempts). ### 7. Concurrent Output Pipeline (`graph.py: _output_pipeline`) - **Logic:** Once data is successfully retrieved, three independent tasks must be performed: Anomaly Detection, Chart Visualization, and Natural Language Insight Synthesis. - **Implementation:** To minimize latency, these three nodes are submitted to a `ThreadPoolExecutor` and run **concurrently**. The system's final latency for this block is bounded entirely by the slowest node (the LLM Insight Synthesizer), reducing sequential execution time drastically. --- ## 🧩 Key Architecture Implementations ### Dynamic AI Chat Suggestions Instead of providing users with static "placeholder" questions (e.g., "What are my top products?"), the system uses a dedicated `/api/query/suggest` endpoint. When a user connects a database, the backend fetches the top 10 tables of their schema and asks the LLM to generate 3-4 custom, highly relevant analytical questions. These are then streamed to the React frontend, allowing the user to begin data analysis in one click without knowing the database layout. ### Abstracted Connector System To support various data sources cleanly, we implemented a `BaseConnector` abstract class (`connectors/base.py`). `PostgresConnector`, `CSVConnector`, and `SQLiteConnector` all inherit from this base class and must implement `get_schema()`, `execute_sql()`, and `load_dataframe()`. This factory pattern allows us to trivially add new data sources (like BigQuery or Snowflake) in the future without changing the core LangGraph agent logic. ### Statelessness for Deployment The application is designed to be completely stateless to support horizontal scaling on platforms like Hugging Face Spaces or Railway. - All session history is managed via Session IDs. - Caching is offloaded to Upstash Redis. - Long-term memory is persisted in the Neon PostgreSQL metadata database. If the FastAPI pod restarts, no conversational context or data analysis progress is permanently lost.