{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "xIDUHu0Rd6TZ" }, "source": [ "Project 1 \u2014 Banking domain (your \"POC-level\" project)\n", "Intelligent Banking Assistant with RAG + Agentic Routing\n", "Build a multi-agent system where a user can ask banking queries \u2014 loan eligibility, fraud alerts, transaction summaries \u2014 and the system routes them intelligently between a RAG agent (policy documents), a SQL agent (transaction data), and a summarization agent. Deploy end-to-end with Docker + Streamlit on HuggingFace Spaces with CI/CD via GitHub Actions." ] }, { "cell_type": "markdown", "metadata": { "id": "pffZZAN2d7sp" }, "source": [ "Why it stands out: combines RAG + agentic AI + your domain expertise + full deployment. It directly tells the story of \"I know banking AND I know GenAI AND I can ship.\" New tool to explore: LangGraph for multi-agent orchestration \u2014 this is becoming the industry standard and will impress interviewers." ] }, { "cell_type": "markdown", "metadata": { "id": "kQGZvNWOsIJY" }, "source": [ "### Import necessary libraries" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "E86VJ-DPUn_G", "outputId": "47f8531b-d72d-4b44-8e00-055e1de28330" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Requirement already satisfied: langchain in /usr/local/lib/python3.12/dist-packages (1.2.10)\n", "Requirement already satisfied: langchain-community in /usr/local/lib/python3.12/dist-packages (0.4.1)\n", "Requirement already satisfied: langchain-groq in /usr/local/lib/python3.12/dist-packages (1.1.2)\n", "Requirement already satisfied: langgraph in /usr/local/lib/python3.12/dist-packages (1.0.9)\n", "Requirement already satisfied: faiss-cpu in /usr/local/lib/python3.12/dist-packages (1.13.2)\n", "Requirement already satisfied: sentence-transformers in /usr/local/lib/python3.12/dist-packages (5.2.3)\n", "Requirement already satisfied: python-dotenv in /usr/local/lib/python3.12/dist-packages (1.2.1)\n", "Requirement already satisfied: sqlalchemy in /usr/local/lib/python3.12/dist-packages (2.0.47)\n", "Requirement already satisfied: pandas in /usr/local/lib/python3.12/dist-packages (2.2.2)\n", "Requirement already satisfied: groq in /usr/local/lib/python3.12/dist-packages (0.37.1)\n", "Requirement already satisfied: langchain-core<2.0.0,>=1.2.10 in /usr/local/lib/python3.12/dist-packages (from langchain) (1.2.18)\n", "Requirement already satisfied: pydantic<3.0.0,>=2.7.4 in /usr/local/lib/python3.12/dist-packages (from langchain) (2.12.3)\n", "Requirement already satisfied: langchain-classic<2.0.0,>=1.0.0 in /usr/local/lib/python3.12/dist-packages (from langchain-community) (1.0.2)\n", "Requirement already satisfied: requests<3.0.0,>=2.32.5 in /usr/local/lib/python3.12/dist-packages (from langchain-community) (2.32.5)\n", "Requirement already satisfied: PyYAML<7.0.0,>=5.3.0 in /usr/local/lib/python3.12/dist-packages (from langchain-community) (6.0.3)\n", "Requirement already satisfied: aiohttp<4.0.0,>=3.8.3 in /usr/local/lib/python3.12/dist-packages (from langchain-community) (3.13.3)\n", "Requirement already satisfied: tenacity!=8.4.0,<10.0.0,>=8.1.0 in /usr/local/lib/python3.12/dist-packages (from langchain-community) (9.1.4)\n", "Requirement already satisfied: dataclasses-json<0.7.0,>=0.6.7 in /usr/local/lib/python3.12/dist-packages (from langchain-community) (0.6.7)\n", "Requirement already satisfied: pydantic-settings<3.0.0,>=2.10.1 in /usr/local/lib/python3.12/dist-packages (from langchain-community) (2.13.1)\n", "Requirement already satisfied: langsmith<1.0.0,>=0.1.125 in /usr/local/lib/python3.12/dist-packages (from langchain-community) (0.7.7)\n", "Requirement already satisfied: httpx-sse<1.0.0,>=0.4.0 in /usr/local/lib/python3.12/dist-packages (from langchain-community) (0.4.3)\n", "Requirement already satisfied: numpy>=1.26.2 in /usr/local/lib/python3.12/dist-packages (from langchain-community) (2.0.2)\n", "Requirement already satisfied: langgraph-checkpoint<5.0.0,>=2.1.0 in /usr/local/lib/python3.12/dist-packages (from langgraph) (4.0.0)\n", "Requirement already satisfied: langgraph-prebuilt<1.1.0,>=1.0.8 in /usr/local/lib/python3.12/dist-packages (from langgraph) (1.0.8)\n", "Requirement already satisfied: langgraph-sdk<0.4.0,>=0.3.0 in /usr/local/lib/python3.12/dist-packages (from langgraph) (0.3.9)\n", "Requirement already satisfied: xxhash>=3.5.0 in /usr/local/lib/python3.12/dist-packages (from langgraph) (3.6.0)\n", "Requirement already satisfied: packaging in /usr/local/lib/python3.12/dist-packages (from faiss-cpu) (26.0)\n", "Requirement already satisfied: transformers<6.0.0,>=4.41.0 in /usr/local/lib/python3.12/dist-packages (from sentence-transformers) (5.0.0)\n", "Requirement already satisfied: huggingface-hub>=0.20.0 in /usr/local/lib/python3.12/dist-packages (from sentence-transformers) (1.5.0)\n", "Requirement already satisfied: torch>=1.11.0 in /usr/local/lib/python3.12/dist-packages (from sentence-transformers) (2.10.0+cpu)\n", "Requirement already satisfied: scikit-learn in /usr/local/lib/python3.12/dist-packages (from sentence-transformers) (1.6.1)\n", "Requirement already satisfied: scipy in /usr/local/lib/python3.12/dist-packages (from sentence-transformers) (1.16.3)\n", "Requirement already satisfied: typing_extensions>=4.5.0 in /usr/local/lib/python3.12/dist-packages (from sentence-transformers) (4.15.0)\n", "Requirement already satisfied: tqdm in /usr/local/lib/python3.12/dist-packages (from sentence-transformers) (4.67.3)\n", "Requirement already satisfied: greenlet>=1 in /usr/local/lib/python3.12/dist-packages (from sqlalchemy) (3.3.2)\n", "Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.12/dist-packages (from pandas) (2.9.0.post0)\n", "Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.12/dist-packages (from pandas) (2025.2)\n", "Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.12/dist-packages (from pandas) (2025.3)\n", "Requirement already satisfied: anyio<5,>=3.5.0 in /usr/local/lib/python3.12/dist-packages (from groq) (4.12.1)\n", "Requirement already satisfied: distro<2,>=1.7.0 in /usr/local/lib/python3.12/dist-packages (from groq) (1.9.0)\n", "Requirement already satisfied: httpx<1,>=0.23.0 in /usr/local/lib/python3.12/dist-packages (from groq) (0.28.1)\n", "Requirement already satisfied: sniffio in /usr/local/lib/python3.12/dist-packages (from groq) (1.3.1)\n", "Requirement already satisfied: aiohappyeyeballs>=2.5.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain-community) (2.6.1)\n", "Requirement already satisfied: aiosignal>=1.4.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain-community) (1.4.0)\n", "Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain-community) (25.4.0)\n", "Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.12/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain-community) (1.8.0)\n", "Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.12/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain-community) (6.7.1)\n", "Requirement already satisfied: propcache>=0.2.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain-community) (0.4.1)\n", "Requirement already satisfied: yarl<2.0,>=1.17.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain-community) (1.22.0)\n", "Requirement already satisfied: idna>=2.8 in /usr/local/lib/python3.12/dist-packages (from anyio<5,>=3.5.0->groq) (3.11)\n", "Requirement already satisfied: marshmallow<4.0.0,>=3.18.0 in /usr/local/lib/python3.12/dist-packages (from dataclasses-json<0.7.0,>=0.6.7->langchain-community) (3.26.2)\n", "Requirement already satisfied: typing-inspect<1,>=0.4.0 in /usr/local/lib/python3.12/dist-packages (from dataclasses-json<0.7.0,>=0.6.7->langchain-community) (0.9.0)\n", "Requirement already satisfied: certifi in /usr/local/lib/python3.12/dist-packages (from httpx<1,>=0.23.0->groq) (2026.2.25)\n", "Requirement already satisfied: httpcore==1.* in /usr/local/lib/python3.12/dist-packages (from httpx<1,>=0.23.0->groq) (1.0.9)\n", "Requirement already satisfied: h11>=0.16 in /usr/local/lib/python3.12/dist-packages (from httpcore==1.*->httpx<1,>=0.23.0->groq) (0.16.0)\n", "Requirement already satisfied: filelock>=3.10.0 in /usr/local/lib/python3.12/dist-packages (from huggingface-hub>=0.20.0->sentence-transformers) (3.24.3)\n", "Requirement already satisfied: fsspec>=2023.5.0 in /usr/local/lib/python3.12/dist-packages (from huggingface-hub>=0.20.0->sentence-transformers) (2025.3.0)\n", "Requirement already satisfied: hf-xet<2.0.0,>=1.2.0 in /usr/local/lib/python3.12/dist-packages (from huggingface-hub>=0.20.0->sentence-transformers) (1.3.1)\n", "Requirement already satisfied: typer in /usr/local/lib/python3.12/dist-packages (from huggingface-hub>=0.20.0->sentence-transformers) (0.24.1)\n", "Requirement already satisfied: langchain-text-splitters<2.0.0,>=1.1.0 in /usr/local/lib/python3.12/dist-packages (from langchain-classic<2.0.0,>=1.0.0->langchain-community) (1.1.1)\n", "Requirement already satisfied: jsonpatch<2.0.0,>=1.33.0 in /usr/local/lib/python3.12/dist-packages (from langchain-core<2.0.0,>=1.2.10->langchain) (1.33)\n", "Requirement already satisfied: uuid-utils<1.0,>=0.12.0 in /usr/local/lib/python3.12/dist-packages (from langchain-core<2.0.0,>=1.2.10->langchain) (0.14.1)\n", "Requirement already satisfied: ormsgpack>=1.12.0 in /usr/local/lib/python3.12/dist-packages (from langgraph-checkpoint<5.0.0,>=2.1.0->langgraph) (1.12.2)\n", "Requirement already satisfied: orjson>=3.11.5 in /usr/local/lib/python3.12/dist-packages (from langgraph-sdk<0.4.0,>=0.3.0->langgraph) (3.11.7)\n", "Requirement already satisfied: requests-toolbelt>=1.0.0 in /usr/local/lib/python3.12/dist-packages (from langsmith<1.0.0,>=0.1.125->langchain-community) (1.0.0)\n", "Requirement already satisfied: zstandard>=0.23.0 in /usr/local/lib/python3.12/dist-packages (from langsmith<1.0.0,>=0.1.125->langchain-community) (0.25.0)\n", "Requirement already satisfied: annotated-types>=0.6.0 in /usr/local/lib/python3.12/dist-packages (from pydantic<3.0.0,>=2.7.4->langchain) (0.7.0)\n", "Requirement already satisfied: pydantic-core==2.41.4 in /usr/local/lib/python3.12/dist-packages (from pydantic<3.0.0,>=2.7.4->langchain) (2.41.4)\n", "Requirement already satisfied: typing-inspection>=0.4.2 in /usr/local/lib/python3.12/dist-packages (from pydantic<3.0.0,>=2.7.4->langchain) (0.4.2)\n", "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.12/dist-packages (from python-dateutil>=2.8.2->pandas) (1.17.0)\n", "Requirement already satisfied: charset_normalizer<4,>=2 in /usr/local/lib/python3.12/dist-packages (from requests<3.0.0,>=2.32.5->langchain-community) (3.4.4)\n", "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.12/dist-packages (from requests<3.0.0,>=2.32.5->langchain-community) (2.5.0)\n", "Requirement already satisfied: setuptools in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers) (75.2.0)\n", "Requirement already satisfied: sympy>=1.13.3 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers) (1.14.0)\n", "Requirement already satisfied: networkx>=2.5.1 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers) (3.6.1)\n", "Requirement already satisfied: jinja2 in /usr/local/lib/python3.12/dist-packages (from torch>=1.11.0->sentence-transformers) (3.1.6)\n", "Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.12/dist-packages (from transformers<6.0.0,>=4.41.0->sentence-transformers) (2025.11.3)\n", "Requirement already satisfied: tokenizers<=0.23.0,>=0.22.0 in /usr/local/lib/python3.12/dist-packages (from transformers<6.0.0,>=4.41.0->sentence-transformers) (0.22.2)\n", "Requirement already satisfied: typer-slim in /usr/local/lib/python3.12/dist-packages (from transformers<6.0.0,>=4.41.0->sentence-transformers) (0.24.0)\n", "Requirement already satisfied: safetensors>=0.4.3 in /usr/local/lib/python3.12/dist-packages (from transformers<6.0.0,>=4.41.0->sentence-transformers) (0.7.0)\n", "Requirement already satisfied: joblib>=1.2.0 in /usr/local/lib/python3.12/dist-packages (from scikit-learn->sentence-transformers) (1.5.3)\n", "Requirement already satisfied: threadpoolctl>=3.1.0 in /usr/local/lib/python3.12/dist-packages (from scikit-learn->sentence-transformers) (3.6.0)\n", "Requirement already satisfied: jsonpointer>=1.9 in /usr/local/lib/python3.12/dist-packages (from jsonpatch<2.0.0,>=1.33.0->langchain-core<2.0.0,>=1.2.10->langchain) (3.0.0)\n", "Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/lib/python3.12/dist-packages (from sympy>=1.13.3->torch>=1.11.0->sentence-transformers) (1.3.0)\n", "Requirement already satisfied: mypy-extensions>=0.3.0 in /usr/local/lib/python3.12/dist-packages (from typing-inspect<1,>=0.4.0->dataclasses-json<0.7.0,>=0.6.7->langchain-community) (1.1.0)\n", "Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.12/dist-packages (from jinja2->torch>=1.11.0->sentence-transformers) (3.0.3)\n", "Requirement already satisfied: click>=8.2.1 in /usr/local/lib/python3.12/dist-packages (from typer->huggingface-hub>=0.20.0->sentence-transformers) (8.3.1)\n", "Requirement already satisfied: shellingham>=1.3.0 in /usr/local/lib/python3.12/dist-packages (from typer->huggingface-hub>=0.20.0->sentence-transformers) (1.5.4)\n", "Requirement already satisfied: rich>=12.3.0 in /usr/local/lib/python3.12/dist-packages (from typer->huggingface-hub>=0.20.0->sentence-transformers) (13.9.4)\n", "Requirement already satisfied: annotated-doc>=0.0.2 in /usr/local/lib/python3.12/dist-packages (from typer->huggingface-hub>=0.20.0->sentence-transformers) (0.0.4)\n", "Requirement already satisfied: markdown-it-py>=2.2.0 in /usr/local/lib/python3.12/dist-packages (from rich>=12.3.0->typer->huggingface-hub>=0.20.0->sentence-transformers) (4.0.0)\n", "Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /usr/local/lib/python3.12/dist-packages (from rich>=12.3.0->typer->huggingface-hub>=0.20.0->sentence-transformers) (2.19.2)\n", "Requirement already satisfied: mdurl~=0.1 in /usr/local/lib/python3.12/dist-packages (from markdown-it-py>=2.2.0->rich>=12.3.0->typer->huggingface-hub>=0.20.0->sentence-transformers) (0.1.2)\n" ] } ], "source": [ "!pip install langchain langchain-community langchain-groq langgraph \\\n", " faiss-cpu sentence-transformers python-dotenv \\\n", " sqlalchemy pandas groq\n" ] }, { "cell_type": "markdown", "metadata": { "id": "0wGMBwEVsOFm" }, "source": [ "### Connect to LLM" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "FOjYbmheZYHo", "outputId": "35ccb13a-d03e-4fc2-ea61-c31e2823aa06" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "LLM connected successfully.\n" ] } ], "source": [ "import os\n", "from langchain_groq import ChatGroq\n", "\n", "# (In production/VS Code, we load from .env \u2014 never hardcode in real repos)\n", "GROQ_API_KEY = os.getenv(\"GROQ_API_KEY\") # Load from .env\n", "\n", "# Initialise the LLM\n", "llm = ChatGroq(\n", "GROQ_API_KEY = os.getenv(\"GROQ_API_KEY\") # Load from .env\n", " model_name=\"llama-3.1-8b-instant\",\n", " temperature=0\n", ")\n", "\n", "#Why `temperature=0`?**\n", "#For a banking assistant we want consistent, factual answers \u2014 not creative ones. Temperature 0 means the model always picks the most confident response. Worth mentioning in interviews!\n", "\n", "# Quick test\n", "response = llm.invoke(\"Say 'LLM connected successfully' and nothing else.\")\n", "print(response.content)" ] }, { "cell_type": "markdown", "metadata": { "id": "X5oy16HYej54" }, "source": [ "**Observation** :\n", "\n", "I chose Llama 3.1 8B Instant on Groq because the task is retrieval-augmented \u2014 the model doesn't need to reason deeply, just understand queries and summarise retrieved context. The instant variant gives low latency which matters for a real-time banking assistant, and Groq's LPU hardware makes it significantly faster than GPU-based inference." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "X-IHdGYxeopv", "outputId": "a49c2273-3eee-4666-d5b7-4c6e9d6c33da" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Collecting pypdf\n", " Downloading pypdf-6.8.0-py3-none-any.whl.metadata (7.1 kB)\n", "Downloading pypdf-6.8.0-py3-none-any.whl (332 kB)\n", "\u001b[?25l \u001b[90m\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u001b[0m \u001b[32m0.0/332.2 kB\u001b[0m \u001b[31m?\u001b[0m eta \u001b[36m-:--:--\u001b[0m\r\u001b[2K \u001b[91m\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u001b[0m\u001b[90m\u257a\u001b[0m \u001b[32m327.7/332.2 kB\u001b[0m \u001b[31m13.0 MB/s\u001b[0m eta \u001b[36m0:00:01\u001b[0m\r\u001b[2K \u001b[90m\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u001b[0m \u001b[32m332.2/332.2 kB\u001b[0m \u001b[31m8.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hInstalling collected packages: pypdf\n", "Successfully installed pypdf-6.8.0\n" ] } ], "source": [ "!pip install pypdf" ] }, { "cell_type": "markdown", "metadata": { "id": "HrgZfpcesTSP" }, "source": [ "### Load PDF documents" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "w7EIs4o_wcKN", "outputId": "71ebffd8-5b9d-47f9-c013-64b679ba133f" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Mounted at /content/drive\n" ] } ], "source": [ "from google.colab import drive\n", "drive.mount('/content/drive')" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "WjTOgedhwea9" }, "outputs": [], "source": [ "# Your PDF path will be:\n", "DOCS_PATH = \"/content/drive/MyDrive/AIML GL/Projects/Banking Intelligence Assistance/Documents/\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "eVW_SeXaqU_E", "outputId": "274005dd-8a9a-4077-f184-6e5feb4b6891" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Loaded: hdfc_credit_card_policy.pdf \u2014 6 pages\n", " Loaded: hdfc_customer_compensation_policy.pdf \u2014 5 pages\n", " Loaded: hdfc_grievance_policy.pdf \u2014 16 pages\n", " Loaded: hdfc_personal_loan_agreement.pdf \u2014 16 pages\n", " Loaded: hdfc_savings_account_charges.pdf \u2014 2 pages\n", " Loaded: hdfc_general_terms_conditions.pdf \u2014 6 pages\n", "\n", " Total pages loaded: 51\n" ] } ], "source": [ "from langchain_community.document_loaders import PyPDFLoader\n", "from langchain_text_splitters import RecursiveCharacterTextSplitter\n", "import os\n", "\n", "# Path to your PDFs in Google Drive\n", "DOCS_PATH = \"/content/drive/MyDrive/AIML GL/Projects/Banking Intelligence Assistance/Documents/\"\n", "\n", "# All 6 PDFs\n", "PDF_FILES = [\n", " \"hdfc_credit_card_policy.pdf\",\n", " \"hdfc_customer_compensation_policy.pdf\",\n", " \"hdfc_grievance_policy.pdf\",\n", " \"hdfc_personal_loan_agreement.pdf\",\n", " \"hdfc_savings_account_charges.pdf\",\n", " \"hdfc_general_terms_conditions.pdf\"\n", "]\n", "\n", "# Load all PDFs\n", "all_documents = []\n", "for pdf in PDF_FILES:\n", " full_path = os.path.join(DOCS_PATH, pdf);\n", " try:\n", " loader = PyPDFLoader(full_path)\n", " docs = loader.load()\n", " all_documents.extend(docs)\n", " print(f\" Loaded: {pdf} \u2014 {len(docs)} pages\")\n", " except Exception as e:\n", " print(f\" Failed to load {pdf}: {e}\")\n", "\n", "print(f\"\\n Total pages loaded: {len(all_documents)}\")" ] }, { "cell_type": "markdown", "metadata": { "id": "BBSYe6DNsYsP" }, "source": [ "### Splitting the PDF data" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "G32ZcpexsiE9", "outputId": "4e66131b-109f-4e76-858c-6a99504d5d15" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Chunks before cleaning: 578\n", " Chunks after cleaning: 577\n", "\n", "--- Sample chunk ---\n", "Repayment Through any of the payment channels (E.g. Net banking, ATMs, \n", "Cheque /cash deposit in branches, Standing instructions for \n", "account holders with HDFC Bank). For credit cards issued to NRIs / \n", "PIOs, the repayment shall be as per the defi ned process for NRI \n", "sourcing and FEMA directions issued by RBI from time to time. Security / Collateral No collateral required for regular Cards. There is a FD based card \n", "where the Credit card is issued with a lien marked against the Fixed\n" ] } ], "source": [ "# Split documents into chunks\n", "import re\n", "\n", "# Split into chunks\n", "text_splitter = RecursiveCharacterTextSplitter(\n", " chunk_size=500,\n", " chunk_overlap=50,\n", " separators=[\"\\n\\n\", \"\\n\", \".\", \" \"]\n", ")\n", "\n", "chunks = text_splitter.split_documents(all_documents)\n", "print(f\" Chunks before cleaning: {len(chunks)}\")\n", "\n", "# Clean chunks\n", "def clean_text(text):\n", " text = re.sub(r'Classification\\s*[-\u2013]\\s*Internal', '', text)\n", " text = re.sub(r'\\n{3,}', '\\n\\n', text)\n", " text = re.sub(r'\\s{3,}', ' ', text)\n", " text = re.sub(r'as on \\d{2}\\.\\d{2}\\.\\d{4}', '', text)\n", " return text.strip()\n", "\n", "for chunk in chunks:\n", " chunk.page_content = clean_text(chunk.page_content)\n", "\n", "# Remove noise chunks\n", "chunks = [chunk for chunk in chunks if len(chunk.page_content) > 50]\n", "print(f\" Chunks after cleaning: {len(chunks)}\")\n", "print(f\"\\n--- Sample chunk ---\")\n", "print(chunks[10].page_content)" ] }, { "cell_type": "markdown", "metadata": { "id": "6WHMzHuVtHCo" }, "source": [ "**Observation :**\n", "\n", "Why these specific settings? \u2014 Great interview talking point:\n", "\n", "- chunk_size=500 \u2014 Each chunk is ~500 characters. Small enough to be precise when retrieving, large enough to contain meaningful context. Too large and retrieval becomes noisy; too small and you lose context.\n", "\n", " - chunk_overlap=50 \u2014 50 characters of overlap between consecutive chunks. This ensures that if an important sentence falls at the boundary of two chunks, it still gets captured in at least one of them.\n", "\n", " - separators \u2014 We split preferentially on paragraph breaks first (\\n\\n), then line breaks, then sentences. This keeps semantically related text together rather than cutting mid-sentence." ] }, { "cell_type": "markdown", "metadata": { "id": "8lsx9vGRJa1b" }, "source": [ "We use chunk-level **embeddings**, not word-level. Each chunk is converted into a **384**-**dimensional** dense vector that captures its semantic meaning. This allows FAISS to retrieve **contextually** **relevant** chunks even when the user's query uses completely different words than what's in the document \u2014 because similar meanings produce geometrically close vectors in the embedding space.\"" ] }, { "cell_type": "markdown", "metadata": { "id": "jWcaZuvd0rKC" }, "source": [ "### Building the FAISS vector store" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 240, "referenced_widgets": [ "e6e6ae14093942a783a0251a9e641419", "4fc4a731772a43bc888bb92e87a042a0", "6b424f3980624d858ded9569fe57936f", "4d1ef31ab0fc4fc9b87633db156bba39", "4878abad46944afeb96e9280c850601e", "eeeec036ab0445bb81abb27d532817e1", "706f9875c6b74d5380d8336f7c435cba", "1d992192b56d4196b2eb98e940e455c4", "2d07c4273cad4fdc8d9ca704d8cb0680", "92a6e1559ba84233b1c82352ceba6216", "fc402a5cef4d4e4a8089217bf942942d" ] }, "id": "Cj8X1SWa0sZJ", "outputId": "aa9a7161-47b4-40c9-afb7-de27c19f8249" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Loading embedding model...\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "e6e6ae14093942a783a0251a9e641419", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Loading weights: 0%| | 0/103 [00:00 Pay > Cards > Credit Cards> Key Fact Statement cum MITC OR Refer to https://www.hdfcbank.com/personal/pay/cards/credit-cards/membership-kit\n", "\n" ] } ], "source": [ "# Test retrieval \u2014 does FAISS find the right chunks?\n", "retriever = vectorstore.as_retriever(\n", " search_type=\"similarity\",\n", " search_kwargs={\"k\": 3} # Return top 3 most relevant chunks\n", ")\n", "\n", "# Test query\n", "test_query = \"What is the late payment fee for credit cards?\"\n", "results = retriever.invoke(test_query)\n", "\n", "print(f\" Query: {test_query}\")\n", "print(f\" Top {len(results)} chunks retrieved:\\n\")\n", "for i, doc in enumerate(results):\n", " print(f\"--- Chunk {i+1} ---\")\n", " print(doc.page_content)\n", " print()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "Mcxh1m4A5Ed2", "outputId": "815e4ae8-adbc-4920-fca7-4f161f8e80ab" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Searching raw chunks for 'late payment'...\n", "\n", "Found 1 chunks mentioning 'late payment':\n", "\n", "--- Match 1 ---\n", "above and for reasons directly attributable to the Bank, the customer shall be compensated at the prevailing \n", "Savings interest rate for the period between the due date of direct / EC S debit and the date of actual debit \n", "carried out by the bank. Direct / ECS debits which are towards payments of an Equated Monthly Instalment \n", "(EMI), the Bank would reimburse the customer, penal interest, late payment charges, if any levied upon\n", "\n" ] } ], "source": [ "# Let's search directly in the raw text to see if the info exists\n", "print(\" Searching raw chunks for 'late payment'...\\n\")\n", "\n", "matches = [chunk for chunk in chunks if \"late payment\" in chunk.page_content.lower()]\n", "\n", "print(f\"Found {len(matches)} chunks mentioning 'late payment':\\n\")\n", "for i, match in enumerate(matches):\n", " print(f\"--- Match {i+1} ---\")\n", " print(match.page_content)\n", " print()" ] }, { "cell_type": "markdown", "metadata": { "id": "5phR_WSb6Kss" }, "source": [ "**Observation -**\n", "\n", "Now we know exactly what's happening \u2014 the information exists but is **incomplete**. The chunk mentions \"late payment charges\" but doesn't state the specific fee amount. This is a data coverage issue, not a retrieval bug. The retriever is actually working correctly!\n", "This is a realistic RAG scenario \u2014 in production, documents are always imperfect.\n", "\n", "Here's how we handle it professionally:\n", "Fix 1 \u2014 Switch to MMR search (better retrieval diversity)." ] }, { "cell_type": "markdown", "metadata": { "id": "6GSkCTDl6aZS" }, "source": [ "### Fix 1 \u2014 Switch to MMR search" ] }, { "cell_type": "markdown", "metadata": { "id": "HaBlal037Drj" }, "source": [ "**Observation -**\n", "\n", "Why **MMR** over simple similarity search? \u2014 Strong interview answer:\n", "\n", "Simple similarity search often returns 3 nearly identical chunks \u2014 lots of redundancy, little new information. MMR solves this by penalising chunks that are too similar to ones already selected, giving you broader, more useful context for the LLM to work with." ] }, { "cell_type": "markdown", "metadata": { "id": "xDk0qzh37K-a" }, "source": [ "### Building the RAG agent chain" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "g4i9fWRi8SV5", "outputId": "46d365ed-49ad-4bc8-b107-61df7304f7db" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " RAG Agent ready with LCEL!\n" ] } ], "source": [ "from langchain_core.prompts import PromptTemplate\n", "from langchain_core.runnables import RunnablePassthrough\n", "from langchain_core.output_parsers import StrOutputParser\n", "\n", "# MMR retriever\n", "retriever = vectorstore.as_retriever(\n", " search_type=\"mmr\",\n", " search_kwargs={\"k\": 4, \"fetch_k\": 20, \"lambda_mult\": 0.7}\n", ")\n", "\n", "# Grounded prompt\n", "prompt_template = \"\"\"You are a helpful HDFC Bank policy assistant.\n", "Use ONLY the context below to answer the customer's question.\n", "If the answer is not in the context, say \"I don't have enough information\n", "in the policy documents to answer this. Please contact HDFC Bank directly.\"\n", "\n", "Context:\n", "{context}\n", "\n", "Customer Question: {question}\n", "\n", "Answer:\"\"\"\n", "\n", "prompt = PromptTemplate(\n", " template=prompt_template,\n", " input_variables=[\"context\", \"question\"]\n", ")\n", "\n", "# Helper to format retrieved chunks into single string\n", "def format_docs(docs):\n", " return \"\\n\\n\".join(doc.page_content for doc in docs)\n", "\n", "# Build RAG chain using LCEL (LangChain Expression Language)\n", "rag_chain = (\n", " {\"context\": retriever | format_docs, \"question\": RunnablePassthrough()}\n", " | prompt\n", " | llm\n", " | StrOutputParser()\n", ")\n", "\n", "print(\" RAG Agent ready with LCEL!\")" ] }, { "cell_type": "markdown", "metadata": { "id": "A-CaZ6GE8fch" }, "source": [ "**Observation -**\n", "\n", "**Why LCEL is better** \u2014 good interview point:\n", "LCEL uses a pipe | operator to chain components, making the data flow explicit and readable. It's also more flexible \u2014 easier to swap components, add middleware, and integrate with LangSmith for tracing. RetrievalQA is now considered legacy." ] }, { "cell_type": "markdown", "metadata": { "id": "x1Yul-ZLLay5" }, "source": [ "#### Testing the queries" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "VftxKSI2NkU0", "outputId": "371ed82b-7f16-4338-b6a2-7616a853311f" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Testing RAG Agent across all policy areas:\n", "\n", " What is the minimum balance required for a savings account?\n", " The minimum balance requirement for a savings account varies based on the branch location. \n", "\n", "For Metro and Urban branches, the minimum balance requirement is either `10,000/- or an FD of `1 Lac for a minimum 1 year 1 day period.\n", "\n", "For Semi Urban branches, the minimum balance requirement is either `5,000/- or an FD of `50,000 for a minimum 1 year 1 day period.\n", "\n", "For Rural branches, the minimum balance requirement is either `2,500/- or an FD of `25,000/- for a minimum 1 year 1 day period.\n", "\n", " What happens if I miss my loan EMI payment?\n", " If you miss your loan EMI payment, you will be charged an Overdue EMI interest of 2% per month on the EMI amount.\n", "\n", " How can I raise a grievance against HDFC Bank?\n", " You can raise a grievance against HDFC Bank by contacting the Grievance Redressal Officer, Mrs. Deepa Balakrishnan, at +91 22 61606160. Alternatively, you can also contact the Call Centres at 1800 1600 / 1800 2600 (if you are in India) or +91 22 61606160 (if you are not in India). \n", "\n", "Additionally, you can write to the Manager, HDFC Bank Cards, at the following address: \n", "HDFC Bank Cards Division, PO Box No 8654, Door No 94 SP, Estate Bus Stand, Wavin Main Road, Mogappair West, Chennai 600058. \n", "\n", "You can also email your grievance to customerservices.cards@hdfcbank.com.\n", "\n", " What is the interest rate on revolving credit for classic cards?\n", " The interest rate on revolving credit for Classic Cards is up to 3.75% per month (45% per annum).\n", "\n", " What documents are needed for KYC verification?\n", " According to the policy documents, the following documents are mentioned for KYC verification:\n", "\n", "- Customer Identification Number\n", "- Telephone Identification Number (TIN)\n", "- Manual verification checks as decided by the Bank from time to time.\n", "\n", "Please note that the Bank may decide on additional documents or methods for KYC verification from time to time.\n", "\n" ] } ], "source": [ "# Test queries covering all 6 PDFs\n", "test_queries = [\n", " \"What is the minimum balance required for a savings account?\",\n", " \"What happens if I miss my loan EMI payment?\",\n", " \"How can I raise a grievance against HDFC Bank?\",\n", " \"What is the interest rate on revolving credit for classic cards?\",\n", " \"What documents are needed for KYC verification?\"\n", "]\n", "\n", "print(\" Testing RAG Agent across all policy areas:\\n\")\n", "for query in test_queries:\n", " result = rag_chain.invoke(query) # LCEL takes plain string, not dict\n", " print(f\" {query}\")\n", " print(f\" {result}\") # LCEL returns string directly, not dict\n", " print()" ] }, { "cell_type": "markdown", "metadata": { "id": "doUeHnB7PlwA" }, "source": [ "**Observations -**\n", "\n", "The response received from the RAG is good and relevant:\n", "\n", "- Savings account \u2014 correctly differentiates Metro/Urban/Semi-Urban/Rural minimums\n", "- Loan EMI \u2014 specific 2% per month overdue interest\n", "- Grievance \u2014 real officer name, phone number, and Chennai address\n", "- Credit card interest \u2014 exact 3.75%/month (45% per annum)\n", "- KYC documents \u2014 accurate list from the policy" ] }, { "cell_type": "markdown", "metadata": { "id": "UO5QDDiiP3vi" }, "source": [ "## Building the SQL Agent" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "HqNa2jDiUOxD", "outputId": "aa547caf-bb55-4ae8-b657-92e84aa09f71" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Tables created!\n", " Inserted 50 customers\n", " Inserted 511 transactions\n", " Inserted 50 credit cards\n", "\n", " Database ready!\n", " \u2192 50 customers (Metro, Urban, Semi Urban, Rural)\n", " \u2192 500 transactions (mix of Success, Failed, Pending)\n", " \u2192 50 credit cards (Classic, Platinum, Titanium)\n", " \u2192 Realistic credit scores, balances & limits by branch\n" ] } ], "source": [ "import sqlite3\n", "import random\n", "from datetime import datetime, timedelta\n", "\n", "# Create SQLite database\n", "conn = sqlite3.connect('banking.db')\n", "cursor = conn.cursor()\n", "\n", "# Drop existing tables to start fresh\n", "cursor.execute(\"DROP TABLE IF EXISTS transactions\")\n", "cursor.execute(\"DROP TABLE IF EXISTS credit_cards\")\n", "cursor.execute(\"DROP TABLE IF EXISTS customers\")\n", "\n", "# \u2500\u2500 TABLE 1: CUSTOMERS \u2500\u2500\n", "cursor.execute('''\n", "CREATE TABLE customers (\n", " customer_id TEXT PRIMARY KEY,\n", " name TEXT,\n", " account_type TEXT,\n", " balance REAL,\n", " credit_score INTEGER,\n", " branch TEXT,\n", " email TEXT,\n", " phone TEXT\n", ")\n", "''')\n", "\n", "# \u2500\u2500 TABLE 2: TRANSACTIONS \u2500\u2500\n", "cursor.execute('''\n", "CREATE TABLE transactions (\n", " txn_id TEXT PRIMARY KEY,\n", " customer_id TEXT,\n", " date TEXT,\n", " amount REAL,\n", " type TEXT,\n", " merchant TEXT,\n", " status TEXT,\n", " FOREIGN KEY (customer_id) REFERENCES customers(customer_id)\n", ")\n", "''')\n", "\n", "# \u2500\u2500 TABLE 3: CREDIT CARDS \u2500\u2500\n", "cursor.execute('''\n", "CREATE TABLE credit_cards (\n", " card_id TEXT PRIMARY KEY,\n", " customer_id TEXT,\n", " card_type TEXT,\n", " credit_limit REAL,\n", " outstanding REAL,\n", " minimum_due REAL,\n", " due_date TEXT,\n", " status TEXT,\n", " FOREIGN KEY (customer_id) REFERENCES customers(customer_id)\n", ")\n", "''')\n", "\n", "print(\" Tables created!\")\n", "\n", "# \u2500\u2500 SEED DATA \u2500\u2500\n", "random.seed(42)\n", "\n", "first_names = [\"Sushmita\", \"Rahul\", \"Priya\", \"Arjun\", \"Deepa\", \"Vikram\",\n", " \"Ananya\", \"Rohan\", \"Kavya\", \"Suresh\", \"Meera\", \"Kiran\",\n", " \"Sanjay\", \"Divya\", \"Arun\", \"Nisha\", \"Rajesh\", \"Pooja\",\n", " \"Harish\", \"Lakshmi\", \"Amit\", \"Sunita\", \"Ganesh\", \"Rekha\",\n", " \"Vijay\", \"Swati\", \"Manoj\", \"Asha\", \"Ravi\", \"Geeta\",\n", " \"Naveen\", \"Usha\", \"Prakash\", \"Shobha\", \"Dinesh\", \"Radha\",\n", " \"Sunil\", \"Vasantha\", \"Mohan\", \"Saritha\", \"Ajay\", \"Padma\",\n", " \"Venkat\", \"Hema\", \"Rajan\", \"Geetha\", \"Kumar\", \"Vani\",\n", " \"Balaji\", \"Nirmala\"]\n", "\n", "last_names = [\"Sharma\", \"Mehta\", \"Nair\", \"Patel\", \"Krishnan\", \"Reddy\",\n", " \"Iyer\", \"Pillai\", \"Joshi\", \"Gupta\", \"Singh\", \"Kumar\",\n", " \"Rao\", \"Shah\", \"Verma\", \"Menon\", \"Chatterjee\", \"Mishra\",\n", " \"Bhat\", \"Kaur\", \"Das\", \"Srinivasan\", \"Hegde\", \"Desai\",\n", " \"Naidu\", \"Rajan\", \"Subramanian\", \"Murthy\", \"Pandey\", \"Nambiar\"]\n", "\n", "branches = [\"Metro\", \"Metro\", \"Metro\", \"Urban\", \"Urban\", \"Semi Urban\", \"Rural\"]\n", "account_types = [\"Savings\", \"Savings\", \"Savings\", \"Current\"]\n", "card_types = [\"Classic\", \"Classic\", \"Platinum\", \"Titanium\"]\n", "card_statuses = [\"Active\", \"Active\", \"Active\", \"Active\", \"Overdue\", \"Blocked\"]\n", "\n", "merchants = [\"Swiggy\", \"Amazon\", \"Zomato\", \"BigBasket\", \"Flipkart\",\n", " \"Netflix\", \"Uber\", \"HDFC ATM\", \"Salary Credit\", \"PhonePe\",\n", " \"Myntra\", \"BookMyShow\", \"Ola\", \"Paytm\", \"Dunzo\",\n", " \"IRCTC\", \"MakeMyTrip\", \"Nykaa\", \"Decathlon\", \"Apple Store\"]\n", "\n", "txn_statuses = [\"Success\", \"Success\", \"Success\", \"Success\", \"Failed\", \"Pending\"]\n", "\n", "# \u2500\u2500 INSERT 50 CUSTOMERS \u2500\u2500\n", "customers = []\n", "for i in range(1, 51):\n", " cust_id = f\"CUST{i:03d}\"\n", " name = f\"{random.choice(first_names)} {random.choice(last_names)}\"\n", " account_type = random.choice(account_types)\n", " branch = random.choice(branches)\n", "\n", " # Realistic balance ranges by branch\n", " if branch == \"Metro\":\n", " balance = round(random.uniform(20000, 500000), 2)\n", " credit_score = random.randint(680, 850)\n", " elif branch == \"Urban\":\n", " balance = round(random.uniform(10000, 200000), 2)\n", " credit_score = random.randint(650, 800)\n", " elif branch == \"Semi Urban\":\n", " balance = round(random.uniform(5000, 100000), 2)\n", " credit_score = random.randint(620, 750)\n", " else: # Rural\n", " balance = round(random.uniform(1000, 50000), 2)\n", " credit_score = random.randint(580, 720)\n", "\n", " customers.append((\n", " cust_id, name, account_type, balance, credit_score, branch,\n", " f\"{name.split()[0].lower()}{i}@email.com\",\n", " f\"+91-98765{i:05d}\"\n", " ))\n", "\n", "cursor.executemany(\n", " \"INSERT INTO customers VALUES (?,?,?,?,?,?,?,?)\", customers)\n", "print(f\" Inserted 50 customers\")\n", "\n", "# \u2500\u2500 INSERT 500 TRANSACTIONS \u2500\u2500\n", "transactions = []\n", "for i, (cust_id, *_) in enumerate(customers):\n", " num_txns = random.randint(8, 12) # 8-12 transactions per customer\n", " for j in range(num_txns):\n", " date = datetime.now() - timedelta(days=random.randint(1, 90))\n", " txn_type = random.choices(\n", " [\"Debit\", \"Credit\"], weights=[75, 25])[0]\n", " amount = round(random.uniform(100, 50000), 2)\n", " transactions.append((\n", " f\"TXN{i+1:03d}{j:03d}\",\n", " cust_id,\n", " date.strftime(\"%Y-%m-%d\"),\n", " amount,\n", " txn_type,\n", " random.choice(merchants),\n", " random.choice(txn_statuses)\n", " ))\n", "\n", "cursor.executemany(\n", " \"INSERT INTO transactions VALUES (?,?,?,?,?,?,?)\", transactions)\n", "print(f\" Inserted {len(transactions)} transactions\")\n", "\n", "# \u2500\u2500 INSERT 50 CREDIT CARDS \u2500\u2500\n", "credit_cards = []\n", "for i, (cust_id, _, _, _, credit_score, branch, *__) in enumerate(customers):\n", " card_type = random.choice(card_types)\n", "\n", " # Credit limit based on credit score\n", " if credit_score >= 780:\n", " credit_limit = round(random.uniform(300000, 800000), 2)\n", " elif credit_score >= 720:\n", " credit_limit = round(random.uniform(150000, 300000), 2)\n", " elif credit_score >= 670:\n", " credit_limit = round(random.uniform(75000, 150000), 2)\n", " else:\n", " credit_limit = round(random.uniform(20000, 75000), 2)\n", "\n", " outstanding = round(random.uniform(0, credit_limit * 0.7), 2)\n", " minimum_due = round(outstanding * 0.05, 2)\n", " due_date = (datetime.now() + timedelta(\n", " days=random.randint(5, 30))).strftime(\"%Y-%m-%d\")\n", " status = random.choice(card_statuses)\n", "\n", " credit_cards.append((\n", " f\"CARD{i+1:03d}\", cust_id, card_type,\n", " credit_limit, outstanding, minimum_due,\n", " due_date, status\n", " ))\n", "\n", "cursor.executemany(\n", " \"INSERT INTO credit_cards VALUES (?,?,?,?,?,?,?,?)\", credit_cards)\n", "print(f\" Inserted 50 credit cards\")\n", "\n", "conn.commit()\n", "conn.close()\n", "\n", "print(\"\\n Database ready!\")\n", "print(\" \u2192 50 customers (Metro, Urban, Semi Urban, Rural)\")\n", "print(\" \u2192 500 transactions (mix of Success, Failed, Pending)\")\n", "print(\" \u2192 50 credit cards (Classic, Platinum, Titanium)\")\n", "print(\" \u2192 Realistic credit scores, balances & limits by branch\")" ] }, { "cell_type": "markdown", "metadata": { "id": "01FGtEDqVDCJ" }, "source": [ "### Tables data preview" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "AIuw1Nu9VGd5", "outputId": "2290b687-e1b7-4bfa-8685-2049473477d7" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "============================================================\n", " CUSTOMERS TABLE (first 5 rows)\n", "============================================================\n", "customer_id name account_type balance credit_score branch email phone\n", " CUST001 Ajay Patel Savings 31127.79 677 Semi Urban ajay1@email.com +91-9876500001\n", " CUST002 Kavya Desai Savings 75363.44 642 Semi Urban kavya2@email.com +91-9876500002\n", " CUST003 Vasantha Shah Savings 64973.72 739 Metro vasantha3@email.com +91-9876500003\n", " CUST004 Prakash Kaur Savings 47779.15 789 Urban prakash4@email.com +91-9876500004\n", " CUST005 Manoj Pillai Current 62856.23 651 Urban manoj5@email.com +91-9876500005\n", "\n", "============================================================\n", " TRANSACTIONS TABLE (first 5 rows)\n", "============================================================\n", " txn_id customer_id date amount type merchant status\n", "TXN001000 CUST001 2026-01-07 42650.72 Debit Netflix Success\n", "TXN001001 CUST001 2026-01-03 9372.67 Debit Apple Store Success\n", "TXN001002 CUST001 2025-12-13 20248.40 Credit Decathlon Success\n", "TXN001003 CUST001 2025-12-25 31007.14 Debit Paytm Pending\n", "TXN001004 CUST001 2025-12-25 15886.86 Debit Salary Credit Success\n", "\n", "============================================================\n", " CREDIT CARDS TABLE (first 5 rows)\n", "============================================================\n", "card_id customer_id card_type credit_limit outstanding minimum_due due_date status\n", "CARD001 CUST001 Classic 109316.28 20075.65 1003.78 2026-03-30 Overdue\n", "CARD002 CUST002 Platinum 52833.40 22608.25 1130.41 2026-04-02 Active\n", "CARD003 CUST003 Platinum 202797.74 64997.16 3249.86 2026-03-20 Blocked\n", "CARD004 CUST004 Platinum 759341.29 313697.45 15684.87 2026-04-05 Active\n", "CARD005 CUST005 Platinum 26482.39 5661.03 283.05 2026-03-20 Active\n", "\n", "============================================================\n", " DATABASE SUMMARY\n", "============================================================\n", " branch customers avg_balance avg_credit_score\n", " Metro 18 319752.30 755.0\n", " Rural 5 25889.10 637.0\n", "Semi Urban 12 60346.07 688.0\n", " Urban 15 81534.73 729.0\n", "\n", " status count\n", " Failed 89\n", "Pending 88\n", "Success 334\n", "\n", "card_type count avg_limit\n", " Classic 21 246726.79\n", " Platinum 14 246112.26\n", " Titanium 15 146153.20\n" ] } ], "source": [ "import sqlite3\n", "import pandas as pd\n", "\n", "conn = sqlite3.connect('banking.db')\n", "\n", "# \u2500\u2500 CUSTOMERS \u2500\u2500\n", "print(\"=\" * 60)\n", "print(\" CUSTOMERS TABLE (first 5 rows)\")\n", "print(\"=\" * 60)\n", "df_customers = pd.read_sql(\"SELECT * FROM customers LIMIT 5\", conn)\n", "print(df_customers.to_string(index=False))\n", "\n", "# \u2500\u2500 TRANSACTIONS \u2500\u2500\n", "print(\"\\n\" + \"=\" * 60)\n", "print(\" TRANSACTIONS TABLE (first 5 rows)\")\n", "print(\"=\" * 60)\n", "df_txns = pd.read_sql(\"SELECT * FROM transactions LIMIT 5\", conn)\n", "print(df_txns.to_string(index=False))\n", "\n", "# \u2500\u2500 CREDIT CARDS \u2500\u2500\n", "print(\"\\n\" + \"=\" * 60)\n", "print(\" CREDIT CARDS TABLE (first 5 rows)\")\n", "print(\"=\" * 60)\n", "df_cards = pd.read_sql(\"SELECT * FROM credit_cards LIMIT 5\", conn)\n", "print(df_cards.to_string(index=False))\n", "\n", "# \u2500\u2500 SUMMARY STATS \u2500\u2500\n", "print(\"\\n\" + \"=\" * 60)\n", "print(\" DATABASE SUMMARY\")\n", "print(\"=\" * 60)\n", "print(pd.read_sql(\"SELECT branch, COUNT(*) as customers, ROUND(AVG(balance),2) as avg_balance, ROUND(AVG(credit_score),0) as avg_credit_score FROM customers GROUP BY branch\", conn).to_string(index=False))\n", "print()\n", "print(pd.read_sql(\"SELECT status, COUNT(*) as count FROM transactions GROUP BY status\", conn).to_string(index=False))\n", "print()\n", "print(pd.read_sql(\"SELECT card_type, COUNT(*) as count, ROUND(AVG(credit_limit),2) as avg_limit FROM credit_cards GROUP BY card_type\", conn).to_string(index=False))\n", "\n", "conn.close()" ] }, { "cell_type": "markdown", "metadata": { "id": "N07dkO6MWsay" }, "source": [ "**Observation-**\n", "\n", "For this prototype I used **SQLite** for portability, but the SQL agent is database-agnostic and would connect to PostgreSQL in production with just a connection string change." ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "3HeS4qAkWyWd", "outputId": "95ff9d1b-bd96-420a-b249-42c05e286c96" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Database connected!\n", " Tables found: ['credit_cards', 'customers', 'transactions']\n" ] } ], "source": [ "from langchain_community.utilities import SQLDatabase\n", "from langchain_community.agent_toolkits import create_sql_agent\n", "\n", "# Connect LangChain to our SQLite database\n", "db = SQLDatabase.from_uri(\"sqlite:///banking.db\")\n", "\n", "# Verify connection\n", "print(\" Database connected!\")\n", "print(f\" Tables found: {db.get_usable_table_names()}\")" ] }, { "cell_type": "markdown", "metadata": { "id": "97FNUNTxZYTo" }, "source": [ "### The SQL Agent" ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "d8hcp73mZack", "outputId": "b9a0d7d7-8870-4106-c52a-1b437a24961a" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "SQL Agent ready!\n" ] } ], "source": [ "from langchain_community.utilities import SQLDatabase\n", "from langchain_community.agent_toolkits import create_sql_agent\n", "\n", "# SQL agent with ReAct style \u2014 compatible with Groq/Llama\n", "sql_agent = create_sql_agent(\n", " llm=llm,\n", " db=db,\n", " agent_type=\"zero-shot-react-description\",\n", " verbose=True,\n", " max_iterations=5,\n", " handle_parsing_errors=True\n", ")\n", "\n", "print(\"SQL Agent ready!\")" ] }, { "cell_type": "markdown", "metadata": { "id": "MkrOjRdOf7CU" }, "source": [ "**Observation -**\n", "\n", "When we use this model, the agent is stuck in a loop \u2014 it keeps listing tables but never progresses to actually querying them. This is a known **limitation** of Llama 3.1 8B \u2014 it's too small for complex SQL agent reasoning. The fix is simple \u2014 upgrade to a larger model just for the SQL agent." ] }, { "cell_type": "markdown", "metadata": { "id": "smWp5OSpgF2a" }, "source": [ "So now, for RAG Agent, we use **Llama 3.1 8B** because Simple task \u2014 just summarise retrieved text. Speed matters.\n", "\n", "For SQL Agent, we use **Llama 3.3 70B** because Complex task \u2014 multi-step reasoning, schema understanding, SQL generation. Accuracy matters." ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "4nZ59qcfe7rx", "outputId": "fe8aeff7-1983-4df5-fb24-e4fd3d8daffe" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "SQL Agent ready with Llama 3.3 70B!\n" ] } ], "source": [ "from langchain_groq import ChatGroq\n", "from langchain_community.utilities import SQLDatabase\n", "from langchain_community.agent_toolkits import create_sql_agent\n", "\n", "# Larger model for SQL agent \u2014 70B handles ReAct reasoning much better\n", "llm_70b = ChatGroq(\n", "GROQ_API_KEY = os.getenv(\"GROQ_API_KEY\") # Load from .env\n", " model_name=\"llama-3.3-70b-versatile\",\n", " temperature=0\n", ")\n", "\n", "# Rebuild SQL agent with 70B model\n", "sql_agent = create_sql_agent(\n", " llm=llm_70b,\n", " db=db,\n", " agent_type=\"zero-shot-react-description\",\n", " verbose=True,\n", " max_iterations=10,\n", " handle_parsing_errors=True\n", ")\n", "\n", "print(\"SQL Agent ready with Llama 3.3 70B!\")" ] }, { "cell_type": "markdown", "metadata": { "id": "t2f8evTFcAGy" }, "source": [ "**Observation-**\n", "\n", "I set **max_iterations=10** as a guardrail to prevent runaway agent loops. In production, unbounded agent execution is a real cost and reliability concern \u2014 every iteration is an LLM API call. Five iterations is sufficient for our banking query complexity while protecting against failure modes.\"" ] }, { "cell_type": "markdown", "metadata": { "id": "vMLLXSf_dKQH" }, "source": [ "### Testing SQL Agent" ] }, { "cell_type": "code", "execution_count": 43, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "ddF1VpB5eKlF", "outputId": "4e28a8bb-c4c9-4530-aaa1-cb6241d96a83" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n", "\u001b[1m> Entering new SQL Agent Executor chain...\u001b[0m\n", "\u001b[32;1m\u001b[1;3mThought: I should look at the tables in the database to see what I can query. Then I should query the schema of the most relevant tables.\n", "Action: sql_db_list_tables\n", "Action Input: \u001b[0m\u001b[38;5;200m\u001b[1;3mcredit_cards, customers, transactions\u001b[0m\u001b[32;1m\u001b[1;3mWith the list of tables, I can see that the \"customers\" table is likely to have the information I need. However, I also need to know how the customers are related to the branches. I should query the schema of the \"customers\" table to see if it has a column related to branches.\n", "\n", "Action: sql_db_schema\n", "Action Input: customers\u001b[0m\u001b[33;1m\u001b[1;3m\n", "CREATE TABLE customers (\n", "\tcustomer_id TEXT, \n", "\tname TEXT, \n", "\taccount_type TEXT, \n", "\tbalance REAL, \n", "\tcredit_score INTEGER, \n", "\tbranch TEXT, \n", "\temail TEXT, \n", "\tphone TEXT, \n", "\tPRIMARY KEY (customer_id)\n", ")\n", "\n", "/*\n", "3 rows from customers table:\n", "customer_id\tname\taccount_type\tbalance\tcredit_score\tbranch\temail\tphone\n", "CUST001\tAjay Patel\tSavings\t31127.79\t677\tSemi Urban\tajay1@email.com\t+91-9876500001\n", "CUST002\tKavya Desai\tSavings\t75363.44\t642\tSemi Urban\tkavya2@email.com\t+91-9876500002\n", "CUST003\tVasantha Shah\tSavings\t64973.72\t739\tMetro\tvasantha3@email.com\t+91-9876500003\n", "*/\u001b[0m\u001b[32;1m\u001b[1;3mWith the schema of the \"customers\" table, I can see that it has a column named \"branch\" which is likely to be the column I need to query to get the number of customers in each branch.\n", "\n", "Action: sql_db_query_checker\n", "Action Input: SELECT branch, COUNT(customer_id) FROM customers GROUP BY branch\u001b[0m\u001b[36;1m\u001b[1;3m```sql\n", "SELECT branch, COUNT(customer_id) \n", "FROM customers \n", "GROUP BY branch;\n", "```\u001b[0m\u001b[32;1m\u001b[1;3mThe query looks correct, so I can proceed to execute it.\n", "\n", "Action: sql_db_query\n", "Action Input: SELECT branch, COUNT(customer_id) FROM customers GROUP BY branch\u001b[0m\u001b[36;1m\u001b[1;3m[('Metro', 18), ('Rural', 5), ('Semi Urban', 12), ('Urban', 15)]\u001b[0m\u001b[32;1m\u001b[1;3mI now know the final answer\n", "Final Answer: There are 18 customers in the Metro branch, 5 customers in the Rural branch, 12 customers in the Semi Urban branch, and 15 customers in the Urban branch.\u001b[0m\n", "\n", "\u001b[1m> Finished chain.\u001b[0m\n", " There are 18 customers in the Metro branch, 5 customers in the Rural branch, 12 customers in the Semi Urban branch, and 15 customers in the Urban branch.\n" ] } ], "source": [ "# Test one query first\n", "result = sql_agent.invoke(\n", " {\"input\": \"How many customers are there in each branch?\"}\n", ")\n", "print(f\" {result['output']}\")" ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "7Qvft0sZdMZl", "outputId": "f4319e3a-7f98-417f-e2df-35a3d407671a" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Testing SQL Agent across all query types:\n", "\n", "============================================================\n", " What are the top 5 customers with highest credit card outstanding balance?\n", "============================================================\n", "\n", "\n", "\u001b[1m> Entering new SQL Agent Executor chain...\u001b[0m\n", "\u001b[32;1m\u001b[1;3mThought: I should look at the tables in the database to see what I can query. Then I should query the schema of the most relevant tables.\n", "Action: sql_db_list_tables\n", "Action Input: \u001b[0m\u001b[38;5;200m\u001b[1;3mcredit_cards, customers, transactions\u001b[0m\u001b[32;1m\u001b[1;3mWith the list of tables, I can now query the schema of the most relevant tables to find the columns I need to answer the question.\n", "\n", "Action: sql_db_schema\n", "Action Input: customers, credit_cards\u001b[0m\u001b[33;1m\u001b[1;3m\n", "CREATE TABLE credit_cards (\n", "\tcard_id TEXT, \n", "\tcustomer_id TEXT, \n", "\tcard_type TEXT, \n", "\tcredit_limit REAL, \n", "\toutstanding REAL, \n", "\tminimum_due REAL, \n", "\tdue_date TEXT, \n", "\tstatus TEXT, \n", "\tPRIMARY KEY (card_id), \n", "\tFOREIGN KEY(customer_id) REFERENCES customers (customer_id)\n", ")\n", "\n", "/*\n", "3 rows from credit_cards table:\n", "card_id\tcustomer_id\tcard_type\tcredit_limit\toutstanding\tminimum_due\tdue_date\tstatus\n", "CARD001\tCUST001\tClassic\t109316.28\t20075.65\t1003.78\t2026-03-30\tOverdue\n", "CARD002\tCUST002\tPlatinum\t52833.4\t22608.25\t1130.41\t2026-04-02\tActive\n", "CARD003\tCUST003\tPlatinum\t202797.74\t64997.16\t3249.86\t2026-03-20\tBlocked\n", "*/\n", "\n", "\n", "CREATE TABLE customers (\n", "\tcustomer_id TEXT, \n", "\tname TEXT, \n", "\taccount_type TEXT, \n", "\tbalance REAL, \n", "\tcredit_score INTEGER, \n", "\tbranch TEXT, \n", "\temail TEXT, \n", "\tphone TEXT, \n", "\tPRIMARY KEY (customer_id)\n", ")\n", "\n", "/*\n", "3 rows from customers table:\n", "customer_id\tname\taccount_type\tbalance\tcredit_score\tbranch\temail\tphone\n", "CUST001\tAjay Patel\tSavings\t31127.79\t677\tSemi Urban\tajay1@email.com\t+91-9876500001\n", "CUST002\tKavya Desai\tSavings\t75363.44\t642\tSemi Urban\tkavya2@email.com\t+91-9876500002\n", "CUST003\tVasantha Shah\tSavings\t64973.72\t739\tMetro\tvasantha3@email.com\t+91-9876500003\n", "*/\u001b[0m\u001b[32;1m\u001b[1;3mWith the schema of the customers and credit_cards tables, I can see that the outstanding balance is stored in the credit_cards table. To find the top 5 customers with the highest credit card outstanding balance, I need to join the customers and credit_cards tables on the customer_id column, and then order the results by the outstanding balance in descending order.\n", "\n", "Action: sql_db_query_checker\n", "Action Input: SELECT customers.name, credit_cards.outstanding FROM customers JOIN credit_cards ON customers.customer_id = credit_cards.customer_id ORDER BY credit_cards.outstanding DESC LIMIT 5\u001b[0m\u001b[36;1m\u001b[1;3m```sql\n", "SELECT customers.name, credit_cards.outstanding \n", "FROM customers \n", "JOIN credit_cards \n", "ON customers.customer_id = credit_cards.customer_id \n", "ORDER BY credit_cards.outstanding DESC \n", "LIMIT 5;\n", "```\u001b[0m\u001b[32;1m\u001b[1;3mThe query looks correct, so I'll execute it.\n", "\n", "Action: sql_db_query\n", "Action Input: SELECT customers.name, credit_cards.outstanding FROM customers JOIN credit_cards ON customers.customer_id = credit_cards.customer_id ORDER BY credit_cards.outstanding DESC LIMIT 5\u001b[0m\u001b[36;1m\u001b[1;3m[('Suresh Iyer', 327729.61), ('Prakash Kaur', 313697.45), ('Swati Joshi', 305654.3), ('Geeta Sharma', 148875.25), ('Meera Shah', 140248.04)]\u001b[0m\u001b[32;1m\u001b[1;3mI now know the final answer\n", "\n", "Final Answer: The top 5 customers with the highest credit card outstanding balance are Suresh Iyer with $327729.61, Prakash Kaur with $313697.45, Swati Joshi with $305654.3, Geeta Sharma with $148875.25, and Meera Shah with $140248.04.\u001b[0m\n", "\n", "\u001b[1m> Finished chain.\u001b[0m\n", " The top 5 customers with the highest credit card outstanding balance are Suresh Iyer with $327729.61, Prakash Kaur with $313697.45, Swati Joshi with $305654.3, Geeta Sharma with $148875.25, and Meera Shah with $140248.04.\n", "\n", "============================================================\n", " How many transactions have Failed status?\n", "============================================================\n", "\n", "\n", "\u001b[1m> Entering new SQL Agent Executor chain...\u001b[0m\n", "\u001b[32;1m\u001b[1;3mThought: I should look at the tables in the database to see what I can query. Then I should query the schema of the most relevant tables.\n", "Action: sql_db_list_tables\n", "Action Input: \u001b[0m\u001b[38;5;200m\u001b[1;3mcredit_cards, customers, transactions\u001b[0m\u001b[32;1m\u001b[1;3mNow that I have the list of tables, I can query the schema of the most relevant tables to find out which table contains the transaction status.\n", "\n", "Action: sql_db_schema\n", "Action Input: transactions\u001b[0m\u001b[33;1m\u001b[1;3m\n", "CREATE TABLE transactions (\n", "\ttxn_id TEXT, \n", "\tcustomer_id TEXT, \n", "\tdate TEXT, \n", "\tamount REAL, \n", "\ttype TEXT, \n", "\tmerchant TEXT, \n", "\tstatus TEXT, \n", "\tPRIMARY KEY (txn_id), \n", "\tFOREIGN KEY(customer_id) REFERENCES customers (customer_id)\n", ")\n", "\n", "/*\n", "3 rows from transactions table:\n", "txn_id\tcustomer_id\tdate\tamount\ttype\tmerchant\tstatus\n", "TXN001000\tCUST001\t2026-01-07\t42650.72\tDebit\tNetflix\tSuccess\n", "TXN001001\tCUST001\t2026-01-03\t9372.67\tDebit\tApple Store\tSuccess\n", "TXN001002\tCUST001\t2025-12-13\t20248.4\tCredit\tDecathlon\tSuccess\n", "*/\u001b[0m\u001b[32;1m\u001b[1;3mNow that I have the schema of the transactions table, I can see that the status column contains the transaction status. I can query this table to find the number of transactions with a Failed status.\n", "\n", "Action: sql_db_query_checker\n", "Action Input: SELECT COUNT(txn_id) FROM transactions WHERE status = 'Failed'\u001b[0m\u001b[36;1m\u001b[1;3m```sql\n", "SELECT COUNT(txn_id) FROM transactions WHERE status = 'Failed'\n", "```\u001b[0m\u001b[32;1m\u001b[1;3mThe query looks correct, so I can proceed to execute it.\n", "\n", "Action: sql_db_query\n", "Action Input: SELECT COUNT(txn_id) FROM transactions WHERE status = 'Failed'\u001b[0m\u001b[36;1m\u001b[1;3m[(89,)]\u001b[0m\u001b[32;1m\u001b[1;3mI now know the final answer\n", "Final Answer: 89\u001b[0m\n", "\n", "\u001b[1m> Finished chain.\u001b[0m\n", " 89\n", "\n", "============================================================\n", " What is the average balance of customers by account type?\n", "============================================================\n", "\n", "\n", "\u001b[1m> Entering new SQL Agent Executor chain...\u001b[0m\n", "\u001b[32;1m\u001b[1;3mThought: I should look at the tables in the database to see what I can query. Then I should query the schema of the most relevant tables.\n", "Action: sql_db_list_tables\n", "Action Input: \u001b[0m\u001b[38;5;200m\u001b[1;3mcredit_cards, customers, transactions\u001b[0m\u001b[32;1m\u001b[1;3mWith the list of tables, I can now query the schema of the most relevant tables to find the columns I need.\n", "\n", "Action: sql_db_schema\n", "Action Input: customers, credit_cards\u001b[0m\u001b[33;1m\u001b[1;3m\n", "CREATE TABLE credit_cards (\n", "\tcard_id TEXT, \n", "\tcustomer_id TEXT, \n", "\tcard_type TEXT, \n", "\tcredit_limit REAL, \n", "\toutstanding REAL, \n", "\tminimum_due REAL, \n", "\tdue_date TEXT, \n", "\tstatus TEXT, \n", "\tPRIMARY KEY (card_id), \n", "\tFOREIGN KEY(customer_id) REFERENCES customers (customer_id)\n", ")\n", "\n", "/*\n", "3 rows from credit_cards table:\n", "card_id\tcustomer_id\tcard_type\tcredit_limit\toutstanding\tminimum_due\tdue_date\tstatus\n", "CARD001\tCUST001\tClassic\t109316.28\t20075.65\t1003.78\t2026-03-30\tOverdue\n", "CARD002\tCUST002\tPlatinum\t52833.4\t22608.25\t1130.41\t2026-04-02\tActive\n", "CARD003\tCUST003\tPlatinum\t202797.74\t64997.16\t3249.86\t2026-03-20\tBlocked\n", "*/\n", "\n", "\n", "CREATE TABLE customers (\n", "\tcustomer_id TEXT, \n", "\tname TEXT, \n", "\taccount_type TEXT, \n", "\tbalance REAL, \n", "\tcredit_score INTEGER, \n", "\tbranch TEXT, \n", "\temail TEXT, \n", "\tphone TEXT, \n", "\tPRIMARY KEY (customer_id)\n", ")\n", "\n", "/*\n", "3 rows from customers table:\n", "customer_id\tname\taccount_type\tbalance\tcredit_score\tbranch\temail\tphone\n", "CUST001\tAjay Patel\tSavings\t31127.79\t677\tSemi Urban\tajay1@email.com\t+91-9876500001\n", "CUST002\tKavya Desai\tSavings\t75363.44\t642\tSemi Urban\tkavya2@email.com\t+91-9876500002\n", "CUST003\tVasantha Shah\tSavings\t64973.72\t739\tMetro\tvasantha3@email.com\t+91-9876500003\n", "*/\u001b[0m\u001b[32;1m\u001b[1;3mWith the schema of the customers and credit_cards tables, I can see that the customers table has a column named \"balance\" and a column named \"account_type\". I can use these columns to calculate the average balance of customers by account type.\n", "\n", "Action: sql_db_query_checker\n", "Action Input: SELECT account_type, AVG(balance) FROM customers GROUP BY account_type\u001b[0m\u001b[36;1m\u001b[1;3m```sql\n", "SELECT \"account_type\", AVG(\"balance\") \n", "FROM \"customers\" \n", "GROUP BY \"account_type\"\n", "```\u001b[0m\u001b[32;1m\u001b[1;3mThe query looks good, so I'll execute it.\n", "\n", "Action: sql_db_query\n", "Action Input: SELECT account_type, AVG(balance) FROM customers GROUP BY account_type\u001b[0m\u001b[36;1m\u001b[1;3m[('Current', 198743.39083333334), ('Savings', 143348.41868421054)]\u001b[0m\u001b[32;1m\u001b[1;3mI now know the final answer\n", "Final Answer: The average balance of customers by account type is 198743.39 for Current accounts and 143348.42 for Savings accounts.\u001b[0m\n", "\n", "\u001b[1m> Finished chain.\u001b[0m\n", " The average balance of customers by account type is 198743.39 for Current accounts and 143348.42 for Savings accounts.\n", "\n", "============================================================\n", " Which merchant has the highest number of transactions?\n", "============================================================\n", "\n", "\n", "\u001b[1m> Entering new SQL Agent Executor chain...\u001b[0m\n", "\u001b[32;1m\u001b[1;3mThought: I should look at the tables in the database to see what I can query. Then I should query the schema of the most relevant tables.\n", "Action: sql_db_list_tables\n", "Action Input: \u001b[0m\u001b[38;5;200m\u001b[1;3mcredit_cards, customers, transactions\u001b[0m\u001b[32;1m\u001b[1;3mWith the list of tables, I can now query the schema of the most relevant tables to find out which columns are available.\n", "\n", "Action: sql_db_schema\n", "Action Input: transactions, customers, credit_cards\u001b[0m\u001b[33;1m\u001b[1;3m\n", "CREATE TABLE credit_cards (\n", "\tcard_id TEXT, \n", "\tcustomer_id TEXT, \n", "\tcard_type TEXT, \n", "\tcredit_limit REAL, \n", "\toutstanding REAL, \n", "\tminimum_due REAL, \n", "\tdue_date TEXT, \n", "\tstatus TEXT, \n", "\tPRIMARY KEY (card_id), \n", "\tFOREIGN KEY(customer_id) REFERENCES customers (customer_id)\n", ")\n", "\n", "/*\n", "3 rows from credit_cards table:\n", "card_id\tcustomer_id\tcard_type\tcredit_limit\toutstanding\tminimum_due\tdue_date\tstatus\n", "CARD001\tCUST001\tClassic\t109316.28\t20075.65\t1003.78\t2026-03-30\tOverdue\n", "CARD002\tCUST002\tPlatinum\t52833.4\t22608.25\t1130.41\t2026-04-02\tActive\n", "CARD003\tCUST003\tPlatinum\t202797.74\t64997.16\t3249.86\t2026-03-20\tBlocked\n", "*/\n", "\n", "\n", "CREATE TABLE customers (\n", "\tcustomer_id TEXT, \n", "\tname TEXT, \n", "\taccount_type TEXT, \n", "\tbalance REAL, \n", "\tcredit_score INTEGER, \n", "\tbranch TEXT, \n", "\temail TEXT, \n", "\tphone TEXT, \n", "\tPRIMARY KEY (customer_id)\n", ")\n", "\n", "/*\n", "3 rows from customers table:\n", "customer_id\tname\taccount_type\tbalance\tcredit_score\tbranch\temail\tphone\n", "CUST001\tAjay Patel\tSavings\t31127.79\t677\tSemi Urban\tajay1@email.com\t+91-9876500001\n", "CUST002\tKavya Desai\tSavings\t75363.44\t642\tSemi Urban\tkavya2@email.com\t+91-9876500002\n", "CUST003\tVasantha Shah\tSavings\t64973.72\t739\tMetro\tvasantha3@email.com\t+91-9876500003\n", "*/\n", "\n", "\n", "CREATE TABLE transactions (\n", "\ttxn_id TEXT, \n", "\tcustomer_id TEXT, \n", "\tdate TEXT, \n", "\tamount REAL, \n", "\ttype TEXT, \n", "\tmerchant TEXT, \n", "\tstatus TEXT, \n", "\tPRIMARY KEY (txn_id), \n", "\tFOREIGN KEY(customer_id) REFERENCES customers (customer_id)\n", ")\n", "\n", "/*\n", "3 rows from transactions table:\n", "txn_id\tcustomer_id\tdate\tamount\ttype\tmerchant\tstatus\n", "TXN001000\tCUST001\t2026-01-07\t42650.72\tDebit\tNetflix\tSuccess\n", "TXN001001\tCUST001\t2026-01-03\t9372.67\tDebit\tApple Store\tSuccess\n", "TXN001002\tCUST001\t2025-12-13\t20248.4\tCredit\tDecathlon\tSuccess\n", "*/\u001b[0m\u001b[32;1m\u001b[1;3mWith the schema of the relevant tables, I can now identify the columns that are necessary to answer the question. The question asks for the merchant with the highest number of transactions. This means I need to query the transactions table and group the results by the merchant column, then count the number of transactions for each merchant.\n", "\n", "Action: sql_db_query_checker\n", "Action Input: SELECT merchant, COUNT(txn_id) AS num_transactions FROM transactions GROUP BY merchant ORDER BY num_transactions DESC LIMIT 1\u001b[0m\u001b[36;1m\u001b[1;3m```sql\n", "SELECT merchant, COUNT(txn_id) AS num_transactions \n", "FROM transactions \n", "GROUP BY merchant \n", "ORDER BY num_transactions DESC \n", "LIMIT 1\n", "```\u001b[0m\u001b[32;1m\u001b[1;3mThe query looks correct. It selects the merchant and counts the number of transactions for each merchant, groups the results by merchant, orders the results by the count of transactions in descending order, and limits the result to the top 1 merchant with the most transactions.\n", "\n", "Action: sql_db_query\n", "Action Input: SELECT merchant, COUNT(txn_id) AS num_transactions FROM transactions GROUP BY merchant ORDER BY num_transactions DESC LIMIT 1\u001b[0m\u001b[36;1m\u001b[1;3m[('Myntra', 37)]\u001b[0m\u001b[32;1m\u001b[1;3mI now know the final answer\n", "Final Answer: Myntra\u001b[0m\n", "\n", "\u001b[1m> Finished chain.\u001b[0m\n", " Myntra\n", "\n", "============================================================\n", " Which customers have a blocked or overdue credit card?\n", "============================================================\n", "\n", "\n", "\u001b[1m> Entering new SQL Agent Executor chain...\u001b[0m\n", "\u001b[32;1m\u001b[1;3mThought: I should look at the tables in the database to see what I can query. Then I should query the schema of the most relevant tables.\n", "Action: sql_db_list_tables\n", "Action Input: \u001b[0m\u001b[38;5;200m\u001b[1;3mcredit_cards, customers, transactions\u001b[0m\u001b[32;1m\u001b[1;3mWith the list of tables, I can now query the schema of the most relevant tables to find out the columns I can use to answer the question.\n", "\n", "Action: sql_db_schema\n", "Action Input: customers, credit_cards\u001b[0m\u001b[33;1m\u001b[1;3m\n", "CREATE TABLE credit_cards (\n", "\tcard_id TEXT, \n", "\tcustomer_id TEXT, \n", "\tcard_type TEXT, \n", "\tcredit_limit REAL, \n", "\toutstanding REAL, \n", "\tminimum_due REAL, \n", "\tdue_date TEXT, \n", "\tstatus TEXT, \n", "\tPRIMARY KEY (card_id), \n", "\tFOREIGN KEY(customer_id) REFERENCES customers (customer_id)\n", ")\n", "\n", "/*\n", "3 rows from credit_cards table:\n", "card_id\tcustomer_id\tcard_type\tcredit_limit\toutstanding\tminimum_due\tdue_date\tstatus\n", "CARD001\tCUST001\tClassic\t109316.28\t20075.65\t1003.78\t2026-03-30\tOverdue\n", "CARD002\tCUST002\tPlatinum\t52833.4\t22608.25\t1130.41\t2026-04-02\tActive\n", "CARD003\tCUST003\tPlatinum\t202797.74\t64997.16\t3249.86\t2026-03-20\tBlocked\n", "*/\n", "\n", "\n", "CREATE TABLE customers (\n", "\tcustomer_id TEXT, \n", "\tname TEXT, \n", "\taccount_type TEXT, \n", "\tbalance REAL, \n", "\tcredit_score INTEGER, \n", "\tbranch TEXT, \n", "\temail TEXT, \n", "\tphone TEXT, \n", "\tPRIMARY KEY (customer_id)\n", ")\n", "\n", "/*\n", "3 rows from customers table:\n", "customer_id\tname\taccount_type\tbalance\tcredit_score\tbranch\temail\tphone\n", "CUST001\tAjay Patel\tSavings\t31127.79\t677\tSemi Urban\tajay1@email.com\t+91-9876500001\n", "CUST002\tKavya Desai\tSavings\t75363.44\t642\tSemi Urban\tkavya2@email.com\t+91-9876500002\n", "CUST003\tVasantha Shah\tSavings\t64973.72\t739\tMetro\tvasantha3@email.com\t+91-9876500003\n", "*/\u001b[0m\u001b[32;1m\u001b[1;3mWith the schema of the customers and credit_cards tables, I can now construct a query to find the customers who have a blocked or overdue credit card.\n", "\n", "Action: sql_db_query_checker\n", "Action Input: SELECT c.customer_id, c.name, cc.status FROM customers c JOIN credit_cards cc ON c.customer_id = cc.customer_id WHERE cc.status IN ('Blocked', 'Overdue') LIMIT 10\u001b[0m\u001b[36;1m\u001b[1;3m```sql\n", "SELECT c.customer_id, c.name, cc.status \n", "FROM customers c \n", "JOIN credit_cards cc ON c.customer_id = cc.customer_id \n", "WHERE cc.status IN ('Blocked', 'Overdue') \n", "LIMIT 10\n", "```\u001b[0m\u001b[32;1m\u001b[1;3mThe query looks correct, so I'll execute it.\n", "\n", "Action: sql_db_query\n", "Action Input: SELECT c.customer_id, c.name, cc.status FROM customers c JOIN credit_cards cc ON c.customer_id = cc.customer_id WHERE cc.status IN ('Blocked', 'Overdue') LIMIT 10\u001b[0m\u001b[36;1m\u001b[1;3m[('CUST001', 'Ajay Patel', 'Overdue'), ('CUST003', 'Vasantha Shah', 'Blocked'), ('CUST006', 'Balaji Rajan', 'Overdue'), ('CUST007', 'Suresh Iyer', 'Blocked'), ('CUST010', 'Ajay Kaur', 'Overdue'), ('CUST014', 'Deepa Kaur', 'Overdue'), ('CUST019', 'Suresh Joshi', 'Blocked'), ('CUST023', 'Vijay Rao', 'Blocked'), ('CUST026', 'Geeta Sharma', 'Blocked'), ('CUST027', 'Ananya Murthy', 'Blocked')]\u001b[0m\u001b[32;1m\u001b[1;3mI now know the final answer\n", "\n", "Final Answer: The customers who have a blocked or overdue credit card are: \n", "- Ajay Patel (CUST001) with an Overdue credit card\n", "- Vasantha Shah (CUST003) with a Blocked credit card\n", "- Balaji Rajan (CUST006) with an Overdue credit card\n", "- Suresh Iyer (CUST007) with a Blocked credit card\n", "- Ajay Kaur (CUST010) with an Overdue credit card\n", "- Deepa Kaur (CUST014) with an Overdue credit card\n", "- Suresh Joshi (CUST019) with a Blocked credit card\n", "- Vijay Rao (CUST023) with a Blocked credit card\n", "- Geeta Sharma (CUST026) with a Blocked credit card\n", "- Ananya Murthy (CUST027) with a Blocked credit card\u001b[0m\n", "\n", "\u001b[1m> Finished chain.\u001b[0m\n", " The customers who have a blocked or overdue credit card are: \n", "- Ajay Patel (CUST001) with an Overdue credit card\n", "- Vasantha Shah (CUST003) with a Blocked credit card\n", "- Balaji Rajan (CUST006) with an Overdue credit card\n", "- Suresh Iyer (CUST007) with a Blocked credit card\n", "- Ajay Kaur (CUST010) with an Overdue credit card\n", "- Deepa Kaur (CUST014) with an Overdue credit card\n", "- Suresh Joshi (CUST019) with a Blocked credit card\n", "- Vijay Rao (CUST023) with a Blocked credit card\n", "- Geeta Sharma (CUST026) with a Blocked credit card\n", "- Ananya Murthy (CUST027) with a Blocked credit card\n", "\n" ] } ], "source": [ "test_queries = [\n", " \"What are the top 5 customers with highest credit card outstanding balance?\",\n", " \"How many transactions have Failed status?\",\n", " \"What is the average balance of customers by account type?\",\n", " \"Which merchant has the highest number of transactions?\",\n", " \"Which customers have a blocked or overdue credit card?\"\n", "]\n", "\n", "print(\" Testing SQL Agent across all query types:\\n\")\n", "for query in test_queries:\n", " print(f\"{'='*60}\")\n", " print(f\" {query}\")\n", " print(f\"{'='*60}\")\n", " result = sql_agent.invoke({\"input\": query})\n", " print(f\" {result['output']}\\n\")" ] }, { "cell_type": "markdown", "metadata": { "id": "VK39-fRTir1M" }, "source": [ "**Observations -**\n", "\n", "Both agents are now ready:\n", "\n", "- **RAG Agent** \u2014 answers policy questions from 6 HDFC PDFs\n", "- **SQL Agent** \u2014 answers data questions from 3 database tables" ] }, { "cell_type": "markdown", "metadata": { "id": "blBp9MeBi1gM" }, "source": [ "### The LangGraph Orchestrator" ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "o81Hb7Fzkiti", "outputId": "b3001c7a-4626-467e-dad2-6888bb62eccc" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " Agent nodes defined!\n" ] } ], "source": [ "from langgraph.graph import StateGraph, END\n", "from typing import TypedDict\n", "\n", "# \u2500\u2500 DEFINE STATE \u2500\u2500\n", "# This is the shared memory that flows through the entire graph\n", "class AgentState(TypedDict):\n", " query: str # Original user question\n", " agent_used: str # Which agent handled it\n", " response: str # Final answer\n", "\n", "# \u2500\u2500 NODE 1: ROUTER \u2500\u2500\n", "# Decides which agent should handle the query\n", "def router(state: AgentState) -> AgentState:\n", " query = state[\"query\"].lower()\n", "\n", " # SQL keywords \u2192 personal/transactional data questions\n", " sql_keywords = [\n", " \"transaction\", \"balance\", \"how many\", \"outstanding\",\n", " \"credit card\", \"merchant\", \"customer\", \"branch\",\n", " \"average\", \"total\", \"count\", \"highest\", \"lowest\",\n", " \"failed\", \"blocked\", \"overdue\", \"statement\"\n", " ]\n", "\n", " # RAG keywords \u2192 policy/rule questions\n", " rag_keywords = [\n", " \"policy\", \"rule\", \"guideline\", \"what is\", \"how does\",\n", " \"eligibility\", \"penalty\", \"interest rate\", \"fee\",\n", " \"grievance\", \"kyc\", \"document\", \"complaint\", \"process\",\n", " \"minimum balance\", \"loan\", \"terms\", \"conditions\"\n", " ]\n", "\n", " sql_score = sum(1 for kw in sql_keywords if kw in query)\n", " rag_score = sum(1 for kw in rag_keywords if kw in query)\n", "\n", " if sql_score > rag_score:\n", " state[\"agent_used\"] = \"sql\"\n", " else:\n", " state[\"agent_used\"] = \"rag\"\n", "\n", " print(f\" Router Decision: {state['agent_used'].upper()} Agent \"\n", " f\"(SQL score: {sql_score}, RAG score: {rag_score})\")\n", " return state\n", "\n", "# \u2500\u2500 NODE 2: RAG AGENT NODE \u2500\u2500\n", "def run_rag_agent(state: AgentState) -> AgentState:\n", " print(f\" RAG Agent processing...\")\n", " response = rag_chain.invoke(state[\"query\"])\n", " state[\"response\"] = response\n", " return state\n", "\n", "# \u2500\u2500 NODE 3: SQL AGENT NODE \u2500\u2500\n", "def run_sql_agent(state: AgentState) -> AgentState:\n", " print(f\" SQL Agent processing...\")\n", " result = sql_agent.invoke({\"input\": state[\"query\"]})\n", " state[\"response\"] = result[\"output\"]\n", " return state\n", "\n", "# \u2500\u2500 CONDITIONAL EDGE: Which agent to call? \u2500\u2500\n", "def route_to_agent(state: AgentState) -> str:\n", " return state[\"agent_used\"] # Returns \"sql\" or \"rag\"\n", "\n", "print(\" Agent nodes defined!\")" ] }, { "cell_type": "code", "execution_count": 46, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "ygeXidIgl4Su", "outputId": "eeeba29b-d4dd-4cba-acbd-9a6633c661cd" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " LangGraph Orchestrator compiled!\n", "\n", "Graph Structure:\n", " START\n", " \u2193\n", " [ROUTER] \u2500\u2500 decides \u2500\u2500\u2192 [RAG AGENT] \u2192 END\n", " \u2514\u2500\u2500\u2192 [SQL AGENT] \u2192 END\n", "\n" ] } ], "source": [ "# \u2500\u2500 BUILD THE GRAPH \u2500\u2500\n", "workflow = StateGraph(AgentState)\n", "\n", "# Add nodes\n", "workflow.add_node(\"router\", router)\n", "workflow.add_node(\"rag\", run_rag_agent)\n", "workflow.add_node(\"sql\", run_sql_agent)\n", "\n", "# Set entry point\n", "workflow.set_entry_point(\"router\")\n", "\n", "# Add conditional edges from router\n", "workflow.add_conditional_edges(\n", " \"router\",\n", " route_to_agent,\n", " {\n", " \"rag\": \"rag\",\n", " \"sql\": \"sql\"\n", " }\n", ")\n", "\n", "# Both agents lead to END\n", "workflow.add_edge(\"rag\", END)\n", "workflow.add_edge(\"sql\", END)\n", "\n", "# Compile the graph\n", "app = workflow.compile()\n", "\n", "print(\" LangGraph Orchestrator compiled!\")\n", "print(\"\"\"\n", "Graph Structure:\n", " START\n", " \u2193\n", " [ROUTER] \u2500\u2500 decides \u2500\u2500\u2192 [RAG AGENT] \u2192 END\n", " \u2514\u2500\u2500\u2192 [SQL AGENT] \u2192 END\n", "\"\"\")" ] }, { "cell_type": "markdown", "metadata": { "id": "nV5OU3RxowMC" }, "source": [ "### Full end-to-end test \u2014 queries should route to different agents" ] }, { "cell_type": "code", "execution_count": 47, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "pA_9TtORos4n", "outputId": "a79a5870-e80e-4a14-9a48-fb406ba0e124" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "==========================================================================================\n", " What is the penalty for missed loan EMI payment?\n", "==========================================================================================\n", " Router Decision: RAG Agent (SQL score: 0, RAG score: 3)\n", " RAG Agent processing...\n", " I don't have enough information in the policy documents to answer this. Please contact HDFC Bank directly.\n", "\n", "==========================================================================================\n", " Which customers have overdue credit cards?\n", "==========================================================================================\n", " Router Decision: SQL Agent (SQL score: 3, RAG score: 0)\n", " SQL Agent processing...\n", "\n", "\n", "\u001b[1m> Entering new SQL Agent Executor chain...\u001b[0m\n", "\u001b[32;1m\u001b[1;3mThought: I should look at the tables in the database to see what I can query. Then I should query the schema of the most relevant tables.\n", "Action: sql_db_list_tables\n", "Action Input: \u001b[0m\u001b[38;5;200m\u001b[1;3mcredit_cards, customers, transactions\u001b[0m\u001b[32;1m\u001b[1;3mWith the list of tables, I can now query the schema of the most relevant tables to find out the column names and data types.\n", "\n", "Action: sql_db_schema\n", "Action Input: credit_cards, customers\u001b[0m\u001b[33;1m\u001b[1;3m\n", "CREATE TABLE credit_cards (\n", "\tcard_id TEXT, \n", "\tcustomer_id TEXT, \n", "\tcard_type TEXT, \n", "\tcredit_limit REAL, \n", "\toutstanding REAL, \n", "\tminimum_due REAL, \n", "\tdue_date TEXT, \n", "\tstatus TEXT, \n", "\tPRIMARY KEY (card_id), \n", "\tFOREIGN KEY(customer_id) REFERENCES customers (customer_id)\n", ")\n", "\n", "/*\n", "3 rows from credit_cards table:\n", "card_id\tcustomer_id\tcard_type\tcredit_limit\toutstanding\tminimum_due\tdue_date\tstatus\n", "CARD001\tCUST001\tClassic\t109316.28\t20075.65\t1003.78\t2026-03-30\tOverdue\n", "CARD002\tCUST002\tPlatinum\t52833.4\t22608.25\t1130.41\t2026-04-02\tActive\n", "CARD003\tCUST003\tPlatinum\t202797.74\t64997.16\t3249.86\t2026-03-20\tBlocked\n", "*/\n", "\n", "\n", "CREATE TABLE customers (\n", "\tcustomer_id TEXT, \n", "\tname TEXT, \n", "\taccount_type TEXT, \n", "\tbalance REAL, \n", "\tcredit_score INTEGER, \n", "\tbranch TEXT, \n", "\temail TEXT, \n", "\tphone TEXT, \n", "\tPRIMARY KEY (customer_id)\n", ")\n", "\n", "/*\n", "3 rows from customers table:\n", "customer_id\tname\taccount_type\tbalance\tcredit_score\tbranch\temail\tphone\n", "CUST001\tAjay Patel\tSavings\t31127.79\t677\tSemi Urban\tajay1@email.com\t+91-9876500001\n", "CUST002\tKavya Desai\tSavings\t75363.44\t642\tSemi Urban\tkavya2@email.com\t+91-9876500002\n", "CUST003\tVasantha Shah\tSavings\t64973.72\t739\tMetro\tvasantha3@email.com\t+91-9876500003\n", "*/\u001b[0m\u001b[32;1m\u001b[1;3mWith the schema of the credit_cards and customers tables, I can now construct a query to find the customers with overdue credit cards.\n", "\n", "Action: sql_db_query_checker\n", "Action Input: SELECT c.name, cc.card_id, cc.due_date FROM customers c JOIN credit_cards cc ON c.customer_id = cc.customer_id WHERE cc.status = 'Overdue' LIMIT 10\u001b[0m\u001b[36;1m\u001b[1;3m```sql\n", "SELECT c.name, cc.card_id, cc.due_date \n", "FROM customers c \n", "JOIN credit_cards cc \n", "ON c.customer_id = cc.customer_id \n", "WHERE cc.status = 'Overdue' \n", "LIMIT 10\n", "```\u001b[0m\u001b[32;1m\u001b[1;3mThe query looks correct, so I'll execute it.\n", "\n", "Action: sql_db_query\n", "Action Input: SELECT c.name, cc.card_id, cc.due_date FROM customers c JOIN credit_cards cc ON c.customer_id = cc.customer_id WHERE cc.status = 'Overdue' LIMIT 10\u001b[0m\u001b[36;1m\u001b[1;3m[('Ajay Patel', 'CARD001', '2026-03-30'), ('Balaji Rajan', 'CARD006', '2026-03-19'), ('Ajay Kaur', 'CARD010', '2026-04-01'), ('Deepa Kaur', 'CARD014', '2026-03-30'), ('Vani Mishra', 'CARD050', '2026-03-18')]\u001b[0m\u001b[32;1m\u001b[1;3mI now know the final answer\n", "\n", "Final Answer: The customers with overdue credit cards are Ajay Patel, Balaji Rajan, Ajay Kaur, Deepa Kaur, and Vani Mishra, with card IDs CARD001, CARD006, CARD010, CARD014, and CARD050, and due dates 2026-03-30, 2026-03-19, 2026-04-01, 2026-03-30, and 2026-03-18 respectively.\u001b[0m\n", "\n", "\u001b[1m> Finished chain.\u001b[0m\n", " The customers with overdue credit cards are Ajay Patel, Balaji Rajan, Ajay Kaur, Deepa Kaur, and Vani Mishra, with card IDs CARD001, CARD006, CARD010, CARD014, and CARD050, and due dates 2026-03-30, 2026-03-19, 2026-04-01, 2026-03-30, and 2026-03-18 respectively.\n", "\n", "==========================================================================================\n", " What documents are needed for KYC verification?\n", "==========================================================================================\n", " Router Decision: RAG Agent (SQL score: 0, RAG score: 2)\n", " RAG Agent processing...\n", " According to the policy documents, the following documents are mentioned for KYC verification:\n", "\n", "- Customer Identification Number\n", "- Telephone Identification Number (TIN)\n", "- Manual verification checks as decided by the Bank from time to time.\n", "\n", "Please note that the Bank may decide on additional documents or methods for KYC verification from time to time.\n", "\n", "==========================================================================================\n", " Which merchant has the highest number of transactions?\n", "==========================================================================================\n", " Router Decision: SQL Agent (SQL score: 3, RAG score: 0)\n", " SQL Agent processing...\n", "\n", "\n", "\u001b[1m> Entering new SQL Agent Executor chain...\u001b[0m\n", "\u001b[32;1m\u001b[1;3mThought: I should look at the tables in the database to see what I can query. Then I should query the schema of the most relevant tables.\n", "Action: sql_db_list_tables\n", "Action Input: \u001b[0m\u001b[38;5;200m\u001b[1;3mcredit_cards, customers, transactions\u001b[0m\u001b[32;1m\u001b[1;3mWith the list of tables, I can now query the schema of the most relevant tables to find out which columns are available.\n", "\n", "Action: sql_db_schema\n", "Action Input: transactions, customers, credit_cards\u001b[0m\u001b[33;1m\u001b[1;3m\n", "CREATE TABLE credit_cards (\n", "\tcard_id TEXT, \n", "\tcustomer_id TEXT, \n", "\tcard_type TEXT, \n", "\tcredit_limit REAL, \n", "\toutstanding REAL, \n", "\tminimum_due REAL, \n", "\tdue_date TEXT, \n", "\tstatus TEXT, \n", "\tPRIMARY KEY (card_id), \n", "\tFOREIGN KEY(customer_id) REFERENCES customers (customer_id)\n", ")\n", "\n", "/*\n", "3 rows from credit_cards table:\n", "card_id\tcustomer_id\tcard_type\tcredit_limit\toutstanding\tminimum_due\tdue_date\tstatus\n", "CARD001\tCUST001\tClassic\t109316.28\t20075.65\t1003.78\t2026-03-30\tOverdue\n", "CARD002\tCUST002\tPlatinum\t52833.4\t22608.25\t1130.41\t2026-04-02\tActive\n", "CARD003\tCUST003\tPlatinum\t202797.74\t64997.16\t3249.86\t2026-03-20\tBlocked\n", "*/\n", "\n", "\n", "CREATE TABLE customers (\n", "\tcustomer_id TEXT, \n", "\tname TEXT, \n", "\taccount_type TEXT, \n", "\tbalance REAL, \n", "\tcredit_score INTEGER, \n", "\tbranch TEXT, \n", "\temail TEXT, \n", "\tphone TEXT, \n", "\tPRIMARY KEY (customer_id)\n", ")\n", "\n", "/*\n", "3 rows from customers table:\n", "customer_id\tname\taccount_type\tbalance\tcredit_score\tbranch\temail\tphone\n", "CUST001\tAjay Patel\tSavings\t31127.79\t677\tSemi Urban\tajay1@email.com\t+91-9876500001\n", "CUST002\tKavya Desai\tSavings\t75363.44\t642\tSemi Urban\tkavya2@email.com\t+91-9876500002\n", "CUST003\tVasantha Shah\tSavings\t64973.72\t739\tMetro\tvasantha3@email.com\t+91-9876500003\n", "*/\n", "\n", "\n", "CREATE TABLE transactions (\n", "\ttxn_id TEXT, \n", "\tcustomer_id TEXT, \n", "\tdate TEXT, \n", "\tamount REAL, \n", "\ttype TEXT, \n", "\tmerchant TEXT, \n", "\tstatus TEXT, \n", "\tPRIMARY KEY (txn_id), \n", "\tFOREIGN KEY(customer_id) REFERENCES customers (customer_id)\n", ")\n", "\n", "/*\n", "3 rows from transactions table:\n", "txn_id\tcustomer_id\tdate\tamount\ttype\tmerchant\tstatus\n", "TXN001000\tCUST001\t2026-01-07\t42650.72\tDebit\tNetflix\tSuccess\n", "TXN001001\tCUST001\t2026-01-03\t9372.67\tDebit\tApple Store\tSuccess\n", "TXN001002\tCUST001\t2025-12-13\t20248.4\tCredit\tDecathlon\tSuccess\n", "*/\u001b[0m\u001b[32;1m\u001b[1;3mWith the schema of the tables, I can see that the transactions table has a column for merchant. To find the merchant with the highest number of transactions, I need to count the number of transactions for each merchant and order the results in descending order.\n", "\n", "Action: sql_db_query_checker\n", "Action Input: SELECT merchant, COUNT(txn_id) as num_transactions FROM transactions GROUP BY merchant ORDER BY num_transactions DESC LIMIT 10\u001b[0m\u001b[36;1m\u001b[1;3m```sql\n", "SELECT \"merchant\", COUNT(\"txn_id\") AS \"num_transactions\" \n", "FROM \"transactions\" \n", "GROUP BY \"merchant\" \n", "ORDER BY \"num_transactions\" DESC \n", "LIMIT 10;\n", "```\u001b[0m\u001b[32;1m\u001b[1;3mThe query looks good, now I can execute it to get the results.\n", "\n", "Action: sql_db_query\n", "Action Input: SELECT \"merchant\", COUNT(\"txn_id\") AS \"num_transactions\" FROM \"transactions\" GROUP BY \"merchant\" ORDER BY \"num_transactions\" DESC LIMIT 10\u001b[0m\u001b[36;1m\u001b[1;3m[('Myntra', 37), ('PhonePe', 35), ('Flipkart', 35), ('Apple Store', 32), ('IRCTC', 31), ('HDFC ATM', 28), ('Decathlon', 28), ('Amazon', 27), ('Nykaa', 26), ('Paytm', 25)]\u001b[0m\u001b[32;1m\u001b[1;3mI now know the final answer\n", "Final Answer: The merchant with the highest number of transactions is Myntra, with 37 transactions.\u001b[0m\n", "\n", "\u001b[1m> Finished chain.\u001b[0m\n", " The merchant with the highest number of transactions is Myntra, with 37 transactions.\n", "\n", "==========================================================================================\n", " How can I raise a grievance against HDFC Bank?\n", "==========================================================================================\n", " Router Decision: RAG Agent (SQL score: 0, RAG score: 1)\n", " RAG Agent processing...\n", " You can raise a grievance against HDFC Bank by contacting the Grievance Redressal Officer, Mrs. Deepa Balakrishnan, at +91 22 61606160. Alternatively, you can also contact the Call Centres at 1800 1600 / 1800 2600 (if you are in India) or +91 22 61606160 (if you are not in India). \n", "\n", "Additionally, you can write to the Manager, HDFC Bank Cards, at the following address: \n", "HDFC Bank Cards Division, PO Box No 8654, Door No 94 SP, Estate Bus Stand, Wavin Main Road, Mogappair West, Chennai 600058. \n", "\n", "You can also email your grievance to customerservices.cards@hdfcbank.com.\n" ] } ], "source": [ "# Full end-to-end test \u2014 queries should route to different agents\n", "test_queries = [\n", " \"What is the penalty for missed loan EMI payment?\", # \u2192 RAG\n", " \"Which customers have overdue credit cards?\", # \u2192 SQL\n", " \"What documents are needed for KYC verification?\", # \u2192 RAG\n", " \"Which merchant has the highest number of transactions?\", # \u2192 SQL\n", " \"How can I raise a grievance against HDFC Bank?\", # \u2192 RAG\n", "]\n", "\n", "def ask(query):\n", " print(f\"\\n{'='*90}\")\n", " print(f\" {query}\")\n", " print(f\"{'='*90}\")\n", " result = app.invoke({\n", " \"query\": query,\n", " \"agent_used\": \"\",\n", " \"response\": \"\"\n", " })\n", " print(f\" {result['response']}\")\n", "\n", "for query in test_queries:\n", " ask(query)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "G1mszo7_p4l9" }, "outputs": [], "source": [] } ], "metadata": { "colab": { "provenance": [] }, "kernelspec": { "display_name": "Python 3", "name": "python3" }, "language_info": { "name": "python" }, "widgets": { "application/vnd.jupyter.widget-state+json": { "1d992192b56d4196b2eb98e940e455c4": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "2d07c4273cad4fdc8d9ca704d8cb0680": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "ProgressStyleModel", "state": { "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "ProgressStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "StyleView", "bar_color": null, "description_width": "" } }, "4878abad46944afeb96e9280c850601e": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "4d1ef31ab0fc4fc9b87633db156bba39": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "HTMLModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "HTMLView", "description": "", "description_tooltip": null, "layout": "IPY_MODEL_92a6e1559ba84233b1c82352ceba6216", "placeholder": "\u200b", "style": "IPY_MODEL_fc402a5cef4d4e4a8089217bf942942d", "value": "\u2007103/103\u2007[00:00<00:00,\u2007442.42it/s,\u2007Materializing\u2007param=pooler.dense.weight]" } }, "4fc4a731772a43bc888bb92e87a042a0": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HTMLModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "HTMLModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "HTMLView", "description": "", "description_tooltip": null, "layout": "IPY_MODEL_eeeec036ab0445bb81abb27d532817e1", "placeholder": "\u200b", "style": "IPY_MODEL_706f9875c6b74d5380d8336f7c435cba", "value": "Loading\u2007weights:\u2007100%" } }, "6b424f3980624d858ded9569fe57936f": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "FloatProgressModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "FloatProgressModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "ProgressView", "bar_style": "success", "description": "", "description_tooltip": null, "layout": "IPY_MODEL_1d992192b56d4196b2eb98e940e455c4", "max": 103, "min": 0, "orientation": "horizontal", "style": "IPY_MODEL_2d07c4273cad4fdc8d9ca704d8cb0680", "value": 103 } }, "706f9875c6b74d5380d8336f7c435cba": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "DescriptionStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "StyleView", "description_width": "" } }, "92a6e1559ba84233b1c82352ceba6216": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "e6e6ae14093942a783a0251a9e641419": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "HBoxModel", "state": { "_dom_classes": [], "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "HBoxModel", "_view_count": null, "_view_module": "@jupyter-widgets/controls", "_view_module_version": "1.5.0", "_view_name": "HBoxView", "box_style": "", "children": [ "IPY_MODEL_4fc4a731772a43bc888bb92e87a042a0", "IPY_MODEL_6b424f3980624d858ded9569fe57936f", "IPY_MODEL_4d1ef31ab0fc4fc9b87633db156bba39" ], "layout": "IPY_MODEL_4878abad46944afeb96e9280c850601e" } }, "eeeec036ab0445bb81abb27d532817e1": { "model_module": "@jupyter-widgets/base", "model_module_version": "1.2.0", "model_name": "LayoutModel", "state": { "_model_module": "@jupyter-widgets/base", "_model_module_version": "1.2.0", "_model_name": "LayoutModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "LayoutView", "align_content": null, "align_items": null, "align_self": null, "border": null, "bottom": null, "display": null, "flex": null, "flex_flow": null, "grid_area": null, "grid_auto_columns": null, "grid_auto_flow": null, "grid_auto_rows": null, "grid_column": null, "grid_gap": null, "grid_row": null, "grid_template_areas": null, "grid_template_columns": null, "grid_template_rows": null, "height": null, "justify_content": null, "justify_items": null, "left": null, "margin": null, "max_height": null, "max_width": null, "min_height": null, "min_width": null, "object_fit": null, "object_position": null, "order": null, "overflow": null, "overflow_x": null, "overflow_y": null, "padding": null, "right": null, "top": null, "visibility": null, "width": null } }, "fc402a5cef4d4e4a8089217bf942942d": { "model_module": "@jupyter-widgets/controls", "model_module_version": "1.5.0", "model_name": "DescriptionStyleModel", "state": { "_model_module": "@jupyter-widgets/controls", "_model_module_version": "1.5.0", "_model_name": "DescriptionStyleModel", "_view_count": null, "_view_module": "@jupyter-widgets/base", "_view_module_version": "1.2.0", "_view_name": "StyleView", "description_width": "" } } } } }, "nbformat": 4, "nbformat_minor": 0 }