Spaces:

sariskiat
/

donecase

Sleeping

App Files Files Community

Brandon Hancock commited on May 9, 2025

Commit

55e16d5

0 Parent(s):

ready for YouTube

Browse files

Files changed (15) hide show

.gitignore +3 -0
README.md +127 -0
rag-agent/.env copy +2 -0
rag-agent/__init__.py +1 -0
rag-agent/agent.py +106 -0
rag-agent/config.py +17 -0
rag-agent/tools/__init__.py +27 -0
rag-agent/tools/add_data.py +142 -0
rag-agent/tools/delete_corpus.py +76 -0
rag-agent/tools/delete_document.py +68 -0
rag-agent/tools/get_corpus_info.py +118 -0
rag-agent/tools/list_corpora.py +68 -0
rag-agent/tools/rag_query.py +125 -0
rag-agent/tools/utils.py +173 -0
requirements.txt +5 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,3 @@

+.env
+__pycache__/
+.venv/

README.md ADDED Viewed

	@@ -0,0 +1,127 @@

+# Vertex AI RAG Agent with ADK
+This repository contains a Google Agent Development Kit (ADK) implementation of a Retrieval Augmented Generation (RAG) agent using Google Cloud Vertex AI.
+## Overview
+The Vertex AI RAG Agent allows you to:
+- Query document corpora with natural language questions
+- List available document corpora
+- Add new documents to existing corpora
+- Get detailed information about specific corpora
+- Delete corpora when they're no longer needed
+## Prerequisites
+- A Google Cloud account with billing enabled
+- A Google Cloud project with the Vertex AI API enabled
+- Appropriate access to create and manage Vertex AI resources
+- Python 3.9+ environment
+## Setting Up Google Cloud Authentication
+Before running the agent, you need to set up authentication with Google Cloud:
+1. **Install Google Cloud CLI**:
+   - Visit [Google Cloud SDK](https://cloud.google.com/sdk/docs/install) for installation instructions for your OS
+   - For macOS with Homebrew: `brew install google-cloud-sdk`
+   - For Windows: Download and run the installer from the Google Cloud website
+   - For Linux: Follow the distribution-specific instructions on the Google Cloud website
+2. **Initialize the Google Cloud CLI**:
+   ```bash
+   gcloud init
+   ```
+   This will guide you through logging in and selecting your project.
+3. **Set up Application Default Credentials**:
+   ```bash
+   gcloud auth application-default login
+   ```
+   This will open a browser window for authentication and store credentials in:
+   `~/.config/gcloud/application_default_credentials.json`
+4. **Verify Authentication**:
+   ```bash
+   gcloud auth list
+   gcloud config list
+   ```
+5. **Enable Required APIs** (if not already enabled):
+   ```bash
+   gcloud services enable aiplatform.googleapis.com
+   ```
+## Installation
+1. **Clone the repository**:
+   ```bash
+   git clone [repository URL]
+   cd [repository directory]
+   ```
+2. **Set up a virtual environment**:
+   ```bash
+   python -m venv .venv
+   source .venv/bin/activate  # On Windows: .venv\Scripts\activate
+   ```
+3. **Install Dependencies**:
+   ```bash
+   pip install -r requirements.txt
+   ```
+## Using the Agent
+The agent provides the following functionality through its tools:
+### 1. Query Documents
+Allows you to ask questions and get answers from your document corpus:
+- Automatically retrieves relevant information from the specified corpus
+- Generates informative responses based on the retrieved content
+### 2. List Corpora
+Shows all available document corpora in your project:
+- Displays corpus names and basic information
+- Helps you understand what data collections are available
+### 3. Add New Data
+Add documents to existing corpora or create new ones:
+- Supports Google Drive URLs and GCS (Google Cloud Storage) paths
+- Automatically creates new corpora if they don't exist
+### 4. Get Corpus Information
+Provides detailed information about a specific corpus:
+- Shows document count, file metadata, and creation time
+- Useful for understanding corpus contents and structure
+### 5. Delete Corpus
+Removes corpora that are no longer needed:
+- Requires confirmation to prevent accidental deletion
+- Permanently removes the corpus and all associated files
+## Troubleshooting
+If you encounter issues:
+- **Authentication Problems**:
+  - Run `gcloud auth application-default login` again
+  - Check if your service account has the necessary permissions
+- **API Errors**:
+  - Ensure the Vertex AI API is enabled: `gcloud services enable aiplatform.googleapis.com`
+  - Verify your project has billing enabled
+- **Quota Issues**:
+  - Check your Google Cloud Console for any quota limitations
+  - Request quota increases if needed
+- **Missing Dependencies**:
+  - Ensure all requirements are installed: `pip install -r requirements.txt`
+## Additional Resources
+- [Vertex AI RAG Documentation](https://cloud.google.com/vertex-ai/generative-ai/docs/rag-overview)
+- [Google Agent Development Kit (ADK) Documentation](https://github.com/google/agents-framework)
+- [Google Cloud Authentication Guide](https://cloud.google.com/docs/authentication)

rag-agent/.env copy ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ GOOGLE_GENAI_USE_VERTEXAI=FALSE
2	+ GOOGLE_API_KEY=...

rag-agent/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ from . import agent

rag-agent/agent.py ADDED Viewed

	@@ -0,0 +1,106 @@

+from google.adk.agents import Agent
+from .tools.add_data import add_data
+from .tools.delete_corpus import delete_corpus
+from .tools.delete_document import delete_document
+from .tools.get_corpus_info import get_corpus_info
+from .tools.list_corpora import list_corpora
+from .tools.rag_query import rag_query
+root_agent = Agent(
+    name="RagAgent",
+    # Using Gemini 2.5 Flash for best performance with RAG operations
+    model="gemini-2.5-flash-preview-04-17",
+    description="Vertex AI RAG Agent",
+    tools=[
+        rag_query,
+        list_corpora,
+        add_data,
+        get_corpus_info,
+        delete_corpus,
+        delete_document,
+    ],
+    instruction="""
+    # 🧠 Vertex AI RAG Agent
+    You are a helpful RAG (Retrieval Augmented Generation) agent that can interact with Vertex AI's document corpora.
+    You can retrieve information from corpora, list available corpora, add new documents to corpora,
+    get detailed information about specific corpora, delete specific documents from corpora,
+    and delete entire corpora when they're no longer needed.
+    If a corpus doesn't exist when needed, it will be automatically created for the user.
+    ## Your Capabilities
+    1. **Query Documents**: You can answer questions by retrieving relevant information from document corpora.
+    2. **List Corpora**: You can list all available document corpora to help users understand what data is available.
+    3. **Add New Data**: You can add new documents (Google Drive URLs, etc.) to existing corpora.
+    4. **Get Corpus Info**: You can provide detailed information about a specific corpus, including file metadata and statistics.
+    5. **Delete Document**: You can delete a specific document from a corpus when it's no longer needed.
+    6. **Delete Corpus**: You can delete an entire corpus and all its associated files when it's no longer needed.
+    ## How to Approach User Requests
+    When a user asks a question:
+    1. First, determine if they want to manage corpora (list/add data/get info/delete) or query existing information.
+    2. If they're asking a knowledge question, use the `rag_query` tool to search the corpus.
+    3. If they're asking about available corpora, use the `list_corpora` tool.
+    4. If they want to add data, ensure you know which corpus to add to, then use the `add_data` tool.
+    5. If they want information about a specific corpus, use the `get_corpus_info` tool.
+    6. If they want to delete a specific document, use the `delete_document` tool with confirmation.
+    7. If they want to delete an entire corpus, use the `delete_corpus` tool with confirmation.
+    ## Using Tools
+    You have six specialized tools at your disposal:
+    1. `rag_query`: Query a corpus to answer questions
+       - Parameters:
+         - corpus_name: The full resource name of the corpus to query (preferably use the full resource name from list_corpora results)
+         - query: The text question to ask
+    2. `list_corpora`: List all available corpora
+       - When this tool is called, it returns the full resource names that should be used with other tools
+    3. `add_data`: Add new data to a corpus (will create the corpus if it doesn't exist)
+       - Parameters:
+         - corpus_name: The name of the corpus to add data to (can be a simple name for new corpora)
+         - paths: List of Google Drive or GCS URLs
+    4. `get_corpus_info`: Get detailed information about a specific corpus
+       - Parameters:
+         - corpus_name: The full resource name of the corpus to get information about (preferably use the full resource name from list_corpora results)
+    5. `delete_document`: Delete a specific document from a corpus
+       - Parameters:
+         - corpus_name: The full resource name of the corpus containing the document (preferably use the full resource name from list_corpora results)
+         - document_id: The ID of the document to delete (can be obtained from get_corpus_info results)
+         - confirm: Boolean flag that must be set to True to confirm deletion
+    6. `delete_corpus`: Delete an entire corpus and all its associated files
+       - Parameters:
+         - corpus_name: The full resource name of the corpus to delete (preferably use the full resource name from list_corpora results)
+         - confirm: Boolean flag that must be set to True to confirm deletion
+    ## INTERNAL: Technical Implementation Details
+    This section is NOT user-facing information - don't repeat these details to users:
+    - Whenever possible, use the full resource name returned by the list_corpora tool when calling other tools
+    - Using the full resource name instead of just the display name will ensure more reliable operation
+    - Do not tell users to use full resource names in your responses - just use them internally in your tool calls
+    ## Communication Guidelines
+    - Be clear and concise in your responses.
+    - If querying a corpus, explain which corpus you're using to answer the question.
+    - If managing corpora, explain what actions you've taken.
+    - When new data is added, confirm what was added and to which corpus.
+    - If a corpus is created automatically, let the user know.
+    - When displaying corpus information, organize it clearly for the user.
+    - When deleting a document or corpus, always ask for confirmation before proceeding.
+    - If an error occurs, explain what went wrong and suggest next steps.
+    - When listing corpora, just provide the display names and basic information - don't tell users about resource names.
+    Remember, your primary goal is to help users access and manage information through RAG capabilities.
+    """,
+)

rag-agent/config.py ADDED Viewed

	@@ -0,0 +1,17 @@

+"""
+Configuration settings for the RAG Agent.
+"""
+import os
+# Vertex AI settings
+PROJECT_ID = os.environ.get("GOOGLE_CLOUD_PROJECT", "adk-vertexai-rag")
+LOCATION = os.environ.get("GOOGLE_CLOUD_LOCATION", "us-central1")
+# RAG settings
+DEFAULT_CHUNK_SIZE = 512
+DEFAULT_CHUNK_OVERLAP = 100
+DEFAULT_TOP_K = 3
+DEFAULT_DISTANCE_THRESHOLD = 0.5
+DEFAULT_EMBEDDING_MODEL = "publishers/google/models/text-embedding-005"
+DEFAULT_EMBEDDING_REQUESTS_PER_MIN = 1000

rag-agent/tools/__init__.py ADDED Viewed

	@@ -0,0 +1,27 @@

+"""
+RAG Tools package for interacting with Vertex AI RAG corpora.
+"""
+from .add_data import add_data
+from .delete_corpus import delete_corpus
+from .delete_document import delete_document
+from .get_corpus_info import get_corpus_info
+from .list_corpora import list_corpora
+from .rag_query import rag_query
+from .utils import (
+    check_corpus_exists,
+    create_corpus_if_not_exists,
+    get_corpus_resource_name,
+)
+__all__ = [
+    "add_data",
+    "list_corpora",
+    "rag_query",
+    "get_corpus_info",
+    "delete_corpus",
+    "delete_document",
+    "check_corpus_exists",
+    "create_corpus_if_not_exists",
+    "get_corpus_resource_name",
+]

rag-agent/tools/add_data.py ADDED Viewed

	@@ -0,0 +1,142 @@

+"""
+Tool for adding new data sources to a Vertex AI RAG corpus.
+"""
+import re
+from typing import Dict, List
+import vertexai
+from google.adk.tools.tool_context import ToolContext
+from vertexai import rag
+from ..config import (
+    DEFAULT_CHUNK_OVERLAP,
+    DEFAULT_CHUNK_SIZE,
+    DEFAULT_EMBEDDING_REQUESTS_PER_MIN,
+    LOCATION,
+    PROJECT_ID,
+)
+from .utils import create_corpus_if_not_exists, get_corpus_resource_name
+def add_data(
+    corpus_name: str,
+    paths: List[str],
+    tool_context: ToolContext,
+) -> Dict:
+    """
+    Add new data sources to a Vertex AI RAG corpus.
+    If the specified corpus doesn't exist, it will be created automatically.
+    Args:
+        corpus_name (str): The name or full resource name of the corpus to add data to
+        paths (List[str]): List of URLs or GCS paths to add to the corpus.
+                          Supported formats:
+                          - Google Drive: "https://drive.google.com/file/d/{FILE_ID}/view"
+                          - Google Docs/Sheets/Slides: "https://docs.google.com/{type}/d/{FILE_ID}/..."
+                          - Google Cloud Storage: "gs://{BUCKET}/{PATH}"
+                          Example: ["https://drive.google.com/file/d/123", "gs://my_bucket/my_files_dir"]
+        tool_context (ToolContext): The tool context
+    Returns:
+        dict: Information about the added data and status
+    """
+    try:
+        # Initialize Vertex AI
+        vertexai.init(project=PROJECT_ID, location=LOCATION)
+        # Validate and convert URLs to the proper format
+        validated_paths = []
+        invalid_paths = []
+        conversions = {}
+        for path in paths:
+            # Direct Google Drive file link - already valid
+            if "drive.google.com/file/d/" in path:
+                validated_paths.append(path)
+            # Google Cloud Storage path - already valid
+            elif path.startswith("gs://"):
+                validated_paths.append(path)
+            # Google Docs/Sheets/Slides links - extract ID and convert to Drive format
+            elif "docs.google.com" in path:
+                # Extract the document ID
+                match = re.search(r"/d/([a-zA-Z0-9_-]+)", path)
+                if match:
+                    file_id = match.group(1)
+                    drive_url = f"https://drive.google.com/file/d/{file_id}/view"
+                    validated_paths.append(drive_url)
+                    conversions[path] = drive_url
+                else:
+                    invalid_paths.append(path)
+            # Not a recognized format
+            else:
+                invalid_paths.append(path)
+        if not validated_paths:
+            return {
+                "status": "error",
+                "message": "No valid paths provided. Please provide Google Drive URLs, Google Docs/Sheets/Slides URLs, or GCS paths.",
+                "invalid_paths": invalid_paths,
+            }
+        # Check if corpus exists and create it if needed
+        corpus_created = False
+        corpus_result = create_corpus_if_not_exists(corpus_name, tool_context)
+        if not corpus_result["success"]:
+            return {
+                "status": "error",
+                "message": f"Unable to access or create corpus '{corpus_name}': {corpus_result['message']}",
+                "corpus_name": corpus_name,
+                "paths": paths,
+            }
+        corpus_created = corpus_result.get("was_created", False)
+        # Get the corpus resource name
+        corpus_resource_name = get_corpus_resource_name(corpus_name)
+        # Set up chunking configuration
+        transformation_config = rag.TransformationConfig(
+            chunking_config=rag.ChunkingConfig(
+                chunk_size=DEFAULT_CHUNK_SIZE,
+                chunk_overlap=DEFAULT_CHUNK_OVERLAP,
+            ),
+        )
+        # Import files to the corpus
+        import_result = rag.import_files(
+            corpus_resource_name,
+            validated_paths,
+            transformation_config=transformation_config,
+            max_embedding_requests_per_min=DEFAULT_EMBEDDING_REQUESTS_PER_MIN,
+        )
+        # Build the success message
+        creation_msg = (
+            f"Created new corpus '{corpus_name}' and " if corpus_created else ""
+        )
+        conversion_msg = ""
+        if conversions:
+            conversion_msg = " (Converted Google Docs URLs to Drive format)"
+        return {
+            "status": "success",
+            "message": f"{creation_msg}Successfully added {import_result.imported_rag_files_count} file(s) to corpus '{corpus_name}'{conversion_msg}",
+            "corpus_name": corpus_name,
+            "corpus_created": corpus_created,
+            "files_added": import_result.imported_rag_files_count,
+            "paths": validated_paths,
+            "invalid_paths": invalid_paths,
+            "conversions": conversions,
+        }
+    except Exception as e:
+        return {
+            "status": "error",
+            "message": f"Error adding data to corpus: {str(e)}",
+            "corpus_name": corpus_name,
+            "paths": paths,
+        }

rag-agent/tools/delete_corpus.py ADDED Viewed

	@@ -0,0 +1,76 @@

+"""
+Tool for deleting a Vertex AI RAG corpus when it's no longer needed.
+"""
+from typing import Dict
+import vertexai
+from google.adk.tools.tool_context import ToolContext
+from vertexai import rag
+from ..config import LOCATION, PROJECT_ID
+from .utils import check_corpus_exists, get_corpus_resource_name
+def delete_corpus(
+    corpus_name: str,
+    confirm: bool,
+    tool_context: ToolContext,
+) -> Dict:
+    """
+    Delete a Vertex AI RAG corpus when it's no longer needed.
+    Requires confirmation to prevent accidental deletion.
+    Args:
+        corpus_name (str): The full resource name of the corpus to delete.
+                           Preferably use the resource_name from list_corpora results.
+        confirm (bool): Must be set to True to confirm deletion
+        tool_context (ToolContext): The tool context
+    Returns:
+        dict: Status information about the deletion operation
+    """
+    try:
+        # Check if deletion is confirmed
+        if not confirm:
+            return {
+                "status": "error",
+                "message": "Deletion not confirmed. Please set confirm=True to delete the corpus.",
+                "corpus_name": corpus_name,
+            }
+        # Initialize Vertex AI
+        vertexai.init(project=PROJECT_ID, location=LOCATION)
+        # Check if corpus exists
+        if not check_corpus_exists(corpus_name, tool_context):
+            return {
+                "status": "error",
+                "message": f"Corpus '{corpus_name}' does not exist, so it cannot be deleted.",
+                "corpus_name": corpus_name,
+            }
+        # Get the corpus resource name
+        corpus_resource_name = get_corpus_resource_name(corpus_name)
+        # Delete the corpus
+        rag.delete_corpus(corpus_resource_name)
+        # Update state to reflect the deletion
+        state_key = f"corpus_exists_{corpus_name}"
+        if state_key in tool_context.state:
+            # Set the value to False instead of deleting the key
+            tool_context.state[state_key] = False
+        return {
+            "status": "success",
+            "message": f"Successfully deleted corpus '{corpus_name}'",
+            "corpus_name": corpus_name,
+        }
+    except Exception as e:
+        return {
+            "status": "error",
+            "message": f"Error deleting corpus: {str(e)}",
+            "corpus_name": corpus_name,
+        }

rag-agent/tools/delete_document.py ADDED Viewed

	@@ -0,0 +1,68 @@

+"""
+Tool for deleting a specific document from a Vertex AI RAG corpus.
+"""
+from typing import Dict
+import vertexai
+from google.adk.tools.tool_context import ToolContext
+from vertexai import rag
+from ..config import LOCATION, PROJECT_ID
+from .utils import check_corpus_exists, get_corpus_resource_name
+def delete_document(
+    corpus_name: str,
+    document_id: str,
+    tool_context: ToolContext,
+) -> Dict:
+    """
+    Delete a specific document from a Vertex AI RAG corpus.
+    Args:
+        corpus_name (str): The full resource name of the corpus containing the document.
+                          Preferably use the resource_name from list_corpora results.
+        document_id (str): The ID of the specific document/file to delete. This can be
+                          obtained from get_corpus_info results.
+        tool_context (ToolContext): The tool context
+    Returns:
+        dict: Status information about the deletion operation
+    """
+    try:
+        # Initialize Vertex AI
+        vertexai.init(project=PROJECT_ID, location=LOCATION)
+        # Check if corpus exists
+        if not check_corpus_exists(corpus_name, tool_context):
+            return {
+                "status": "error",
+                "message": f"Corpus '{corpus_name}' does not exist, so the document cannot be deleted.",
+                "corpus_name": corpus_name,
+                "document_id": document_id,
+            }
+        # Get the corpus resource name
+        corpus_resource_name = get_corpus_resource_name(corpus_name)
+        # Construct the full document resource name
+        document_resource_name = f"{corpus_resource_name}/ragFiles/{document_id}"
+        # Delete the document
+        rag.delete_file(name=document_resource_name)
+        return {
+            "status": "success",
+            "message": f"Successfully deleted document '{document_id}' from corpus '{corpus_name}'",
+            "corpus_name": corpus_name,
+            "document_id": document_id,
+        }
+    except Exception as e:
+        return {
+            "status": "error",
+            "message": f"Error deleting document: {str(e)}",
+            "corpus_name": corpus_name,
+            "document_id": document_id,
+        }

rag-agent/tools/get_corpus_info.py ADDED Viewed

	@@ -0,0 +1,118 @@

+"""
+Tool for retrieving detailed information about a specific RAG corpus.
+"""
+from typing import Dict
+import vertexai
+from google.adk.tools.tool_context import ToolContext
+from vertexai import rag
+from ..config import LOCATION, PROJECT_ID
+from .utils import check_corpus_exists, get_corpus_resource_name
+def get_corpus_info(
+    corpus_name: str,
+    tool_context: ToolContext,
+) -> Dict:
+    """
+    Get detailed information about a specific RAG corpus, including its files.
+    Args:
+        corpus_name (str): The full resource name of the corpus to get information about.
+                           Preferably use the resource_name from list_corpora results.
+        tool_context (ToolContext): The tool context
+    Returns:
+        dict: Information about the corpus and its files
+    """
+    try:
+        # Initialize Vertex AI
+        vertexai.init(project=PROJECT_ID, location=LOCATION)
+        # Check if corpus exists
+        if not check_corpus_exists(corpus_name, tool_context):
+            return {
+                "status": "error",
+                "message": f"Corpus '{corpus_name}' does not exist",
+                "corpus_name": corpus_name,
+            }
+        # Get the corpus resource name
+        corpus_resource_name = get_corpus_resource_name(corpus_name)
+        print(f"Corpus resource name: {corpus_resource_name}")
+        # Try to get corpus details first
+        corpus_display_name = corpus_name  # Default if we can't get actual display name
+        try:
+            corpus = rag.get_corpus(corpus_resource_name)
+            if hasattr(corpus, "display_name") and corpus.display_name:
+                corpus_display_name = corpus.display_name
+        except Exception as corpus_error:
+            print(f"Error getting corpus details: {str(corpus_error)}")
+            # Just continue without corpus details
+            pass
+        # Process file information
+        file_details = []
+        try:
+            # Get files in the corpus
+            files = list(rag.list_files(corpus_resource_name))
+            print(f"Found {len(files)} files")
+            for file in files:
+                file_info = {
+                    "file_id": (
+                        file.name.split("/")[-1] if hasattr(file, "name") else ""
+                    ),
+                    "source_uri": (
+                        file.source_uri if hasattr(file, "source_uri") else ""
+                    ),
+                    "display_name": (
+                        file.display_name if hasattr(file, "display_name") else ""
+                    ),
+                    "create_time": (
+                        str(file.create_time) if hasattr(file, "create_time") else ""
+                    ),
+                    "update_time": (
+                        str(file.update_time) if hasattr(file, "update_time") else ""
+                    ),
+                    "mime_type": file.mime_type if hasattr(file, "mime_type") else "",
+                    "state": str(file.state) if hasattr(file, "state") else "",
+                }
+                file_details.append(file_info)
+        except Exception as files_error:
+            print(f"Error retrieving files: {str(files_error)}")
+            return {
+                "status": "error",
+                "message": f"Error retrieving files for corpus '{corpus_name}': {str(files_error)}",
+                "corpus_name": corpus_name,
+                "corpus_resource_name": corpus_resource_name,
+            }
+        # Corpus statistics
+        corpus_stats = {
+            "file_count": len(file_details),
+        }
+        return {
+            "status": "success",
+            "message": f"Successfully retrieved information for corpus '{corpus_name}'",
+            "corpus_name": corpus_name,
+            "corpus_resource_name": corpus_resource_name,
+            "display_name": corpus_display_name,
+            "stats": corpus_stats,
+            "files": file_details,
+        }
+    except Exception as e:
+        print(f"Error in get_corpus_info: {str(e)}")
+        return {
+            "status": "error",
+            "message": f"Error retrieving corpus information: {str(e)}",
+            "corpus_name": corpus_name,
+        }

rag-agent/tools/list_corpora.py ADDED Viewed

	@@ -0,0 +1,68 @@

+"""
+Tool for listing all available Vertex AI RAG corpora.
+"""
+from typing import Dict, List, Union
+import vertexai
+from google.adk.tools.tool_context import ToolContext
+from vertexai import rag
+from ..config import LOCATION, PROJECT_ID
+def list_corpora(
+    tool_context: ToolContext,
+) -> Dict:
+    """
+    List all available Vertex AI RAG corpora.
+    Args:
+        tool_context (ToolContext): The tool context
+    Returns:
+        dict: A list of available corpora and status, with each corpus containing:
+            - resource_name: The full resource name to use with other tools
+            - display_name: The human-readable name of the corpus
+            - create_time: When the corpus was created
+            - update_time: When the corpus was last updated
+    """
+    try:
+        print("Listing corpora...", PROJECT_ID)
+        # Initialize Vertex AI
+        vertexai.init(project=PROJECT_ID, location=LOCATION)
+        # Get the list of corpora
+        corpora = rag.list_corpora()
+        # Process corpus information into a more usable format
+        corpus_info: List[Dict[str, Union[str, int]]] = []
+        for corpus in corpora:
+            corpus_data: Dict[str, Union[str, int]] = {
+                "resource_name": corpus.name,  # Full resource name for use with other tools
+                "display_name": corpus.display_name,
+                "create_time": (
+                    str(corpus.create_time) if hasattr(corpus, "create_time") else ""
+                ),
+                "update_time": (
+                    str(corpus.update_time) if hasattr(corpus, "update_time") else ""
+                ),
+            }
+            corpus_info.append(corpus_data)
+        print(f"Corpus info: {corpus_info}")
+        return {
+            "status": "success",
+            "message": f"Found {len(corpus_info)} corpus/corpora",
+            "corpora": corpus_info,
+            "count": len(corpus_info),
+            "note": "Use the 'resource_name' field (not 'display_name') when referencing corpora in other tools",
+        }
+    except Exception as e:
+        return {
+            "status": "error",
+            "message": f"Error listing corpora: {str(e)}",
+        }

rag-agent/tools/rag_query.py ADDED Viewed

	@@ -0,0 +1,125 @@

+"""
+Tool for querying Vertex AI RAG corpora and retrieving relevant information.
+"""
+from typing import Dict
+import vertexai
+from google.adk.tools.tool_context import ToolContext
+from vertexai import rag
+from ..config import (
+    DEFAULT_DISTANCE_THRESHOLD,
+    DEFAULT_TOP_K,
+    LOCATION,
+    PROJECT_ID,
+)
+from .utils import create_corpus_if_not_exists, get_corpus_resource_name
+def rag_query(
+    corpus_name: str,
+    query: str,
+    tool_context: ToolContext,
+) -> Dict:
+    """
+    Query a Vertex AI RAG corpus with a user question and return relevant information.
+    If the specified corpus doesn't exist, it will be created automatically.
+    Args:
+        corpus_name (str): The full resource name of the corpus to query.
+                           Preferably use the resource_name from list_corpora results.
+        query (str): The text query to search for in the corpus
+        tool_context (ToolContext): The tool context
+    Returns:
+        dict: The query results and status
+    """
+    try:
+        # Initialize Vertex AI
+        vertexai.init(project=PROJECT_ID, location=LOCATION)
+        # Check if corpus exists and create it if needed
+        corpus_result = create_corpus_if_not_exists(corpus_name, tool_context)
+        if not corpus_result["success"]:
+            return {
+                "status": "error",
+                "message": f"Unable to access or create corpus '{corpus_name}': {corpus_result['message']}",
+                "query": query,
+                "corpus_name": corpus_name,
+            }
+        # If corpus was created, there's no data to query yet
+        if corpus_result.get("was_created", False):
+            return {
+                "status": "warning",
+                "message": f"Created a new corpus '{corpus_name}', but it doesn't contain any data yet. Please add data to the corpus before querying.",
+                "query": query,
+                "corpus_name": corpus_name,
+                "results": [],
+                "results_count": 0,
+            }
+        # Get the corpus resource name
+        corpus_resource_name = get_corpus_resource_name(corpus_name)
+        # Configure retrieval parameters
+        rag_retrieval_config = rag.RagRetrievalConfig(
+            top_k=DEFAULT_TOP_K,
+            filter=rag.Filter(vector_distance_threshold=DEFAULT_DISTANCE_THRESHOLD),
+        )
+        # Perform the query
+        response = rag.retrieval_query(
+            rag_resources=[
+                rag.RagResource(
+                    rag_corpus=corpus_resource_name,
+                )
+            ],
+            text=query,
+            rag_retrieval_config=rag_retrieval_config,
+        )
+        # Process the response into a more usable format
+        results = []
+        if hasattr(response, "contexts") and response.contexts:
+            for ctx_group in response.contexts.contexts:
+                result = {
+                    "source_uri": (
+                        ctx_group.source_uri if hasattr(ctx_group, "source_uri") else ""
+                    ),
+                    "source_name": (
+                        ctx_group.source_display_name
+                        if hasattr(ctx_group, "source_display_name")
+                        else ""
+                    ),
+                    "text": ctx_group.text if hasattr(ctx_group, "text") else "",
+                    "score": ctx_group.score if hasattr(ctx_group, "score") else 0.0,
+                }
+                results.append(result)
+        # If we didn't find any results
+        if not results:
+            return {
+                "status": "warning",
+                "message": f"No results found in corpus '{corpus_name}' for query: '{query}'",
+                "query": query,
+                "results": [],
+                "results_count": 0,
+            }
+        return {
+            "status": "success",
+            "message": f"Successfully queried corpus '{corpus_name}'",
+            "query": query,
+            "results": results,
+            "results_count": len(results),
+        }
+    except Exception as e:
+        return {
+            "status": "error",
+            "message": f"Error querying corpus: {str(e)}",
+            "query": query,
+            "corpus_name": corpus_name,
+        }

rag-agent/tools/utils.py ADDED Viewed

	@@ -0,0 +1,173 @@

+"""
+Utility functions for the RAG tools.
+"""
+import re
+from typing import Any, Dict
+import vertexai
+from google.adk.tools.tool_context import ToolContext
+from vertexai import rag
+from ..config import (
+    DEFAULT_EMBEDDING_MODEL,
+    LOCATION,
+    PROJECT_ID,
+)
+def get_corpus_resource_name(corpus_name: str) -> str:
+    """
+    Convert a corpus name to its full resource name if needed.
+    Handles various input formats and ensures the returned name follows Vertex AI's requirements.
+    Args:
+        corpus_name (str): The corpus name or display name
+    Returns:
+        str: The full resource name of the corpus
+    """
+    print(f"Corpus name: {corpus_name}")
+    # If it's already a full resource name with the projects/locations/ragCorpora format
+    if re.match(r"^projects/[^/]+/locations/[^/]+/ragCorpora/[^/]+$", corpus_name):
+        return corpus_name
+    # Check if this is a display name of an existing corpus
+    try:
+        # Initialize Vertex AI if needed
+        vertexai.init(project=PROJECT_ID, location=LOCATION)
+        # List all corpora and check if there's a match with the display name
+        corpora = rag.list_corpora()
+        for corpus in corpora:
+            if hasattr(corpus, "display_name") and corpus.display_name == corpus_name:
+                return corpus.name
+    except Exception:
+        # If we can't check, continue with the default behavior
+        pass
+    # If it contains partial path elements, extract just the corpus ID
+    if "/" in corpus_name:
+        # Extract the last part of the path as the corpus ID
+        corpus_id = corpus_name.split("/")[-1]
+    else:
+        corpus_id = corpus_name
+    # Remove any special characters that might cause issues
+    corpus_id = re.sub(r"[^a-zA-Z0-9_-]", "_", corpus_id)
+    # Construct the standardized resource name
+    return f"projects/{PROJECT_ID}/locations/{LOCATION}/ragCorpora/{corpus_id}"
+def check_corpus_exists(corpus_name: str, tool_context: ToolContext) -> bool:
+    """
+    Check if a corpus with the given name exists.
+    Args:
+        corpus_name (str): The name of the corpus to check
+        tool_context (ToolContext): The tool context for state management
+    Returns:
+        bool: True if the corpus exists, False otherwise
+    """
+    # Check state first if tool_context is provided
+    if tool_context.state.get(f"corpus_exists_{corpus_name}"):
+        return True
+    try:
+        # Initialize Vertex AI
+        vertexai.init(project=PROJECT_ID, location=LOCATION)
+        # Get full resource name
+        corpus_resource_name = get_corpus_resource_name(corpus_name)
+        # List all corpora and check if this one exists
+        corpora = rag.list_corpora()
+        for corpus in corpora:
+            if (
+                corpus.name == corpus_resource_name
+                or corpus.display_name == corpus_name
+            ):
+                # Update state
+                tool_context.state[f"corpus_exists_{corpus_name}"] = True
+                return True
+        return False
+    except Exception:
+        # If we can't check, assume it doesn't exist
+        return False
+def create_corpus_if_not_exists(
+    corpus_name: str, tool_context: ToolContext
+) -> Dict[str, Any]:
+    """
+    Create a corpus if it doesn't already exist.
+    Args:
+        corpus_name (str): The name of the corpus to create if needed
+        tool_context (ToolContext): The tool context for state management
+    Returns:
+        Dict[str, Any]: Status information about the operation with the following keys:
+            - success (bool): True if the corpus was created or already exists
+            - corpus_name (str): The name of the corpus
+            - was_created (bool): Whether the corpus was newly created
+            - status (str): Status message ("success" or "error")
+            - message (str): Detailed message about the operation
+    """
+    # Check if corpus already exists
+    exists = check_corpus_exists(corpus_name, tool_context)
+    if exists:
+        return {
+            "success": True,
+            "status": "success",
+            "message": f"Corpus '{corpus_name}' already exists",
+            "corpus_name": corpus_name,
+            "was_created": False,
+        }
+    try:
+        # Initialize Vertex AI
+        vertexai.init(project=PROJECT_ID, location=LOCATION)
+        # Clean corpus name for use as display name
+        display_name = re.sub(r"[^a-zA-Z0-9_-]", "_", corpus_name)
+        # Configure embedding model
+        embedding_model_config = rag.RagEmbeddingModelConfig(
+            vertex_prediction_endpoint=rag.VertexPredictionEndpoint(
+                publisher_model=DEFAULT_EMBEDDING_MODEL
+            )
+        )
+        # Create the corpus
+        rag_corpus = rag.create_corpus(
+            display_name=display_name,
+            backend_config=rag.RagVectorDbConfig(
+                rag_embedding_model_config=embedding_model_config
+            ),
+        )
+        # Update state
+        tool_context.state[f"corpus_exists_{corpus_name}"] = True
+        return {
+            "success": True,
+            "status": "success",
+            "message": f"Successfully created corpus '{corpus_name}'",
+            "corpus_name": rag_corpus.name,
+            "display_name": rag_corpus.display_name,
+            "was_created": True,
+        }
+    except Exception as e:
+        return {
+            "success": False,
+            "status": "error",
+            "message": f"Error creating corpus: {str(e)}",
+            "corpus_name": corpus_name,
+            "was_created": False,
+        }

requirements.txt ADDED Viewed

	@@ -0,0 +1,5 @@

+google-cloud-aiplatform==1.92.0
+google-cloud-storage==2.19.0
+google-genai==1.14.0
+gitpython==3.1.40
+google-adk==0.4.0