Brandon Hancock commited on
Commit
55e16d5
·
0 Parent(s):

ready for YouTube

Browse files
.gitignore ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ .env
2
+ __pycache__/
3
+ .venv/
README.md ADDED
@@ -0,0 +1,127 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Vertex AI RAG Agent with ADK
2
+
3
+ This repository contains a Google Agent Development Kit (ADK) implementation of a Retrieval Augmented Generation (RAG) agent using Google Cloud Vertex AI.
4
+
5
+ ## Overview
6
+
7
+ The Vertex AI RAG Agent allows you to:
8
+
9
+ - Query document corpora with natural language questions
10
+ - List available document corpora
11
+ - Add new documents to existing corpora
12
+ - Get detailed information about specific corpora
13
+ - Delete corpora when they're no longer needed
14
+
15
+ ## Prerequisites
16
+
17
+ - A Google Cloud account with billing enabled
18
+ - A Google Cloud project with the Vertex AI API enabled
19
+ - Appropriate access to create and manage Vertex AI resources
20
+ - Python 3.9+ environment
21
+
22
+ ## Setting Up Google Cloud Authentication
23
+
24
+ Before running the agent, you need to set up authentication with Google Cloud:
25
+
26
+ 1. **Install Google Cloud CLI**:
27
+ - Visit [Google Cloud SDK](https://cloud.google.com/sdk/docs/install) for installation instructions for your OS
28
+ - For macOS with Homebrew: `brew install google-cloud-sdk`
29
+ - For Windows: Download and run the installer from the Google Cloud website
30
+ - For Linux: Follow the distribution-specific instructions on the Google Cloud website
31
+
32
+ 2. **Initialize the Google Cloud CLI**:
33
+ ```bash
34
+ gcloud init
35
+ ```
36
+ This will guide you through logging in and selecting your project.
37
+
38
+ 3. **Set up Application Default Credentials**:
39
+ ```bash
40
+ gcloud auth application-default login
41
+ ```
42
+ This will open a browser window for authentication and store credentials in:
43
+ `~/.config/gcloud/application_default_credentials.json`
44
+
45
+ 4. **Verify Authentication**:
46
+ ```bash
47
+ gcloud auth list
48
+ gcloud config list
49
+ ```
50
+
51
+ 5. **Enable Required APIs** (if not already enabled):
52
+ ```bash
53
+ gcloud services enable aiplatform.googleapis.com
54
+ ```
55
+
56
+ ## Installation
57
+
58
+ 1. **Clone the repository**:
59
+ ```bash
60
+ git clone [repository URL]
61
+ cd [repository directory]
62
+ ```
63
+
64
+ 2. **Set up a virtual environment**:
65
+ ```bash
66
+ python -m venv .venv
67
+ source .venv/bin/activate # On Windows: .venv\Scripts\activate
68
+ ```
69
+
70
+ 3. **Install Dependencies**:
71
+ ```bash
72
+ pip install -r requirements.txt
73
+ ```
74
+
75
+ ## Using the Agent
76
+
77
+ The agent provides the following functionality through its tools:
78
+
79
+ ### 1. Query Documents
80
+ Allows you to ask questions and get answers from your document corpus:
81
+ - Automatically retrieves relevant information from the specified corpus
82
+ - Generates informative responses based on the retrieved content
83
+
84
+ ### 2. List Corpora
85
+ Shows all available document corpora in your project:
86
+ - Displays corpus names and basic information
87
+ - Helps you understand what data collections are available
88
+
89
+ ### 3. Add New Data
90
+ Add documents to existing corpora or create new ones:
91
+ - Supports Google Drive URLs and GCS (Google Cloud Storage) paths
92
+ - Automatically creates new corpora if they don't exist
93
+
94
+ ### 4. Get Corpus Information
95
+ Provides detailed information about a specific corpus:
96
+ - Shows document count, file metadata, and creation time
97
+ - Useful for understanding corpus contents and structure
98
+
99
+ ### 5. Delete Corpus
100
+ Removes corpora that are no longer needed:
101
+ - Requires confirmation to prevent accidental deletion
102
+ - Permanently removes the corpus and all associated files
103
+
104
+ ## Troubleshooting
105
+
106
+ If you encounter issues:
107
+
108
+ - **Authentication Problems**:
109
+ - Run `gcloud auth application-default login` again
110
+ - Check if your service account has the necessary permissions
111
+
112
+ - **API Errors**:
113
+ - Ensure the Vertex AI API is enabled: `gcloud services enable aiplatform.googleapis.com`
114
+ - Verify your project has billing enabled
115
+
116
+ - **Quota Issues**:
117
+ - Check your Google Cloud Console for any quota limitations
118
+ - Request quota increases if needed
119
+
120
+ - **Missing Dependencies**:
121
+ - Ensure all requirements are installed: `pip install -r requirements.txt`
122
+
123
+ ## Additional Resources
124
+
125
+ - [Vertex AI RAG Documentation](https://cloud.google.com/vertex-ai/generative-ai/docs/rag-overview)
126
+ - [Google Agent Development Kit (ADK) Documentation](https://github.com/google/agents-framework)
127
+ - [Google Cloud Authentication Guide](https://cloud.google.com/docs/authentication)
rag-agent/.env copy ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ GOOGLE_GENAI_USE_VERTEXAI=FALSE
2
+ GOOGLE_API_KEY=...
rag-agent/__init__.py ADDED
@@ -0,0 +1 @@
 
 
1
+ from . import agent
rag-agent/agent.py ADDED
@@ -0,0 +1,106 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from google.adk.agents import Agent
2
+
3
+ from .tools.add_data import add_data
4
+ from .tools.delete_corpus import delete_corpus
5
+ from .tools.delete_document import delete_document
6
+ from .tools.get_corpus_info import get_corpus_info
7
+ from .tools.list_corpora import list_corpora
8
+ from .tools.rag_query import rag_query
9
+
10
+ root_agent = Agent(
11
+ name="RagAgent",
12
+ # Using Gemini 2.5 Flash for best performance with RAG operations
13
+ model="gemini-2.5-flash-preview-04-17",
14
+ description="Vertex AI RAG Agent",
15
+ tools=[
16
+ rag_query,
17
+ list_corpora,
18
+ add_data,
19
+ get_corpus_info,
20
+ delete_corpus,
21
+ delete_document,
22
+ ],
23
+ instruction="""
24
+ # 🧠 Vertex AI RAG Agent
25
+
26
+ You are a helpful RAG (Retrieval Augmented Generation) agent that can interact with Vertex AI's document corpora.
27
+ You can retrieve information from corpora, list available corpora, add new documents to corpora,
28
+ get detailed information about specific corpora, delete specific documents from corpora,
29
+ and delete entire corpora when they're no longer needed.
30
+ If a corpus doesn't exist when needed, it will be automatically created for the user.
31
+
32
+ ## Your Capabilities
33
+
34
+ 1. **Query Documents**: You can answer questions by retrieving relevant information from document corpora.
35
+ 2. **List Corpora**: You can list all available document corpora to help users understand what data is available.
36
+ 3. **Add New Data**: You can add new documents (Google Drive URLs, etc.) to existing corpora.
37
+ 4. **Get Corpus Info**: You can provide detailed information about a specific corpus, including file metadata and statistics.
38
+ 5. **Delete Document**: You can delete a specific document from a corpus when it's no longer needed.
39
+ 6. **Delete Corpus**: You can delete an entire corpus and all its associated files when it's no longer needed.
40
+
41
+ ## How to Approach User Requests
42
+
43
+ When a user asks a question:
44
+ 1. First, determine if they want to manage corpora (list/add data/get info/delete) or query existing information.
45
+ 2. If they're asking a knowledge question, use the `rag_query` tool to search the corpus.
46
+ 3. If they're asking about available corpora, use the `list_corpora` tool.
47
+ 4. If they want to add data, ensure you know which corpus to add to, then use the `add_data` tool.
48
+ 5. If they want information about a specific corpus, use the `get_corpus_info` tool.
49
+ 6. If they want to delete a specific document, use the `delete_document` tool with confirmation.
50
+ 7. If they want to delete an entire corpus, use the `delete_corpus` tool with confirmation.
51
+
52
+ ## Using Tools
53
+
54
+ You have six specialized tools at your disposal:
55
+
56
+ 1. `rag_query`: Query a corpus to answer questions
57
+ - Parameters:
58
+ - corpus_name: The full resource name of the corpus to query (preferably use the full resource name from list_corpora results)
59
+ - query: The text question to ask
60
+
61
+ 2. `list_corpora`: List all available corpora
62
+ - When this tool is called, it returns the full resource names that should be used with other tools
63
+
64
+ 3. `add_data`: Add new data to a corpus (will create the corpus if it doesn't exist)
65
+ - Parameters:
66
+ - corpus_name: The name of the corpus to add data to (can be a simple name for new corpora)
67
+ - paths: List of Google Drive or GCS URLs
68
+
69
+ 4. `get_corpus_info`: Get detailed information about a specific corpus
70
+ - Parameters:
71
+ - corpus_name: The full resource name of the corpus to get information about (preferably use the full resource name from list_corpora results)
72
+
73
+ 5. `delete_document`: Delete a specific document from a corpus
74
+ - Parameters:
75
+ - corpus_name: The full resource name of the corpus containing the document (preferably use the full resource name from list_corpora results)
76
+ - document_id: The ID of the document to delete (can be obtained from get_corpus_info results)
77
+ - confirm: Boolean flag that must be set to True to confirm deletion
78
+
79
+ 6. `delete_corpus`: Delete an entire corpus and all its associated files
80
+ - Parameters:
81
+ - corpus_name: The full resource name of the corpus to delete (preferably use the full resource name from list_corpora results)
82
+ - confirm: Boolean flag that must be set to True to confirm deletion
83
+
84
+ ## INTERNAL: Technical Implementation Details
85
+
86
+ This section is NOT user-facing information - don't repeat these details to users:
87
+
88
+ - Whenever possible, use the full resource name returned by the list_corpora tool when calling other tools
89
+ - Using the full resource name instead of just the display name will ensure more reliable operation
90
+ - Do not tell users to use full resource names in your responses - just use them internally in your tool calls
91
+
92
+ ## Communication Guidelines
93
+
94
+ - Be clear and concise in your responses.
95
+ - If querying a corpus, explain which corpus you're using to answer the question.
96
+ - If managing corpora, explain what actions you've taken.
97
+ - When new data is added, confirm what was added and to which corpus.
98
+ - If a corpus is created automatically, let the user know.
99
+ - When displaying corpus information, organize it clearly for the user.
100
+ - When deleting a document or corpus, always ask for confirmation before proceeding.
101
+ - If an error occurs, explain what went wrong and suggest next steps.
102
+ - When listing corpora, just provide the display names and basic information - don't tell users about resource names.
103
+
104
+ Remember, your primary goal is to help users access and manage information through RAG capabilities.
105
+ """,
106
+ )
rag-agent/config.py ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Configuration settings for the RAG Agent.
3
+ """
4
+
5
+ import os
6
+
7
+ # Vertex AI settings
8
+ PROJECT_ID = os.environ.get("GOOGLE_CLOUD_PROJECT", "adk-vertexai-rag")
9
+ LOCATION = os.environ.get("GOOGLE_CLOUD_LOCATION", "us-central1")
10
+
11
+ # RAG settings
12
+ DEFAULT_CHUNK_SIZE = 512
13
+ DEFAULT_CHUNK_OVERLAP = 100
14
+ DEFAULT_TOP_K = 3
15
+ DEFAULT_DISTANCE_THRESHOLD = 0.5
16
+ DEFAULT_EMBEDDING_MODEL = "publishers/google/models/text-embedding-005"
17
+ DEFAULT_EMBEDDING_REQUESTS_PER_MIN = 1000
rag-agent/tools/__init__.py ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ RAG Tools package for interacting with Vertex AI RAG corpora.
3
+ """
4
+
5
+ from .add_data import add_data
6
+ from .delete_corpus import delete_corpus
7
+ from .delete_document import delete_document
8
+ from .get_corpus_info import get_corpus_info
9
+ from .list_corpora import list_corpora
10
+ from .rag_query import rag_query
11
+ from .utils import (
12
+ check_corpus_exists,
13
+ create_corpus_if_not_exists,
14
+ get_corpus_resource_name,
15
+ )
16
+
17
+ __all__ = [
18
+ "add_data",
19
+ "list_corpora",
20
+ "rag_query",
21
+ "get_corpus_info",
22
+ "delete_corpus",
23
+ "delete_document",
24
+ "check_corpus_exists",
25
+ "create_corpus_if_not_exists",
26
+ "get_corpus_resource_name",
27
+ ]
rag-agent/tools/add_data.py ADDED
@@ -0,0 +1,142 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Tool for adding new data sources to a Vertex AI RAG corpus.
3
+ """
4
+
5
+ import re
6
+ from typing import Dict, List
7
+
8
+ import vertexai
9
+ from google.adk.tools.tool_context import ToolContext
10
+ from vertexai import rag
11
+
12
+ from ..config import (
13
+ DEFAULT_CHUNK_OVERLAP,
14
+ DEFAULT_CHUNK_SIZE,
15
+ DEFAULT_EMBEDDING_REQUESTS_PER_MIN,
16
+ LOCATION,
17
+ PROJECT_ID,
18
+ )
19
+ from .utils import create_corpus_if_not_exists, get_corpus_resource_name
20
+
21
+
22
+ def add_data(
23
+ corpus_name: str,
24
+ paths: List[str],
25
+ tool_context: ToolContext,
26
+ ) -> Dict:
27
+ """
28
+ Add new data sources to a Vertex AI RAG corpus.
29
+ If the specified corpus doesn't exist, it will be created automatically.
30
+
31
+ Args:
32
+ corpus_name (str): The name or full resource name of the corpus to add data to
33
+ paths (List[str]): List of URLs or GCS paths to add to the corpus.
34
+ Supported formats:
35
+ - Google Drive: "https://drive.google.com/file/d/{FILE_ID}/view"
36
+ - Google Docs/Sheets/Slides: "https://docs.google.com/{type}/d/{FILE_ID}/..."
37
+ - Google Cloud Storage: "gs://{BUCKET}/{PATH}"
38
+ Example: ["https://drive.google.com/file/d/123", "gs://my_bucket/my_files_dir"]
39
+ tool_context (ToolContext): The tool context
40
+
41
+ Returns:
42
+ dict: Information about the added data and status
43
+ """
44
+ try:
45
+ # Initialize Vertex AI
46
+ vertexai.init(project=PROJECT_ID, location=LOCATION)
47
+
48
+ # Validate and convert URLs to the proper format
49
+ validated_paths = []
50
+ invalid_paths = []
51
+ conversions = {}
52
+
53
+ for path in paths:
54
+ # Direct Google Drive file link - already valid
55
+ if "drive.google.com/file/d/" in path:
56
+ validated_paths.append(path)
57
+
58
+ # Google Cloud Storage path - already valid
59
+ elif path.startswith("gs://"):
60
+ validated_paths.append(path)
61
+
62
+ # Google Docs/Sheets/Slides links - extract ID and convert to Drive format
63
+ elif "docs.google.com" in path:
64
+ # Extract the document ID
65
+ match = re.search(r"/d/([a-zA-Z0-9_-]+)", path)
66
+ if match:
67
+ file_id = match.group(1)
68
+ drive_url = f"https://drive.google.com/file/d/{file_id}/view"
69
+ validated_paths.append(drive_url)
70
+ conversions[path] = drive_url
71
+ else:
72
+ invalid_paths.append(path)
73
+
74
+ # Not a recognized format
75
+ else:
76
+ invalid_paths.append(path)
77
+
78
+ if not validated_paths:
79
+ return {
80
+ "status": "error",
81
+ "message": "No valid paths provided. Please provide Google Drive URLs, Google Docs/Sheets/Slides URLs, or GCS paths.",
82
+ "invalid_paths": invalid_paths,
83
+ }
84
+
85
+ # Check if corpus exists and create it if needed
86
+ corpus_created = False
87
+ corpus_result = create_corpus_if_not_exists(corpus_name, tool_context)
88
+ if not corpus_result["success"]:
89
+ return {
90
+ "status": "error",
91
+ "message": f"Unable to access or create corpus '{corpus_name}': {corpus_result['message']}",
92
+ "corpus_name": corpus_name,
93
+ "paths": paths,
94
+ }
95
+
96
+ corpus_created = corpus_result.get("was_created", False)
97
+
98
+ # Get the corpus resource name
99
+ corpus_resource_name = get_corpus_resource_name(corpus_name)
100
+
101
+ # Set up chunking configuration
102
+ transformation_config = rag.TransformationConfig(
103
+ chunking_config=rag.ChunkingConfig(
104
+ chunk_size=DEFAULT_CHUNK_SIZE,
105
+ chunk_overlap=DEFAULT_CHUNK_OVERLAP,
106
+ ),
107
+ )
108
+
109
+ # Import files to the corpus
110
+ import_result = rag.import_files(
111
+ corpus_resource_name,
112
+ validated_paths,
113
+ transformation_config=transformation_config,
114
+ max_embedding_requests_per_min=DEFAULT_EMBEDDING_REQUESTS_PER_MIN,
115
+ )
116
+
117
+ # Build the success message
118
+ creation_msg = (
119
+ f"Created new corpus '{corpus_name}' and " if corpus_created else ""
120
+ )
121
+ conversion_msg = ""
122
+ if conversions:
123
+ conversion_msg = " (Converted Google Docs URLs to Drive format)"
124
+
125
+ return {
126
+ "status": "success",
127
+ "message": f"{creation_msg}Successfully added {import_result.imported_rag_files_count} file(s) to corpus '{corpus_name}'{conversion_msg}",
128
+ "corpus_name": corpus_name,
129
+ "corpus_created": corpus_created,
130
+ "files_added": import_result.imported_rag_files_count,
131
+ "paths": validated_paths,
132
+ "invalid_paths": invalid_paths,
133
+ "conversions": conversions,
134
+ }
135
+
136
+ except Exception as e:
137
+ return {
138
+ "status": "error",
139
+ "message": f"Error adding data to corpus: {str(e)}",
140
+ "corpus_name": corpus_name,
141
+ "paths": paths,
142
+ }
rag-agent/tools/delete_corpus.py ADDED
@@ -0,0 +1,76 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Tool for deleting a Vertex AI RAG corpus when it's no longer needed.
3
+ """
4
+
5
+ from typing import Dict
6
+
7
+ import vertexai
8
+ from google.adk.tools.tool_context import ToolContext
9
+ from vertexai import rag
10
+
11
+ from ..config import LOCATION, PROJECT_ID
12
+ from .utils import check_corpus_exists, get_corpus_resource_name
13
+
14
+
15
+ def delete_corpus(
16
+ corpus_name: str,
17
+ confirm: bool,
18
+ tool_context: ToolContext,
19
+ ) -> Dict:
20
+ """
21
+ Delete a Vertex AI RAG corpus when it's no longer needed.
22
+ Requires confirmation to prevent accidental deletion.
23
+
24
+ Args:
25
+ corpus_name (str): The full resource name of the corpus to delete.
26
+ Preferably use the resource_name from list_corpora results.
27
+ confirm (bool): Must be set to True to confirm deletion
28
+ tool_context (ToolContext): The tool context
29
+
30
+ Returns:
31
+ dict: Status information about the deletion operation
32
+ """
33
+ try:
34
+ # Check if deletion is confirmed
35
+ if not confirm:
36
+ return {
37
+ "status": "error",
38
+ "message": "Deletion not confirmed. Please set confirm=True to delete the corpus.",
39
+ "corpus_name": corpus_name,
40
+ }
41
+
42
+ # Initialize Vertex AI
43
+ vertexai.init(project=PROJECT_ID, location=LOCATION)
44
+
45
+ # Check if corpus exists
46
+ if not check_corpus_exists(corpus_name, tool_context):
47
+ return {
48
+ "status": "error",
49
+ "message": f"Corpus '{corpus_name}' does not exist, so it cannot be deleted.",
50
+ "corpus_name": corpus_name,
51
+ }
52
+
53
+ # Get the corpus resource name
54
+ corpus_resource_name = get_corpus_resource_name(corpus_name)
55
+
56
+ # Delete the corpus
57
+ rag.delete_corpus(corpus_resource_name)
58
+
59
+ # Update state to reflect the deletion
60
+ state_key = f"corpus_exists_{corpus_name}"
61
+ if state_key in tool_context.state:
62
+ # Set the value to False instead of deleting the key
63
+ tool_context.state[state_key] = False
64
+
65
+ return {
66
+ "status": "success",
67
+ "message": f"Successfully deleted corpus '{corpus_name}'",
68
+ "corpus_name": corpus_name,
69
+ }
70
+
71
+ except Exception as e:
72
+ return {
73
+ "status": "error",
74
+ "message": f"Error deleting corpus: {str(e)}",
75
+ "corpus_name": corpus_name,
76
+ }
rag-agent/tools/delete_document.py ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Tool for deleting a specific document from a Vertex AI RAG corpus.
3
+ """
4
+
5
+ from typing import Dict
6
+
7
+ import vertexai
8
+ from google.adk.tools.tool_context import ToolContext
9
+ from vertexai import rag
10
+
11
+ from ..config import LOCATION, PROJECT_ID
12
+ from .utils import check_corpus_exists, get_corpus_resource_name
13
+
14
+
15
+ def delete_document(
16
+ corpus_name: str,
17
+ document_id: str,
18
+ tool_context: ToolContext,
19
+ ) -> Dict:
20
+ """
21
+ Delete a specific document from a Vertex AI RAG corpus.
22
+
23
+ Args:
24
+ corpus_name (str): The full resource name of the corpus containing the document.
25
+ Preferably use the resource_name from list_corpora results.
26
+ document_id (str): The ID of the specific document/file to delete. This can be
27
+ obtained from get_corpus_info results.
28
+ tool_context (ToolContext): The tool context
29
+
30
+ Returns:
31
+ dict: Status information about the deletion operation
32
+ """
33
+ try:
34
+ # Initialize Vertex AI
35
+ vertexai.init(project=PROJECT_ID, location=LOCATION)
36
+
37
+ # Check if corpus exists
38
+ if not check_corpus_exists(corpus_name, tool_context):
39
+ return {
40
+ "status": "error",
41
+ "message": f"Corpus '{corpus_name}' does not exist, so the document cannot be deleted.",
42
+ "corpus_name": corpus_name,
43
+ "document_id": document_id,
44
+ }
45
+
46
+ # Get the corpus resource name
47
+ corpus_resource_name = get_corpus_resource_name(corpus_name)
48
+
49
+ # Construct the full document resource name
50
+ document_resource_name = f"{corpus_resource_name}/ragFiles/{document_id}"
51
+
52
+ # Delete the document
53
+ rag.delete_file(name=document_resource_name)
54
+
55
+ return {
56
+ "status": "success",
57
+ "message": f"Successfully deleted document '{document_id}' from corpus '{corpus_name}'",
58
+ "corpus_name": corpus_name,
59
+ "document_id": document_id,
60
+ }
61
+
62
+ except Exception as e:
63
+ return {
64
+ "status": "error",
65
+ "message": f"Error deleting document: {str(e)}",
66
+ "corpus_name": corpus_name,
67
+ "document_id": document_id,
68
+ }
rag-agent/tools/get_corpus_info.py ADDED
@@ -0,0 +1,118 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Tool for retrieving detailed information about a specific RAG corpus.
3
+ """
4
+
5
+ from typing import Dict
6
+
7
+ import vertexai
8
+ from google.adk.tools.tool_context import ToolContext
9
+ from vertexai import rag
10
+
11
+ from ..config import LOCATION, PROJECT_ID
12
+ from .utils import check_corpus_exists, get_corpus_resource_name
13
+
14
+
15
+ def get_corpus_info(
16
+ corpus_name: str,
17
+ tool_context: ToolContext,
18
+ ) -> Dict:
19
+ """
20
+ Get detailed information about a specific RAG corpus, including its files.
21
+
22
+ Args:
23
+ corpus_name (str): The full resource name of the corpus to get information about.
24
+ Preferably use the resource_name from list_corpora results.
25
+ tool_context (ToolContext): The tool context
26
+
27
+ Returns:
28
+ dict: Information about the corpus and its files
29
+ """
30
+ try:
31
+ # Initialize Vertex AI
32
+ vertexai.init(project=PROJECT_ID, location=LOCATION)
33
+
34
+ # Check if corpus exists
35
+ if not check_corpus_exists(corpus_name, tool_context):
36
+ return {
37
+ "status": "error",
38
+ "message": f"Corpus '{corpus_name}' does not exist",
39
+ "corpus_name": corpus_name,
40
+ }
41
+
42
+ # Get the corpus resource name
43
+ corpus_resource_name = get_corpus_resource_name(corpus_name)
44
+ print(f"Corpus resource name: {corpus_resource_name}")
45
+
46
+ # Try to get corpus details first
47
+ corpus_display_name = corpus_name # Default if we can't get actual display name
48
+
49
+ try:
50
+ corpus = rag.get_corpus(corpus_resource_name)
51
+ if hasattr(corpus, "display_name") and corpus.display_name:
52
+ corpus_display_name = corpus.display_name
53
+ except Exception as corpus_error:
54
+ print(f"Error getting corpus details: {str(corpus_error)}")
55
+ # Just continue without corpus details
56
+ pass
57
+
58
+ # Process file information
59
+ file_details = []
60
+
61
+ try:
62
+ # Get files in the corpus
63
+ files = list(rag.list_files(corpus_resource_name))
64
+ print(f"Found {len(files)} files")
65
+
66
+ for file in files:
67
+ file_info = {
68
+ "file_id": (
69
+ file.name.split("/")[-1] if hasattr(file, "name") else ""
70
+ ),
71
+ "source_uri": (
72
+ file.source_uri if hasattr(file, "source_uri") else ""
73
+ ),
74
+ "display_name": (
75
+ file.display_name if hasattr(file, "display_name") else ""
76
+ ),
77
+ "create_time": (
78
+ str(file.create_time) if hasattr(file, "create_time") else ""
79
+ ),
80
+ "update_time": (
81
+ str(file.update_time) if hasattr(file, "update_time") else ""
82
+ ),
83
+ "mime_type": file.mime_type if hasattr(file, "mime_type") else "",
84
+ "state": str(file.state) if hasattr(file, "state") else "",
85
+ }
86
+ file_details.append(file_info)
87
+
88
+ except Exception as files_error:
89
+ print(f"Error retrieving files: {str(files_error)}")
90
+ return {
91
+ "status": "error",
92
+ "message": f"Error retrieving files for corpus '{corpus_name}': {str(files_error)}",
93
+ "corpus_name": corpus_name,
94
+ "corpus_resource_name": corpus_resource_name,
95
+ }
96
+
97
+ # Corpus statistics
98
+ corpus_stats = {
99
+ "file_count": len(file_details),
100
+ }
101
+
102
+ return {
103
+ "status": "success",
104
+ "message": f"Successfully retrieved information for corpus '{corpus_name}'",
105
+ "corpus_name": corpus_name,
106
+ "corpus_resource_name": corpus_resource_name,
107
+ "display_name": corpus_display_name,
108
+ "stats": corpus_stats,
109
+ "files": file_details,
110
+ }
111
+
112
+ except Exception as e:
113
+ print(f"Error in get_corpus_info: {str(e)}")
114
+ return {
115
+ "status": "error",
116
+ "message": f"Error retrieving corpus information: {str(e)}",
117
+ "corpus_name": corpus_name,
118
+ }
rag-agent/tools/list_corpora.py ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Tool for listing all available Vertex AI RAG corpora.
3
+ """
4
+
5
+ from typing import Dict, List, Union
6
+
7
+ import vertexai
8
+ from google.adk.tools.tool_context import ToolContext
9
+ from vertexai import rag
10
+
11
+ from ..config import LOCATION, PROJECT_ID
12
+
13
+
14
+ def list_corpora(
15
+ tool_context: ToolContext,
16
+ ) -> Dict:
17
+ """
18
+ List all available Vertex AI RAG corpora.
19
+
20
+ Args:
21
+ tool_context (ToolContext): The tool context
22
+
23
+ Returns:
24
+ dict: A list of available corpora and status, with each corpus containing:
25
+ - resource_name: The full resource name to use with other tools
26
+ - display_name: The human-readable name of the corpus
27
+ - create_time: When the corpus was created
28
+ - update_time: When the corpus was last updated
29
+ """
30
+ try:
31
+ print("Listing corpora...", PROJECT_ID)
32
+ # Initialize Vertex AI
33
+ vertexai.init(project=PROJECT_ID, location=LOCATION)
34
+
35
+ # Get the list of corpora
36
+ corpora = rag.list_corpora()
37
+
38
+ # Process corpus information into a more usable format
39
+ corpus_info: List[Dict[str, Union[str, int]]] = []
40
+ for corpus in corpora:
41
+ corpus_data: Dict[str, Union[str, int]] = {
42
+ "resource_name": corpus.name, # Full resource name for use with other tools
43
+ "display_name": corpus.display_name,
44
+ "create_time": (
45
+ str(corpus.create_time) if hasattr(corpus, "create_time") else ""
46
+ ),
47
+ "update_time": (
48
+ str(corpus.update_time) if hasattr(corpus, "update_time") else ""
49
+ ),
50
+ }
51
+
52
+ corpus_info.append(corpus_data)
53
+
54
+ print(f"Corpus info: {corpus_info}")
55
+
56
+ return {
57
+ "status": "success",
58
+ "message": f"Found {len(corpus_info)} corpus/corpora",
59
+ "corpora": corpus_info,
60
+ "count": len(corpus_info),
61
+ "note": "Use the 'resource_name' field (not 'display_name') when referencing corpora in other tools",
62
+ }
63
+
64
+ except Exception as e:
65
+ return {
66
+ "status": "error",
67
+ "message": f"Error listing corpora: {str(e)}",
68
+ }
rag-agent/tools/rag_query.py ADDED
@@ -0,0 +1,125 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Tool for querying Vertex AI RAG corpora and retrieving relevant information.
3
+ """
4
+
5
+ from typing import Dict
6
+
7
+ import vertexai
8
+ from google.adk.tools.tool_context import ToolContext
9
+ from vertexai import rag
10
+
11
+ from ..config import (
12
+ DEFAULT_DISTANCE_THRESHOLD,
13
+ DEFAULT_TOP_K,
14
+ LOCATION,
15
+ PROJECT_ID,
16
+ )
17
+ from .utils import create_corpus_if_not_exists, get_corpus_resource_name
18
+
19
+
20
+ def rag_query(
21
+ corpus_name: str,
22
+ query: str,
23
+ tool_context: ToolContext,
24
+ ) -> Dict:
25
+ """
26
+ Query a Vertex AI RAG corpus with a user question and return relevant information.
27
+ If the specified corpus doesn't exist, it will be created automatically.
28
+
29
+ Args:
30
+ corpus_name (str): The full resource name of the corpus to query.
31
+ Preferably use the resource_name from list_corpora results.
32
+ query (str): The text query to search for in the corpus
33
+ tool_context (ToolContext): The tool context
34
+
35
+ Returns:
36
+ dict: The query results and status
37
+ """
38
+ try:
39
+ # Initialize Vertex AI
40
+ vertexai.init(project=PROJECT_ID, location=LOCATION)
41
+
42
+ # Check if corpus exists and create it if needed
43
+ corpus_result = create_corpus_if_not_exists(corpus_name, tool_context)
44
+ if not corpus_result["success"]:
45
+ return {
46
+ "status": "error",
47
+ "message": f"Unable to access or create corpus '{corpus_name}': {corpus_result['message']}",
48
+ "query": query,
49
+ "corpus_name": corpus_name,
50
+ }
51
+
52
+ # If corpus was created, there's no data to query yet
53
+ if corpus_result.get("was_created", False):
54
+ return {
55
+ "status": "warning",
56
+ "message": f"Created a new corpus '{corpus_name}', but it doesn't contain any data yet. Please add data to the corpus before querying.",
57
+ "query": query,
58
+ "corpus_name": corpus_name,
59
+ "results": [],
60
+ "results_count": 0,
61
+ }
62
+
63
+ # Get the corpus resource name
64
+ corpus_resource_name = get_corpus_resource_name(corpus_name)
65
+
66
+ # Configure retrieval parameters
67
+ rag_retrieval_config = rag.RagRetrievalConfig(
68
+ top_k=DEFAULT_TOP_K,
69
+ filter=rag.Filter(vector_distance_threshold=DEFAULT_DISTANCE_THRESHOLD),
70
+ )
71
+
72
+ # Perform the query
73
+ response = rag.retrieval_query(
74
+ rag_resources=[
75
+ rag.RagResource(
76
+ rag_corpus=corpus_resource_name,
77
+ )
78
+ ],
79
+ text=query,
80
+ rag_retrieval_config=rag_retrieval_config,
81
+ )
82
+
83
+ # Process the response into a more usable format
84
+ results = []
85
+ if hasattr(response, "contexts") and response.contexts:
86
+ for ctx_group in response.contexts.contexts:
87
+ result = {
88
+ "source_uri": (
89
+ ctx_group.source_uri if hasattr(ctx_group, "source_uri") else ""
90
+ ),
91
+ "source_name": (
92
+ ctx_group.source_display_name
93
+ if hasattr(ctx_group, "source_display_name")
94
+ else ""
95
+ ),
96
+ "text": ctx_group.text if hasattr(ctx_group, "text") else "",
97
+ "score": ctx_group.score if hasattr(ctx_group, "score") else 0.0,
98
+ }
99
+ results.append(result)
100
+
101
+ # If we didn't find any results
102
+ if not results:
103
+ return {
104
+ "status": "warning",
105
+ "message": f"No results found in corpus '{corpus_name}' for query: '{query}'",
106
+ "query": query,
107
+ "results": [],
108
+ "results_count": 0,
109
+ }
110
+
111
+ return {
112
+ "status": "success",
113
+ "message": f"Successfully queried corpus '{corpus_name}'",
114
+ "query": query,
115
+ "results": results,
116
+ "results_count": len(results),
117
+ }
118
+
119
+ except Exception as e:
120
+ return {
121
+ "status": "error",
122
+ "message": f"Error querying corpus: {str(e)}",
123
+ "query": query,
124
+ "corpus_name": corpus_name,
125
+ }
rag-agent/tools/utils.py ADDED
@@ -0,0 +1,173 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Utility functions for the RAG tools.
3
+ """
4
+
5
+ import re
6
+ from typing import Any, Dict
7
+
8
+ import vertexai
9
+ from google.adk.tools.tool_context import ToolContext
10
+ from vertexai import rag
11
+
12
+ from ..config import (
13
+ DEFAULT_EMBEDDING_MODEL,
14
+ LOCATION,
15
+ PROJECT_ID,
16
+ )
17
+
18
+
19
+ def get_corpus_resource_name(corpus_name: str) -> str:
20
+ """
21
+ Convert a corpus name to its full resource name if needed.
22
+ Handles various input formats and ensures the returned name follows Vertex AI's requirements.
23
+
24
+ Args:
25
+ corpus_name (str): The corpus name or display name
26
+
27
+ Returns:
28
+ str: The full resource name of the corpus
29
+ """
30
+ print(f"Corpus name: {corpus_name}")
31
+
32
+ # If it's already a full resource name with the projects/locations/ragCorpora format
33
+ if re.match(r"^projects/[^/]+/locations/[^/]+/ragCorpora/[^/]+$", corpus_name):
34
+ return corpus_name
35
+
36
+ # Check if this is a display name of an existing corpus
37
+ try:
38
+ # Initialize Vertex AI if needed
39
+ vertexai.init(project=PROJECT_ID, location=LOCATION)
40
+
41
+ # List all corpora and check if there's a match with the display name
42
+ corpora = rag.list_corpora()
43
+ for corpus in corpora:
44
+ if hasattr(corpus, "display_name") and corpus.display_name == corpus_name:
45
+ return corpus.name
46
+ except Exception:
47
+ # If we can't check, continue with the default behavior
48
+ pass
49
+
50
+ # If it contains partial path elements, extract just the corpus ID
51
+ if "/" in corpus_name:
52
+ # Extract the last part of the path as the corpus ID
53
+ corpus_id = corpus_name.split("/")[-1]
54
+ else:
55
+ corpus_id = corpus_name
56
+
57
+ # Remove any special characters that might cause issues
58
+ corpus_id = re.sub(r"[^a-zA-Z0-9_-]", "_", corpus_id)
59
+
60
+ # Construct the standardized resource name
61
+ return f"projects/{PROJECT_ID}/locations/{LOCATION}/ragCorpora/{corpus_id}"
62
+
63
+
64
+ def check_corpus_exists(corpus_name: str, tool_context: ToolContext) -> bool:
65
+ """
66
+ Check if a corpus with the given name exists.
67
+
68
+ Args:
69
+ corpus_name (str): The name of the corpus to check
70
+ tool_context (ToolContext): The tool context for state management
71
+
72
+ Returns:
73
+ bool: True if the corpus exists, False otherwise
74
+ """
75
+ # Check state first if tool_context is provided
76
+ if tool_context.state.get(f"corpus_exists_{corpus_name}"):
77
+ return True
78
+
79
+ try:
80
+ # Initialize Vertex AI
81
+ vertexai.init(project=PROJECT_ID, location=LOCATION)
82
+
83
+ # Get full resource name
84
+ corpus_resource_name = get_corpus_resource_name(corpus_name)
85
+
86
+ # List all corpora and check if this one exists
87
+ corpora = rag.list_corpora()
88
+ for corpus in corpora:
89
+ if (
90
+ corpus.name == corpus_resource_name
91
+ or corpus.display_name == corpus_name
92
+ ):
93
+ # Update state
94
+ tool_context.state[f"corpus_exists_{corpus_name}"] = True
95
+ return True
96
+
97
+ return False
98
+ except Exception:
99
+ # If we can't check, assume it doesn't exist
100
+ return False
101
+
102
+
103
+ def create_corpus_if_not_exists(
104
+ corpus_name: str, tool_context: ToolContext
105
+ ) -> Dict[str, Any]:
106
+ """
107
+ Create a corpus if it doesn't already exist.
108
+
109
+ Args:
110
+ corpus_name (str): The name of the corpus to create if needed
111
+ tool_context (ToolContext): The tool context for state management
112
+
113
+ Returns:
114
+ Dict[str, Any]: Status information about the operation with the following keys:
115
+ - success (bool): True if the corpus was created or already exists
116
+ - corpus_name (str): The name of the corpus
117
+ - was_created (bool): Whether the corpus was newly created
118
+ - status (str): Status message ("success" or "error")
119
+ - message (str): Detailed message about the operation
120
+ """
121
+ # Check if corpus already exists
122
+ exists = check_corpus_exists(corpus_name, tool_context)
123
+ if exists:
124
+ return {
125
+ "success": True,
126
+ "status": "success",
127
+ "message": f"Corpus '{corpus_name}' already exists",
128
+ "corpus_name": corpus_name,
129
+ "was_created": False,
130
+ }
131
+
132
+ try:
133
+ # Initialize Vertex AI
134
+ vertexai.init(project=PROJECT_ID, location=LOCATION)
135
+
136
+ # Clean corpus name for use as display name
137
+ display_name = re.sub(r"[^a-zA-Z0-9_-]", "_", corpus_name)
138
+
139
+ # Configure embedding model
140
+ embedding_model_config = rag.RagEmbeddingModelConfig(
141
+ vertex_prediction_endpoint=rag.VertexPredictionEndpoint(
142
+ publisher_model=DEFAULT_EMBEDDING_MODEL
143
+ )
144
+ )
145
+
146
+ # Create the corpus
147
+ rag_corpus = rag.create_corpus(
148
+ display_name=display_name,
149
+ backend_config=rag.RagVectorDbConfig(
150
+ rag_embedding_model_config=embedding_model_config
151
+ ),
152
+ )
153
+
154
+ # Update state
155
+ tool_context.state[f"corpus_exists_{corpus_name}"] = True
156
+
157
+ return {
158
+ "success": True,
159
+ "status": "success",
160
+ "message": f"Successfully created corpus '{corpus_name}'",
161
+ "corpus_name": rag_corpus.name,
162
+ "display_name": rag_corpus.display_name,
163
+ "was_created": True,
164
+ }
165
+
166
+ except Exception as e:
167
+ return {
168
+ "success": False,
169
+ "status": "error",
170
+ "message": f"Error creating corpus: {str(e)}",
171
+ "corpus_name": corpus_name,
172
+ "was_created": False,
173
+ }
requirements.txt ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ google-cloud-aiplatform==1.92.0
2
+ google-cloud-storage==2.19.0
3
+ google-genai==1.14.0
4
+ gitpython==3.1.40
5
+ google-adk==0.4.0