Brandon Hancock
commited on
Commit
·
55e16d5
0
Parent(s):
ready for YouTube
Browse files- .gitignore +3 -0
- README.md +127 -0
- rag-agent/.env copy +2 -0
- rag-agent/__init__.py +1 -0
- rag-agent/agent.py +106 -0
- rag-agent/config.py +17 -0
- rag-agent/tools/__init__.py +27 -0
- rag-agent/tools/add_data.py +142 -0
- rag-agent/tools/delete_corpus.py +76 -0
- rag-agent/tools/delete_document.py +68 -0
- rag-agent/tools/get_corpus_info.py +118 -0
- rag-agent/tools/list_corpora.py +68 -0
- rag-agent/tools/rag_query.py +125 -0
- rag-agent/tools/utils.py +173 -0
- requirements.txt +5 -0
.gitignore
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
.env
|
| 2 |
+
__pycache__/
|
| 3 |
+
.venv/
|
README.md
ADDED
|
@@ -0,0 +1,127 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Vertex AI RAG Agent with ADK
|
| 2 |
+
|
| 3 |
+
This repository contains a Google Agent Development Kit (ADK) implementation of a Retrieval Augmented Generation (RAG) agent using Google Cloud Vertex AI.
|
| 4 |
+
|
| 5 |
+
## Overview
|
| 6 |
+
|
| 7 |
+
The Vertex AI RAG Agent allows you to:
|
| 8 |
+
|
| 9 |
+
- Query document corpora with natural language questions
|
| 10 |
+
- List available document corpora
|
| 11 |
+
- Add new documents to existing corpora
|
| 12 |
+
- Get detailed information about specific corpora
|
| 13 |
+
- Delete corpora when they're no longer needed
|
| 14 |
+
|
| 15 |
+
## Prerequisites
|
| 16 |
+
|
| 17 |
+
- A Google Cloud account with billing enabled
|
| 18 |
+
- A Google Cloud project with the Vertex AI API enabled
|
| 19 |
+
- Appropriate access to create and manage Vertex AI resources
|
| 20 |
+
- Python 3.9+ environment
|
| 21 |
+
|
| 22 |
+
## Setting Up Google Cloud Authentication
|
| 23 |
+
|
| 24 |
+
Before running the agent, you need to set up authentication with Google Cloud:
|
| 25 |
+
|
| 26 |
+
1. **Install Google Cloud CLI**:
|
| 27 |
+
- Visit [Google Cloud SDK](https://cloud.google.com/sdk/docs/install) for installation instructions for your OS
|
| 28 |
+
- For macOS with Homebrew: `brew install google-cloud-sdk`
|
| 29 |
+
- For Windows: Download and run the installer from the Google Cloud website
|
| 30 |
+
- For Linux: Follow the distribution-specific instructions on the Google Cloud website
|
| 31 |
+
|
| 32 |
+
2. **Initialize the Google Cloud CLI**:
|
| 33 |
+
```bash
|
| 34 |
+
gcloud init
|
| 35 |
+
```
|
| 36 |
+
This will guide you through logging in and selecting your project.
|
| 37 |
+
|
| 38 |
+
3. **Set up Application Default Credentials**:
|
| 39 |
+
```bash
|
| 40 |
+
gcloud auth application-default login
|
| 41 |
+
```
|
| 42 |
+
This will open a browser window for authentication and store credentials in:
|
| 43 |
+
`~/.config/gcloud/application_default_credentials.json`
|
| 44 |
+
|
| 45 |
+
4. **Verify Authentication**:
|
| 46 |
+
```bash
|
| 47 |
+
gcloud auth list
|
| 48 |
+
gcloud config list
|
| 49 |
+
```
|
| 50 |
+
|
| 51 |
+
5. **Enable Required APIs** (if not already enabled):
|
| 52 |
+
```bash
|
| 53 |
+
gcloud services enable aiplatform.googleapis.com
|
| 54 |
+
```
|
| 55 |
+
|
| 56 |
+
## Installation
|
| 57 |
+
|
| 58 |
+
1. **Clone the repository**:
|
| 59 |
+
```bash
|
| 60 |
+
git clone [repository URL]
|
| 61 |
+
cd [repository directory]
|
| 62 |
+
```
|
| 63 |
+
|
| 64 |
+
2. **Set up a virtual environment**:
|
| 65 |
+
```bash
|
| 66 |
+
python -m venv .venv
|
| 67 |
+
source .venv/bin/activate # On Windows: .venv\Scripts\activate
|
| 68 |
+
```
|
| 69 |
+
|
| 70 |
+
3. **Install Dependencies**:
|
| 71 |
+
```bash
|
| 72 |
+
pip install -r requirements.txt
|
| 73 |
+
```
|
| 74 |
+
|
| 75 |
+
## Using the Agent
|
| 76 |
+
|
| 77 |
+
The agent provides the following functionality through its tools:
|
| 78 |
+
|
| 79 |
+
### 1. Query Documents
|
| 80 |
+
Allows you to ask questions and get answers from your document corpus:
|
| 81 |
+
- Automatically retrieves relevant information from the specified corpus
|
| 82 |
+
- Generates informative responses based on the retrieved content
|
| 83 |
+
|
| 84 |
+
### 2. List Corpora
|
| 85 |
+
Shows all available document corpora in your project:
|
| 86 |
+
- Displays corpus names and basic information
|
| 87 |
+
- Helps you understand what data collections are available
|
| 88 |
+
|
| 89 |
+
### 3. Add New Data
|
| 90 |
+
Add documents to existing corpora or create new ones:
|
| 91 |
+
- Supports Google Drive URLs and GCS (Google Cloud Storage) paths
|
| 92 |
+
- Automatically creates new corpora if they don't exist
|
| 93 |
+
|
| 94 |
+
### 4. Get Corpus Information
|
| 95 |
+
Provides detailed information about a specific corpus:
|
| 96 |
+
- Shows document count, file metadata, and creation time
|
| 97 |
+
- Useful for understanding corpus contents and structure
|
| 98 |
+
|
| 99 |
+
### 5. Delete Corpus
|
| 100 |
+
Removes corpora that are no longer needed:
|
| 101 |
+
- Requires confirmation to prevent accidental deletion
|
| 102 |
+
- Permanently removes the corpus and all associated files
|
| 103 |
+
|
| 104 |
+
## Troubleshooting
|
| 105 |
+
|
| 106 |
+
If you encounter issues:
|
| 107 |
+
|
| 108 |
+
- **Authentication Problems**:
|
| 109 |
+
- Run `gcloud auth application-default login` again
|
| 110 |
+
- Check if your service account has the necessary permissions
|
| 111 |
+
|
| 112 |
+
- **API Errors**:
|
| 113 |
+
- Ensure the Vertex AI API is enabled: `gcloud services enable aiplatform.googleapis.com`
|
| 114 |
+
- Verify your project has billing enabled
|
| 115 |
+
|
| 116 |
+
- **Quota Issues**:
|
| 117 |
+
- Check your Google Cloud Console for any quota limitations
|
| 118 |
+
- Request quota increases if needed
|
| 119 |
+
|
| 120 |
+
- **Missing Dependencies**:
|
| 121 |
+
- Ensure all requirements are installed: `pip install -r requirements.txt`
|
| 122 |
+
|
| 123 |
+
## Additional Resources
|
| 124 |
+
|
| 125 |
+
- [Vertex AI RAG Documentation](https://cloud.google.com/vertex-ai/generative-ai/docs/rag-overview)
|
| 126 |
+
- [Google Agent Development Kit (ADK) Documentation](https://github.com/google/agents-framework)
|
| 127 |
+
- [Google Cloud Authentication Guide](https://cloud.google.com/docs/authentication)
|
rag-agent/.env copy
ADDED
|
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
|
|
|
| 1 |
+
GOOGLE_GENAI_USE_VERTEXAI=FALSE
|
| 2 |
+
GOOGLE_API_KEY=...
|
rag-agent/__init__.py
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
from . import agent
|
rag-agent/agent.py
ADDED
|
@@ -0,0 +1,106 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from google.adk.agents import Agent
|
| 2 |
+
|
| 3 |
+
from .tools.add_data import add_data
|
| 4 |
+
from .tools.delete_corpus import delete_corpus
|
| 5 |
+
from .tools.delete_document import delete_document
|
| 6 |
+
from .tools.get_corpus_info import get_corpus_info
|
| 7 |
+
from .tools.list_corpora import list_corpora
|
| 8 |
+
from .tools.rag_query import rag_query
|
| 9 |
+
|
| 10 |
+
root_agent = Agent(
|
| 11 |
+
name="RagAgent",
|
| 12 |
+
# Using Gemini 2.5 Flash for best performance with RAG operations
|
| 13 |
+
model="gemini-2.5-flash-preview-04-17",
|
| 14 |
+
description="Vertex AI RAG Agent",
|
| 15 |
+
tools=[
|
| 16 |
+
rag_query,
|
| 17 |
+
list_corpora,
|
| 18 |
+
add_data,
|
| 19 |
+
get_corpus_info,
|
| 20 |
+
delete_corpus,
|
| 21 |
+
delete_document,
|
| 22 |
+
],
|
| 23 |
+
instruction="""
|
| 24 |
+
# 🧠 Vertex AI RAG Agent
|
| 25 |
+
|
| 26 |
+
You are a helpful RAG (Retrieval Augmented Generation) agent that can interact with Vertex AI's document corpora.
|
| 27 |
+
You can retrieve information from corpora, list available corpora, add new documents to corpora,
|
| 28 |
+
get detailed information about specific corpora, delete specific documents from corpora,
|
| 29 |
+
and delete entire corpora when they're no longer needed.
|
| 30 |
+
If a corpus doesn't exist when needed, it will be automatically created for the user.
|
| 31 |
+
|
| 32 |
+
## Your Capabilities
|
| 33 |
+
|
| 34 |
+
1. **Query Documents**: You can answer questions by retrieving relevant information from document corpora.
|
| 35 |
+
2. **List Corpora**: You can list all available document corpora to help users understand what data is available.
|
| 36 |
+
3. **Add New Data**: You can add new documents (Google Drive URLs, etc.) to existing corpora.
|
| 37 |
+
4. **Get Corpus Info**: You can provide detailed information about a specific corpus, including file metadata and statistics.
|
| 38 |
+
5. **Delete Document**: You can delete a specific document from a corpus when it's no longer needed.
|
| 39 |
+
6. **Delete Corpus**: You can delete an entire corpus and all its associated files when it's no longer needed.
|
| 40 |
+
|
| 41 |
+
## How to Approach User Requests
|
| 42 |
+
|
| 43 |
+
When a user asks a question:
|
| 44 |
+
1. First, determine if they want to manage corpora (list/add data/get info/delete) or query existing information.
|
| 45 |
+
2. If they're asking a knowledge question, use the `rag_query` tool to search the corpus.
|
| 46 |
+
3. If they're asking about available corpora, use the `list_corpora` tool.
|
| 47 |
+
4. If they want to add data, ensure you know which corpus to add to, then use the `add_data` tool.
|
| 48 |
+
5. If they want information about a specific corpus, use the `get_corpus_info` tool.
|
| 49 |
+
6. If they want to delete a specific document, use the `delete_document` tool with confirmation.
|
| 50 |
+
7. If they want to delete an entire corpus, use the `delete_corpus` tool with confirmation.
|
| 51 |
+
|
| 52 |
+
## Using Tools
|
| 53 |
+
|
| 54 |
+
You have six specialized tools at your disposal:
|
| 55 |
+
|
| 56 |
+
1. `rag_query`: Query a corpus to answer questions
|
| 57 |
+
- Parameters:
|
| 58 |
+
- corpus_name: The full resource name of the corpus to query (preferably use the full resource name from list_corpora results)
|
| 59 |
+
- query: The text question to ask
|
| 60 |
+
|
| 61 |
+
2. `list_corpora`: List all available corpora
|
| 62 |
+
- When this tool is called, it returns the full resource names that should be used with other tools
|
| 63 |
+
|
| 64 |
+
3. `add_data`: Add new data to a corpus (will create the corpus if it doesn't exist)
|
| 65 |
+
- Parameters:
|
| 66 |
+
- corpus_name: The name of the corpus to add data to (can be a simple name for new corpora)
|
| 67 |
+
- paths: List of Google Drive or GCS URLs
|
| 68 |
+
|
| 69 |
+
4. `get_corpus_info`: Get detailed information about a specific corpus
|
| 70 |
+
- Parameters:
|
| 71 |
+
- corpus_name: The full resource name of the corpus to get information about (preferably use the full resource name from list_corpora results)
|
| 72 |
+
|
| 73 |
+
5. `delete_document`: Delete a specific document from a corpus
|
| 74 |
+
- Parameters:
|
| 75 |
+
- corpus_name: The full resource name of the corpus containing the document (preferably use the full resource name from list_corpora results)
|
| 76 |
+
- document_id: The ID of the document to delete (can be obtained from get_corpus_info results)
|
| 77 |
+
- confirm: Boolean flag that must be set to True to confirm deletion
|
| 78 |
+
|
| 79 |
+
6. `delete_corpus`: Delete an entire corpus and all its associated files
|
| 80 |
+
- Parameters:
|
| 81 |
+
- corpus_name: The full resource name of the corpus to delete (preferably use the full resource name from list_corpora results)
|
| 82 |
+
- confirm: Boolean flag that must be set to True to confirm deletion
|
| 83 |
+
|
| 84 |
+
## INTERNAL: Technical Implementation Details
|
| 85 |
+
|
| 86 |
+
This section is NOT user-facing information - don't repeat these details to users:
|
| 87 |
+
|
| 88 |
+
- Whenever possible, use the full resource name returned by the list_corpora tool when calling other tools
|
| 89 |
+
- Using the full resource name instead of just the display name will ensure more reliable operation
|
| 90 |
+
- Do not tell users to use full resource names in your responses - just use them internally in your tool calls
|
| 91 |
+
|
| 92 |
+
## Communication Guidelines
|
| 93 |
+
|
| 94 |
+
- Be clear and concise in your responses.
|
| 95 |
+
- If querying a corpus, explain which corpus you're using to answer the question.
|
| 96 |
+
- If managing corpora, explain what actions you've taken.
|
| 97 |
+
- When new data is added, confirm what was added and to which corpus.
|
| 98 |
+
- If a corpus is created automatically, let the user know.
|
| 99 |
+
- When displaying corpus information, organize it clearly for the user.
|
| 100 |
+
- When deleting a document or corpus, always ask for confirmation before proceeding.
|
| 101 |
+
- If an error occurs, explain what went wrong and suggest next steps.
|
| 102 |
+
- When listing corpora, just provide the display names and basic information - don't tell users about resource names.
|
| 103 |
+
|
| 104 |
+
Remember, your primary goal is to help users access and manage information through RAG capabilities.
|
| 105 |
+
""",
|
| 106 |
+
)
|
rag-agent/config.py
ADDED
|
@@ -0,0 +1,17 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Configuration settings for the RAG Agent.
|
| 3 |
+
"""
|
| 4 |
+
|
| 5 |
+
import os
|
| 6 |
+
|
| 7 |
+
# Vertex AI settings
|
| 8 |
+
PROJECT_ID = os.environ.get("GOOGLE_CLOUD_PROJECT", "adk-vertexai-rag")
|
| 9 |
+
LOCATION = os.environ.get("GOOGLE_CLOUD_LOCATION", "us-central1")
|
| 10 |
+
|
| 11 |
+
# RAG settings
|
| 12 |
+
DEFAULT_CHUNK_SIZE = 512
|
| 13 |
+
DEFAULT_CHUNK_OVERLAP = 100
|
| 14 |
+
DEFAULT_TOP_K = 3
|
| 15 |
+
DEFAULT_DISTANCE_THRESHOLD = 0.5
|
| 16 |
+
DEFAULT_EMBEDDING_MODEL = "publishers/google/models/text-embedding-005"
|
| 17 |
+
DEFAULT_EMBEDDING_REQUESTS_PER_MIN = 1000
|
rag-agent/tools/__init__.py
ADDED
|
@@ -0,0 +1,27 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
RAG Tools package for interacting with Vertex AI RAG corpora.
|
| 3 |
+
"""
|
| 4 |
+
|
| 5 |
+
from .add_data import add_data
|
| 6 |
+
from .delete_corpus import delete_corpus
|
| 7 |
+
from .delete_document import delete_document
|
| 8 |
+
from .get_corpus_info import get_corpus_info
|
| 9 |
+
from .list_corpora import list_corpora
|
| 10 |
+
from .rag_query import rag_query
|
| 11 |
+
from .utils import (
|
| 12 |
+
check_corpus_exists,
|
| 13 |
+
create_corpus_if_not_exists,
|
| 14 |
+
get_corpus_resource_name,
|
| 15 |
+
)
|
| 16 |
+
|
| 17 |
+
__all__ = [
|
| 18 |
+
"add_data",
|
| 19 |
+
"list_corpora",
|
| 20 |
+
"rag_query",
|
| 21 |
+
"get_corpus_info",
|
| 22 |
+
"delete_corpus",
|
| 23 |
+
"delete_document",
|
| 24 |
+
"check_corpus_exists",
|
| 25 |
+
"create_corpus_if_not_exists",
|
| 26 |
+
"get_corpus_resource_name",
|
| 27 |
+
]
|
rag-agent/tools/add_data.py
ADDED
|
@@ -0,0 +1,142 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Tool for adding new data sources to a Vertex AI RAG corpus.
|
| 3 |
+
"""
|
| 4 |
+
|
| 5 |
+
import re
|
| 6 |
+
from typing import Dict, List
|
| 7 |
+
|
| 8 |
+
import vertexai
|
| 9 |
+
from google.adk.tools.tool_context import ToolContext
|
| 10 |
+
from vertexai import rag
|
| 11 |
+
|
| 12 |
+
from ..config import (
|
| 13 |
+
DEFAULT_CHUNK_OVERLAP,
|
| 14 |
+
DEFAULT_CHUNK_SIZE,
|
| 15 |
+
DEFAULT_EMBEDDING_REQUESTS_PER_MIN,
|
| 16 |
+
LOCATION,
|
| 17 |
+
PROJECT_ID,
|
| 18 |
+
)
|
| 19 |
+
from .utils import create_corpus_if_not_exists, get_corpus_resource_name
|
| 20 |
+
|
| 21 |
+
|
| 22 |
+
def add_data(
|
| 23 |
+
corpus_name: str,
|
| 24 |
+
paths: List[str],
|
| 25 |
+
tool_context: ToolContext,
|
| 26 |
+
) -> Dict:
|
| 27 |
+
"""
|
| 28 |
+
Add new data sources to a Vertex AI RAG corpus.
|
| 29 |
+
If the specified corpus doesn't exist, it will be created automatically.
|
| 30 |
+
|
| 31 |
+
Args:
|
| 32 |
+
corpus_name (str): The name or full resource name of the corpus to add data to
|
| 33 |
+
paths (List[str]): List of URLs or GCS paths to add to the corpus.
|
| 34 |
+
Supported formats:
|
| 35 |
+
- Google Drive: "https://drive.google.com/file/d/{FILE_ID}/view"
|
| 36 |
+
- Google Docs/Sheets/Slides: "https://docs.google.com/{type}/d/{FILE_ID}/..."
|
| 37 |
+
- Google Cloud Storage: "gs://{BUCKET}/{PATH}"
|
| 38 |
+
Example: ["https://drive.google.com/file/d/123", "gs://my_bucket/my_files_dir"]
|
| 39 |
+
tool_context (ToolContext): The tool context
|
| 40 |
+
|
| 41 |
+
Returns:
|
| 42 |
+
dict: Information about the added data and status
|
| 43 |
+
"""
|
| 44 |
+
try:
|
| 45 |
+
# Initialize Vertex AI
|
| 46 |
+
vertexai.init(project=PROJECT_ID, location=LOCATION)
|
| 47 |
+
|
| 48 |
+
# Validate and convert URLs to the proper format
|
| 49 |
+
validated_paths = []
|
| 50 |
+
invalid_paths = []
|
| 51 |
+
conversions = {}
|
| 52 |
+
|
| 53 |
+
for path in paths:
|
| 54 |
+
# Direct Google Drive file link - already valid
|
| 55 |
+
if "drive.google.com/file/d/" in path:
|
| 56 |
+
validated_paths.append(path)
|
| 57 |
+
|
| 58 |
+
# Google Cloud Storage path - already valid
|
| 59 |
+
elif path.startswith("gs://"):
|
| 60 |
+
validated_paths.append(path)
|
| 61 |
+
|
| 62 |
+
# Google Docs/Sheets/Slides links - extract ID and convert to Drive format
|
| 63 |
+
elif "docs.google.com" in path:
|
| 64 |
+
# Extract the document ID
|
| 65 |
+
match = re.search(r"/d/([a-zA-Z0-9_-]+)", path)
|
| 66 |
+
if match:
|
| 67 |
+
file_id = match.group(1)
|
| 68 |
+
drive_url = f"https://drive.google.com/file/d/{file_id}/view"
|
| 69 |
+
validated_paths.append(drive_url)
|
| 70 |
+
conversions[path] = drive_url
|
| 71 |
+
else:
|
| 72 |
+
invalid_paths.append(path)
|
| 73 |
+
|
| 74 |
+
# Not a recognized format
|
| 75 |
+
else:
|
| 76 |
+
invalid_paths.append(path)
|
| 77 |
+
|
| 78 |
+
if not validated_paths:
|
| 79 |
+
return {
|
| 80 |
+
"status": "error",
|
| 81 |
+
"message": "No valid paths provided. Please provide Google Drive URLs, Google Docs/Sheets/Slides URLs, or GCS paths.",
|
| 82 |
+
"invalid_paths": invalid_paths,
|
| 83 |
+
}
|
| 84 |
+
|
| 85 |
+
# Check if corpus exists and create it if needed
|
| 86 |
+
corpus_created = False
|
| 87 |
+
corpus_result = create_corpus_if_not_exists(corpus_name, tool_context)
|
| 88 |
+
if not corpus_result["success"]:
|
| 89 |
+
return {
|
| 90 |
+
"status": "error",
|
| 91 |
+
"message": f"Unable to access or create corpus '{corpus_name}': {corpus_result['message']}",
|
| 92 |
+
"corpus_name": corpus_name,
|
| 93 |
+
"paths": paths,
|
| 94 |
+
}
|
| 95 |
+
|
| 96 |
+
corpus_created = corpus_result.get("was_created", False)
|
| 97 |
+
|
| 98 |
+
# Get the corpus resource name
|
| 99 |
+
corpus_resource_name = get_corpus_resource_name(corpus_name)
|
| 100 |
+
|
| 101 |
+
# Set up chunking configuration
|
| 102 |
+
transformation_config = rag.TransformationConfig(
|
| 103 |
+
chunking_config=rag.ChunkingConfig(
|
| 104 |
+
chunk_size=DEFAULT_CHUNK_SIZE,
|
| 105 |
+
chunk_overlap=DEFAULT_CHUNK_OVERLAP,
|
| 106 |
+
),
|
| 107 |
+
)
|
| 108 |
+
|
| 109 |
+
# Import files to the corpus
|
| 110 |
+
import_result = rag.import_files(
|
| 111 |
+
corpus_resource_name,
|
| 112 |
+
validated_paths,
|
| 113 |
+
transformation_config=transformation_config,
|
| 114 |
+
max_embedding_requests_per_min=DEFAULT_EMBEDDING_REQUESTS_PER_MIN,
|
| 115 |
+
)
|
| 116 |
+
|
| 117 |
+
# Build the success message
|
| 118 |
+
creation_msg = (
|
| 119 |
+
f"Created new corpus '{corpus_name}' and " if corpus_created else ""
|
| 120 |
+
)
|
| 121 |
+
conversion_msg = ""
|
| 122 |
+
if conversions:
|
| 123 |
+
conversion_msg = " (Converted Google Docs URLs to Drive format)"
|
| 124 |
+
|
| 125 |
+
return {
|
| 126 |
+
"status": "success",
|
| 127 |
+
"message": f"{creation_msg}Successfully added {import_result.imported_rag_files_count} file(s) to corpus '{corpus_name}'{conversion_msg}",
|
| 128 |
+
"corpus_name": corpus_name,
|
| 129 |
+
"corpus_created": corpus_created,
|
| 130 |
+
"files_added": import_result.imported_rag_files_count,
|
| 131 |
+
"paths": validated_paths,
|
| 132 |
+
"invalid_paths": invalid_paths,
|
| 133 |
+
"conversions": conversions,
|
| 134 |
+
}
|
| 135 |
+
|
| 136 |
+
except Exception as e:
|
| 137 |
+
return {
|
| 138 |
+
"status": "error",
|
| 139 |
+
"message": f"Error adding data to corpus: {str(e)}",
|
| 140 |
+
"corpus_name": corpus_name,
|
| 141 |
+
"paths": paths,
|
| 142 |
+
}
|
rag-agent/tools/delete_corpus.py
ADDED
|
@@ -0,0 +1,76 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Tool for deleting a Vertex AI RAG corpus when it's no longer needed.
|
| 3 |
+
"""
|
| 4 |
+
|
| 5 |
+
from typing import Dict
|
| 6 |
+
|
| 7 |
+
import vertexai
|
| 8 |
+
from google.adk.tools.tool_context import ToolContext
|
| 9 |
+
from vertexai import rag
|
| 10 |
+
|
| 11 |
+
from ..config import LOCATION, PROJECT_ID
|
| 12 |
+
from .utils import check_corpus_exists, get_corpus_resource_name
|
| 13 |
+
|
| 14 |
+
|
| 15 |
+
def delete_corpus(
|
| 16 |
+
corpus_name: str,
|
| 17 |
+
confirm: bool,
|
| 18 |
+
tool_context: ToolContext,
|
| 19 |
+
) -> Dict:
|
| 20 |
+
"""
|
| 21 |
+
Delete a Vertex AI RAG corpus when it's no longer needed.
|
| 22 |
+
Requires confirmation to prevent accidental deletion.
|
| 23 |
+
|
| 24 |
+
Args:
|
| 25 |
+
corpus_name (str): The full resource name of the corpus to delete.
|
| 26 |
+
Preferably use the resource_name from list_corpora results.
|
| 27 |
+
confirm (bool): Must be set to True to confirm deletion
|
| 28 |
+
tool_context (ToolContext): The tool context
|
| 29 |
+
|
| 30 |
+
Returns:
|
| 31 |
+
dict: Status information about the deletion operation
|
| 32 |
+
"""
|
| 33 |
+
try:
|
| 34 |
+
# Check if deletion is confirmed
|
| 35 |
+
if not confirm:
|
| 36 |
+
return {
|
| 37 |
+
"status": "error",
|
| 38 |
+
"message": "Deletion not confirmed. Please set confirm=True to delete the corpus.",
|
| 39 |
+
"corpus_name": corpus_name,
|
| 40 |
+
}
|
| 41 |
+
|
| 42 |
+
# Initialize Vertex AI
|
| 43 |
+
vertexai.init(project=PROJECT_ID, location=LOCATION)
|
| 44 |
+
|
| 45 |
+
# Check if corpus exists
|
| 46 |
+
if not check_corpus_exists(corpus_name, tool_context):
|
| 47 |
+
return {
|
| 48 |
+
"status": "error",
|
| 49 |
+
"message": f"Corpus '{corpus_name}' does not exist, so it cannot be deleted.",
|
| 50 |
+
"corpus_name": corpus_name,
|
| 51 |
+
}
|
| 52 |
+
|
| 53 |
+
# Get the corpus resource name
|
| 54 |
+
corpus_resource_name = get_corpus_resource_name(corpus_name)
|
| 55 |
+
|
| 56 |
+
# Delete the corpus
|
| 57 |
+
rag.delete_corpus(corpus_resource_name)
|
| 58 |
+
|
| 59 |
+
# Update state to reflect the deletion
|
| 60 |
+
state_key = f"corpus_exists_{corpus_name}"
|
| 61 |
+
if state_key in tool_context.state:
|
| 62 |
+
# Set the value to False instead of deleting the key
|
| 63 |
+
tool_context.state[state_key] = False
|
| 64 |
+
|
| 65 |
+
return {
|
| 66 |
+
"status": "success",
|
| 67 |
+
"message": f"Successfully deleted corpus '{corpus_name}'",
|
| 68 |
+
"corpus_name": corpus_name,
|
| 69 |
+
}
|
| 70 |
+
|
| 71 |
+
except Exception as e:
|
| 72 |
+
return {
|
| 73 |
+
"status": "error",
|
| 74 |
+
"message": f"Error deleting corpus: {str(e)}",
|
| 75 |
+
"corpus_name": corpus_name,
|
| 76 |
+
}
|
rag-agent/tools/delete_document.py
ADDED
|
@@ -0,0 +1,68 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Tool for deleting a specific document from a Vertex AI RAG corpus.
|
| 3 |
+
"""
|
| 4 |
+
|
| 5 |
+
from typing import Dict
|
| 6 |
+
|
| 7 |
+
import vertexai
|
| 8 |
+
from google.adk.tools.tool_context import ToolContext
|
| 9 |
+
from vertexai import rag
|
| 10 |
+
|
| 11 |
+
from ..config import LOCATION, PROJECT_ID
|
| 12 |
+
from .utils import check_corpus_exists, get_corpus_resource_name
|
| 13 |
+
|
| 14 |
+
|
| 15 |
+
def delete_document(
|
| 16 |
+
corpus_name: str,
|
| 17 |
+
document_id: str,
|
| 18 |
+
tool_context: ToolContext,
|
| 19 |
+
) -> Dict:
|
| 20 |
+
"""
|
| 21 |
+
Delete a specific document from a Vertex AI RAG corpus.
|
| 22 |
+
|
| 23 |
+
Args:
|
| 24 |
+
corpus_name (str): The full resource name of the corpus containing the document.
|
| 25 |
+
Preferably use the resource_name from list_corpora results.
|
| 26 |
+
document_id (str): The ID of the specific document/file to delete. This can be
|
| 27 |
+
obtained from get_corpus_info results.
|
| 28 |
+
tool_context (ToolContext): The tool context
|
| 29 |
+
|
| 30 |
+
Returns:
|
| 31 |
+
dict: Status information about the deletion operation
|
| 32 |
+
"""
|
| 33 |
+
try:
|
| 34 |
+
# Initialize Vertex AI
|
| 35 |
+
vertexai.init(project=PROJECT_ID, location=LOCATION)
|
| 36 |
+
|
| 37 |
+
# Check if corpus exists
|
| 38 |
+
if not check_corpus_exists(corpus_name, tool_context):
|
| 39 |
+
return {
|
| 40 |
+
"status": "error",
|
| 41 |
+
"message": f"Corpus '{corpus_name}' does not exist, so the document cannot be deleted.",
|
| 42 |
+
"corpus_name": corpus_name,
|
| 43 |
+
"document_id": document_id,
|
| 44 |
+
}
|
| 45 |
+
|
| 46 |
+
# Get the corpus resource name
|
| 47 |
+
corpus_resource_name = get_corpus_resource_name(corpus_name)
|
| 48 |
+
|
| 49 |
+
# Construct the full document resource name
|
| 50 |
+
document_resource_name = f"{corpus_resource_name}/ragFiles/{document_id}"
|
| 51 |
+
|
| 52 |
+
# Delete the document
|
| 53 |
+
rag.delete_file(name=document_resource_name)
|
| 54 |
+
|
| 55 |
+
return {
|
| 56 |
+
"status": "success",
|
| 57 |
+
"message": f"Successfully deleted document '{document_id}' from corpus '{corpus_name}'",
|
| 58 |
+
"corpus_name": corpus_name,
|
| 59 |
+
"document_id": document_id,
|
| 60 |
+
}
|
| 61 |
+
|
| 62 |
+
except Exception as e:
|
| 63 |
+
return {
|
| 64 |
+
"status": "error",
|
| 65 |
+
"message": f"Error deleting document: {str(e)}",
|
| 66 |
+
"corpus_name": corpus_name,
|
| 67 |
+
"document_id": document_id,
|
| 68 |
+
}
|
rag-agent/tools/get_corpus_info.py
ADDED
|
@@ -0,0 +1,118 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Tool for retrieving detailed information about a specific RAG corpus.
|
| 3 |
+
"""
|
| 4 |
+
|
| 5 |
+
from typing import Dict
|
| 6 |
+
|
| 7 |
+
import vertexai
|
| 8 |
+
from google.adk.tools.tool_context import ToolContext
|
| 9 |
+
from vertexai import rag
|
| 10 |
+
|
| 11 |
+
from ..config import LOCATION, PROJECT_ID
|
| 12 |
+
from .utils import check_corpus_exists, get_corpus_resource_name
|
| 13 |
+
|
| 14 |
+
|
| 15 |
+
def get_corpus_info(
|
| 16 |
+
corpus_name: str,
|
| 17 |
+
tool_context: ToolContext,
|
| 18 |
+
) -> Dict:
|
| 19 |
+
"""
|
| 20 |
+
Get detailed information about a specific RAG corpus, including its files.
|
| 21 |
+
|
| 22 |
+
Args:
|
| 23 |
+
corpus_name (str): The full resource name of the corpus to get information about.
|
| 24 |
+
Preferably use the resource_name from list_corpora results.
|
| 25 |
+
tool_context (ToolContext): The tool context
|
| 26 |
+
|
| 27 |
+
Returns:
|
| 28 |
+
dict: Information about the corpus and its files
|
| 29 |
+
"""
|
| 30 |
+
try:
|
| 31 |
+
# Initialize Vertex AI
|
| 32 |
+
vertexai.init(project=PROJECT_ID, location=LOCATION)
|
| 33 |
+
|
| 34 |
+
# Check if corpus exists
|
| 35 |
+
if not check_corpus_exists(corpus_name, tool_context):
|
| 36 |
+
return {
|
| 37 |
+
"status": "error",
|
| 38 |
+
"message": f"Corpus '{corpus_name}' does not exist",
|
| 39 |
+
"corpus_name": corpus_name,
|
| 40 |
+
}
|
| 41 |
+
|
| 42 |
+
# Get the corpus resource name
|
| 43 |
+
corpus_resource_name = get_corpus_resource_name(corpus_name)
|
| 44 |
+
print(f"Corpus resource name: {corpus_resource_name}")
|
| 45 |
+
|
| 46 |
+
# Try to get corpus details first
|
| 47 |
+
corpus_display_name = corpus_name # Default if we can't get actual display name
|
| 48 |
+
|
| 49 |
+
try:
|
| 50 |
+
corpus = rag.get_corpus(corpus_resource_name)
|
| 51 |
+
if hasattr(corpus, "display_name") and corpus.display_name:
|
| 52 |
+
corpus_display_name = corpus.display_name
|
| 53 |
+
except Exception as corpus_error:
|
| 54 |
+
print(f"Error getting corpus details: {str(corpus_error)}")
|
| 55 |
+
# Just continue without corpus details
|
| 56 |
+
pass
|
| 57 |
+
|
| 58 |
+
# Process file information
|
| 59 |
+
file_details = []
|
| 60 |
+
|
| 61 |
+
try:
|
| 62 |
+
# Get files in the corpus
|
| 63 |
+
files = list(rag.list_files(corpus_resource_name))
|
| 64 |
+
print(f"Found {len(files)} files")
|
| 65 |
+
|
| 66 |
+
for file in files:
|
| 67 |
+
file_info = {
|
| 68 |
+
"file_id": (
|
| 69 |
+
file.name.split("/")[-1] if hasattr(file, "name") else ""
|
| 70 |
+
),
|
| 71 |
+
"source_uri": (
|
| 72 |
+
file.source_uri if hasattr(file, "source_uri") else ""
|
| 73 |
+
),
|
| 74 |
+
"display_name": (
|
| 75 |
+
file.display_name if hasattr(file, "display_name") else ""
|
| 76 |
+
),
|
| 77 |
+
"create_time": (
|
| 78 |
+
str(file.create_time) if hasattr(file, "create_time") else ""
|
| 79 |
+
),
|
| 80 |
+
"update_time": (
|
| 81 |
+
str(file.update_time) if hasattr(file, "update_time") else ""
|
| 82 |
+
),
|
| 83 |
+
"mime_type": file.mime_type if hasattr(file, "mime_type") else "",
|
| 84 |
+
"state": str(file.state) if hasattr(file, "state") else "",
|
| 85 |
+
}
|
| 86 |
+
file_details.append(file_info)
|
| 87 |
+
|
| 88 |
+
except Exception as files_error:
|
| 89 |
+
print(f"Error retrieving files: {str(files_error)}")
|
| 90 |
+
return {
|
| 91 |
+
"status": "error",
|
| 92 |
+
"message": f"Error retrieving files for corpus '{corpus_name}': {str(files_error)}",
|
| 93 |
+
"corpus_name": corpus_name,
|
| 94 |
+
"corpus_resource_name": corpus_resource_name,
|
| 95 |
+
}
|
| 96 |
+
|
| 97 |
+
# Corpus statistics
|
| 98 |
+
corpus_stats = {
|
| 99 |
+
"file_count": len(file_details),
|
| 100 |
+
}
|
| 101 |
+
|
| 102 |
+
return {
|
| 103 |
+
"status": "success",
|
| 104 |
+
"message": f"Successfully retrieved information for corpus '{corpus_name}'",
|
| 105 |
+
"corpus_name": corpus_name,
|
| 106 |
+
"corpus_resource_name": corpus_resource_name,
|
| 107 |
+
"display_name": corpus_display_name,
|
| 108 |
+
"stats": corpus_stats,
|
| 109 |
+
"files": file_details,
|
| 110 |
+
}
|
| 111 |
+
|
| 112 |
+
except Exception as e:
|
| 113 |
+
print(f"Error in get_corpus_info: {str(e)}")
|
| 114 |
+
return {
|
| 115 |
+
"status": "error",
|
| 116 |
+
"message": f"Error retrieving corpus information: {str(e)}",
|
| 117 |
+
"corpus_name": corpus_name,
|
| 118 |
+
}
|
rag-agent/tools/list_corpora.py
ADDED
|
@@ -0,0 +1,68 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Tool for listing all available Vertex AI RAG corpora.
|
| 3 |
+
"""
|
| 4 |
+
|
| 5 |
+
from typing import Dict, List, Union
|
| 6 |
+
|
| 7 |
+
import vertexai
|
| 8 |
+
from google.adk.tools.tool_context import ToolContext
|
| 9 |
+
from vertexai import rag
|
| 10 |
+
|
| 11 |
+
from ..config import LOCATION, PROJECT_ID
|
| 12 |
+
|
| 13 |
+
|
| 14 |
+
def list_corpora(
|
| 15 |
+
tool_context: ToolContext,
|
| 16 |
+
) -> Dict:
|
| 17 |
+
"""
|
| 18 |
+
List all available Vertex AI RAG corpora.
|
| 19 |
+
|
| 20 |
+
Args:
|
| 21 |
+
tool_context (ToolContext): The tool context
|
| 22 |
+
|
| 23 |
+
Returns:
|
| 24 |
+
dict: A list of available corpora and status, with each corpus containing:
|
| 25 |
+
- resource_name: The full resource name to use with other tools
|
| 26 |
+
- display_name: The human-readable name of the corpus
|
| 27 |
+
- create_time: When the corpus was created
|
| 28 |
+
- update_time: When the corpus was last updated
|
| 29 |
+
"""
|
| 30 |
+
try:
|
| 31 |
+
print("Listing corpora...", PROJECT_ID)
|
| 32 |
+
# Initialize Vertex AI
|
| 33 |
+
vertexai.init(project=PROJECT_ID, location=LOCATION)
|
| 34 |
+
|
| 35 |
+
# Get the list of corpora
|
| 36 |
+
corpora = rag.list_corpora()
|
| 37 |
+
|
| 38 |
+
# Process corpus information into a more usable format
|
| 39 |
+
corpus_info: List[Dict[str, Union[str, int]]] = []
|
| 40 |
+
for corpus in corpora:
|
| 41 |
+
corpus_data: Dict[str, Union[str, int]] = {
|
| 42 |
+
"resource_name": corpus.name, # Full resource name for use with other tools
|
| 43 |
+
"display_name": corpus.display_name,
|
| 44 |
+
"create_time": (
|
| 45 |
+
str(corpus.create_time) if hasattr(corpus, "create_time") else ""
|
| 46 |
+
),
|
| 47 |
+
"update_time": (
|
| 48 |
+
str(corpus.update_time) if hasattr(corpus, "update_time") else ""
|
| 49 |
+
),
|
| 50 |
+
}
|
| 51 |
+
|
| 52 |
+
corpus_info.append(corpus_data)
|
| 53 |
+
|
| 54 |
+
print(f"Corpus info: {corpus_info}")
|
| 55 |
+
|
| 56 |
+
return {
|
| 57 |
+
"status": "success",
|
| 58 |
+
"message": f"Found {len(corpus_info)} corpus/corpora",
|
| 59 |
+
"corpora": corpus_info,
|
| 60 |
+
"count": len(corpus_info),
|
| 61 |
+
"note": "Use the 'resource_name' field (not 'display_name') when referencing corpora in other tools",
|
| 62 |
+
}
|
| 63 |
+
|
| 64 |
+
except Exception as e:
|
| 65 |
+
return {
|
| 66 |
+
"status": "error",
|
| 67 |
+
"message": f"Error listing corpora: {str(e)}",
|
| 68 |
+
}
|
rag-agent/tools/rag_query.py
ADDED
|
@@ -0,0 +1,125 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Tool for querying Vertex AI RAG corpora and retrieving relevant information.
|
| 3 |
+
"""
|
| 4 |
+
|
| 5 |
+
from typing import Dict
|
| 6 |
+
|
| 7 |
+
import vertexai
|
| 8 |
+
from google.adk.tools.tool_context import ToolContext
|
| 9 |
+
from vertexai import rag
|
| 10 |
+
|
| 11 |
+
from ..config import (
|
| 12 |
+
DEFAULT_DISTANCE_THRESHOLD,
|
| 13 |
+
DEFAULT_TOP_K,
|
| 14 |
+
LOCATION,
|
| 15 |
+
PROJECT_ID,
|
| 16 |
+
)
|
| 17 |
+
from .utils import create_corpus_if_not_exists, get_corpus_resource_name
|
| 18 |
+
|
| 19 |
+
|
| 20 |
+
def rag_query(
|
| 21 |
+
corpus_name: str,
|
| 22 |
+
query: str,
|
| 23 |
+
tool_context: ToolContext,
|
| 24 |
+
) -> Dict:
|
| 25 |
+
"""
|
| 26 |
+
Query a Vertex AI RAG corpus with a user question and return relevant information.
|
| 27 |
+
If the specified corpus doesn't exist, it will be created automatically.
|
| 28 |
+
|
| 29 |
+
Args:
|
| 30 |
+
corpus_name (str): The full resource name of the corpus to query.
|
| 31 |
+
Preferably use the resource_name from list_corpora results.
|
| 32 |
+
query (str): The text query to search for in the corpus
|
| 33 |
+
tool_context (ToolContext): The tool context
|
| 34 |
+
|
| 35 |
+
Returns:
|
| 36 |
+
dict: The query results and status
|
| 37 |
+
"""
|
| 38 |
+
try:
|
| 39 |
+
# Initialize Vertex AI
|
| 40 |
+
vertexai.init(project=PROJECT_ID, location=LOCATION)
|
| 41 |
+
|
| 42 |
+
# Check if corpus exists and create it if needed
|
| 43 |
+
corpus_result = create_corpus_if_not_exists(corpus_name, tool_context)
|
| 44 |
+
if not corpus_result["success"]:
|
| 45 |
+
return {
|
| 46 |
+
"status": "error",
|
| 47 |
+
"message": f"Unable to access or create corpus '{corpus_name}': {corpus_result['message']}",
|
| 48 |
+
"query": query,
|
| 49 |
+
"corpus_name": corpus_name,
|
| 50 |
+
}
|
| 51 |
+
|
| 52 |
+
# If corpus was created, there's no data to query yet
|
| 53 |
+
if corpus_result.get("was_created", False):
|
| 54 |
+
return {
|
| 55 |
+
"status": "warning",
|
| 56 |
+
"message": f"Created a new corpus '{corpus_name}', but it doesn't contain any data yet. Please add data to the corpus before querying.",
|
| 57 |
+
"query": query,
|
| 58 |
+
"corpus_name": corpus_name,
|
| 59 |
+
"results": [],
|
| 60 |
+
"results_count": 0,
|
| 61 |
+
}
|
| 62 |
+
|
| 63 |
+
# Get the corpus resource name
|
| 64 |
+
corpus_resource_name = get_corpus_resource_name(corpus_name)
|
| 65 |
+
|
| 66 |
+
# Configure retrieval parameters
|
| 67 |
+
rag_retrieval_config = rag.RagRetrievalConfig(
|
| 68 |
+
top_k=DEFAULT_TOP_K,
|
| 69 |
+
filter=rag.Filter(vector_distance_threshold=DEFAULT_DISTANCE_THRESHOLD),
|
| 70 |
+
)
|
| 71 |
+
|
| 72 |
+
# Perform the query
|
| 73 |
+
response = rag.retrieval_query(
|
| 74 |
+
rag_resources=[
|
| 75 |
+
rag.RagResource(
|
| 76 |
+
rag_corpus=corpus_resource_name,
|
| 77 |
+
)
|
| 78 |
+
],
|
| 79 |
+
text=query,
|
| 80 |
+
rag_retrieval_config=rag_retrieval_config,
|
| 81 |
+
)
|
| 82 |
+
|
| 83 |
+
# Process the response into a more usable format
|
| 84 |
+
results = []
|
| 85 |
+
if hasattr(response, "contexts") and response.contexts:
|
| 86 |
+
for ctx_group in response.contexts.contexts:
|
| 87 |
+
result = {
|
| 88 |
+
"source_uri": (
|
| 89 |
+
ctx_group.source_uri if hasattr(ctx_group, "source_uri") else ""
|
| 90 |
+
),
|
| 91 |
+
"source_name": (
|
| 92 |
+
ctx_group.source_display_name
|
| 93 |
+
if hasattr(ctx_group, "source_display_name")
|
| 94 |
+
else ""
|
| 95 |
+
),
|
| 96 |
+
"text": ctx_group.text if hasattr(ctx_group, "text") else "",
|
| 97 |
+
"score": ctx_group.score if hasattr(ctx_group, "score") else 0.0,
|
| 98 |
+
}
|
| 99 |
+
results.append(result)
|
| 100 |
+
|
| 101 |
+
# If we didn't find any results
|
| 102 |
+
if not results:
|
| 103 |
+
return {
|
| 104 |
+
"status": "warning",
|
| 105 |
+
"message": f"No results found in corpus '{corpus_name}' for query: '{query}'",
|
| 106 |
+
"query": query,
|
| 107 |
+
"results": [],
|
| 108 |
+
"results_count": 0,
|
| 109 |
+
}
|
| 110 |
+
|
| 111 |
+
return {
|
| 112 |
+
"status": "success",
|
| 113 |
+
"message": f"Successfully queried corpus '{corpus_name}'",
|
| 114 |
+
"query": query,
|
| 115 |
+
"results": results,
|
| 116 |
+
"results_count": len(results),
|
| 117 |
+
}
|
| 118 |
+
|
| 119 |
+
except Exception as e:
|
| 120 |
+
return {
|
| 121 |
+
"status": "error",
|
| 122 |
+
"message": f"Error querying corpus: {str(e)}",
|
| 123 |
+
"query": query,
|
| 124 |
+
"corpus_name": corpus_name,
|
| 125 |
+
}
|
rag-agent/tools/utils.py
ADDED
|
@@ -0,0 +1,173 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Utility functions for the RAG tools.
|
| 3 |
+
"""
|
| 4 |
+
|
| 5 |
+
import re
|
| 6 |
+
from typing import Any, Dict
|
| 7 |
+
|
| 8 |
+
import vertexai
|
| 9 |
+
from google.adk.tools.tool_context import ToolContext
|
| 10 |
+
from vertexai import rag
|
| 11 |
+
|
| 12 |
+
from ..config import (
|
| 13 |
+
DEFAULT_EMBEDDING_MODEL,
|
| 14 |
+
LOCATION,
|
| 15 |
+
PROJECT_ID,
|
| 16 |
+
)
|
| 17 |
+
|
| 18 |
+
|
| 19 |
+
def get_corpus_resource_name(corpus_name: str) -> str:
|
| 20 |
+
"""
|
| 21 |
+
Convert a corpus name to its full resource name if needed.
|
| 22 |
+
Handles various input formats and ensures the returned name follows Vertex AI's requirements.
|
| 23 |
+
|
| 24 |
+
Args:
|
| 25 |
+
corpus_name (str): The corpus name or display name
|
| 26 |
+
|
| 27 |
+
Returns:
|
| 28 |
+
str: The full resource name of the corpus
|
| 29 |
+
"""
|
| 30 |
+
print(f"Corpus name: {corpus_name}")
|
| 31 |
+
|
| 32 |
+
# If it's already a full resource name with the projects/locations/ragCorpora format
|
| 33 |
+
if re.match(r"^projects/[^/]+/locations/[^/]+/ragCorpora/[^/]+$", corpus_name):
|
| 34 |
+
return corpus_name
|
| 35 |
+
|
| 36 |
+
# Check if this is a display name of an existing corpus
|
| 37 |
+
try:
|
| 38 |
+
# Initialize Vertex AI if needed
|
| 39 |
+
vertexai.init(project=PROJECT_ID, location=LOCATION)
|
| 40 |
+
|
| 41 |
+
# List all corpora and check if there's a match with the display name
|
| 42 |
+
corpora = rag.list_corpora()
|
| 43 |
+
for corpus in corpora:
|
| 44 |
+
if hasattr(corpus, "display_name") and corpus.display_name == corpus_name:
|
| 45 |
+
return corpus.name
|
| 46 |
+
except Exception:
|
| 47 |
+
# If we can't check, continue with the default behavior
|
| 48 |
+
pass
|
| 49 |
+
|
| 50 |
+
# If it contains partial path elements, extract just the corpus ID
|
| 51 |
+
if "/" in corpus_name:
|
| 52 |
+
# Extract the last part of the path as the corpus ID
|
| 53 |
+
corpus_id = corpus_name.split("/")[-1]
|
| 54 |
+
else:
|
| 55 |
+
corpus_id = corpus_name
|
| 56 |
+
|
| 57 |
+
# Remove any special characters that might cause issues
|
| 58 |
+
corpus_id = re.sub(r"[^a-zA-Z0-9_-]", "_", corpus_id)
|
| 59 |
+
|
| 60 |
+
# Construct the standardized resource name
|
| 61 |
+
return f"projects/{PROJECT_ID}/locations/{LOCATION}/ragCorpora/{corpus_id}"
|
| 62 |
+
|
| 63 |
+
|
| 64 |
+
def check_corpus_exists(corpus_name: str, tool_context: ToolContext) -> bool:
|
| 65 |
+
"""
|
| 66 |
+
Check if a corpus with the given name exists.
|
| 67 |
+
|
| 68 |
+
Args:
|
| 69 |
+
corpus_name (str): The name of the corpus to check
|
| 70 |
+
tool_context (ToolContext): The tool context for state management
|
| 71 |
+
|
| 72 |
+
Returns:
|
| 73 |
+
bool: True if the corpus exists, False otherwise
|
| 74 |
+
"""
|
| 75 |
+
# Check state first if tool_context is provided
|
| 76 |
+
if tool_context.state.get(f"corpus_exists_{corpus_name}"):
|
| 77 |
+
return True
|
| 78 |
+
|
| 79 |
+
try:
|
| 80 |
+
# Initialize Vertex AI
|
| 81 |
+
vertexai.init(project=PROJECT_ID, location=LOCATION)
|
| 82 |
+
|
| 83 |
+
# Get full resource name
|
| 84 |
+
corpus_resource_name = get_corpus_resource_name(corpus_name)
|
| 85 |
+
|
| 86 |
+
# List all corpora and check if this one exists
|
| 87 |
+
corpora = rag.list_corpora()
|
| 88 |
+
for corpus in corpora:
|
| 89 |
+
if (
|
| 90 |
+
corpus.name == corpus_resource_name
|
| 91 |
+
or corpus.display_name == corpus_name
|
| 92 |
+
):
|
| 93 |
+
# Update state
|
| 94 |
+
tool_context.state[f"corpus_exists_{corpus_name}"] = True
|
| 95 |
+
return True
|
| 96 |
+
|
| 97 |
+
return False
|
| 98 |
+
except Exception:
|
| 99 |
+
# If we can't check, assume it doesn't exist
|
| 100 |
+
return False
|
| 101 |
+
|
| 102 |
+
|
| 103 |
+
def create_corpus_if_not_exists(
|
| 104 |
+
corpus_name: str, tool_context: ToolContext
|
| 105 |
+
) -> Dict[str, Any]:
|
| 106 |
+
"""
|
| 107 |
+
Create a corpus if it doesn't already exist.
|
| 108 |
+
|
| 109 |
+
Args:
|
| 110 |
+
corpus_name (str): The name of the corpus to create if needed
|
| 111 |
+
tool_context (ToolContext): The tool context for state management
|
| 112 |
+
|
| 113 |
+
Returns:
|
| 114 |
+
Dict[str, Any]: Status information about the operation with the following keys:
|
| 115 |
+
- success (bool): True if the corpus was created or already exists
|
| 116 |
+
- corpus_name (str): The name of the corpus
|
| 117 |
+
- was_created (bool): Whether the corpus was newly created
|
| 118 |
+
- status (str): Status message ("success" or "error")
|
| 119 |
+
- message (str): Detailed message about the operation
|
| 120 |
+
"""
|
| 121 |
+
# Check if corpus already exists
|
| 122 |
+
exists = check_corpus_exists(corpus_name, tool_context)
|
| 123 |
+
if exists:
|
| 124 |
+
return {
|
| 125 |
+
"success": True,
|
| 126 |
+
"status": "success",
|
| 127 |
+
"message": f"Corpus '{corpus_name}' already exists",
|
| 128 |
+
"corpus_name": corpus_name,
|
| 129 |
+
"was_created": False,
|
| 130 |
+
}
|
| 131 |
+
|
| 132 |
+
try:
|
| 133 |
+
# Initialize Vertex AI
|
| 134 |
+
vertexai.init(project=PROJECT_ID, location=LOCATION)
|
| 135 |
+
|
| 136 |
+
# Clean corpus name for use as display name
|
| 137 |
+
display_name = re.sub(r"[^a-zA-Z0-9_-]", "_", corpus_name)
|
| 138 |
+
|
| 139 |
+
# Configure embedding model
|
| 140 |
+
embedding_model_config = rag.RagEmbeddingModelConfig(
|
| 141 |
+
vertex_prediction_endpoint=rag.VertexPredictionEndpoint(
|
| 142 |
+
publisher_model=DEFAULT_EMBEDDING_MODEL
|
| 143 |
+
)
|
| 144 |
+
)
|
| 145 |
+
|
| 146 |
+
# Create the corpus
|
| 147 |
+
rag_corpus = rag.create_corpus(
|
| 148 |
+
display_name=display_name,
|
| 149 |
+
backend_config=rag.RagVectorDbConfig(
|
| 150 |
+
rag_embedding_model_config=embedding_model_config
|
| 151 |
+
),
|
| 152 |
+
)
|
| 153 |
+
|
| 154 |
+
# Update state
|
| 155 |
+
tool_context.state[f"corpus_exists_{corpus_name}"] = True
|
| 156 |
+
|
| 157 |
+
return {
|
| 158 |
+
"success": True,
|
| 159 |
+
"status": "success",
|
| 160 |
+
"message": f"Successfully created corpus '{corpus_name}'",
|
| 161 |
+
"corpus_name": rag_corpus.name,
|
| 162 |
+
"display_name": rag_corpus.display_name,
|
| 163 |
+
"was_created": True,
|
| 164 |
+
}
|
| 165 |
+
|
| 166 |
+
except Exception as e:
|
| 167 |
+
return {
|
| 168 |
+
"success": False,
|
| 169 |
+
"status": "error",
|
| 170 |
+
"message": f"Error creating corpus: {str(e)}",
|
| 171 |
+
"corpus_name": corpus_name,
|
| 172 |
+
"was_created": False,
|
| 173 |
+
}
|
requirements.txt
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
google-cloud-aiplatform==1.92.0
|
| 2 |
+
google-cloud-storage==2.19.0
|
| 3 |
+
google-genai==1.14.0
|
| 4 |
+
gitpython==3.1.40
|
| 5 |
+
google-adk==0.4.0
|