Spaces:
Configuration error
Configuration error
| # Document Reader Tools | |
| This module provides function tools for your Innoscribe chatbot agent to read documents from local files (PDF, DOCX) and Firebase Firestore. | |
| ## Features | |
| - **Read Local Documents**: Automatically reads `data.docx` and any PDF files from the root directory | |
| - **Read Firestore Documents**: Reads documents from the `data` collection in Firebase Firestore | |
| - **Auto Mode**: Tries local files first, then falls back to Firestore | |
| - **List Available Documents**: Shows all available documents from both sources | |
| ## Setup | |
| ### 1. Install Dependencies | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| Required packages: | |
| - `firebase-admin` - For Firebase Firestore integration | |
| - `python-docx` - For reading DOCX files | |
| - `PyPDF2` - For reading PDF files | |
| ### 2. Firebase Configuration | |
| Make sure your `serviceAccount.json` file is in the root directory of the project. This file is used to authenticate with Firebase. | |
| ### 3. Document Storage | |
| **Local Documents:** | |
| - Place your `data.docx` file in the root directory | |
| - Place any PDF files in the root directory | |
| **Firestore Documents:** | |
| - Upload documents to the `data` collection in Firebase Firestore | |
| - Each document should have a `content`, `text`, or `data` field containing the text | |
| - Optionally include a `name` field for identification | |
| ## Usage | |
| ### Basic Integration with Agent | |
| ```python | |
| from agents import Agent | |
| from config.chabot_config import model | |
| from instructions.chatbot_instructions import innscribe_dynamic_instructions | |
| from tools.document_reader_tool import read_document_data, list_available_documents | |
| # Create agent with document reading tools | |
| innscribe_assistant = Agent( | |
| name="Innoscribe Assistant", | |
| instructions=innscribe_dynamic_instructions, | |
| model=model, | |
| tools=[read_document_data, list_available_documents] | |
| ) | |
| ``` | |
| ### Tool Functions | |
| #### `read_document_data(query: str, source: str = "auto")` | |
| Reads and searches for information from documents. | |
| **Parameters:** | |
| - `query`: The search query or topic to look for | |
| - `source`: Where to read from - `"local"`, `"firestore"`, or `"auto"` (default) | |
| **Returns:** Formatted content from matching documents | |
| **Example:** | |
| ```python | |
| result = read_document_data("product information", source="auto") | |
| ``` | |
| #### `list_available_documents()` | |
| Lists all available documents from both local storage and Firestore. | |
| **Returns:** Formatted list of available documents | |
| **Example:** | |
| ```python | |
| docs = list_available_documents() | |
| print(docs) | |
| ``` | |
| ## How It Works | |
| ### Automatic Fallback Strategy | |
| 1. **Auto Mode (default)**: | |
| - First tries to read from local files (data.docx, *.pdf) | |
| - If no data found, tries Firebase Firestore | |
| - Returns combined results if both sources have data | |
| 2. **Local Mode**: | |
| - Only reads from local files | |
| 3. **Firestore Mode**: | |
| - Only reads from Firebase Firestore | |
| ### Agent Behavior | |
| When a user asks a question requiring document data, the agent will: | |
| 1. Detect that document information is needed | |
| 2. Automatically call `read_document_data()` with the relevant query | |
| 3. Search through local files and/or Firestore | |
| 4. Return the relevant information to answer the user's question | |
| ## Example User Interactions | |
| **User:** "What information do you have about our company?" | |
| - Agent calls: `read_document_data("company information")` | |
| - Returns relevant content from documents | |
| **User:** "List all available documents" | |
| - Agent calls: `list_available_documents()` | |
| - Returns formatted list of all documents | |
| **User:** "Tell me about product pricing" | |
| - Agent calls: `read_document_data("product pricing")` | |
| - Returns pricing information from documents | |
| ## Firestore Collection Structure | |
| Your Firestore `data` collection should have documents structured like: | |
| ```json | |
| { | |
| "name": "Product Catalog", | |
| "content": "This is the product information...", | |
| "type": "product", | |
| "created_at": "2024-01-01" | |
| } | |
| ``` | |
| Or simply: | |
| ```json | |
| { | |
| "text": "Document content here..." | |
| } | |
| ``` | |
| The tool will look for `content`, `text`, or `data` fields to extract the document text. | |
| ## Testing | |
| Run the example usage file to test the tools: | |
| ```bash | |
| python tools/example_usage.py | |
| ``` | |
| ## Troubleshooting | |
| **Firebase not initializing:** | |
| - Check that `serviceAccount.json` exists in the root directory | |
| - Verify the service account has Firestore permissions | |
| **Documents not found:** | |
| - Verify `data.docx` or PDF files exist in the root directory | |
| - Check Firestore collection is named `data` | |
| - Ensure documents have `content`, `text`, or `data` fields | |
| **Import errors:** | |
| - Make sure all dependencies are installed: `pip install -r requirements.txt` | |