# Document Reader Tools This module provides function tools for your Innoscribe chatbot agent to read documents from local files (PDF, DOCX) and Firebase Firestore. ## Features - **Read Local Documents**: Automatically reads `data.docx` and any PDF files from the root directory - **Read Firestore Documents**: Reads documents from the `data` collection in Firebase Firestore - **Auto Mode**: Tries local files first, then falls back to Firestore - **List Available Documents**: Shows all available documents from both sources ## Setup ### 1. Install Dependencies ```bash pip install -r requirements.txt ``` Required packages: - `firebase-admin` - For Firebase Firestore integration - `python-docx` - For reading DOCX files - `PyPDF2` - For reading PDF files ### 2. Firebase Configuration Make sure your `serviceAccount.json` file is in the root directory of the project. This file is used to authenticate with Firebase. ### 3. Document Storage **Local Documents:** - Place your `data.docx` file in the root directory - Place any PDF files in the root directory **Firestore Documents:** - Upload documents to the `data` collection in Firebase Firestore - Each document should have a `content`, `text`, or `data` field containing the text - Optionally include a `name` field for identification ## Usage ### Basic Integration with Agent ```python from agents import Agent from config.chabot_config import model from instructions.chatbot_instructions import innscribe_dynamic_instructions from tools.document_reader_tool import read_document_data, list_available_documents # Create agent with document reading tools innscribe_assistant = Agent( name="Innoscribe Assistant", instructions=innscribe_dynamic_instructions, model=model, tools=[read_document_data, list_available_documents] ) ``` ### Tool Functions #### `read_document_data(query: str, source: str = "auto")` Reads and searches for information from documents. **Parameters:** - `query`: The search query or topic to look for - `source`: Where to read from - `"local"`, `"firestore"`, or `"auto"` (default) **Returns:** Formatted content from matching documents **Example:** ```python result = read_document_data("product information", source="auto") ``` #### `list_available_documents()` Lists all available documents from both local storage and Firestore. **Returns:** Formatted list of available documents **Example:** ```python docs = list_available_documents() print(docs) ``` ## How It Works ### Automatic Fallback Strategy 1. **Auto Mode (default)**: - First tries to read from local files (data.docx, *.pdf) - If no data found, tries Firebase Firestore - Returns combined results if both sources have data 2. **Local Mode**: - Only reads from local files 3. **Firestore Mode**: - Only reads from Firebase Firestore ### Agent Behavior When a user asks a question requiring document data, the agent will: 1. Detect that document information is needed 2. Automatically call `read_document_data()` with the relevant query 3. Search through local files and/or Firestore 4. Return the relevant information to answer the user's question ## Example User Interactions **User:** "What information do you have about our company?" - Agent calls: `read_document_data("company information")` - Returns relevant content from documents **User:** "List all available documents" - Agent calls: `list_available_documents()` - Returns formatted list of all documents **User:** "Tell me about product pricing" - Agent calls: `read_document_data("product pricing")` - Returns pricing information from documents ## Firestore Collection Structure Your Firestore `data` collection should have documents structured like: ```json { "name": "Product Catalog", "content": "This is the product information...", "type": "product", "created_at": "2024-01-01" } ``` Or simply: ```json { "text": "Document content here..." } ``` The tool will look for `content`, `text`, or `data` fields to extract the document text. ## Testing Run the example usage file to test the tools: ```bash python tools/example_usage.py ``` ## Troubleshooting **Firebase not initializing:** - Check that `serviceAccount.json` exists in the root directory - Verify the service account has Firestore permissions **Documents not found:** - Verify `data.docx` or PDF files exist in the root directory - Check Firestore collection is named `data` - Ensure documents have `content`, `text`, or `data` fields **Import errors:** - Make sure all dependencies are installed: `pip install -r requirements.txt`