Spaces:
Configuration error
Configuration error
File size: 4,599 Bytes
8770644 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 | # Document Reader Tools
This module provides function tools for your Innoscribe chatbot agent to read documents from local files (PDF, DOCX) and Firebase Firestore.
## Features
- **Read Local Documents**: Automatically reads `data.docx` and any PDF files from the root directory
- **Read Firestore Documents**: Reads documents from the `data` collection in Firebase Firestore
- **Auto Mode**: Tries local files first, then falls back to Firestore
- **List Available Documents**: Shows all available documents from both sources
## Setup
### 1. Install Dependencies
```bash
pip install -r requirements.txt
```
Required packages:
- `firebase-admin` - For Firebase Firestore integration
- `python-docx` - For reading DOCX files
- `PyPDF2` - For reading PDF files
### 2. Firebase Configuration
Make sure your `serviceAccount.json` file is in the root directory of the project. This file is used to authenticate with Firebase.
### 3. Document Storage
**Local Documents:**
- Place your `data.docx` file in the root directory
- Place any PDF files in the root directory
**Firestore Documents:**
- Upload documents to the `data` collection in Firebase Firestore
- Each document should have a `content`, `text`, or `data` field containing the text
- Optionally include a `name` field for identification
## Usage
### Basic Integration with Agent
```python
from agents import Agent
from config.chabot_config import model
from instructions.chatbot_instructions import innscribe_dynamic_instructions
from tools.document_reader_tool import read_document_data, list_available_documents
# Create agent with document reading tools
innscribe_assistant = Agent(
name="Innoscribe Assistant",
instructions=innscribe_dynamic_instructions,
model=model,
tools=[read_document_data, list_available_documents]
)
```
### Tool Functions
#### `read_document_data(query: str, source: str = "auto")`
Reads and searches for information from documents.
**Parameters:**
- `query`: The search query or topic to look for
- `source`: Where to read from - `"local"`, `"firestore"`, or `"auto"` (default)
**Returns:** Formatted content from matching documents
**Example:**
```python
result = read_document_data("product information", source="auto")
```
#### `list_available_documents()`
Lists all available documents from both local storage and Firestore.
**Returns:** Formatted list of available documents
**Example:**
```python
docs = list_available_documents()
print(docs)
```
## How It Works
### Automatic Fallback Strategy
1. **Auto Mode (default)**:
- First tries to read from local files (data.docx, *.pdf)
- If no data found, tries Firebase Firestore
- Returns combined results if both sources have data
2. **Local Mode**:
- Only reads from local files
3. **Firestore Mode**:
- Only reads from Firebase Firestore
### Agent Behavior
When a user asks a question requiring document data, the agent will:
1. Detect that document information is needed
2. Automatically call `read_document_data()` with the relevant query
3. Search through local files and/or Firestore
4. Return the relevant information to answer the user's question
## Example User Interactions
**User:** "What information do you have about our company?"
- Agent calls: `read_document_data("company information")`
- Returns relevant content from documents
**User:** "List all available documents"
- Agent calls: `list_available_documents()`
- Returns formatted list of all documents
**User:** "Tell me about product pricing"
- Agent calls: `read_document_data("product pricing")`
- Returns pricing information from documents
## Firestore Collection Structure
Your Firestore `data` collection should have documents structured like:
```json
{
"name": "Product Catalog",
"content": "This is the product information...",
"type": "product",
"created_at": "2024-01-01"
}
```
Or simply:
```json
{
"text": "Document content here..."
}
```
The tool will look for `content`, `text`, or `data` fields to extract the document text.
## Testing
Run the example usage file to test the tools:
```bash
python tools/example_usage.py
```
## Troubleshooting
**Firebase not initializing:**
- Check that `serviceAccount.json` exists in the root directory
- Verify the service account has Firestore permissions
**Documents not found:**
- Verify `data.docx` or PDF files exist in the root directory
- Check Firestore collection is named `data`
- Ensure documents have `content`, `text`, or `data` fields
**Import errors:**
- Make sure all dependencies are installed: `pip install -r requirements.txt`
|