Launchlab / tools /README.md
Muhammad Saad
Add application file
8770644
# Document Reader Tools
This module provides function tools for your Innoscribe chatbot agent to read documents from local files (PDF, DOCX) and Firebase Firestore.
## Features
- **Read Local Documents**: Automatically reads `data.docx` and any PDF files from the root directory
- **Read Firestore Documents**: Reads documents from the `data` collection in Firebase Firestore
- **Auto Mode**: Tries local files first, then falls back to Firestore
- **List Available Documents**: Shows all available documents from both sources
## Setup
### 1. Install Dependencies
```bash
pip install -r requirements.txt
```
Required packages:
- `firebase-admin` - For Firebase Firestore integration
- `python-docx` - For reading DOCX files
- `PyPDF2` - For reading PDF files
### 2. Firebase Configuration
Make sure your `serviceAccount.json` file is in the root directory of the project. This file is used to authenticate with Firebase.
### 3. Document Storage
**Local Documents:**
- Place your `data.docx` file in the root directory
- Place any PDF files in the root directory
**Firestore Documents:**
- Upload documents to the `data` collection in Firebase Firestore
- Each document should have a `content`, `text`, or `data` field containing the text
- Optionally include a `name` field for identification
## Usage
### Basic Integration with Agent
```python
from agents import Agent
from config.chabot_config import model
from instructions.chatbot_instructions import innscribe_dynamic_instructions
from tools.document_reader_tool import read_document_data, list_available_documents
# Create agent with document reading tools
innscribe_assistant = Agent(
name="Innoscribe Assistant",
instructions=innscribe_dynamic_instructions,
model=model,
tools=[read_document_data, list_available_documents]
)
```
### Tool Functions
#### `read_document_data(query: str, source: str = "auto")`
Reads and searches for information from documents.
**Parameters:**
- `query`: The search query or topic to look for
- `source`: Where to read from - `"local"`, `"firestore"`, or `"auto"` (default)
**Returns:** Formatted content from matching documents
**Example:**
```python
result = read_document_data("product information", source="auto")
```
#### `list_available_documents()`
Lists all available documents from both local storage and Firestore.
**Returns:** Formatted list of available documents
**Example:**
```python
docs = list_available_documents()
print(docs)
```
## How It Works
### Automatic Fallback Strategy
1. **Auto Mode (default)**:
- First tries to read from local files (data.docx, *.pdf)
- If no data found, tries Firebase Firestore
- Returns combined results if both sources have data
2. **Local Mode**:
- Only reads from local files
3. **Firestore Mode**:
- Only reads from Firebase Firestore
### Agent Behavior
When a user asks a question requiring document data, the agent will:
1. Detect that document information is needed
2. Automatically call `read_document_data()` with the relevant query
3. Search through local files and/or Firestore
4. Return the relevant information to answer the user's question
## Example User Interactions
**User:** "What information do you have about our company?"
- Agent calls: `read_document_data("company information")`
- Returns relevant content from documents
**User:** "List all available documents"
- Agent calls: `list_available_documents()`
- Returns formatted list of all documents
**User:** "Tell me about product pricing"
- Agent calls: `read_document_data("product pricing")`
- Returns pricing information from documents
## Firestore Collection Structure
Your Firestore `data` collection should have documents structured like:
```json
{
"name": "Product Catalog",
"content": "This is the product information...",
"type": "product",
"created_at": "2024-01-01"
}
```
Or simply:
```json
{
"text": "Document content here..."
}
```
The tool will look for `content`, `text`, or `data` fields to extract the document text.
## Testing
Run the example usage file to test the tools:
```bash
python tools/example_usage.py
```
## Troubleshooting
**Firebase not initializing:**
- Check that `serviceAccount.json` exists in the root directory
- Verify the service account has Firestore permissions
**Documents not found:**
- Verify `data.docx` or PDF files exist in the root directory
- Check Firestore collection is named `data`
- Ensure documents have `content`, `text`, or `data` fields
**Import errors:**
- Make sure all dependencies are installed: `pip install -r requirements.txt`