Spaces:
Configuration error
Document Reader Tools
This module provides function tools for your Innoscribe chatbot agent to read documents from local files (PDF, DOCX) and Firebase Firestore.
Features
- Read Local Documents: Automatically reads
data.docxand any PDF files from the root directory - Read Firestore Documents: Reads documents from the
datacollection in Firebase Firestore - Auto Mode: Tries local files first, then falls back to Firestore
- List Available Documents: Shows all available documents from both sources
Setup
1. Install Dependencies
pip install -r requirements.txt
Required packages:
firebase-admin- For Firebase Firestore integrationpython-docx- For reading DOCX filesPyPDF2- For reading PDF files
2. Firebase Configuration
Make sure your serviceAccount.json file is in the root directory of the project. This file is used to authenticate with Firebase.
3. Document Storage
Local Documents:
- Place your
data.docxfile in the root directory - Place any PDF files in the root directory
Firestore Documents:
- Upload documents to the
datacollection in Firebase Firestore - Each document should have a
content,text, ordatafield containing the text - Optionally include a
namefield for identification
Usage
Basic Integration with Agent
from agents import Agent
from config.chabot_config import model
from instructions.chatbot_instructions import innscribe_dynamic_instructions
from tools.document_reader_tool import read_document_data, list_available_documents
# Create agent with document reading tools
innscribe_assistant = Agent(
name="Innoscribe Assistant",
instructions=innscribe_dynamic_instructions,
model=model,
tools=[read_document_data, list_available_documents]
)
Tool Functions
read_document_data(query: str, source: str = "auto")
Reads and searches for information from documents.
Parameters:
query: The search query or topic to look forsource: Where to read from -"local","firestore", or"auto"(default)
Returns: Formatted content from matching documents
Example:
result = read_document_data("product information", source="auto")
list_available_documents()
Lists all available documents from both local storage and Firestore.
Returns: Formatted list of available documents
Example:
docs = list_available_documents()
print(docs)
How It Works
Automatic Fallback Strategy
Auto Mode (default):
- First tries to read from local files (data.docx, *.pdf)
- If no data found, tries Firebase Firestore
- Returns combined results if both sources have data
Local Mode:
- Only reads from local files
Firestore Mode:
- Only reads from Firebase Firestore
Agent Behavior
When a user asks a question requiring document data, the agent will:
- Detect that document information is needed
- Automatically call
read_document_data()with the relevant query - Search through local files and/or Firestore
- Return the relevant information to answer the user's question
Example User Interactions
User: "What information do you have about our company?"
- Agent calls:
read_document_data("company information") - Returns relevant content from documents
User: "List all available documents"
- Agent calls:
list_available_documents() - Returns formatted list of all documents
User: "Tell me about product pricing"
- Agent calls:
read_document_data("product pricing") - Returns pricing information from documents
Firestore Collection Structure
Your Firestore data collection should have documents structured like:
{
"name": "Product Catalog",
"content": "This is the product information...",
"type": "product",
"created_at": "2024-01-01"
}
Or simply:
{
"text": "Document content here..."
}
The tool will look for content, text, or data fields to extract the document text.
Testing
Run the example usage file to test the tools:
python tools/example_usage.py
Troubleshooting
Firebase not initializing:
- Check that
serviceAccount.jsonexists in the root directory - Verify the service account has Firestore permissions
Documents not found:
- Verify
data.docxor PDF files exist in the root directory - Check Firestore collection is named
data - Ensure documents have
content,text, ordatafields
Import errors:
- Make sure all dependencies are installed:
pip install -r requirements.txt