mohsin / tools /README.md
MuhammadSaad16's picture
Add application file
cb3f557

Document Reader Tools

This module provides function tools for your Innoscribe chatbot agent to read documents from local files (PDF, DOCX) and Firebase Firestore.

Features

  • Read Local Documents: Automatically reads data.docx and any PDF files from the root directory
  • Read Firestore Documents: Reads documents from the data collection in Firebase Firestore
  • Auto Mode: Tries local files first, then falls back to Firestore
  • List Available Documents: Shows all available documents from both sources

Setup

1. Install Dependencies

pip install -r requirements.txt

Required packages:

  • firebase-admin - For Firebase Firestore integration
  • python-docx - For reading DOCX files
  • PyPDF2 - For reading PDF files

2. Firebase Configuration

Make sure your serviceAccount.json file is in the root directory of the project. This file is used to authenticate with Firebase.

3. Document Storage

Local Documents:

  • Place your data.docx file in the root directory
  • Place any PDF files in the root directory

Firestore Documents:

  • Upload documents to the data collection in Firebase Firestore
  • Each document should have a content, text, or data field containing the text
  • Optionally include a name field for identification

Usage

Basic Integration with Agent

from agents import Agent
from config.chabot_config import model
from instructions.chatbot_instructions import innscribe_dynamic_instructions
from tools.document_reader_tool import read_document_data, list_available_documents

# Create agent with document reading tools
innscribe_assistant = Agent(
    name="Innoscribe Assistant",
    instructions=innscribe_dynamic_instructions,
    model=model,
    tools=[read_document_data, list_available_documents]
)

Tool Functions

read_document_data(query: str, source: str = "auto")

Reads and searches for information from documents.

Parameters:

  • query: The search query or topic to look for
  • source: Where to read from - "local", "firestore", or "auto" (default)

Returns: Formatted content from matching documents

Example:

result = read_document_data("product information", source="auto")

list_available_documents()

Lists all available documents from both local storage and Firestore.

Returns: Formatted list of available documents

Example:

docs = list_available_documents()
print(docs)

How It Works

Automatic Fallback Strategy

  1. Auto Mode (default):

    • First tries to read from local files (data.docx, *.pdf)
    • If no data found, tries Firebase Firestore
    • Returns combined results if both sources have data
  2. Local Mode:

    • Only reads from local files
  3. Firestore Mode:

    • Only reads from Firebase Firestore

Agent Behavior

When a user asks a question requiring document data, the agent will:

  1. Detect that document information is needed
  2. Automatically call read_document_data() with the relevant query
  3. Search through local files and/or Firestore
  4. Return the relevant information to answer the user's question

Example User Interactions

User: "What information do you have about our company?"

  • Agent calls: read_document_data("company information")
  • Returns relevant content from documents

User: "List all available documents"

  • Agent calls: list_available_documents()
  • Returns formatted list of all documents

User: "Tell me about product pricing"

  • Agent calls: read_document_data("product pricing")
  • Returns pricing information from documents

Firestore Collection Structure

Your Firestore data collection should have documents structured like:

{
  "name": "Product Catalog",
  "content": "This is the product information...",
  "type": "product",
  "created_at": "2024-01-01"
}

Or simply:

{
  "text": "Document content here..."
}

The tool will look for content, text, or data fields to extract the document text.

Testing

Run the example usage file to test the tools:

python tools/example_usage.py

Troubleshooting

Firebase not initializing:

  • Check that serviceAccount.json exists in the root directory
  • Verify the service account has Firestore permissions

Documents not found:

  • Verify data.docx or PDF files exist in the root directory
  • Check Firestore collection is named data
  • Ensure documents have content, text, or data fields

Import errors:

  • Make sure all dependencies are installed: pip install -r requirements.txt