File size: 4,599 Bytes
8770644
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
# Document Reader Tools

This module provides function tools for your Innoscribe chatbot agent to read documents from local files (PDF, DOCX) and Firebase Firestore.

## Features

- **Read Local Documents**: Automatically reads `data.docx` and any PDF files from the root directory
- **Read Firestore Documents**: Reads documents from the `data` collection in Firebase Firestore
- **Auto Mode**: Tries local files first, then falls back to Firestore
- **List Available Documents**: Shows all available documents from both sources

## Setup

### 1. Install Dependencies

```bash
pip install -r requirements.txt
```

Required packages:
- `firebase-admin` - For Firebase Firestore integration
- `python-docx` - For reading DOCX files
- `PyPDF2` - For reading PDF files

### 2. Firebase Configuration

Make sure your `serviceAccount.json` file is in the root directory of the project. This file is used to authenticate with Firebase.

### 3. Document Storage

**Local Documents:**
- Place your `data.docx` file in the root directory
- Place any PDF files in the root directory

**Firestore Documents:**
- Upload documents to the `data` collection in Firebase Firestore
- Each document should have a `content`, `text`, or `data` field containing the text
- Optionally include a `name` field for identification

## Usage

### Basic Integration with Agent

```python
from agents import Agent
from config.chabot_config import model
from instructions.chatbot_instructions import innscribe_dynamic_instructions
from tools.document_reader_tool import read_document_data, list_available_documents

# Create agent with document reading tools
innscribe_assistant = Agent(
    name="Innoscribe Assistant",
    instructions=innscribe_dynamic_instructions,
    model=model,
    tools=[read_document_data, list_available_documents]
)
```

### Tool Functions

#### `read_document_data(query: str, source: str = "auto")`

Reads and searches for information from documents.

**Parameters:**
- `query`: The search query or topic to look for
- `source`: Where to read from - `"local"`, `"firestore"`, or `"auto"` (default)

**Returns:** Formatted content from matching documents

**Example:**
```python
result = read_document_data("product information", source="auto")
```

#### `list_available_documents()`

Lists all available documents from both local storage and Firestore.

**Returns:** Formatted list of available documents

**Example:**
```python
docs = list_available_documents()
print(docs)
```

## How It Works

### Automatic Fallback Strategy

1. **Auto Mode (default)**: 
   - First tries to read from local files (data.docx, *.pdf)
   - If no data found, tries Firebase Firestore
   - Returns combined results if both sources have data

2. **Local Mode**:
   - Only reads from local files

3. **Firestore Mode**:
   - Only reads from Firebase Firestore

### Agent Behavior

When a user asks a question requiring document data, the agent will:

1. Detect that document information is needed
2. Automatically call `read_document_data()` with the relevant query
3. Search through local files and/or Firestore
4. Return the relevant information to answer the user's question

## Example User Interactions

**User:** "What information do you have about our company?"
- Agent calls: `read_document_data("company information")`
- Returns relevant content from documents

**User:** "List all available documents"
- Agent calls: `list_available_documents()`
- Returns formatted list of all documents

**User:** "Tell me about product pricing"
- Agent calls: `read_document_data("product pricing")`
- Returns pricing information from documents

## Firestore Collection Structure

Your Firestore `data` collection should have documents structured like:

```json
{
  "name": "Product Catalog",
  "content": "This is the product information...",
  "type": "product",
  "created_at": "2024-01-01"
}
```

Or simply:

```json
{
  "text": "Document content here..."
}
```

The tool will look for `content`, `text`, or `data` fields to extract the document text.

## Testing

Run the example usage file to test the tools:

```bash
python tools/example_usage.py
```

## Troubleshooting

**Firebase not initializing:**
- Check that `serviceAccount.json` exists in the root directory
- Verify the service account has Firestore permissions

**Documents not found:**
- Verify `data.docx` or PDF files exist in the root directory
- Check Firestore collection is named `data`
- Ensure documents have `content`, `text`, or `data` fields

**Import errors:**
- Make sure all dependencies are installed: `pip install -r requirements.txt`