Spaces:
Configuration error
Configuration error
File size: 2,313 Bytes
27a8994 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
## Backend
This folder contains the backend services for the Document Chat App.
### `app.py`
This file is the main entry point for the Streamlit web application. It handles the user interface, chat history management, and interacts with the `agent` to process user queries and generate responses.
### `main.py`
This script is responsible for processing documents. It loads and extracts data (tables, texts, images) from PDF files in the `data` directory, summarizes them using the `summerizer` module, and then chunks and adds the processed documents to the vector store. It keeps track of processed files in `processed_files.txt` to avoid reprocessing.
### `data/`
This directory is intended to store the raw PDF documents that need to be processed by the system.
### `vectorStore/`
This directory stores the generated vector embeddings of the processed documents. These embeddings are used by the `agent` for retrieving relevant information during the chat.
### `agent/`
This module contains the logic for the conversational agent, which uses the vector store to answer questions based on the processed documents.
### `summerizer/`
This module provides functionalities for summarizing different types of content (text, images) extracted from the documents.
### `utils/`
This module contains utility functions, such as `helper.py` for loading and extracting data from documents.
### `tool/`
This module likely contains tools or functions used by the agent to perform specific tasks.
### `generator.py`
This file likely contains code related to generating responses or content within the application.
### How to Run
To run the backend application, you will typically run `app.py` using Streamlit after ensuring all dependencies are installed and documents are processed by `main.py`.
```bash
streamlit run app.py
```
### Running `main.py` for Document Embedding
To process and embed documents, run the `main.py` script. This script will load PDF files from the `data` directory, extract and summarize their contents, and then add them to the vector store.
```bash
python main.py
```
Make sure that the `data` directory contains the PDF files you want to process. The script will log processed files in `processed_files.txt` to avoid reprocessing them.
|