Spaces:
Configuration error
Configuration error
| ## Backend | |
| This folder contains the backend services for the Document Chat App. | |
| ### `app.py` | |
| This file is the main entry point for the Streamlit web application. It handles the user interface, chat history management, and interacts with the `agent` to process user queries and generate responses. | |
| ### `main.py` | |
| This script is responsible for processing documents. It loads and extracts data (tables, texts, images) from PDF files in the `data` directory, summarizes them using the `summerizer` module, and then chunks and adds the processed documents to the vector store. It keeps track of processed files in `processed_files.txt` to avoid reprocessing. | |
| ### `data/` | |
| This directory is intended to store the raw PDF documents that need to be processed by the system. | |
| ### `vectorStore/` | |
| This directory stores the generated vector embeddings of the processed documents. These embeddings are used by the `agent` for retrieving relevant information during the chat. | |
| ### `agent/` | |
| This module contains the logic for the conversational agent, which uses the vector store to answer questions based on the processed documents. | |
| ### `summerizer/` | |
| This module provides functionalities for summarizing different types of content (text, images) extracted from the documents. | |
| ### `utils/` | |
| This module contains utility functions, such as `helper.py` for loading and extracting data from documents. | |
| ### `tool/` | |
| This module likely contains tools or functions used by the agent to perform specific tasks. | |
| ### `generator.py` | |
| This file likely contains code related to generating responses or content within the application. | |
| ### How to Run | |
| To run the backend application, you will typically run `app.py` using Streamlit after ensuring all dependencies are installed and documents are processed by `main.py`. | |
| ```bash | |
| streamlit run app.py | |
| ``` | |
| ### Running `main.py` for Document Embedding | |
| To process and embed documents, run the `main.py` script. This script will load PDF files from the `data` directory, extract and summarize their contents, and then add them to the vector store. | |
| ```bash | |
| python main.py | |
| ``` | |
| Make sure that the `data` directory contains the PDF files you want to process. The script will log processed files in `processed_files.txt` to avoid reprocessing them. | |