Spaces:

sami606713
/

RagChatbot

Configuration error

App Files Files Community

RagChatbot / README.md

sami606713

Upload 17 files

27a8994 verified 7 months ago

preview code

raw

history blame contribute delete

2.31 kB

	## Backend

	This folder contains the backend services for the Document Chat App.

	### `app.py`
	This file is the main entry point for the Streamlit web application. It handles the user interface, chat history management, and interacts with the `agent` to process user queries and generate responses.

	### `main.py`
	This script is responsible for processing documents. It loads and extracts data (tables, texts, images) from PDF files in the `data` directory, summarizes them using the `summerizer` module, and then chunks and adds the processed documents to the vector store. It keeps track of processed files in `processed_files.txt` to avoid reprocessing.

	### `data/`
	This directory is intended to store the raw PDF documents that need to be processed by the system.

	### `vectorStore/`
	This directory stores the generated vector embeddings of the processed documents. These embeddings are used by the `agent` for retrieving relevant information during the chat.

	### `agent/`
	This module contains the logic for the conversational agent, which uses the vector store to answer questions based on the processed documents.

	### `summerizer/`
	This module provides functionalities for summarizing different types of content (text, images) extracted from the documents.

	### `utils/`
	This module contains utility functions, such as `helper.py` for loading and extracting data from documents.

	### `tool/`
	This module likely contains tools or functions used by the agent to perform specific tasks.

	### `generator.py`
	This file likely contains code related to generating responses or content within the application.

	### How to Run
	To run the backend application, you will typically run `app.py` using Streamlit after ensuring all dependencies are installed and documents are processed by `main.py`.

	```bash
	streamlit run app.py
	```

	### Running `main.py` for Document Embedding
	To process and embed documents, run the `main.py` script. This script will load PDF files from the `data` directory, extract and summarize their contents, and then add them to the vector store.

	```bash
	python main.py
	```

	Make sure that the `data` directory contains the PDF files you want to process. The script will log processed files in `processed_files.txt` to avoid reprocessing them.