Delete README.md
Browse files---
title: Rag-System-with- Gemini
sdk: streamlit
sdk_version: 1.50.0
app_file: app.py
pinned: false
license: mit
---
README.md
DELETED
|
@@ -1,127 +0,0 @@
|
|
| 1 |
-
# Agentic RAG Streamlit Application
|
| 2 |
-
|
| 3 |
-
This project implements an Retrieval-Augmented Generation (RAG) system using **Gemini** and **Streamlit**. It allows users to ingest data from PDF files and web URLs, ask questions, and receive answers generated by a **Large Language Model (LLM)** leveraging the ingested context and optional web search results.
|
| 4 |
-

|
| 5 |
-
|
| 6 |
-
### How it works
|
| 7 |
-
|
| 8 |
-
* The user uploads PDF documents or provides web URLs, these documents are processed and stored in **Chroma** Vector Database.
|
| 9 |
-
* The user submits a query, the query is first sent to a **Rewrite Agent**. This agent analyzes and reformulates the original query, aiming to improve its clarity and effectiveness for retrieval.
|
| 10 |
-
* The rewritten query is forwarded to the LLM. The LLM searches the Vector DB (**Chroma**), retrieving relevant text chunks based on semantic similarity. Simultaneously or based on configuration, it can leverage Web Search (**DuckDuckGo**) to gather information not present in the uploaded documents. If no specific context found, the LLM answers based on its general knowledge.
|
| 11 |
-
* The generated Response is sent back to the Streamlit interface, where it is displayed to the user.
|
| 12 |
-
|
| 13 |
-
## Features
|
| 14 |
-
|
| 15 |
-
* **Data Ingestion:** Upload PDF files or enter web URLs to populate the knowledge base.
|
| 16 |
-
* **Persistent Vector Store:** Uses **ChromaDB** to store and retrieve text embeddings locally.
|
| 17 |
-
* **Query Rewriting:** Employs an agent with **Agno** to reformulate user questions for potentially better retrieval results.
|
| 18 |
-
* **Retrieval-Augmented Generation (RAG):**
|
| 19 |
-
* Retrieves relevant text chunks from the **ChromaDB** vector store based on the (rewritten) query.
|
| 20 |
-
* Uses a RAG agent (**Gemini**) to synthesize an answer based on the retrieved context.
|
| 21 |
-
* **Web Search:** Optionally performs a web search via **DuckDuckGo** if:
|
| 22 |
-
* No relevant documents are found in the local vector store.
|
| 23 |
-
* Web search is explicitly forced via the UI.
|
| 24 |
-
* **Configuration:** Allows users to configure:
|
| 25 |
-
* Enabling/disabling web search.
|
| 26 |
-
* Forcing web search.
|
| 27 |
-
* Adjusting the similarity score threshold for document retrieval.
|
| 28 |
-
* **Database Management:** Options to clear chat history and the vector database.
|
| 29 |
-
* **Dockerized:** Includes a `Dockerfile` for easy containerization and deployment.
|
| 30 |
-
|
| 31 |
-
## Tech Stack
|
| 32 |
-
|
| 33 |
-
* **Web Framework:** Streamlit
|
| 34 |
-
* **Vector Database:** ChromaDB
|
| 35 |
-
* **LLM & Embeddings:** Gemini
|
| 36 |
-
* **Core Logic:** Langchain (for document processing, vector store integration), Agno (for agents)
|
| 37 |
-
* **Containerization:** Docker
|
| 38 |
-
|
| 39 |
-
## Prerequisites
|
| 40 |
-
|
| 41 |
-
* **Python:** Version 3.11 or higher recommended.
|
| 42 |
-
* **pip:** Python package installer.
|
| 43 |
-
* **Git:** For cloning the repository.
|
| 44 |
-
* **Docker:** Required for running the application using Docker (recommended for easy setup and persistence).
|
| 45 |
-
* **Google API Key:** You need an API key for Google Generative AI (e.g., Gemini API). You can obtain one from [Google AI Studio](https://aistudio.google.com/app/apikey).
|
| 46 |
-
|
| 47 |
-
## How to use
|
| 48 |
-
### Without Docker
|
| 49 |
-
|
| 50 |
-
1. **Clone the Repository:**
|
| 51 |
-
```bash
|
| 52 |
-
git clone https://github.com/luanntd/RAG-System-with-Gemini.git
|
| 53 |
-
cd RAG-System-with-Gemini
|
| 54 |
-
```
|
| 55 |
-
|
| 56 |
-
2. **Create a Virtual Environment (Recommended):**
|
| 57 |
-
```bash
|
| 58 |
-
python -m venv venv
|
| 59 |
-
# Activate it (Linux/macOS)
|
| 60 |
-
source venv/bin/activate
|
| 61 |
-
# Activate it (Windows)
|
| 62 |
-
.\venv\Scripts\activate
|
| 63 |
-
```
|
| 64 |
-
|
| 65 |
-
3. **Install Dependencies:**
|
| 66 |
-
```bash
|
| 67 |
-
pip install -r requirements.txt
|
| 68 |
-
```
|
| 69 |
-
|
| 70 |
-
4. **Create Directory for Vector Store**
|
| 71 |
-
```bash
|
| 72 |
-
mkdir chroma_db
|
| 73 |
-
```
|
| 74 |
-
|
| 75 |
-
5. **Set Up Environment Variables:**
|
| 76 |
-
* Create a file named `.env` in the project's root directory.
|
| 77 |
-
* Add the following variables:
|
| 78 |
-
|
| 79 |
-
```dotenv
|
| 80 |
-
GOOGLE_API_KEY=YOUR_GOOGLE_API_KEY
|
| 81 |
-
COLLECTION_NAME=rag_system
|
| 82 |
-
DB_PATH=chroma_db
|
| 83 |
-
```
|
| 84 |
-
* Replace `"YOUR_GOOGLE_API_KEY"` with your actual Google API key.
|
| 85 |
-
|
| 86 |
-
6. **Running the Application**
|
| 87 |
-
```bash
|
| 88 |
-
streamlit run main.py
|
| 89 |
-
```
|
| 90 |
-
|
| 91 |
-
### With Docker (Recommended)
|
| 92 |
-
|
| 93 |
-
You need to do steps 1 and 5 above before this.
|
| 94 |
-
|
| 95 |
-
1. **Build the Docker Image:**
|
| 96 |
-
```bash
|
| 97 |
-
docker build -t rag-system .
|
| 98 |
-
```
|
| 99 |
-
|
| 100 |
-
2. **Run the Docker Container:**
|
| 101 |
-
* Create the volume:
|
| 102 |
-
```bash
|
| 103 |
-
docker volume create chroma_data
|
| 104 |
-
```
|
| 105 |
-
* Run the container:
|
| 106 |
-
```bash
|
| 107 |
-
docker run -d \
|
| 108 |
-
-p 8501:8501 \
|
| 109 |
-
--env-file ./.env \
|
| 110 |
-
-v chroma_data:/chroma_db \
|
| 111 |
-
--name rag-system-container \
|
| 112 |
-
rag-system
|
| 113 |
-
```
|
| 114 |
-
|
| 115 |
-
* **Explanation of `docker run` flags:**
|
| 116 |
-
* `-d`: Run the container in detached mode (in the background).
|
| 117 |
-
* `-p 8501:8501`: Map port 8501 on your host machine to port 8501 inside the container.
|
| 118 |
-
* `--env-file ./.env`: Load environment variables from your local `.env` file into the container.
|
| 119 |
-
* `-v rag_chroma_data:/app/chroma_db`: Mounts persistent storage. It links the named volume `chroma_data` to the `/chroma_db` directory *inside* the container. This path (`/chroma_db`) is where ChromaDB will store its data.
|
| 120 |
-
* `--name rag-system-container`: Assigns a name to your running container.
|
| 121 |
-
* `rag-system`: The name of the Docker image you built.
|
| 122 |
-
|
| 123 |
-
3. **Access the Application:**
|
| 124 |
-
* Open your web browser and navigate to `http://localhost:8501`.
|
| 125 |
-
|
| 126 |
-
## Demo
|
| 127 |
-

|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|