Spaces:
Sleeping
Sleeping
Adding Readme.md
Browse files
README.md
CHANGED
|
@@ -9,3 +9,117 @@ short_description: This is a RAG Based ChatBOT by Mohammed Adil.
|
|
| 9 |
---
|
| 10 |
|
| 11 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
---
|
| 10 |
|
| 11 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
| 12 |
+
|
| 13 |
+
# RAG Based Chatbot
|
| 14 |
+
|
| 15 |
+
The RAG-based AI ChatBot is designed to streamline information retrieval from lengthy PDF files, catering to professionals like Researchers, Teachers, Engineers, and anyone. Utilizing **Pinecone Vector Database, Gemini API,** and **Sarvam AI**, it generates relevant answers from internal documents. It also generates speech from the response using an LLM. Built with Streamlit for the frontend and FastAPI for the backend, the chatbot leverages Langchain to handle queries efficiently. The project offers fast and accurate insights, optimizing document-based research and decision-making processes.
|
| 16 |
+
|
| 17 |
+
The primary data of this project is based on Chapter 11 - Sound, Book Science of CBSE class 9th. To change the data for personal use, open the file named **"api.py"**, go to **line 164**, and instead of **"./data/ncert_data.pdf"**, paste the filepath of your document. Rerun the script, and your program is good to go.
|
| 18 |
+
|
| 19 |
+
[Click to watch the YouTube Video](https://youtu.be/wphBupOCq28)
|
| 20 |
+
|
| 21 |
+
|
| 22 |
+
## 💻 Built with
|
| 23 |
+
|
| 24 |
+
Python is used as the main language to build this project.
|
| 25 |
+
|
| 26 |
+
### Python Libraries mainly used in the project:
|
| 27 |
+
|
| 28 |
+
* Streamlit [Check here](https://docs.streamlit.io/)
|
| 29 |
+
* Langchain [Check here](https://python.langchain.com/docs/introduction/)
|
| 30 |
+
* FastAPI [Check here](https://fastapi.tiangolo.com/learn/)
|
| 31 |
+
|
| 32 |
+
### APIs used in the project:
|
| 33 |
+
|
| 34 |
+
* Sarvam AI Text to Speech [Check here](https://docs.sarvam.ai/api-reference-docs/endpoints/text-to-speech)
|
| 35 |
+
* Google Gemini 1.5 Flash [Check here](https://ai.google.dev/gemini-api)
|
| 36 |
+
* Pinecone Vector Database [Check here](https://www.pinecone.io/)
|
| 37 |
+
|
| 38 |
+
### Version control tool and containerization technologies:
|
| 39 |
+
|
| 40 |
+
* Docker [Check here](https://www.docker.com/)
|
| 41 |
+
* GitHub [Check here](https://github.com/aadil080)
|
| 42 |
+
|
| 43 |
+
## 🧐 Features
|
| 44 |
+
|
| 45 |
+
Here are some of the project's best features:
|
| 46 |
+
|
| 47 |
+
* Completely built with Python.
|
| 48 |
+
* Agent based query handling.
|
| 49 |
+
* Containerized whole project using Docker.
|
| 50 |
+
* Support for long content files.
|
| 51 |
+
* Free and open-source resources used to built.
|
| 52 |
+
|
| 53 |
+
## 🛠️ Installation Steps
|
| 54 |
+
|
| 55 |
+
### By Basic Way
|
| 56 |
+
|
| 57 |
+
1. Clone the repo
|
| 58 |
+
|
| 59 |
+
```bash
|
| 60 |
+
git clone https://github.com/aadil080/Sarvam-ML-Assignment.git
|
| 61 |
+
```
|
| 62 |
+
|
| 63 |
+
2. Change the working directory and install the requirements
|
| 64 |
+
|
| 65 |
+
```bash
|
| 66 |
+
pip install -r requirements.txt
|
| 67 |
+
```
|
| 68 |
+
|
| 69 |
+
3. Create & add environment variables in the ".env" file
|
| 70 |
+
|
| 71 |
+
```plaintext
|
| 72 |
+
PINECONE_API_KEY = <your_pinecone_index_api_key>
|
| 73 |
+
PINECONE_INDEX_NAME = <your_pinecone_name>
|
| 74 |
+
GOOGLE_API_KEY = <your_google_gemini_1.5_flash_api_key>
|
| 75 |
+
SARVAM_API_KEY = <your_sarvam_ai_text_to_speech_api_key>
|
| 76 |
+
```
|
| 77 |
+
|
| 78 |
+
4. Execute the bash file
|
| 79 |
+
|
| 80 |
+
```bash
|
| 81 |
+
bash start.sh
|
| 82 |
+
```
|
| 83 |
+
|
| 84 |
+
### Using Docker
|
| 85 |
+
|
| 86 |
+
1. Clone the repo
|
| 87 |
+
|
| 88 |
+
```bash
|
| 89 |
+
git clone https://github.com/aadil080/Sarvam-ML-Assignment.git
|
| 90 |
+
```
|
| 91 |
+
|
| 92 |
+
2. Create & add environment variables in the ".env" file
|
| 93 |
+
|
| 94 |
+
```plaintext
|
| 95 |
+
PINECONE_API_KEY = <your_pinecone_index_api_key>
|
| 96 |
+
PINECONE_INDEX_NAME = <your_pinecone_name>
|
| 97 |
+
GOOGLE_API_KEY = <your_google_gemini_1.5_flash_api_key>
|
| 98 |
+
SARVAM_API_KEY = <your_sarvam_ai_text_to_speech_api_key>
|
| 99 |
+
```
|
| 100 |
+
|
| 101 |
+
3. Execute the Docker image creation command
|
| 102 |
+
|
| 103 |
+
```bash
|
| 104 |
+
docker build -t <image_name> . # here period represents the dockerfile path
|
| 105 |
+
```
|
| 106 |
+
|
| 107 |
+
4. Create a new container from the created image
|
| 108 |
+
|
| 109 |
+
```bash
|
| 110 |
+
docker run -p 8000:80 <image_name>
|
| 111 |
+
```
|
| 112 |
+
|
| 113 |
+
## Usage
|
| 114 |
+
|
| 115 |
+
After all the above steps, open your browser on the same machine and type the address below:
|
| 116 |
+
|
| 117 |
+
```bash
|
| 118 |
+
http://localhost:8501
|
| 119 |
+
```
|
| 120 |
+
|
| 121 |
+
|
| 122 |
+
|
| 123 |
+
## 🛡️ License
|
| 124 |
+
|
| 125 |
+
This project is licensed under the Apache-2.0 license.
|