Spaces:
Sleeping
A newer version of the Streamlit SDK is available: 1.57.0
Streamlit Chatbot Application
This project is a simple and extensible chatbot application built using Streamlit. The chatbot currently echoes user inputs and supports conversation clearing, with a flexible codebase for future enhancements.
Features
- Interactive Chatbot Interface: Engages users through a conversational interface built with Streamlit.
- Conversation History: Stores the entire conversation history for reference.
- Error Handling: Ensures the chatbot gracefully handles errors and unexpected inputs.
- Modular and Extensible: Easily extendable to incorporate more advanced functionality in the future.
Prerequisites
Before setting up the project, make sure you have the following software installed on your system:
- Python: Version 3.8 or higher.
- Pip: Python's package manager.
- Virtual Environment (optional but recommended for isolation).
Installation
To get started with the chatbot, follow these steps:
1. Clone the Repository
Clone the project repository to your local machine:
git clone https://github.com/gupta-bhavesh/gitlab-rag-chatbot.git
cd gitlab-rag-chatbot
2. Set Up a Virtual Environment (Optional)
It's recommended to use a virtual environment to keep your project dependencies isolated. To create and activate a virtual environment, use the following commands:
python3 -m venv venv
source venv/bin/activate
# On Windows, use `venv\Scripts\activate
3. Install Dependencies
Once the virtual environment is active, install the required dependencies using pip:
pip install -r requirements.txt
This will install all the necessary libraries listed in the requirements.txt file.
Data Setup
The chatbot relies on data gathered from GitLab's Handbook and Direction pages, which are scraped using a web crawler.
1. Configure URL for Crawling
You can configure the URLs the crawler should start with in the gitlab_crawler/gitlab_crawler/spiders/handbook.py file. Add or update the start_urls list with the desired URLs:
start_urls = [
"https://handbook.gitlab.com/",
"https://about.gitlab.com/direction/"
]
2. Configure Max Depth for Crawling
Control the depth of the crawl by adjusting the DEPTH_LIMIT setting in the gitlab_crawler/gitlab_crawler/settings.py file. This parameter limits how many levels deep the spider will follow links.
DEPTH_LIMIT = 2 # Set the crawl depth limit
3. Run the Spider
To start the web crawler, run the following command in the terminal:
cd gitlab_crawler
scrapy crawl gitlab -o ../gitlab_data.json
This command will start the crawler and save the scraped data into a gitlab_data.json file. The output will include:
- title: Title of the page.
- url: URL of the page.
- content: HTML content of the page.
Environment Configuration
To run the application, you'll need to set up a few environment variables.
1. Create .env File
In the project's root directory, create a .env file and add the following environment variables:
GOOGLE_API_KEY=your_google_api_key
PINECONE_API_KEY=your_pinecone_api_key
2. Embed and Store Data in Pinecone
Use the index_data.ipynb Jupyter notebook to process and embed your data into Pinecone. Run the notebook cells sequentially to embed the scraped data into Pinecone for efficient search and retrieval.
Running the Application
Once everything is set up, you can run the chatbot application locally by following these steps:
Open your terminal and navigate to the project directory.
Start the Streamlit application with the following command:
streamlit run chatbot.py
This will launch the Streamlit application in your browser, and you can start interacting with the chatbot.