gitlab-rag-chatbot / README 2.md
Bhavesh Gupta
adding files
e535079

A newer version of the Streamlit SDK is available: 1.57.0

Upgrade

Streamlit Chatbot Application

This project is a simple and extensible chatbot application built using Streamlit. The chatbot currently echoes user inputs and supports conversation clearing, with a flexible codebase for future enhancements.


Features

  • Interactive Chatbot Interface: Engages users through a conversational interface built with Streamlit.
  • Conversation History: Stores the entire conversation history for reference.
  • Error Handling: Ensures the chatbot gracefully handles errors and unexpected inputs.
  • Modular and Extensible: Easily extendable to incorporate more advanced functionality in the future.

Prerequisites

Before setting up the project, make sure you have the following software installed on your system:

  • Python: Version 3.8 or higher.
  • Pip: Python's package manager.
  • Virtual Environment (optional but recommended for isolation).

Installation

To get started with the chatbot, follow these steps:

1. Clone the Repository

Clone the project repository to your local machine:

git clone https://github.com/gupta-bhavesh/gitlab-rag-chatbot.git
cd gitlab-rag-chatbot

2. Set Up a Virtual Environment (Optional)

It's recommended to use a virtual environment to keep your project dependencies isolated. To create and activate a virtual environment, use the following commands:

python3 -m venv venv
source venv/bin/activate
# On Windows, use `venv\Scripts\activate

3. Install Dependencies

Once the virtual environment is active, install the required dependencies using pip:

pip install -r requirements.txt

This will install all the necessary libraries listed in the requirements.txt file.


Data Setup

The chatbot relies on data gathered from GitLab's Handbook and Direction pages, which are scraped using a web crawler.

1. Configure URL for Crawling

You can configure the URLs the crawler should start with in the gitlab_crawler/gitlab_crawler/spiders/handbook.py file. Add or update the start_urls list with the desired URLs:

start_urls = [
        "https://handbook.gitlab.com/",
        "https://about.gitlab.com/direction/"
    ]

2. Configure Max Depth for Crawling

Control the depth of the crawl by adjusting the DEPTH_LIMIT setting in the gitlab_crawler/gitlab_crawler/settings.py file. This parameter limits how many levels deep the spider will follow links.

DEPTH_LIMIT = 2  # Set the crawl depth limit

3. Run the Spider

To start the web crawler, run the following command in the terminal:

cd gitlab_crawler
scrapy crawl gitlab -o ../gitlab_data.json

This command will start the crawler and save the scraped data into a gitlab_data.json file. The output will include:

  • title: Title of the page.
  • url: URL of the page.
  • content: HTML content of the page.

Environment Configuration

To run the application, you'll need to set up a few environment variables.

1. Create .env File

In the project's root directory, create a .env file and add the following environment variables:

GOOGLE_API_KEY=your_google_api_key
PINECONE_API_KEY=your_pinecone_api_key

2. Embed and Store Data in Pinecone

Use the index_data.ipynb Jupyter notebook to process and embed your data into Pinecone. Run the notebook cells sequentially to embed the scraped data into Pinecone for efficient search and retrieval.


Running the Application

Once everything is set up, you can run the chatbot application locally by following these steps:

  1. Open your terminal and navigate to the project directory.

  2. Start the Streamlit application with the following command:

    streamlit run chatbot.py
    

This will launch the Streamlit application in your browser, and you can start interacting with the chatbot.