# Abalone RAG Chatbot This project implements a Retrieval-Augmented Generation (RAG) chatbot about Abalone using LangChain + OpenAI with a Streamlit frontend. It's designed to be deployed on Hugging Face Spaces. Contents - `app.py` - Streamlit app entrypoint - `src/ingest.py` - Ingest files from `data/` into a persisted Chroma vectorstore - `src/vectorstore.py` - Helpers to build/load the Chroma vectorstore and return a retriever - `src/qa_chain.py` - Build the conversational retrieval QA chain - `data/` - Put Abalone source files here (CSV/MD/TXT/PDF) - `vectorstore/` - Persisted vectorstore directory (created by ingestion) Quickstart (local) 1. Create a venv and install dependencies: ```bash python -m venv .venv source .venv/bin/activate pip install -r requirements.txt ``` 2. Set your OpenAI API key: ```bash export OPENAI_API_KEY="sk-..." ``` 3. Add Abalone files into `data/` (for example `abalone.csv`). 4. Build the vectorstore: ```bash python -m src.ingest --data-dir ./data --persist-dir ./vectorstore ``` 5. Run the Streamlit app: ```bash streamlit run app.py ``` Deploying to Hugging Face Spaces - Add `OPENAI_API_KEY` in the Spaces secrets (Settings -> Secrets). - Push this repository to your HF Space. HF will install `requirements.txt` and run the Streamlit app. - On first run, click the "Ingest data" button or allow the app to rebuild the index. Security - Do NOT commit your OpenAI API key. Use HF Spaces Secrets for deployment. License - MIT