| # Abalone RAG Chatbot | |
| This project implements a Retrieval-Augmented Generation (RAG) chatbot about Abalone using LangChain + OpenAI with a Streamlit frontend. It's designed to be deployed on Hugging Face Spaces. | |
| Contents | |
| - `app.py` - Streamlit app entrypoint | |
| - `src/ingest.py` - Ingest files from `data/` into a persisted Chroma vectorstore | |
| - `src/vectorstore.py` - Helpers to build/load the Chroma vectorstore and return a retriever | |
| - `src/qa_chain.py` - Build the conversational retrieval QA chain | |
| - `data/` - Put Abalone source files here (CSV/MD/TXT/PDF) | |
| - `vectorstore/` - Persisted vectorstore directory (created by ingestion) | |
| Quickstart (local) | |
| 1. Create a venv and install dependencies: | |
| ```bash | |
| python -m venv .venv | |
| source .venv/bin/activate | |
| pip install -r requirements.txt | |
| ``` | |
| 2. Set your OpenAI API key: | |
| ```bash | |
| export OPENAI_API_KEY="sk-..." | |
| ``` | |
| 3. Add Abalone files into `data/` (for example `abalone.csv`). | |
| 4. Build the vectorstore: | |
| ```bash | |
| python -m src.ingest --data-dir ./data --persist-dir ./vectorstore | |
| ``` | |
| 5. Run the Streamlit app: | |
| ```bash | |
| streamlit run app.py | |
| ``` | |
| Deploying to Hugging Face Spaces | |
| - Add `OPENAI_API_KEY` in the Spaces secrets (Settings -> Secrets). | |
| - Push this repository to your HF Space. HF will install `requirements.txt` and run the Streamlit app. | |
| - On first run, click the "Ingest data" button or allow the app to rebuild the index. | |
| Security | |
| - Do NOT commit your OpenAI API key. Use HF Spaces Secrets for deployment. | |
| License | |
| - MIT | |