# Abalone RAG Chatbot

This project implements a Retrieval-Augmented Generation (RAG) chatbot about Abalone using LangChain + OpenAI with a Streamlit frontend. It's designed to be deployed on Hugging Face Spaces.

Contents
- `app.py` - Streamlit app entrypoint
- `src/ingest.py` - Ingest files from `data/` into a persisted Chroma vectorstore
- `src/vectorstore.py` - Helpers to build/load the Chroma vectorstore and return a retriever
- `src/qa_chain.py` - Build the conversational retrieval QA chain
- `data/` - Put Abalone source files here (CSV/MD/TXT/PDF)
- `vectorstore/` - Persisted vectorstore directory (created by ingestion)

Quickstart (local)

1. Create a venv and install dependencies:

```bash
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
```

2. Set your OpenAI API key:

```bash
export OPENAI_API_KEY="sk-..."
```

3. Add Abalone files into `data/` (for example `abalone.csv`).

4. Build the vectorstore:

```bash
python -m src.ingest --data-dir ./data --persist-dir ./vectorstore
```

5. Run the Streamlit app:

```bash
streamlit run app.py
```

Deploying to Hugging Face Spaces

- Add `OPENAI_API_KEY` in the Spaces secrets (Settings -> Secrets).
- Push this repository to your HF Space. HF will install `requirements.txt` and run the Streamlit app.
- On first run, click the "Ingest data" button or allow the app to rebuild the index.

Security
- Do NOT commit your OpenAI API key. Use HF Spaces Secrets for deployment.

License
- MIT