Spaces:

kamkol
/

AB_Testing_RAG_Agent

Sleeping

App Files Files Community

AB_Testing_RAG_Agent / README.md

kamkol

Add Hugging Face Space configuration

4264e91 11 months ago

preview code

raw

history blame contribute delete

3.69 kB

	---
	title: AB Testing RAG Agent
	emoji: 📊
	colorFrom: blue
	colorTo: indigo
	sdk: streamlit
	sdk_version: 1.32.0
	app_file: streamlit_app.py
	pinned: false
	---

	# AB Testing RAG Agent

	This repository contains a Retrieval Augmented Generation (RAG) agent specialized in A/B Testing that:

	1. Answers questions about A/B Testing using a collection of Ron Kohavi's work
	2. Automatically searches ArXiv for academic papers when needed for better responses
	3. Preserves privacy by pre-processing PDFs locally and only deploying processed data

	## Features

	- Interactive chat interface built with Streamlit
	- Vector search using Qdrant with OpenAI embeddings
	- Two-tier approach:
	- Initial RAG search for efficiency
	- Advanced agent with tools for complex questions
	- Smart source handling and deduplication
	- ArXiv integration

	## Quick Start

	### Local Development

	1. Clone this repository:
	```bash
	git clone https://github.com/yourusername/AB_Testing_RAG_Agent.git
	cd AB_Testing_RAG_Agent
	```

	2. Install dependencies:
	```bash
	pip install -r requirements.txt
	```

	3. Create a `.env` file with your OpenAI API key:
	```
	OPENAI_API_KEY=your_openai_api_key_here
	```

	4. Process your PDF files (only needed once):
	```bash
	python scripts/preprocess_data.py
	```

	5. Run the Streamlit app:
	```bash
	streamlit run streamlit_app.py
	```

	### Docker Deployment

	1. Build the Docker image:
	```bash
	docker build -t ab-testing-rag-agent .
	```

	2. Run the container:
	```bash
	docker run -p 8000:8000 -e OPENAI_API_KEY=your_openai_api_key_here ab-testing-rag-agent
	```

	## Deployment to Hugging Face

	1. Prepare for deployment (check if all required files are ready):
	```bash
	python scripts/prepare_for_deployment.py
	```

	2. Push to your Hugging Face Space:
	```bash
	# Initialize git repository if not already done
	git init
	git add .
	git commit -m "Initial commit"

	# Add Hugging Face Space remote
	git remote add hf https://huggingface.co/spaces/yourusername/ab-testing-rag

	# Push to Hugging Face
	git push hf main
	```

	3. Set both required environment variables in the Hugging Face Space settings:
	- `OPENAI_API_KEY`: Your OpenAI API key
	- `HF_TOKEN`: Your Hugging Face token with access to the dataset

	### Setting Up The PDF Dataset on Hugging Face

	The deployment uses PDFs stored in a separate Hugging Face dataset repo. To set up your own:

	1. Create a dataset repository on Hugging Face called `yourusername/ab_testing_pdfs`

	2. Upload all your PDF files to this repository via the Hugging Face UI or git:
	```bash
	git clone https://huggingface.co/datasets/yourusername/ab_testing_pdfs
	cd ab_testing_pdfs
	cp /path/to/your/pdfs/*.pdf .
	git add .
	git commit -m "Add AB Testing PDFs"
	git push
	```

	3. Update the dataset name in `download_pdfs.py` if you used a different repository name

	4. Make sure your `HF_TOKEN` has read access to this dataset repository

	## Architecture

	- Pre-processing Pipeline: PDF files are processed locally, converted to embeddings, and stored in a vector database
	- Retrieval System: Uses OpenAI's text-embedding-3-small model and Qdrant for vector search
	- Response Generation:
	- Initial attempt with gpt-4.1-mini for efficiency
	- Falls back to gpt-4.1 with tools for complex queries
	- ArXiv Integration: Searches academic papers when necessary

	## Adding Your Own PDFs

	1. Add PDF files to the `data/` directory
	2. Run the preprocessing script:
	```bash
	python scripts/preprocess_data.py
	```

	## Implementation Notes

	- Uses the text-embedding-3-small model for embeddings
	- Uses gpt-4.1-mini for initial responses
	- Uses gpt-4.1 for agent tools and quality evaluation
	- Stores preprocessed data in `processed_data/` directory