--- title: AB Testing RAG Agent emoji: 📊 colorFrom: blue colorTo: indigo sdk: streamlit sdk_version: 1.32.0 app_file: streamlit_app.py pinned: false --- # AB Testing RAG Agent This repository contains a Retrieval Augmented Generation (RAG) agent specialized in A/B Testing that: 1. Answers questions about A/B Testing using a collection of Ron Kohavi's work 2. Automatically searches ArXiv for academic papers when needed for better responses 3. Preserves privacy by pre-processing PDFs locally and only deploying processed data ## Features - Interactive chat interface built with Streamlit - Vector search using Qdrant with OpenAI embeddings - Two-tier approach: - Initial RAG search for efficiency - Advanced agent with tools for complex questions - Smart source handling and deduplication - ArXiv integration ## Quick Start ### Local Development 1. Clone this repository: ```bash git clone https://github.com/yourusername/AB_Testing_RAG_Agent.git cd AB_Testing_RAG_Agent ``` 2. Install dependencies: ```bash pip install -r requirements.txt ``` 3. Create a `.env` file with your OpenAI API key: ``` OPENAI_API_KEY=your_openai_api_key_here ``` 4. Process your PDF files (only needed once): ```bash python scripts/preprocess_data.py ``` 5. Run the Streamlit app: ```bash streamlit run streamlit_app.py ``` ### Docker Deployment 1. Build the Docker image: ```bash docker build -t ab-testing-rag-agent . ``` 2. Run the container: ```bash docker run -p 8000:8000 -e OPENAI_API_KEY=your_openai_api_key_here ab-testing-rag-agent ``` ## Deployment to Hugging Face 1. Prepare for deployment (check if all required files are ready): ```bash python scripts/prepare_for_deployment.py ``` 2. Push to your Hugging Face Space: ```bash # Initialize git repository if not already done git init git add . git commit -m "Initial commit" # Add Hugging Face Space remote git remote add hf https://huggingface.co/spaces/yourusername/ab-testing-rag # Push to Hugging Face git push hf main ``` 3. Set both required environment variables in the Hugging Face Space settings: - `OPENAI_API_KEY`: Your OpenAI API key - `HF_TOKEN`: Your Hugging Face token with access to the dataset ### Setting Up The PDF Dataset on Hugging Face The deployment uses PDFs stored in a separate Hugging Face dataset repo. To set up your own: 1. Create a dataset repository on Hugging Face called `yourusername/ab_testing_pdfs` 2. Upload all your PDF files to this repository via the Hugging Face UI or git: ```bash git clone https://huggingface.co/datasets/yourusername/ab_testing_pdfs cd ab_testing_pdfs cp /path/to/your/pdfs/*.pdf . git add . git commit -m "Add AB Testing PDFs" git push ``` 3. Update the dataset name in `download_pdfs.py` if you used a different repository name 4. Make sure your `HF_TOKEN` has read access to this dataset repository ## Architecture - **Pre-processing Pipeline**: PDF files are processed locally, converted to embeddings, and stored in a vector database - **Retrieval System**: Uses OpenAI's text-embedding-3-small model and Qdrant for vector search - **Response Generation**: - Initial attempt with gpt-4.1-mini for efficiency - Falls back to gpt-4.1 with tools for complex queries - **ArXiv Integration**: Searches academic papers when necessary ## Adding Your Own PDFs 1. Add PDF files to the `data/` directory 2. Run the preprocessing script: ```bash python scripts/preprocess_data.py ``` ## Implementation Notes - Uses the text-embedding-3-small model for embeddings - Uses gpt-4.1-mini for initial responses - Uses gpt-4.1 for agent tools and quality evaluation - Stores preprocessed data in `processed_data/` directory